Re: Linux 5.18.18

From: Greg Kroah-Hartman
Date: Wed Aug 17 2022 - 09:26:24 EST

Next message: Greg Kroah-Hartman: "Re: Linux 5.19.2"
Previous message: Greg Kroah-Hartman: "Re: Linux 5.15.61"
In reply to: Greg Kroah-Hartman: "Linux 5.18.18"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback b/Documentation/ABI/testing/sysfs-driver-xen-blkback
index a74dfe52dd76..ca7f12aeb2da 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
@@ -42,5 +42,5 @@ KernelVersion: 5.10
Contact: SeongJae Park <sj@xxxxxxxxxx>
Description:
Whether to enable the persistent grants feature or not. Note
- that this option only takes effect on newly created backends.
+ that this option only takes effect on newly connected backends.
The default is Y (enable).
diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
index 61fd173fabfe..98b41c3b6002 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
@@ -15,5 +15,5 @@ KernelVersion: 5.10
Contact: SeongJae Park <sj@xxxxxxxxxx>
Description:
Whether to enable the persistent grants feature or not. Note
- that this option only takes effect on newly created frontends.
+ that this option only takes effect on newly connected frontends.
The default is Y (enable).
diff --git a/Documentation/admin-guide/device-mapper/writecache.rst b/Documentation/admin-guide/device-mapper/writecache.rst
index 10429779a91a..724e028d1858 100644
--- a/Documentation/admin-guide/device-mapper/writecache.rst
+++ b/Documentation/admin-guide/device-mapper/writecache.rst
@@ -78,16 +78,16 @@ Status:
2. the number of blocks
3. the number of free blocks
4. the number of blocks under writeback
-5. the number of read requests
-6. the number of read requests that hit the cache
-7. the number of write requests
-8. the number of write requests that hit uncommitted block
-9. the number of write requests that hit committed block
-10. the number of write requests that bypass the cache
-11. the number of write requests that are allocated in the cache
+5. the number of read blocks
+6. the number of read blocks that hit the cache
+7. the number of write blocks
+8. the number of write blocks that hit uncommitted block
+9. the number of write blocks that hit committed block
+10. the number of write blocks that bypass the cache
+11. the number of write blocks that are allocated in the cache
12. the number of write requests that are blocked on the freelist
13. the number of flush requests
-14. the number of discard requests
+14. the number of discarded blocks

Messages:
flush
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 334544c893dd..c9a272acfb5b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5130,20 +5130,33 @@
Speculative Code Execution with Return Instructions)
vulnerability.

+ AMD-based UNRET and IBPB mitigations alone do not stop
+ sibling threads from influencing the predictions of other
+ sibling threads. For that reason, STIBP is used on pro-
+ cessors that support it, and mitigate SMT on processors
+ that don't.
+
off - no mitigation
auto - automatically select a migitation
auto,nosmt - automatically select a mitigation,
disabling SMT if necessary for
the full mitigation (only on Zen1
and older without STIBP).
- ibpb - mitigate short speculation windows on
- basic block boundaries too. Safe, highest
- perf impact.
- unret - force enable untrained return thunks,
- only effective on AMD f15h-f17h
- based systems.
- unret,nosmt - like unret, will disable SMT when STIBP
- is not available.
+ ibpb - On AMD, mitigate short speculation
+ windows on basic block boundaries too.
+ Safe, highest perf impact. It also
+ enables STIBP if present. Not suitable
+ on Intel.
+ ibpb,nosmt - Like "ibpb" above but will disable SMT
+ when STIBP is not available. This is
+ the alternative for systems which do not
+ have STIBP.
+ unret - Force enable untrained return thunks,
+ only effective on AMD f15h-f17h based
+ systems.
+ unret,nosmt - Like unret, but will disable SMT when STIBP
+ is not available. This is the alternative for
+ systems which do not have STIBP.

Selecting 'auto' will choose a mitigation method at run
time according to the CPU.
diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst
index aec2cd2aaea7..19754beb5a4e 100644
--- a/Documentation/admin-guide/pm/cpuidle.rst
+++ b/Documentation/admin-guide/pm/cpuidle.rst
@@ -612,8 +612,8 @@ the ``menu`` governor to be used on the systems that use the ``ladder`` governor
by default this way, for example.

The other kernel command line parameters controlling CPU idle time management
-described below are only relevant for the *x86* architecture and some of
-them affect Intel processors only.
+described below are only relevant for the *x86* architecture and references
+to ``intel_idle`` affect Intel processors only.

The *x86* architecture support code recognizes three kernel command line
options related to CPU idle time management: ``idle=poll``, ``idle=halt``,
@@ -635,10 +635,13 @@ idle, so it very well may hurt single-thread computations performance as well as
energy-efficiency. Thus using it for performance reasons may not be a good idea
at all.]

-The ``idle=nomwait`` option disables the ``intel_idle`` driver and causes
-``acpi_idle`` to be used (as long as all of the information needed by it is
-there in the system's ACPI tables), but it is not allowed to use the
-``MWAIT`` instruction of the CPUs to ask the hardware to enter idle states.
+The ``idle=nomwait`` option prevents the use of ``MWAIT`` instruction of
+the CPU to enter idle states. When this option is used, the ``acpi_idle``
+driver will use the ``HLT`` instruction instead of ``MWAIT``. On systems
+running Intel processors, this option disables the ``intel_idle`` driver
+and forces the use of the ``acpi_idle`` driver instead. Note that in either
+case, ``acpi_idle`` driver will function only if all the information needed
+by it is in the system's ACPI tables.

In addition to the architecture-level kernel command line options affecting CPU
idle time management, there are parameters affecting individual ``CPUIdle``
diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index d27db84d585e..0b4235b1f8c4 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -82,10 +82,14 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A57 | #1319537 | ARM64_ERRATUM_1319367 |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | Cortex-A57 | #1742098 | ARM64_ERRATUM_1742098 |
++----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A72 | #853709 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A72 | #1319367 | ARM64_ERRATUM_1319367 |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | Cortex-A72 | #1655431 | ARM64_ERRATUM_1742098 |
++----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A76 | #1188873,1418040| ARM64_ERRATUM_1418040 |
diff --git a/Documentation/devicetree/bindings/riscv/sifive-l2-cache.yaml b/Documentation/devicetree/bindings/riscv/sifive-l2-cache.yaml
index e2d330bd4608..69cdab18d629 100644
--- a/Documentation/devicetree/bindings/riscv/sifive-l2-cache.yaml
+++ b/Documentation/devicetree/bindings/riscv/sifive-l2-cache.yaml
@@ -46,7 +46,7 @@ properties:
const: 2

cache-sets:
- const: 1024
+ enum: [1024, 2048]

cache-size:
const: 2097152
@@ -84,6 +84,8 @@ then:
description: |
Must contain entries for DirError, DataError and DataFail signals.
maxItems: 3
+ cache-sets:
+ const: 1024

else:
properties:
@@ -91,6 +93,8 @@ else:
description: |
Must contain entries for DirError, DataError, DataFail, DirFail signals.
minItems: 4
+ cache-sets:
+ const: 2048

additionalProperties: false

diff --git a/Documentation/tty/device_drivers/oxsemi-tornado.rst b/Documentation/tty/device_drivers/oxsemi-tornado.rst
new file mode 100644
index 000000000000..0180d8bb0881
--- /dev/null
+++ b/Documentation/tty/device_drivers/oxsemi-tornado.rst
@@ -0,0 +1,129 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================================================================
+Notes on Oxford Semiconductor PCIe (Tornado) 950 serial port devices
+====================================================================
+
+Oxford Semiconductor PCIe (Tornado) 950 serial port devices are driven
+by a fixed 62.5MHz clock input derived from the 100MHz PCI Express clock.
+
+The baud rate produced by the baud generator is obtained from this input
+frequency by dividing it by the clock prescaler, which can be set to any
+value from 1 to 63.875 in increments of 0.125, and then the usual 16-bit
+divisor is used as with the original 8250, to divide the frequency by a
+value from 1 to 65535. Finally a programmable oversampling rate is used
+that can take any value from 4 to 16 to divide the frequency further and
+determine the actual baud rate used. Baud rates from 15625000bps down
+to 0.933bps can be obtained this way.
+
+By default the oversampling rate is set to 16 and the clock prescaler is
+set to 33.875, meaning that the frequency to be used as the reference
+for the usual 16-bit divisor is 115313.653, which is close enough to the
+frequency of 115200 used by the original 8250 for the same values to be
+used for the divisor to obtain the requested baud rates by software that
+is unaware of the extra clock controls available.
+
+The oversampling rate is programmed with the TCR register and the clock
+prescaler is programmed with the CPR/CPR2 register pair[1][2][3][4].
+To switch away from the default value of 33.875 for the prescaler the
+the enhanced mode has to be explicitly enabled though, by setting bit 4
+of the EFR. In that mode setting bit 7 in the MCR enables the prescaler
+or otherwise it is bypassed as if the value of 1 was used. Additionally
+writing any value to CPR clears CPR2 for compatibility with old software
+written for older conventional PCI Oxford Semiconductor devices that do
+not have the extra prescaler's 9th bit in CPR2, so the CPR/CPR2 register
+pair has to be programmed in the right order.
+
+By using these parameters rates from 15625000bps down to 1bps can be
+obtained, with either exact or highly-accurate actual bit rates for
+standard and many non-standard rates.
+
+Here are the figures for the standard and some non-standard baud rates
+(including those quoted in Oxford Semiconductor documentation), giving
+the requested rate (r), the actual rate yielded (a) and its deviation
+from the requested rate (d), and the values of the oversampling rate
+(tcr), the clock prescaler (cpr) and the divisor (div) produced by the
+new `get_divisor' handler:
+
+r: 15625000, a: 15625000.00, d: 0.0000%, tcr: 4, cpr: 1.000, div: 1
+r: 12500000, a: 12500000.00, d: 0.0000%, tcr: 5, cpr: 1.000, div: 1
+r: 10416666, a: 10416666.67, d: 0.0000%, tcr: 6, cpr: 1.000, div: 1
+r: 8928571, a: 8928571.43, d: 0.0000%, tcr: 7, cpr: 1.000, div: 1
+r: 7812500, a: 7812500.00, d: 0.0000%, tcr: 8, cpr: 1.000, div: 1
+r: 4000000, a: 4000000.00, d: 0.0000%, tcr: 5, cpr: 3.125, div: 1
+r: 3686400, a: 3676470.59, d: -0.2694%, tcr: 8, cpr: 2.125, div: 1
+r: 3500000, a: 3496503.50, d: -0.0999%, tcr: 13, cpr: 1.375, div: 1
+r: 3000000, a: 2976190.48, d: -0.7937%, tcr: 14, cpr: 1.500, div: 1
+r: 2500000, a: 2500000.00, d: 0.0000%, tcr: 10, cpr: 2.500, div: 1
+r: 2000000, a: 2000000.00, d: 0.0000%, tcr: 10, cpr: 3.125, div: 1
+r: 1843200, a: 1838235.29, d: -0.2694%, tcr: 16, cpr: 2.125, div: 1
+r: 1500000, a: 1492537.31, d: -0.4975%, tcr: 5, cpr: 8.375, div: 1
+r: 1152000, a: 1152073.73, d: 0.0064%, tcr: 14, cpr: 3.875, div: 1
+r: 921600, a: 919117.65, d: -0.2694%, tcr: 16, cpr: 2.125, div: 2
+r: 576000, a: 576036.87, d: 0.0064%, tcr: 14, cpr: 3.875, div: 2
+r: 460800, a: 460829.49, d: 0.0064%, tcr: 7, cpr: 3.875, div: 5
+r: 230400, a: 230414.75, d: 0.0064%, tcr: 14, cpr: 3.875, div: 5
+r: 115200, a: 115207.37, d: 0.0064%, tcr: 14, cpr: 1.250, div: 31
+r: 57600, a: 57603.69, d: 0.0064%, tcr: 8, cpr: 3.875, div: 35
+r: 38400, a: 38402.46, d: 0.0064%, tcr: 14, cpr: 3.875, div: 30
+r: 19200, a: 19201.23, d: 0.0064%, tcr: 8, cpr: 3.875, div: 105
+r: 9600, a: 9600.06, d: 0.0006%, tcr: 9, cpr: 1.125, div: 643
+r: 4800, a: 4799.98, d: -0.0004%, tcr: 7, cpr: 2.875, div: 647
+r: 2400, a: 2400.02, d: 0.0008%, tcr: 9, cpr: 2.250, div: 1286
+r: 1200, a: 1200.00, d: 0.0000%, tcr: 14, cpr: 2.875, div: 1294
+r: 300, a: 300.00, d: 0.0000%, tcr: 11, cpr: 2.625, div: 7215
+r: 200, a: 200.00, d: 0.0000%, tcr: 16, cpr: 1.250, div: 15625
+r: 150, a: 150.00, d: 0.0000%, tcr: 13, cpr: 2.250, div: 14245
+r: 134, a: 134.00, d: 0.0000%, tcr: 11, cpr: 2.625, div: 16153
+r: 110, a: 110.00, d: 0.0000%, tcr: 12, cpr: 1.000, div: 47348
+r: 75, a: 75.00, d: 0.0000%, tcr: 4, cpr: 5.875, div: 35461
+r: 50, a: 50.00, d: 0.0000%, tcr: 16, cpr: 1.250, div: 62500
+r: 25, a: 25.00, d: 0.0000%, tcr: 16, cpr: 2.500, div: 62500
+r: 4, a: 4.00, d: 0.0000%, tcr: 16, cpr: 20.000, div: 48828
+r: 2, a: 2.00, d: 0.0000%, tcr: 16, cpr: 40.000, div: 48828
+r: 1, a: 1.00, d: 0.0000%, tcr: 16, cpr: 63.875, div: 61154
+
+With the baud base set to 15625000 and the unsigned 16-bit UART_DIV_MAX
+limitation imposed by `serial8250_get_baud_rate' standard baud rates
+below 300bps become unavailable in the regular way, e.g. the rate of
+200bps requires the baud base to be divided by 78125 and that is beyond
+the unsigned 16-bit range. The historic spd_cust feature can still be
+used by encoding the values for, the prescaler, the oversampling rate
+and the clock divisor (DLM/DLL) as follows to obtain such rates if so
+required:
+
+ 31 29 28 20 19 16 15 0
++-----+-----------------+-------+-------------------------------+
+|0 0 0| CPR2:CPR | TCR | DLM:DLL |
++-----+-----------------+-------+-------------------------------+
+
+Use a value such encoded for the `custom_divisor' field along with the
+ASYNC_SPD_CUST flag set in the `flags' field in `struct serial_struct'
+passed with the TIOCSSERIAL ioctl(2), such as with the setserial(8)
+utility and its `divisor' and `spd_cust' parameters, and the select
+the baud rate of 38400bps. Note that the value of 0 in TCR sets the
+oversampling rate to 16 and prescaler values below 1 in CPR2/CPR are
+clamped by the driver to 1.
+
+For example the value of 0x1f4004e2 will set CPR2/CPR, TCR and DLM/DLL
+respectively to 0x1f4, 0x0 and 0x04e2, choosing the prescaler value,
+the oversampling rate and the clock divisor of 62.500, 16 and 1250
+respectively. These parameters will set the baud rate for the serial
+port to 62500000 / 62.500 / 1250 / 16 = 50bps.
+
+References:
+
+[1] "OXPCIe200 PCI Express Multi-Port Bridge", Oxford Semiconductor,
+ Inc., DS-0045, 10 Nov 2008, Section "950 Mode", pp. 64-65
+
+[2] "OXPCIe952 PCI Express Bridge to Dual Serial & Parallel Port",
+ Oxford Semiconductor, Inc., DS-0046, Mar 06 08, Section "950 Mode",
+ p. 20
+
+[3] "OXPCIe954 PCI Express Bridge to Quad Serial Port", Oxford
+ Semiconductor, Inc., DS-0047, Feb 08, Section "950 Mode", p. 20
+
+[4] "OXPCIe958 PCI Express Bridge to Octal Serial Port", Oxford
+ Semiconductor, Inc., DS-0048, Feb 08, Section "950 Mode", p. 20
+
+Maciej W. Rozycki <macro@xxxxxxxxxxx>
diff --git a/Documentation/userspace-api/media/v4l/ext-ctrls-codec.rst b/Documentation/userspace-api/media/v4l/ext-ctrls-codec.rst
index 4cd7c541fc30..bcdc14144105 100644
--- a/Documentation/userspace-api/media/v4l/ext-ctrls-codec.rst
+++ b/Documentation/userspace-api/media/v4l/ext-ctrls-codec.rst
@@ -2975,7 +2975,7 @@ enum v4l2_mpeg_video_hevc_size_of_length_field -
* - __u8
- ``colour_plane_id``
-
- * - __u16
+ * - __s32
- ``slice_pic_order_cnt``
-
* - __u8
diff --git a/MAINTAINERS b/MAINTAINERS
index 2b70e2d21405..c7c7a96b62a8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7599,9 +7599,6 @@ F: include/linux/fs.h
F: include/linux/fs_types.h
F: include/uapi/linux/fs.h
F: include/uapi/linux/openat2.h
-X: fs/io-wq.c
-X: fs/io-wq.h
-X: fs/io_uring.c

FINTEK F75375S HARDWARE MONITOR AND FAN CONTROLLER DRIVER
M: Riku Voipio <riku.voipio@xxxxxx>
@@ -10277,9 +10274,7 @@ L: io-uring@xxxxxxxxxxxxxxx
S: Maintained
T: git git://git.kernel.dk/linux-block
T: git git://git.kernel.dk/liburing
-F: fs/io-wq.c
-F: fs/io-wq.h
-F: fs/io_uring.c
+F: io_uring/
F: include/linux/io_uring.h
F: include/uapi/linux/io_uring.h
F: tools/io_uring/
diff --git a/Makefile b/Makefile
index ef8c18e5c161..23162e2bdf14 100644
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
VERSION = 5
PATCHLEVEL = 18
-SUBLEVEL = 17
+SUBLEVEL = 18
EXTRAVERSION =
NAME = Superb Owl

@@ -1031,6 +1031,11 @@ KBUILD_CFLAGS += $(KCFLAGS)
KBUILD_LDFLAGS_MODULE += --build-id=sha1
LDFLAGS_vmlinux += --build-id=sha1

+KBUILD_LDFLAGS += -z noexecstack
+ifeq ($(CONFIG_LD_IS_BFD),y)
+KBUILD_LDFLAGS += $(call ld-option,--no-warn-rwx-segments)
+endif
+
ifeq ($(CONFIG_STRIP_ASM_SYMS),y)
LDFLAGS_vmlinux += $(call ld-option, -X,)
endif
@@ -1095,6 +1100,7 @@ export MODULES_NSDEPS := $(extmod_prefix)modules.nsdeps
ifeq ($(KBUILD_EXTMOD),)
core-y += kernel/ certs/ mm/ fs/ ipc/ security/ crypto/
core-$(CONFIG_BLOCK) += block/
+core-$(CONFIG_IO_URING) += io_uring/

vmlinux-dirs := $(patsubst %/,%,$(filter %/, \
$(core-y) $(core-m) $(drivers-y) $(drivers-m) \
diff --git a/arch/Kconfig b/arch/Kconfig
index 31c4fdc4a4ba..ab45e0f6c21b 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -214,6 +214,9 @@ config HAVE_FUNCTION_DESCRIPTORS
config TRACE_IRQFLAGS_SUPPORT
bool

+config TRACE_IRQFLAGS_NMI_SUPPORT
+ bool
+
#
# An arch should select this if it provides all these things:
#
diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 7c16f8a2b738..13d788d1e6dc 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -133,6 +133,7 @@ dtb-$(CONFIG_ARCH_BCM_5301X) += \
bcm47094-luxul-xwr-3150-v1.dtb \
bcm47094-netgear-r8500.dtb \
bcm47094-phicomm-k3.dtb \
+ bcm53015-meraki-mr26.dtb \
bcm53016-meraki-mr32.dtb \
bcm94708.dtb \
bcm94709.dtb \
diff --git a/arch/arm/boot/dts/aspeed-ast2500-evb.dts b/arch/arm/boot/dts/aspeed-ast2500-evb.dts
index 1d24b394ea4c..a497dd135491 100644
--- a/arch/arm/boot/dts/aspeed-ast2500-evb.dts
+++ b/arch/arm/boot/dts/aspeed-ast2500-evb.dts
@@ -5,7 +5,7 @@

/ {
model = "AST2500 EVB";
- compatible = "aspeed,ast2500";
+ compatible = "aspeed,ast2500-evb", "aspeed,ast2500";

aliases {
serial4 = &uart5;
diff --git a/arch/arm/boot/dts/aspeed-ast2600-evb-a1.dts b/arch/arm/boot/dts/aspeed-ast2600-evb-a1.dts
index dd7148060c4a..d0a5c2ff0fec 100644
--- a/arch/arm/boot/dts/aspeed-ast2600-evb-a1.dts
+++ b/arch/arm/boot/dts/aspeed-ast2600-evb-a1.dts
@@ -5,6 +5,7 @@

/ {
model = "AST2600 A1 EVB";
+ compatible = "aspeed,ast2600-evb-a1", "aspeed,ast2600";

/delete-node/regulator-vcc-sdhci0;
/delete-node/regulator-vcc-sdhci1;
diff --git a/arch/arm/boot/dts/aspeed-ast2600-evb.dts b/arch/arm/boot/dts/aspeed-ast2600-evb.dts
index 788448cdd6b3..b8e55bf167aa 100644
--- a/arch/arm/boot/dts/aspeed-ast2600-evb.dts
+++ b/arch/arm/boot/dts/aspeed-ast2600-evb.dts
@@ -8,7 +8,7 @@

/ {
model = "AST2600 EVB";
- compatible = "aspeed,ast2600";
+ compatible = "aspeed,ast2600-evb-a1", "aspeed,ast2600";

aliases {
serial4 = &uart5;
diff --git a/arch/arm/boot/dts/bcm53015-meraki-mr26.dts b/arch/arm/boot/dts/bcm53015-meraki-mr26.dts
new file mode 100644
index 000000000000..14f58033efeb
--- /dev/null
+++ b/arch/arm/boot/dts/bcm53015-meraki-mr26.dts
@@ -0,0 +1,166 @@
+// SPDX-License-Identifier: GPL-2.0-or-later OR MIT
+/*
+ * Broadcom BCM470X / BCM5301X ARM platform code.
+ * DTS for Meraki MR26 / Codename: Venom
+ *
+ * Copyright (C) 2022 Christian Lamparter <chunkeey@xxxxxxxxx>
+ */
+
+/dts-v1/;
+
+#include "bcm4708.dtsi"
+#include "bcm5301x-nand-cs0-bch8.dtsi"
+#include <dt-bindings/leds/common.h>
+
+/ {
+ compatible = "meraki,mr26", "brcm,bcm53015", "brcm,bcm4708";
+ model = "Meraki MR26";
+
+ memory@0 {
+ reg = <0x00000000 0x08000000>;
+ device_type = "memory";
+ };
+
+ leds {
+ compatible = "gpio-leds";
+
+ led-0 {
+ function = LED_FUNCTION_FAULT;
+ color = <LED_COLOR_ID_AMBER>;
+ gpios = <&chipcommon 13 GPIO_ACTIVE_HIGH>;
+ panic-indicator;
+ };
+ led-1 {
+ function = LED_FUNCTION_INDICATOR;
+ color = <LED_COLOR_ID_WHITE>;
+ gpios = <&chipcommon 12 GPIO_ACTIVE_HIGH>;
+ };
+ };
+
+ keys {
+ compatible = "gpio-keys";
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ key-restart {
+ label = "Reset";
+ linux,code = <KEY_RESTART>;
+ gpios = <&chipcommon 11 GPIO_ACTIVE_LOW>;
+ };
+ };
+};
+
+&uart0 {
+ clock-frequency = <50000000>;
+ /delete-property/ clocks;
+};
+
+&uart1 {
+ status = "disabled";
+};
+
+&gmac0 {
+ status = "okay";
+};
+
+&gmac1 {
+ status = "disabled";
+};
+&gmac2 {
+ status = "disabled";
+};
+&gmac3 {
+ status = "disabled";
+};
+
+&nandcs {
+ nand-ecc-algo = "hw";
+
+ partitions {
+ compatible = "fixed-partitions";
+ #address-cells = <0x1>;
+ #size-cells = <0x1>;
+
+ partition@0 {
+ label = "u-boot";
+ reg = <0x0 0x200000>;
+ read-only;
+ };
+
+ partition@200000 {
+ label = "u-boot-env";
+ reg = <0x200000 0x200000>;
+ /* empty */
+ };
+
+ partition@400000 {
+ label = "u-boot-backup";
+ reg = <0x400000 0x200000>;
+ /* empty */
+ };
+
+ partition@600000 {
+ label = "u-boot-env-backup";
+ reg = <0x600000 0x200000>;
+ /* empty */
+ };
+
+ partition@800000 {
+ label = "ubi";
+ reg = <0x800000 0x7780000>;
+ };
+ };
+};
+
+&srab {
+ status = "okay";
+
+ ports {
+ port@0 {
+ reg = <0>;
+ label = "poe";
+ };
+
+ port@5 {
+ reg = <5>;
+ label = "cpu";
+ ethernet = <&gmac0>;
+
+ fixed-link {
+ speed = <1000>;
+ duplex-full;
+ };
+ };
+ };
+};
+
+&i2c0 {
+ status = "okay";
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&pinmux_i2c>;
+
+ clock-frequency = <100000>;
+
+ ina219@40 {
+ compatible = "ti,ina219"; /* PoE power */
+ reg = <0x40>;
+ shunt-resistor = <60000>; /* = 60 mOhms */
+ };
+
+ eeprom@56 {
+ compatible = "atmel,24c64";
+ reg = <0x56>;
+ pagesize = <32>;
+ read-only;
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ /* it's empty */
+ };
+};
+
+&thermal {
+ status = "disabled";
+ /* does not work, reads 418 degree Celsius */
+};
diff --git a/arch/arm/boot/dts/imx6qdl-apalis.dtsi b/arch/arm/boot/dts/imx6qdl-apalis.dtsi
index bd763bae596b..da919d0544a8 100644
--- a/arch/arm/boot/dts/imx6qdl-apalis.dtsi
+++ b/arch/arm/boot/dts/imx6qdl-apalis.dtsi
@@ -315,7 +315,7 @@ stmpe811@41 {
/* ADC conversion time: 80 clocks */
st,sample-time = <4>;

- stmpe_touchscreen: stmpe-touchscreen {
+ stmpe_touchscreen: stmpe_touchscreen {
compatible = "st,stmpe-ts";
/* 8 sample average control */
st,ave-ctrl = <3>;
@@ -332,7 +332,7 @@ stmpe_touchscreen: stmpe-touchscreen {
st,touch-det-delay = <5>;
};

- stmpe_adc: stmpe-adc {
+ stmpe_adc: stmpe_adc {
compatible = "st,stmpe-adc";
/* forbid to use ADC channels 3-0 (touch) */
st,norequest-mask = <0x0F>;
diff --git a/arch/arm/boot/dts/imx6ul.dtsi b/arch/arm/boot/dts/imx6ul.dtsi
index afeec01f6522..eca8bf89ab88 100644
--- a/arch/arm/boot/dts/imx6ul.dtsi
+++ b/arch/arm/boot/dts/imx6ul.dtsi
@@ -64,20 +64,18 @@ cpu0: cpu@0 {
clock-frequency = <696000000>;
clock-latency = <61036>; /* two CLK32 periods */
#cooling-cells = <2>;
- operating-points = <
+ operating-points =
/* kHz uV */
- 696000 1275000
- 528000 1175000
- 396000 1025000
- 198000 950000
- >;
- fsl,soc-operating-points = <
+ <696000 1275000>,
+ <528000 1175000>,
+ <396000 1025000>,
+ <198000 950000>;
+ fsl,soc-operating-points =
/* KHz uV */
- 696000 1275000
- 528000 1175000
- 396000 1175000
- 198000 1175000
- >;
+ <696000 1275000>,
+ <528000 1175000>,
+ <396000 1175000>,
+ <198000 1175000>;
clocks = <&clks IMX6UL_CLK_ARM>,
<&clks IMX6UL_CLK_PLL2_BUS>,
<&clks IMX6UL_CLK_PLL2_PFD2>,
@@ -149,6 +147,9 @@ soc {
ocram: sram@900000 {
compatible = "mmio-sram";
reg = <0x00900000 0x20000>;
+ ranges = <0 0x00900000 0x20000>;
+ #address-cells = <1>;
+ #size-cells = <1>;
};

intc: interrupt-controller@a01000 {
@@ -543,7 +544,7 @@ fec2: ethernet@20b4000 {
};

kpp: keypad@20b8000 {
- compatible = "fsl,imx6ul-kpp", "fsl,imx6q-kpp", "fsl,imx21-kpp";
+ compatible = "fsl,imx6ul-kpp", "fsl,imx21-kpp";
reg = <0x020b8000 0x4000>;
interrupts = <GIC_SPI 82 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&clks IMX6UL_CLK_KPP>;
@@ -998,7 +999,7 @@ cpu_speed_grade: speed-grade@10 {
};

csi: csi@21c4000 {
- compatible = "fsl,imx6ul-csi", "fsl,imx7-csi";
+ compatible = "fsl,imx6ul-csi";
reg = <0x021c4000 0x4000>;
interrupts = <GIC_SPI 7 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&clks IMX6UL_CLK_CSI>;
@@ -1007,7 +1008,7 @@ csi: csi@21c4000 {
};

lcdif: lcdif@21c8000 {
- compatible = "fsl,imx6ul-lcdif", "fsl,imx28-lcdif";
+ compatible = "fsl,imx6ul-lcdif", "fsl,imx6sx-lcdif";
reg = <0x021c8000 0x4000>;
interrupts = <GIC_SPI 5 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&clks IMX6UL_CLK_LCDIF_PIX>,
@@ -1028,7 +1029,7 @@ pxp: pxp@21cc000 {
qspi: spi@21e0000 {
#address-cells = <1>;
#size-cells = <0>;
- compatible = "fsl,imx6ul-qspi", "fsl,imx6sx-qspi";
+ compatible = "fsl,imx6ul-qspi";
reg = <0x021e0000 0x4000>, <0x60000000 0x10000000>;
reg-names = "QuadSPI", "QuadSPI-memory";
interrupts = <GIC_SPI 107 IRQ_TYPE_LEVEL_HIGH>;
diff --git a/arch/arm/boot/dts/imx7d-colibri-emmc.dtsi b/arch/arm/boot/dts/imx7d-colibri-emmc.dtsi
index af39e5370fa1..045e4413d339 100644
--- a/arch/arm/boot/dts/imx7d-colibri-emmc.dtsi
+++ b/arch/arm/boot/dts/imx7d-colibri-emmc.dtsi
@@ -13,6 +13,10 @@ memory@80000000 {
};
};

+&cpu1 {
+ cpu-supply = <&reg_DCDC2>;
+};
+
&gpio6 {
gpio-line-names = "",
"",
diff --git a/arch/arm/boot/dts/qcom-apq8064.dtsi b/arch/arm/boot/dts/qcom-apq8064.dtsi
index a1c8ae516d21..33a4d3441959 100644
--- a/arch/arm/boot/dts/qcom-apq8064.dtsi
+++ b/arch/arm/boot/dts/qcom-apq8064.dtsi
@@ -227,7 +227,7 @@ smem {
smd {
compatible = "qcom,smd";

- modem@0 {
+ modem-edge {
interrupts = <0 37 IRQ_TYPE_EDGE_RISING>;

qcom,ipc = <&l2cc 8 3>;
@@ -236,7 +236,7 @@ modem@0 {
status = "disabled";
};

- q6@1 {
+ q6-edge {
interrupts = <0 90 IRQ_TYPE_EDGE_RISING>;

qcom,ipc = <&l2cc 8 15>;
@@ -245,7 +245,7 @@ q6@1 {
status = "disabled";
};

- dsps@3 {
+ dsps-edge {
interrupts = <0 138 IRQ_TYPE_EDGE_RISING>;

qcom,ipc = <&sps_sic_non_secure 0x4080 0>;
@@ -254,7 +254,7 @@ dsps@3 {
status = "disabled";
};

- riva@6 {
+ riva-edge {
interrupts = <0 198 IRQ_TYPE_EDGE_RISING>;

qcom,ipc = <&l2cc 8 25>;
diff --git a/arch/arm/boot/dts/qcom-apq8074-dragonboard.dts b/arch/arm/boot/dts/qcom-apq8074-dragonboard.dts
index 83793b835d40..f47020cf7a90 100644
--- a/arch/arm/boot/dts/qcom-apq8074-dragonboard.dts
+++ b/arch/arm/boot/dts/qcom-apq8074-dragonboard.dts
@@ -16,331 +16,320 @@ aliases {
chosen {
stdout-path = "serial0:115200n8";
};
+};
+
+&blsp1_uart2 {
+ status = "okay";
+};
+
+&blsp2_i2c5 {
+ status = "okay";
+ clock-frequency = <200000>;

- soc {
- serial@f991e000 {
+ pinctrl-0 = <&i2c11_pins>;
+ pinctrl-names = "default";
+
+ eeprom: eeprom@52 {
+ compatible = "atmel,24c128";
+ reg = <0x52>;
+ pagesize = <32>;
+ read-only;
+ };
+};
+
+&otg {
+ status = "okay";
+
+ phys = <&usb_hs2_phy>;
+ phy-select = <&tcsr 0xb000 1>;
+ extcon = <&smbb>, <&usb_id>;
+ vbus-supply = <&chg_otg>;
+ hnp-disable;
+ srp-disable;
+ adp-disable;
+
+ ulpi {
+ phy@b {
status = "okay";
+ v3p3-supply = <&pm8941_l24>;
+ v1p8-supply = <&pm8941_l6>;
+ extcon = <&smbb>;
+ qcom,init-seq = /bits/ 8 <0x1 0x63>;
};
+ };
+};

- sdhci@f9824900 {
- bus-width = <8>;
- non-removable;
- status = "okay";
+&rpm_requests {
+ pm8841-regulators {
+ compatible = "qcom,rpm-pm8841-regulators";

- vmmc-supply = <&pm8941_l20>;
- vqmmc-supply = <&pm8941_s3>;
+ pm8841_s1: s1 {
+ regulator-min-microvolt = <675000>;
+ regulator-max-microvolt = <1050000>;
+ };

- pinctrl-names = "default";
- pinctrl-0 = <&sdhc1_pin_a>;
+ pm8841_s2: s2 {
+ regulator-min-microvolt = <500000>;
+ regulator-max-microvolt = <1050000>;
};

- sdhci@f98a4900 {
- cd-gpios = <&msmgpio 62 0x1>;
- pinctrl-names = "default";
- pinctrl-0 = <&sdhc2_pin_a>, <&sdhc2_cd_pin_a>;
- bus-width = <4>;
- status = "okay";
+ pm8841_s3: s3 {
+ regulator-min-microvolt = <500000>;
+ regulator-max-microvolt = <1050000>;
+ };

- vmmc-supply = <&pm8941_l21>;
- vqmmc-supply = <&pm8941_l13>;
+ pm8841_s4: s4 {
+ regulator-min-microvolt = <500000>;
+ regulator-max-microvolt = <1050000>;
};
+ };

- usb@f9a55000 {
- status = "okay";
- phys = <&usb_hs2_phy>;
- phy-select = <&tcsr 0xb000 1>;
- extcon = <&smbb>, <&usb_id>;
- vbus-supply = <&chg_otg>;
- hnp-disable;
- srp-disable;
- adp-disable;
- ulpi {
- phy@b {
- status = "okay";
- v3p3-supply = <&pm8941_l24>;
- v1p8-supply = <&pm8941_l6>;
- extcon = <&smbb>;
- qcom,init-seq = /bits/ 8 <0x1 0x63>;
- };
- };
- };
-
-
- pinctrl@fd510000 {
- i2c11_pins: i2c11 {
- mux {
- pins = "gpio83", "gpio84";
- function = "blsp_i2c11";
- };
- };
-
- spi8_default: spi8_default {
- mosi {
- pins = "gpio45";
- function = "blsp_spi8";
- };
- miso {
- pins = "gpio46";
- function = "blsp_spi8";
- };
- cs {
- pins = "gpio47";
- function = "blsp_spi8";
- };
- clk {
- pins = "gpio48";
- function = "blsp_spi8";
- };
- };
-
- sdhc1_pin_a: sdhc1-pin-active {
- clk {
- pins = "sdc1_clk";
- drive-strength = <16>;
- bias-disable;
- };
-
- cmd-data {
- pins = "sdc1_cmd", "sdc1_data";
- drive-strength = <10>;
- bias-pull-up;
- };
- };
-
- sdhc2_cd_pin_a: sdhc2-cd-pin-active {
- pins = "gpio62";
- function = "gpio";
-
- drive-strength = <2>;
- bias-disable;
- };
-
- sdhc2_pin_a: sdhc2-pin-active {
- clk {
- pins = "sdc2_clk";
- drive-strength = <10>;
- bias-disable;
- };
-
- cmd-data {
- pins = "sdc2_cmd", "sdc2_data";
- drive-strength = <6>;
- bias-pull-up;
- };
- };
- };
-
- i2c@f9967000 {
- status = "okay";
- clock-frequency = <200000>;
- pinctrl-0 = <&i2c11_pins>;
- pinctrl-names = "default";
+ pm8941-regulators {
+ compatible = "qcom,rpm-pm8941-regulators";
+
+ vdd_l1_l3-supply = <&pm8941_s1>;
+ vdd_l2_lvs1_2_3-supply = <&pm8941_s3>;
+ vdd_l4_l11-supply = <&pm8941_s1>;
+ vdd_l5_l7-supply = <&pm8941_s2>;
+ vdd_l6_l12_l14_l15-supply = <&pm8941_s2>;
+ vin_5vs-supply = <&pm8941_5v>;
+
+ pm8941_s1: s1 {
+ regulator-min-microvolt = <1300000>;
+ regulator-max-microvolt = <1300000>;
+ regulator-always-on;
+ regulator-boot-on;
+ };
+
+ pm8941_s2: s2 {
+ regulator-min-microvolt = <2150000>;
+ regulator-max-microvolt = <2150000>;
+ regulator-boot-on;
+ };

- eeprom: eeprom@52 {
- compatible = "atmel,24c128";
- reg = <0x52>;
- pagesize = <32>;
- read-only;
- };
+ pm8941_s3: s3 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-always-on;
+ regulator-boot-on;
};
+
+ pm8941_l1: l1 {
+ regulator-min-microvolt = <1225000>;
+ regulator-max-microvolt = <1225000>;
+ regulator-always-on;
+ regulator-boot-on;
+ };
+
+ pm8941_l2: l2 {
+ regulator-min-microvolt = <1200000>;
+ regulator-max-microvolt = <1200000>;
+ };
+
+ pm8941_l3: l3 {
+ regulator-min-microvolt = <1225000>;
+ regulator-max-microvolt = <1225000>;
+ };
+
+ pm8941_l4: l4 {
+ regulator-min-microvolt = <1225000>;
+ regulator-max-microvolt = <1225000>;
+ };
+
+ pm8941_l5: l5 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };
+
+ pm8941_l6: l6 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-boot-on;
+ };
+
+ pm8941_l7: l7 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-boot-on;
+ };
+
+ pm8941_l8: l8 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };
+
+ pm8941_l9: l9 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <2950000>;
+ };
+
+ pm8941_l10: l10 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-always-on;
+ };
+
+ pm8941_l11: l11 {
+ regulator-min-microvolt = <1300000>;
+ regulator-max-microvolt = <1300000>;
+ };
+
+ pm8941_l12: l12 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-always-on;
+ regulator-boot-on;
+ };
+
+ pm8941_l13: l13 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <2950000>;
+ regulator-boot-on;
+ };
+
+ pm8941_l14: l14 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };
+
+ pm8941_l15: l15 {
+ regulator-min-microvolt = <2050000>;
+ regulator-max-microvolt = <2050000>;
+ };
+
+ pm8941_l16: l16 {
+ regulator-min-microvolt = <2700000>;
+ regulator-max-microvolt = <2700000>;
+ };
+
+ pm8941_l17: l17 {
+ regulator-min-microvolt = <2700000>;
+ regulator-max-microvolt = <2700000>;
+ };
+
+ pm8941_l18: l18 {
+ regulator-min-microvolt = <2850000>;
+ regulator-max-microvolt = <2850000>;
+ };
+
+ pm8941_l19: l19 {
+ regulator-min-microvolt = <3300000>;
+ regulator-max-microvolt = <3300000>;
+ regulator-always-on;
+ };
+
+ pm8941_l20: l20 {
+ regulator-min-microvolt = <2950000>;
+ regulator-max-microvolt = <2950000>;
+ regulator-system-load = <200000>;
+ regulator-allow-set-load;
+ regulator-boot-on;
+ };
+
+ pm8941_l21: l21 {
+ regulator-min-microvolt = <2950000>;
+ regulator-max-microvolt = <2950000>;
+ regulator-boot-on;
+ };
+
+ pm8941_l22: l22 {
+ regulator-min-microvolt = <3000000>;
+ regulator-max-microvolt = <3000000>;
+ };
+
+ pm8941_l23: l23 {
+ regulator-min-microvolt = <3000000>;
+ regulator-max-microvolt = <3000000>;
+ };
+
+ pm8941_l24: l24 {
+ regulator-min-microvolt = <3075000>;
+ regulator-max-microvolt = <3075000>;
+ regulator-boot-on;
+ };
+ };
+};
+
+&sdhc_1 {
+ status = "okay";
+
+ vmmc-supply = <&pm8941_l20>;
+ vqmmc-supply = <&pm8941_s3>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&sdhc1_pin_a>;
+};
+
+&sdhc_2 {
+ status = "okay";
+ cd-gpios = <&tlmm 62 0x1>;
+ pinctrl-names = "default";
+ pinctrl-0 = <&sdhc2_pin_a>, <&sdhc2_cd_pin_a>;
+
+ vmmc-supply = <&pm8941_l21>;
+ vqmmc-supply = <&pm8941_l13>;
+};
+
+&tlmm {
+ i2c11_pins: i2c11 {
+ mux {
+ pins = "gpio83", "gpio84";
+ function = "blsp_i2c11";
+ };
+ };
+
+ spi8_default: spi8_default {
+ mosi {
+ pins = "gpio45";
+ function = "blsp_spi8";
+ };
+ miso {
+ pins = "gpio46";
+ function = "blsp_spi8";
+ };
+ cs {
+ pins = "gpio47";
+ function = "blsp_spi8";
+ };
+ clk {
+ pins = "gpio48";
+ function = "blsp_spi8";
+ };
+ };
+
+ sdhc1_pin_a: sdhc1-pin-active {
+ clk {
+ pins = "sdc1_clk";
+ drive-strength = <16>;
+ bias-disable;
+ };
+
+ cmd-data {
+ pins = "sdc1_cmd", "sdc1_data";
+ drive-strength = <10>;
+ bias-pull-up;
+ };
+ };
+
+ sdhc2_cd_pin_a: sdhc2-cd-pin-active {
+ pins = "gpio62";
+ function = "gpio";
+
+ drive-strength = <2>;
+ bias-disable;
};

- smd {
- rpm {
- rpm_requests {
- pm8841-regulators {
- s1 {
- regulator-min-microvolt = <675000>;
- regulator-max-microvolt = <1050000>;
- };
-
- s2 {
- regulator-min-microvolt = <500000>;
- regulator-max-microvolt = <1050000>;
- };
-
- s3 {
- regulator-min-microvolt = <500000>;
- regulator-max-microvolt = <1050000>;
- };
-
- s4 {
- regulator-min-microvolt = <500000>;
- regulator-max-microvolt = <1050000>;
- };
- };
-
- pm8941-regulators {
- vdd_l1_l3-supply = <&pm8941_s1>;
- vdd_l2_lvs1_2_3-supply = <&pm8941_s3>;
- vdd_l4_l11-supply = <&pm8941_s1>;
- vdd_l5_l7-supply = <&pm8941_s2>;
- vdd_l6_l12_l14_l15-supply = <&pm8941_s2>;
- vin_5vs-supply = <&pm8941_5v>;
-
- s1 {
- regulator-min-microvolt = <1300000>;
- regulator-max-microvolt = <1300000>;
- regulator-always-on;
- regulator-boot-on;
- };
-
- s2 {
- regulator-min-microvolt = <2150000>;
- regulator-max-microvolt = <2150000>;
- regulator-boot-on;
- };
-
- s3 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- regulator-always-on;
- regulator-boot-on;
- };
-
- l1 {
- regulator-min-microvolt = <1225000>;
- regulator-max-microvolt = <1225000>;
-
- regulator-always-on;
- regulator-boot-on;
- };
-
- l2 {
- regulator-min-microvolt = <1200000>;
- regulator-max-microvolt = <1200000>;
- };
-
- l3 {
- regulator-min-microvolt = <1225000>;
- regulator-max-microvolt = <1225000>;
- };
-
- l4 {
- regulator-min-microvolt = <1225000>;
- regulator-max-microvolt = <1225000>;
- };
-
- l5 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- l6 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-boot-on;
- };
-
- l7 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-boot-on;
- };
-
- l8 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- l9 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <2950000>;
- };
-
- l10 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- regulator-always-on;
- };
-
- l11 {
- regulator-min-microvolt = <1300000>;
- regulator-max-microvolt = <1300000>;
- };
-
- l12 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-always-on;
- regulator-boot-on;
- };
-
- l13 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-boot-on;
- };
-
- l14 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- l15 {
- regulator-min-microvolt = <2050000>;
- regulator-max-microvolt = <2050000>;
- };
-
- l16 {
- regulator-min-microvolt = <2700000>;
- regulator-max-microvolt = <2700000>;
- };
-
- l17 {
- regulator-min-microvolt = <2700000>;
- regulator-max-microvolt = <2700000>;
- };
-
- l18 {
- regulator-min-microvolt = <2850000>;
- regulator-max-microvolt = <2850000>;
- };
-
- l19 {
- regulator-min-microvolt = <3300000>;
- regulator-max-microvolt = <3300000>;
- regulator-always-on;
- };
-
- l20 {
- regulator-min-microvolt = <2950000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-allow-set-load;
- regulator-boot-on;
- regulator-system-load = <200000>;
- };
-
- l21 {
- regulator-min-microvolt = <2950000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-boot-on;
- };
-
- l22 {
- regulator-min-microvolt = <3000000>;
- regulator-max-microvolt = <3000000>;
- };
-
- l23 {
- regulator-min-microvolt = <3000000>;
- regulator-max-microvolt = <3000000>;
- };
-
- l24 {
- regulator-min-microvolt = <3075000>;
- regulator-max-microvolt = <3075000>;
-
- regulator-boot-on;
- };
- };
- };
+ sdhc2_pin_a: sdhc2-pin-active {
+ clk {
+ pins = "sdc2_clk";
+ drive-strength = <10>;
+ bias-disable;
+ };
+
+ cmd-data {
+ pins = "sdc2_cmd", "sdc2_data";
+ drive-strength = <6>;
+ bias-pull-up;
};
};
};
diff --git a/arch/arm/boot/dts/qcom-apq8084.dtsi b/arch/arm/boot/dts/qcom-apq8084.dtsi
index 52240fc7a1a6..da50a1a0197f 100644
--- a/arch/arm/boot/dts/qcom-apq8084.dtsi
+++ b/arch/arm/boot/dts/qcom-apq8084.dtsi
@@ -470,7 +470,7 @@ rpm {
qcom,ipc = <&apcs 8 0>;
qcom,smd-edge = <15>;

- rpm_requests {
+ rpm-requests {
compatible = "qcom,rpm-apq8084";
qcom,smd-channels = "rpm_requests";

diff --git a/arch/arm/boot/dts/qcom-mdm9615.dtsi b/arch/arm/boot/dts/qcom-mdm9615.dtsi
index 4d4f37cebf21..96a66862875c 100644
--- a/arch/arm/boot/dts/qcom-mdm9615.dtsi
+++ b/arch/arm/boot/dts/qcom-mdm9615.dtsi
@@ -321,6 +321,7 @@ rtc@11d {

pmicgpio: gpio@150 {
compatible = "qcom,pm8018-gpio", "qcom,ssbi-gpio";
+ reg = <0x150>;
interrupt-controller;
#interrupt-cells = <2>;
gpio-controller;
diff --git a/arch/arm/boot/dts/qcom-msm8974-fairphone-fp2.dts b/arch/arm/boot/dts/qcom-msm8974-fairphone-fp2.dts
index 6d77e0f8ca4d..085591183592 100644
--- a/arch/arm/boot/dts/qcom-msm8974-fairphone-fp2.dts
+++ b/arch/arm/boot/dts/qcom-msm8974-fairphone-fp2.dts
@@ -5,7 +5,6 @@
#include <dt-bindings/input/input.h>
#include <dt-bindings/pinctrl/qcom,pmic-gpio.h>

-
/ {
model = "Fairphone 2";
compatible = "fairphone,fp2", "qcom,msm8974";
@@ -51,359 +50,358 @@ volume-up {

vibrator {
compatible = "gpio-vibrator";
- enable-gpios = <&msmgpio 86 GPIO_ACTIVE_HIGH>;
+ enable-gpios = <&tlmm 86 GPIO_ACTIVE_HIGH>;
vcc-supply = <&pm8941_l18>;
};
+};
+
+&blsp1_uart2 {
+ status = "okay";
+};
+
+&imem {
+ status = "okay";
+
+ reboot-mode {
+ mode-normal = <0x77665501>;
+ mode-bootloader = <0x77665500>;
+ mode-recovery = <0x77665502>;
+ };
+};
+
+&otg {
+ status = "okay";
+
+ phys = <&usb_hs1_phy>;
+ phy-select = <&tcsr 0xb000 0>;
+ extcon = <&smbb>, <&usb_id>;
+ vbus-supply = <&chg_otg>;
+
+ hnp-disable;
+ srp-disable;
+ adp-disable;

- smd {
- rpm {
- rpm_requests {
- pm8841-regulators {
- s1 {
- regulator-min-microvolt = <675000>;
- regulator-max-microvolt = <1050000>;
- };
-
- s2 {
- regulator-min-microvolt = <500000>;
- regulator-max-microvolt = <1050000>;
- };
-
- s3 {
- regulator-min-microvolt = <1050000>;
- regulator-max-microvolt = <1050000>;
- };
- };
-
- pm8941-regulators {
- vdd_l1_l3-supply = <&pm8941_s1>;
- vdd_l2_lvs1_2_3-supply = <&pm8941_s3>;
- vdd_l4_l11-supply = <&pm8941_s1>;
- vdd_l5_l7-supply = <&pm8941_s2>;
- vdd_l6_l12_l14_l15-supply = <&pm8941_s2>;
- vdd_l9_l10_l17_l22-supply = <&vreg_boost>;
- vdd_l13_l20_l23_l24-supply = <&vreg_boost>;
- vdd_l21-supply = <&vreg_boost>;
-
- s1 {
- regulator-min-microvolt = <1300000>;
- regulator-max-microvolt = <1300000>;
-
- regulator-always-on;
- regulator-boot-on;
- };
-
- s2 {
- regulator-min-microvolt = <2150000>;
- regulator-max-microvolt = <2150000>;
-
- regulator-boot-on;
- };
-
- s3 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-always-on;
- regulator-boot-on;
- };
-
- l1 {
- regulator-min-microvolt = <1225000>;
- regulator-max-microvolt = <1225000>;
-
- regulator-always-on;
- regulator-boot-on;
- };
-
- l2 {
- regulator-min-microvolt = <1200000>;
- regulator-max-microvolt = <1200000>;
- };
-
- l3 {
- regulator-min-microvolt = <1225000>;
- regulator-max-microvolt = <1225000>;
- };
-
- l4 {
- regulator-min-microvolt = <1225000>;
- regulator-max-microvolt = <1225000>;
- };
-
- l5 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- l6 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-boot-on;
- };
-
- l7 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-boot-on;
- };
-
- l8 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- l9 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <2950000>;
- };
-
- l10 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <2950000>;
- };
-
- l11 {
- regulator-min-microvolt = <1225000>;
- regulator-max-microvolt = <1350000>;
- };
-
- l12 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-always-on;
- regulator-boot-on;
- };
-
- l13 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-boot-on;
- };
-
- l14 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- l15 {
- regulator-min-microvolt = <2050000>;
- regulator-max-microvolt = <2050000>;
- };
-
- l16 {
- regulator-min-microvolt = <2700000>;
- regulator-max-microvolt = <2700000>;
- };
-
- l17 {
- regulator-min-microvolt = <2850000>;
- regulator-max-microvolt = <2850000>;
- };
-
- l18 {
- regulator-min-microvolt = <2850000>;
- regulator-max-microvolt = <2850000>;
- };
-
- l19 {
- regulator-min-microvolt = <2900000>;
- regulator-max-microvolt = <3350000>;
- };
-
- l20 {
- regulator-min-microvolt = <2950000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-boot-on;
- regulator-system-load = <200000>;
- regulator-allow-set-load;
- };
-
- l21 {
- regulator-min-microvolt = <2950000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-boot-on;
- };
-
- l22 {
- regulator-min-microvolt = <3000000>;
- regulator-max-microvolt = <3300000>;
- };
-
- l23 {
- regulator-min-microvolt = <3000000>;
- regulator-max-microvolt = <3000000>;
- };
-
- l24 {
- regulator-min-microvolt = <3075000>;
- regulator-max-microvolt = <3075000>;
-
- regulator-boot-on;
- };
- };
- };
+ ulpi {
+ phy@a {
+ status = "okay";
+
+ v1p8-supply = <&pm8941_l6>;
+ v3p3-supply = <&pm8941_l24>;
+
+ extcon = <&smbb>;
+ qcom,init-seq = /bits/ 8 <0x1 0x64>;
};
};
};

-&soc {
- serial@f991e000 {
- status = "okay";
+&pm8941_gpios {
+ gpio_keys_pin_a: gpio-keys-active {
+ pins = "gpio1", "gpio2", "gpio5";
+ function = "normal";
+
+ bias-pull-up;
+ power-source = <PM8941_GPIO_S3>;
};
+};

- remoteproc@fb21b000 {
- status = "okay";
+&pronto {
+ status = "okay";

- vddmx-supply = <&pm8841_s1>;
- vddcx-supply = <&pm8841_s2>;
+ vddmx-supply = <&pm8841_s1>;
+ vddcx-supply = <&pm8841_s2>;
+ vddpx-supply = <&pm8941_s3>;

- pinctrl-names = "default";
- pinctrl-0 = <&wcnss_pin_a>;
+ pinctrl-names = "default";
+ pinctrl-0 = <&wcnss_pin_a>;

- smd-edge {
- qcom,remote-pid = <4>;
- label = "pronto";
+ iris {
+ vddxo-supply = <&pm8941_l6>;
+ vddrfa-supply = <&pm8941_l11>;
+ vddpa-supply = <&pm8941_l19>;
+ vdddig-supply = <&pm8941_s3>;
+ };
+
+ smd-edge {
+ qcom,remote-pid = <4>;
+ label = "pronto";

- wcnss {
- status = "okay";
- };
+ wcnss {
+ status = "okay";
};
};
+};

- pinctrl@fd510000 {
- sdhc1_pin_a: sdhc1-pin-active {
- clk {
- pins = "sdc1_clk";
- drive-strength = <16>;
- bias-disable;
- };
+&remoteproc_adsp {
+ status = "okay";
+ cx-supply = <&pm8841_s2>;
+};

- cmd-data {
- pins = "sdc1_cmd", "sdc1_data";
- drive-strength = <10>;
- bias-pull-up;
- };
+&remoteproc_mss {
+ status = "okay";
+ cx-supply = <&pm8841_s2>;
+ mss-supply = <&pm8841_s3>;
+ mx-supply = <&pm8841_s1>;
+ pll-supply = <&pm8941_l12>;
+};
+
+&rpm_requests {
+ pm8841-regulators {
+ compatible = "qcom,rpm-pm8841-regulators";
+
+ pm8841_s1: s1 {
+ regulator-min-microvolt = <675000>;
+ regulator-max-microvolt = <1050000>;
};

- sdhc2_pin_a: sdhc2-pin-active {
- clk {
- pins = "sdc2_clk";
- drive-strength = <10>;
- bias-disable;
- };
+ pm8841_s2: s2 {
+ regulator-min-microvolt = <500000>;
+ regulator-max-microvolt = <1050000>;
+ };

- cmd-data {
- pins = "sdc2_cmd", "sdc2_data";
- drive-strength = <6>;
- bias-pull-up;
- };
+ pm8841_s3: s3 {
+ regulator-min-microvolt = <1050000>;
+ regulator-max-microvolt = <1050000>;
};
+ };

- wcnss_pin_a: wcnss-pin-active {
- wlan {
- pins = "gpio36", "gpio37", "gpio38", "gpio39", "gpio40";
- function = "wlan";
+ pm8941-regulators {
+ compatible = "qcom,rpm-pm8941-regulators";
+
+ vdd_l1_l3-supply = <&pm8941_s1>;
+ vdd_l2_lvs1_2_3-supply = <&pm8941_s3>;
+ vdd_l4_l11-supply = <&pm8941_s1>;
+ vdd_l5_l7-supply = <&pm8941_s2>;
+ vdd_l6_l12_l14_l15-supply = <&pm8941_s2>;
+ vdd_l9_l10_l17_l22-supply = <&vreg_boost>;
+ vdd_l13_l20_l23_l24-supply = <&vreg_boost>;
+ vdd_l21-supply = <&vreg_boost>;
+
+ pm8941_s1: s1 {
+ regulator-min-microvolt = <1300000>;
+ regulator-max-microvolt = <1300000>;
+ regulator-always-on;
+ regulator-boot-on;
+ };

- drive-strength = <6>;
- bias-pull-down;
- };
+ pm8941_s2: s2 {
+ regulator-min-microvolt = <2150000>;
+ regulator-max-microvolt = <2150000>;
+ regulator-boot-on;
+ };

- bt {
- pins = "gpio35", "gpio43", "gpio44";
- function = "bt";
+ pm8941_s3: s3 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-always-on;
+ regulator-boot-on;
+ };

- drive-strength = <2>;
- bias-pull-down;
- };
+ pm8941_l1: l1 {
+ regulator-min-microvolt = <1225000>;
+ regulator-max-microvolt = <1225000>;
+ regulator-always-on;
+ regulator-boot-on;
+ };

- fm {
- pins = "gpio41", "gpio42";
- function = "fm";
+ pm8941_l2: l2 {
+ regulator-min-microvolt = <1200000>;
+ regulator-max-microvolt = <1200000>;
+ };

- drive-strength = <2>;
- bias-pull-down;
- };
+ pm8941_l3: l3 {
+ regulator-min-microvolt = <1225000>;
+ regulator-max-microvolt = <1225000>;
};
- };

- sdhci@f9824900 {
- status = "okay";
+ pm8941_l4: l4 {
+ regulator-min-microvolt = <1225000>;
+ regulator-max-microvolt = <1225000>;
+ };

- vmmc-supply = <&pm8941_l20>;
- vqmmc-supply = <&pm8941_s3>;
+ pm8941_l5: l5 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };

- bus-width = <8>;
- non-removable;
+ pm8941_l6: l6 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-boot-on;
+ };

- pinctrl-names = "default";
- pinctrl-0 = <&sdhc1_pin_a>;
- };
+ pm8941_l7: l7 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-boot-on;
+ };

- sdhci@f98a4900 {
- status = "okay";
+ pm8941_l8: l8 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };

- vmmc-supply = <&pm8941_l21>;
- vqmmc-supply = <&pm8941_l13>;
+ pm8941_l9: l9 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <2950000>;
+ };

- bus-width = <4>;
+ pm8941_l10: l10 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <2950000>;
+ };

- pinctrl-names = "default";
- pinctrl-0 = <&sdhc2_pin_a>;
+ pm8941_l11: l11 {
+ regulator-min-microvolt = <1225000>;
+ regulator-max-microvolt = <1350000>;
+ };
+
+ pm8941_l12: l12 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-always-on;
+ regulator-boot-on;
+ };
+
+ pm8941_l13: l13 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <2950000>;
+ regulator-boot-on;
+ };
+
+ pm8941_l14: l14 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };
+
+ pm8941_l15: l15 {
+ regulator-min-microvolt = <2050000>;
+ regulator-max-microvolt = <2050000>;
+ };
+
+ pm8941_l16: l16 {
+ regulator-min-microvolt = <2700000>;
+ regulator-max-microvolt = <2700000>;
+ };
+
+ pm8941_l17: l17 {
+ regulator-min-microvolt = <2850000>;
+ regulator-max-microvolt = <2850000>;
+ };
+
+ pm8941_l18: l18 {
+ regulator-min-microvolt = <2850000>;
+ regulator-max-microvolt = <2850000>;
+ };
+
+ pm8941_l19: l19 {
+ regulator-min-microvolt = <2900000>;
+ regulator-max-microvolt = <3350000>;
+ };
+
+ pm8941_l20: l20 {
+ regulator-min-microvolt = <2950000>;
+ regulator-max-microvolt = <2950000>;
+ regulator-system-load = <200000>;
+ regulator-allow-set-load;
+ regulator-boot-on;
+ };
+
+ pm8941_l21: l21 {
+ regulator-min-microvolt = <2950000>;
+ regulator-max-microvolt = <2950000>;
+ regulator-boot-on;
+ };
+
+ pm8941_l22: l22 {
+ regulator-min-microvolt = <3000000>;
+ regulator-max-microvolt = <3300000>;
+ };
+
+ pm8941_l23: l23 {
+ regulator-min-microvolt = <3000000>;
+ regulator-max-microvolt = <3000000>;
+ };
+
+ pm8941_l24: l24 {
+ regulator-min-microvolt = <3075000>;
+ regulator-max-microvolt = <3075000>;
+ regulator-boot-on;
+ };
};
+};
+
+&sdhc_1 {
+ status = "okay";

- usb@f9a55000 {
- status = "okay";
+ vmmc-supply = <&pm8941_l20>;
+ vqmmc-supply = <&pm8941_s3>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&sdhc1_pin_a>;
+};

- phys = <&usb_hs1_phy>;
- phy-select = <&tcsr 0xb000 0>;
- extcon = <&smbb>, <&usb_id>;
- vbus-supply = <&chg_otg>;
+&sdhc_2 {
+ status = "okay";

- hnp-disable;
- srp-disable;
- adp-disable;
+ vmmc-supply = <&pm8941_l21>;
+ vqmmc-supply = <&pm8941_l13>;

- ulpi {
- phy@a {
- status = "okay";
+ pinctrl-names = "default";
+ pinctrl-0 = <&sdhc2_pin_a>;
+};

- v1p8-supply = <&pm8941_l6>;
- v3p3-supply = <&pm8941_l24>;
+&tlmm {
+ sdhc1_pin_a: sdhc1-pin-active {
+ clk {
+ pins = "sdc1_clk";
+ drive-strength = <16>;
+ bias-disable;
+ };

- extcon = <&smbb>;
- qcom,init-seq = /bits/ 8 <0x1 0x64>;
- };
+ cmd-data {
+ pins = "sdc1_cmd", "sdc1_data";
+ drive-strength = <10>;
+ bias-pull-up;
};
};

- imem@fe805000 {
- status = "okay";
+ sdhc2_pin_a: sdhc2-pin-active {
+ clk {
+ pins = "sdc2_clk";
+ drive-strength = <10>;
+ bias-disable;
+ };

- reboot-mode {
- mode-normal = <0x77665501>;
- mode-bootloader = <0x77665500>;
- mode-recovery = <0x77665502>;
+ cmd-data {
+ pins = "sdc2_cmd", "sdc2_data";
+ drive-strength = <6>;
+ bias-pull-up;
};
};
-};

-&spmi_bus {
- pm8941@0 {
- gpios@c000 {
- gpio_keys_pin_a: gpio-keys-active {
- pins = "gpio1", "gpio2", "gpio5";
- function = "normal";
+ wcnss_pin_a: wcnss-pin-active {
+ wlan {
+ pins = "gpio36", "gpio37", "gpio38", "gpio39", "gpio40";
+ function = "wlan";
+
+ drive-strength = <6>;
+ bias-pull-down;
+ };
+
+ bt {
+ pins = "gpio35", "gpio43", "gpio44";
+ function = "bt";
+
+ drive-strength = <2>;
+ bias-pull-down;
+ };
+
+ fm {
+ pins = "gpio41", "gpio42";
+ function = "fm";

- bias-pull-up;
- power-source = <PM8941_GPIO_S3>;
- };
+ drive-strength = <2>;
+ bias-pull-down;
};
};
};
diff --git a/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts b/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts
index 069136170198..6537950c30ba 100644
--- a/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts
+++ b/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts
@@ -12,213 +12,31 @@ / {

aliases {
serial0 = &blsp1_uart1;
- serial1 = &blsp2_uart10;
+ serial1 = &blsp2_uart4;
};

chosen {
stdout-path = "serial0:115200n8";
};

- smd {
- rpm {
- rpm_requests {
- pm8841-regulators {
- s1 {
- regulator-min-microvolt = <675000>;
- regulator-max-microvolt = <1050000>;
- };
-
- s2 {
- regulator-min-microvolt = <500000>;
- regulator-max-microvolt = <1050000>;
- };
-
- s3 {
- regulator-min-microvolt = <1050000>;
- regulator-max-microvolt = <1050000>;
- };
-
- s4 {
- regulator-min-microvolt = <815000>;
- regulator-max-microvolt = <900000>;
- };
- };
-
- pm8941-regulators {
- vdd_l1_l3-supply = <&pm8941_s1>;
- vdd_l2_lvs1_2_3-supply = <&pm8941_s3>;
- vdd_l4_l11-supply = <&pm8941_s1>;
- vdd_l5_l7-supply = <&pm8941_s2>;
- vdd_l6_l12_l14_l15-supply = <&pm8941_s2>;
- vdd_l8_l16_l18_l19-supply = <&vreg_vph_pwr>;
- vdd_l9_l10_l17_l22-supply = <&vreg_boost>;
- vdd_l13_l20_l23_l24-supply = <&vreg_boost>;
- vdd_l21-supply = <&vreg_boost>;
-
- s1 {
- regulator-min-microvolt = <1300000>;
- regulator-max-microvolt = <1300000>;
-
- regulator-always-on;
- regulator-boot-on;
- };
-
- s2 {
- regulator-min-microvolt = <2150000>;
- regulator-max-microvolt = <2150000>;
-
- regulator-boot-on;
- };
-
- s3 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-always-on;
- regulator-boot-on;
- };
-
- l1 {
- regulator-min-microvolt = <1225000>;
- regulator-max-microvolt = <1225000>;
-
- regulator-always-on;
- regulator-boot-on;
- };
-
- l2 {
- regulator-min-microvolt = <1200000>;
- regulator-max-microvolt = <1200000>;
- };
-
- l3 {
- regulator-min-microvolt = <1225000>;
- regulator-max-microvolt = <1225000>;
- };
-
- l4 {
- regulator-min-microvolt = <1225000>;
- regulator-max-microvolt = <1225000>;
- };
-
- l5 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- l6 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-boot-on;
- };
-
- l7 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-boot-on;
- };
-
- l8 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- l9 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <2950000>;
- };
-
- l10 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <2950000>;
- };
-
- l11 {
- regulator-min-microvolt = <1300000>;
- regulator-max-microvolt = <1300000>;
- };
-
- l12 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-always-on;
- regulator-boot-on;
- };
-
- l13 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-boot-on;
- };
-
- l14 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- l15 {
- regulator-min-microvolt = <2050000>;
- regulator-max-microvolt = <2050000>;
- };
-
- l16 {
- regulator-min-microvolt = <2700000>;
- regulator-max-microvolt = <2700000>;
- };
-
- l17 {
- regulator-min-microvolt = <2850000>;
- regulator-max-microvolt = <2850000>;
- };
-
- l18 {
- regulator-min-microvolt = <2850000>;
- regulator-max-microvolt = <2850000>;
- };
-
- l19 {
- regulator-min-microvolt = <3000000>;
- regulator-max-microvolt = <3300000>;
- };
-
- l20 {
- regulator-min-microvolt = <2950000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-boot-on;
- regulator-system-load = <200000>;
- regulator-allow-set-load;
- };
-
- l21 {
- regulator-min-microvolt = <2950000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-boot-on;
- };
-
- l22 {
- regulator-min-microvolt = <3000000>;
- regulator-max-microvolt = <3300000>;
- };
-
- l23 {
- regulator-min-microvolt = <3000000>;
- regulator-max-microvolt = <3000000>;
- };
-
- l24 {
- regulator-min-microvolt = <3075000>;
- regulator-max-microvolt = <3075000>;
-
- regulator-boot-on;
- };
- };
- };
+ gpio-keys {
+ compatible = "gpio-keys";
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&gpio_keys_pin_a>;
+
+ volume-up {
+ label = "volume_up";
+ gpios = <&pm8941_gpios 2 GPIO_ACTIVE_LOW>;
+ linux,input-type = <1>;
+ linux,code = <KEY_VOLUMEUP>;
+ };
+
+ volume-down {
+ label = "volume_down";
+ gpios = <&pm8941_gpios 3 GPIO_ACTIVE_LOW>;
+ linux,input-type = <1>;
+ linux,code = <KEY_VOLUMEDOWN>;
};
};

@@ -229,7 +47,7 @@ vreg_wlan: wlan-regulator {
regulator-min-microvolt = <3300000>;
regulator-max-microvolt = <3300000>;

- gpio = <&msmgpio 26 GPIO_ACTIVE_HIGH>;
+ gpio = <&tlmm 26 GPIO_ACTIVE_HIGH>;
enable-active-high;

pinctrl-names = "default";
@@ -237,525 +55,676 @@ vreg_wlan: wlan-regulator {
};
};

-&soc {
- serial@f991d000 {
- status = "okay";
+&blsp1_i2c1 {
+ status = "okay";
+ clock-frequency = <100000>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&i2c1_pins>;
+
+ charger: bq24192@6b {
+ compatible = "ti,bq24192";
+ reg = <0x6b>;
+ interrupts-extended = <&spmi_bus 0 0xd5 0 IRQ_TYPE_EDGE_FALLING>;
+
+ omit-battery-class;
+
+ usb_otg_vbus: usb-otg-vbus { };
};

- pinctrl@fd510000 {
- sdhc1_pin_a: sdhc1-pin-active {
- clk {
- pins = "sdc1_clk";
- drive-strength = <16>;
- bias-disable;
- };
+ fuelgauge: max17048@36 {
+ compatible = "maxim,max17048";
+ reg = <0x36>;

- cmd-data {
- pins = "sdc1_cmd", "sdc1_data";
- drive-strength = <10>;
- bias-pull-up;
- };
- };
+ maxim,double-soc;
+ maxim,rcomp = /bits/ 8 <0x4d>;

- sdhc2_pin_a: sdhc2-pin-active {
- clk {
- pins = "sdc2_clk";
- drive-strength = <6>;
- bias-disable;
- };
+ interrupt-parent = <&tlmm>;
+ interrupts = <9 IRQ_TYPE_LEVEL_LOW>;

- cmd-data {
- pins = "sdc2_cmd", "sdc2_data";
- drive-strength = <6>;
- bias-pull-up;
- };
- };
+ pinctrl-names = "default";
+ pinctrl-0 = <&fuelgauge_pin>;

- i2c1_pins: i2c1 {
- mux {
- pins = "gpio2", "gpio3";
- function = "blsp_i2c1";
+ maxim,alert-low-soc-level = <2>;
+ };
+};

- drive-strength = <2>;
- bias-disable;
- };
- };
+&blsp1_i2c2 {
+ status = "okay";
+ clock-frequency = <355000>;

- i2c2_pins: i2c2 {
- mux {
- pins = "gpio6", "gpio7";
- function = "blsp_i2c2";
+ pinctrl-names = "default";
+ pinctrl-0 = <&i2c2_pins>;

- drive-strength = <2>;
- bias-disable;
- };
+ synaptics@70 {
+ compatible = "syna,rmi4-i2c";
+ reg = <0x70>;
+
+ interrupts-extended = <&tlmm 5 IRQ_TYPE_EDGE_FALLING>;
+ vdd-supply = <&pm8941_l22>;
+ vio-supply = <&pm8941_lvs3>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&touch_pin>;
+
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ rmi4-f01@1 {
+ reg = <0x1>;
+ syna,nosleep-mode = <1>;
};

- i2c3_pins: i2c3 {
- mux {
- pins = "gpio10", "gpio11";
- function = "blsp_i2c3";
- drive-strength = <2>;
- bias-disable;
- };
+ rmi4-f12@12 {
+ reg = <0x12>;
+ syna,sensor-type = <1>;
};
+ };
+};

- i2c11_pins: i2c11 {
- mux {
- pins = "gpio83", "gpio84";
- function = "blsp_i2c11";
+&blsp1_i2c3 {
+ status = "okay";
+ clock-frequency = <100000>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&i2c3_pins>;
+
+ avago_apds993@39 {
+ compatible = "avago,apds9930";
+ reg = <0x39>;
+ interrupts-extended = <&tlmm 61 IRQ_TYPE_EDGE_FALLING>;
+ vdd-supply = <&pm8941_l17>;
+ vddio-supply = <&pm8941_lvs1>;
+ led-max-microamp = <100000>;
+ amstaos,proximity-diodes = <0>;
+ };
+};

- drive-strength = <2>;
- bias-disable;
- };
+&blsp2_i2c5 {
+ status = "okay";
+ clock-frequency = <355000>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&i2c11_pins>;
+
+ led-controller@38 {
+ compatible = "ti,lm3630a";
+ status = "okay";
+ reg = <0x38>;
+
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ led@0 {
+ reg = <0>;
+ led-sources = <0 1>;
+ label = "lcd-backlight";
+ default-brightness = <200>;
};
+ };
+};
+
+&blsp2_i2c6 {
+ status = "okay";
+ clock-frequency = <100000>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&i2c12_pins>;
+
+ mpu6515@68 {
+ compatible = "invensense,mpu6515";
+ reg = <0x68>;
+ interrupts-extended = <&tlmm 73 IRQ_TYPE_EDGE_FALLING>;
+ vddio-supply = <&pm8941_lvs1>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&mpu6515_pin>;

- i2c12_pins: i2c12 {
- mux {
- pins = "gpio87", "gpio88";
- function = "blsp_i2c12";
- drive-strength = <2>;
- bias-disable;
+ mount-matrix = "0", "-1", "0",
+ "-1", "0", "0",
+ "0", "0", "1";
+
+ i2c-gate {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ ak8963@f {
+ compatible = "asahi-kasei,ak8963";
+ reg = <0x0f>;
+ gpios = <&tlmm 67 0>;
+ vid-supply = <&pm8941_lvs1>;
+ vdd-supply = <&pm8941_l17>;
};
- };

- mpu6515_pin: mpu6515 {
- irq {
- pins = "gpio73";
- function = "gpio";
- bias-disable;
- input-enable;
+ bmp280@76 {
+ compatible = "bosch,bmp280";
+ reg = <0x76>;
+ vdda-supply = <&pm8941_lvs1>;
+ vddd-supply = <&pm8941_l17>;
};
};
+ };
+};
+
+&blsp1_uart1 {
+ status = "okay";
+};

- touch_pin: touch {
- int {
- pins = "gpio5";
- function = "gpio";
+&blsp2_uart4 {
+ status = "okay";

- drive-strength = <2>;
- bias-disable;
- input-enable;
- };
+ pinctrl-names = "default";
+ pinctrl-0 = <&blsp2_uart4_pin_a>;

- reset {
- pins = "gpio8";
- function = "gpio";
+ bluetooth {
+ compatible = "brcm,bcm43438-bt";
+ max-speed = <3000000>;

- drive-strength = <2>;
- bias-pull-up;
- };
- };
+ pinctrl-names = "default";
+ pinctrl-0 = <&bt_pin>;
+
+ host-wakeup-gpios = <&tlmm 42 GPIO_ACTIVE_HIGH>;
+ device-wakeup-gpios = <&tlmm 62 GPIO_ACTIVE_HIGH>;
+ shutdown-gpios = <&tlmm 41 GPIO_ACTIVE_HIGH>;
+ };
+};

- panel_pin: panel {
- te {
- pins = "gpio12";
- function = "mdp_vsync";
+&dsi0 {
+ status = "okay";

- drive-strength = <2>;
- bias-disable;
- };
- };
+ vdda-supply = <&pm8941_l2>;
+ vdd-supply = <&pm8941_lvs3>;
+ vddio-supply = <&pm8941_l12>;

- bt_pin: bt {
- hostwake {
- pins = "gpio42";
- function = "gpio";
- };
+ panel: panel@0 {
+ reg = <0>;
+ compatible = "lg,acx467akm-7";

- devwake {
- pins = "gpio62";
- function = "gpio";
- };
+ pinctrl-names = "default";
+ pinctrl-0 = <&panel_pin>;

- shutdown {
- pins = "gpio41";
- function = "gpio";
+ port {
+ panel_in: endpoint {
+ remote-endpoint = <&dsi0_out>;
};
};
+ };
+};

- blsp2_uart10_pin_a: blsp2-uart10-pin-active {
- tx {
- pins = "gpio53";
- function = "blsp_uart10";
+&dsi0_out {
+ remote-endpoint = <&panel_in>;
+ data-lanes = <0 1 2 3>;
+};

- drive-strength = <2>;
- bias-disable;
- };
+&dsi0_phy {
+ status = "okay";

- rx {
- pins = "gpio54";
- function = "blsp_uart10";
+ vddio-supply = <&pm8941_l12>;
+};

- drive-strength = <2>;
- bias-pull-up;
- };
+&mdss {
+ status = "okay";
+};

- cts {
- pins = "gpio55";
- function = "blsp_uart10";
+&otg {
+ status = "okay";

- drive-strength = <2>;
- bias-pull-up;
- };
+ phys = <&usb_hs1_phy>;
+ phy-select = <&tcsr 0xb000 0>;

- rts {
- pins = "gpio56";
- function = "blsp_uart10";
+ extcon = <&charger>, <&usb_id>;
+ vbus-supply = <&usb_otg_vbus>;

- drive-strength = <2>;
- bias-disable;
- };
+ hnp-disable;
+ srp-disable;
+ adp-disable;
+
+ ulpi {
+ phy@a {
+ status = "okay";
+
+ v1p8-supply = <&pm8941_l6>;
+ v3p3-supply = <&pm8941_l24>;
+
+ qcom,init-seq = /bits/ 8 <0x1 0x64>;
};
};
+};

- sdhci@f9824900 {
- status = "okay";
+&pm8941_gpios {
+ gpio_keys_pin_a: gpio-keys-active {
+ pins = "gpio2", "gpio3";
+ function = "normal";

- vmmc-supply = <&pm8941_l20>;
- vqmmc-supply = <&pm8941_s3>;
+ bias-pull-up;
+ power-source = <PM8941_GPIO_S3>;
+ };

- bus-width = <8>;
- non-removable;
+ fuelgauge_pin: fuelgauge-int {
+ pins = "gpio9";
+ function = "normal";

- pinctrl-names = "default";
- pinctrl-0 = <&sdhc1_pin_a>;
+ bias-disable;
+ input-enable;
+ power-source = <PM8941_GPIO_S3>;
};

- sdhci@f98a4900 {
- status = "okay";
+ wlan_sleep_clk_pin: wl-sleep-clk {
+ pins = "gpio16";
+ function = "func2";

- max-frequency = <100000000>;
- bus-width = <4>;
- non-removable;
- vmmc-supply = <&vreg_wlan>;
- vqmmc-supply = <&pm8941_s3>;
+ output-high;
+ power-source = <PM8941_GPIO_S3>;
+ };

- pinctrl-names = "default";
- pinctrl-0 = <&sdhc2_pin_a>;
+ wlan_regulator_pin: wl-reg-active {
+ pins = "gpio17";
+ function = "normal";

- #address-cells = <1>;
- #size-cells = <0>;
+ bias-disable;
+ power-source = <PM8941_GPIO_S3>;
+ };
+
+ otg {
+ gpio-hog;
+ gpios = <35 GPIO_ACTIVE_HIGH>;
+ output-high;
+ line-name = "otg-gpio";
+ };
+};

- bcrmf@1 {
- compatible = "brcm,bcm4339-fmac", "brcm,bcm4329-fmac";
- reg = <1>;
+&rpm_requests {
+ pm8841-regulators {
+ compatible = "qcom,rpm-pm8841-regulators";

- brcm,drive-strength = <10>;
+ pm8841_s1: s1 {
+ regulator-min-microvolt = <675000>;
+ regulator-max-microvolt = <1050000>;
+ };
+
+ pm8841_s2: s2 {
+ regulator-min-microvolt = <500000>;
+ regulator-max-microvolt = <1050000>;
+ };

- pinctrl-names = "default";
- pinctrl-0 = <&wlan_sleep_clk_pin>;
+ pm8841_s3: s3 {
+ regulator-min-microvolt = <1050000>;
+ regulator-max-microvolt = <1050000>;
+ };
+
+ pm8841_s4: s4 {
+ regulator-min-microvolt = <815000>;
+ regulator-max-microvolt = <900000>;
};
};

- gpio-keys {
- compatible = "gpio-keys";
+ pm8941-regulators {
+ compatible = "qcom,rpm-pm8941-regulators";
+
+ vdd_l1_l3-supply = <&pm8941_s1>;
+ vdd_l2_lvs1_2_3-supply = <&pm8941_s3>;
+ vdd_l4_l11-supply = <&pm8941_s1>;
+ vdd_l5_l7-supply = <&pm8941_s2>;
+ vdd_l6_l12_l14_l15-supply = <&pm8941_s2>;
+ vdd_l8_l16_l18_l19-supply = <&vreg_vph_pwr>;
+ vdd_l9_l10_l17_l22-supply = <&vreg_boost>;
+ vdd_l13_l20_l23_l24-supply = <&vreg_boost>;
+ vdd_l21-supply = <&vreg_boost>;
+
+ pm8941_s1: s1 {
+ regulator-min-microvolt = <1300000>;
+ regulator-max-microvolt = <1300000>;
+ regulator-always-on;
+ regulator-boot-on;
+ };

- pinctrl-names = "default";
- pinctrl-0 = <&gpio_keys_pin_a>;
+ pm8941_s2: s2 {
+ regulator-min-microvolt = <2150000>;
+ regulator-max-microvolt = <2150000>;
+ regulator-boot-on;
+ };

- volume-up {
- label = "volume_up";
- gpios = <&pm8941_gpios 2 GPIO_ACTIVE_LOW>;
- linux,input-type = <1>;
- linux,code = <KEY_VOLUMEUP>;
+ pm8941_s3: s3 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-always-on;
+ regulator-boot-on;
};

- volume-down {
- label = "volume_down";
- gpios = <&pm8941_gpios 3 GPIO_ACTIVE_LOW>;
- linux,input-type = <1>;
- linux,code = <KEY_VOLUMEDOWN>;
+ pm8941_l1: l1 {
+ regulator-min-microvolt = <1225000>;
+ regulator-max-microvolt = <1225000>;
+ regulator-always-on;
+ regulator-boot-on;
};
- };

- serial@f9960000 {
- status = "okay";
+ pm8941_l2: l2 {
+ regulator-min-microvolt = <1200000>;
+ regulator-max-microvolt = <1200000>;
+ };

- pinctrl-names = "default";
- pinctrl-0 = <&blsp2_uart10_pin_a>;
+ pm8941_l3: l3 {
+ regulator-min-microvolt = <1225000>;
+ regulator-max-microvolt = <1225000>;
+ };

- bluetooth {
- compatible = "brcm,bcm43438-bt";
- max-speed = <3000000>;
+ pm8941_l4: l4 {
+ regulator-min-microvolt = <1225000>;
+ regulator-max-microvolt = <1225000>;
+ };

- pinctrl-names = "default";
- pinctrl-0 = <&bt_pin>;
+ pm8941_l5: l5 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };

- host-wakeup-gpios = <&msmgpio 42 GPIO_ACTIVE_HIGH>;
- device-wakeup-gpios = <&msmgpio 62 GPIO_ACTIVE_HIGH>;
- shutdown-gpios = <&msmgpio 41 GPIO_ACTIVE_HIGH>;
+ pm8941_l6: l6 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-boot-on;
};
- };

- i2c@f9967000 {
- status = "okay";
- pinctrl-names = "default";
- pinctrl-0 = <&i2c11_pins>;
- clock-frequency = <355000>;
- qcom,src-freq = <50000000>;
+ pm8941_l7: l7 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-boot-on;
+ };

- led-controller@38 {
- compatible = "ti,lm3630a";
- status = "okay";
- reg = <0x38>;
+ pm8941_l8: l8 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };

- #address-cells = <1>;
- #size-cells = <0>;
+ pm8941_l9: l9 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <2950000>;
+ };

- led@0 {
- reg = <0>;
- led-sources = <0 1>;
- label = "lcd-backlight";
- default-brightness = <200>;
- };
+ pm8941_l10: l10 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <2950000>;
};
- };

- i2c@f9968000 {
- status = "okay";
- pinctrl-names = "default";
- pinctrl-0 = <&i2c12_pins>;
- clock-frequency = <100000>;
- qcom,src-freq = <50000000>;
-
- mpu6515@68 {
- compatible = "invensense,mpu6515";
- reg = <0x68>;
- interrupts-extended = <&msmgpio 73 IRQ_TYPE_EDGE_FALLING>;
- vddio-supply = <&pm8941_lvs1>;
-
- pinctrl-names = "default";
- pinctrl-0 = <&mpu6515_pin>;
-
- mount-matrix = "0", "-1", "0",
- "-1", "0", "0",
- "0", "0", "1";
-
- i2c-gate {
- #address-cells = <1>;
- #size-cells = <0>;
- ak8963@f {
- compatible = "asahi-kasei,ak8963";
- reg = <0x0f>;
- gpios = <&msmgpio 67 0>;
- vid-supply = <&pm8941_lvs1>;
- vdd-supply = <&pm8941_l17>;
- };
-
- bmp280@76 {
- compatible = "bosch,bmp280";
- reg = <0x76>;
- vdda-supply = <&pm8941_lvs1>;
- vddd-supply = <&pm8941_l17>;
- };
- };
+ pm8941_l11: l11 {
+ regulator-min-microvolt = <1300000>;
+ regulator-max-microvolt = <1300000>;
};
- };

- i2c@f9923000 {
- status = "okay";
- pinctrl-names = "default";
- pinctrl-0 = <&i2c1_pins>;
- clock-frequency = <100000>;
- qcom,src-freq = <50000000>;
+ pm8941_l12: l12 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-always-on;
+ regulator-boot-on;
+ };
+
+ pm8941_l13: l13 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <2950000>;
+ regulator-boot-on;
+ };
+
+ pm8941_l14: l14 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };
+
+ pm8941_l15: l15 {
+ regulator-min-microvolt = <2050000>;
+ regulator-max-microvolt = <2050000>;
+ };

- charger: bq24192@6b {
- compatible = "ti,bq24192";
- reg = <0x6b>;
- interrupts-extended = <&spmi_bus 0 0xd5 0 IRQ_TYPE_EDGE_FALLING>;
+ pm8941_l16: l16 {
+ regulator-min-microvolt = <2700000>;
+ regulator-max-microvolt = <2700000>;
+ };

- omit-battery-class;
+ pm8941_l17: l17 {
+ regulator-min-microvolt = <2850000>;
+ regulator-max-microvolt = <2850000>;
+ };

- usb_otg_vbus: usb-otg-vbus { };
+ pm8941_l18: l18 {
+ regulator-min-microvolt = <2850000>;
+ regulator-max-microvolt = <2850000>;
};

- fuelgauge: max17048@36 {
- compatible = "maxim,max17048";
- reg = <0x36>;
+ pm8941_l19: l19 {
+ regulator-min-microvolt = <3000000>;
+ regulator-max-microvolt = <3300000>;
+ };

- maxim,double-soc;
- maxim,rcomp = /bits/ 8 <0x4d>;
+ pm8941_l20: l20 {
+ regulator-min-microvolt = <2950000>;
+ regulator-max-microvolt = <2950000>;
+ regulator-system-load = <200000>;
+ regulator-allow-set-load;
+ regulator-boot-on;
+ };

- interrupt-parent = <&msmgpio>;
- interrupts = <9 IRQ_TYPE_LEVEL_LOW>;
+ pm8941_l21: l21 {
+ regulator-min-microvolt = <2950000>;
+ regulator-max-microvolt = <2950000>;
+ regulator-boot-on;
+ };

- pinctrl-names = "default";
- pinctrl-0 = <&fuelgauge_pin>;
+ pm8941_l22: l22 {
+ regulator-min-microvolt = <3000000>;
+ regulator-max-microvolt = <3300000>;
+ };

- maxim,alert-low-soc-level = <2>;
+ pm8941_l23: l23 {
+ regulator-min-microvolt = <3000000>;
+ regulator-max-microvolt = <3000000>;
};
+
+ pm8941_l24: l24 {
+ regulator-min-microvolt = <3075000>;
+ regulator-max-microvolt = <3075000>;
+ regulator-boot-on;
+ };
+
+ pm8941_lvs1: lvs1 {};
+ pm8941_lvs3: lvs3 {};
};
+};

- i2c@f9924000 {
- status = "okay";
+&sdhc_1 {
+ status = "okay";

- clock-frequency = <355000>;
- qcom,src-freq = <50000000>;
+ vmmc-supply = <&pm8941_l20>;
+ vqmmc-supply = <&pm8941_s3>;

- pinctrl-names = "default";
- pinctrl-0 = <&i2c2_pins>;
+ pinctrl-names = "default";
+ pinctrl-0 = <&sdhc1_pin_a>;
+};

- synaptics@70 {
- compatible = "syna,rmi4-i2c";
- reg = <0x70>;
+&sdhc_2 {
+ status = "okay";

- interrupts-extended = <&msmgpio 5 IRQ_TYPE_EDGE_FALLING>;
- vdd-supply = <&pm8941_l22>;
- vio-supply = <&pm8941_lvs3>;
+ max-frequency = <100000000>;
+ vmmc-supply = <&vreg_wlan>;
+ vqmmc-supply = <&pm8941_s3>;
+ non-removable;

- pinctrl-names = "default";
- pinctrl-0 = <&touch_pin>;
+ pinctrl-names = "default";
+ pinctrl-0 = <&sdhc2_pin_a>;

- #address-cells = <1>;
- #size-cells = <0>;
+ #address-cells = <1>;
+ #size-cells = <0>;

- rmi4-f01@1 {
- reg = <0x1>;
- syna,nosleep-mode = <1>;
- };
+ bcrmf@1 {
+ compatible = "brcm,bcm4339-fmac", "brcm,bcm4329-fmac";
+ reg = <1>;

- rmi4-f12@12 {
- reg = <0x12>;
- syna,sensor-type = <1>;
- };
+ brcm,drive-strength = <10>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&wlan_sleep_clk_pin>;
+ };
+};
+
+&tlmm {
+ sdhc1_pin_a: sdhc1-pin-active {
+ clk {
+ pins = "sdc1_clk";
+ drive-strength = <16>;
+ bias-disable;
+ };
+
+ cmd-data {
+ pins = "sdc1_cmd", "sdc1_data";
+ drive-strength = <10>;
+ bias-pull-up;
};
};

- i2c@f9925000 {
- status = "okay";
- pinctrl-names = "default";
- pinctrl-0 = <&i2c3_pins>;
- clock-frequency = <100000>;
- qcom,src-freq = <50000000>;
+ sdhc2_pin_a: sdhc2-pin-active {
+ clk {
+ pins = "sdc2_clk";
+ drive-strength = <6>;
+ bias-disable;
+ };

- avago_apds993@39 {
- compatible = "avago,apds9930";
- reg = <0x39>;
- interrupts-extended = <&msmgpio 61 IRQ_TYPE_EDGE_FALLING>;
- vdd-supply = <&pm8941_l17>;
- vddio-supply = <&pm8941_lvs1>;
- led-max-microamp = <100000>;
- amstaos,proximity-diodes = <0>;
+ cmd-data {
+ pins = "sdc2_cmd", "sdc2_data";
+ drive-strength = <6>;
+ bias-pull-up;
};
};

- usb@f9a55000 {
- status = "okay";
+ i2c1_pins: i2c1 {
+ mux {
+ pins = "gpio2", "gpio3";
+ function = "blsp_i2c1";

- phys = <&usb_hs1_phy>;
- phy-select = <&tcsr 0xb000 0>;
+ drive-strength = <2>;
+ bias-disable;
+ };
+ };

- extcon = <&charger>, <&usb_id>;
- vbus-supply = <&usb_otg_vbus>;
+ i2c2_pins: i2c2 {
+ mux {
+ pins = "gpio6", "gpio7";
+ function = "blsp_i2c2";

- hnp-disable;
- srp-disable;
- adp-disable;
+ drive-strength = <2>;
+ bias-disable;
+ };
+ };

- ulpi {
- phy@a {
- status = "okay";
+ i2c3_pins: i2c3 {
+ mux {
+ pins = "gpio10", "gpio11";
+ function = "blsp_i2c3";
+ drive-strength = <2>;
+ bias-disable;
+ };
+ };

- v1p8-supply = <&pm8941_l6>;
- v3p3-supply = <&pm8941_l24>;
+ i2c11_pins: i2c11 {
+ mux {
+ pins = "gpio83", "gpio84";
+ function = "blsp_i2c11";

- qcom,init-seq = /bits/ 8 <0x1 0x64>;
- };
+ drive-strength = <2>;
+ bias-disable;
};
};

- mdss@fd900000 {
- status = "okay";
+ i2c12_pins: i2c12 {
+ mux {
+ pins = "gpio87", "gpio88";
+ function = "blsp_i2c12";
+ drive-strength = <2>;
+ bias-disable;
+ };
+ };

- mdp@fd900000 {
- status = "okay";
+ mpu6515_pin: mpu6515 {
+ irq {
+ pins = "gpio73";
+ function = "gpio";
+ bias-disable;
+ input-enable;
};
+ };

- dsi@fd922800 {
- status = "okay";
+ touch_pin: touch {
+ int {
+ pins = "gpio5";
+ function = "gpio";

- vdda-supply = <&pm8941_l2>;
- vdd-supply = <&pm8941_lvs3>;
- vddio-supply = <&pm8941_l12>;
+ drive-strength = <2>;
+ bias-disable;
+ input-enable;
+ };

- #address-cells = <1>;
- #size-cells = <0>;
+ reset {
+ pins = "gpio8";
+ function = "gpio";

- ports {
- port@1 {
- endpoint {
- remote-endpoint = <&panel_in>;
- data-lanes = <0 1 2 3>;
- };
- };
- };
+ drive-strength = <2>;
+ bias-pull-up;
+ };
+ };

- panel: panel@0 {
- reg = <0>;
- compatible = "lg,acx467akm-7";
+ panel_pin: panel {
+ te {
+ pins = "gpio12";
+ function = "mdp_vsync";

- pinctrl-names = "default";
- pinctrl-0 = <&panel_pin>;
+ drive-strength = <2>;
+ bias-disable;
+ };
+ };

- port {
- panel_in: endpoint {
- remote-endpoint = <&dsi0_out>;
- };
- };
- };
+ bt_pin: bt {
+ hostwake {
+ pins = "gpio42";
+ function = "gpio";
};

- dsi-phy@fd922a00 {
- status = "okay";
+ devwake {
+ pins = "gpio62";
+ function = "gpio";
+ };

- vddio-supply = <&pm8941_l12>;
+ shutdown {
+ pins = "gpio41";
+ function = "gpio";
};
};
-};

-&spmi_bus {
- pm8941@0 {
- gpios@c000 {
- gpio_keys_pin_a: gpio-keys-active {
- pins = "gpio2", "gpio3";
- function = "normal";
+ blsp2_uart4_pin_a: blsp2-uart4-pin-active {
+ tx {
+ pins = "gpio53";
+ function = "blsp_uart10";

- bias-pull-up;
- power-source = <PM8941_GPIO_S3>;
- };
+ drive-strength = <2>;
+ bias-disable;
+ };

- fuelgauge_pin: fuelgauge-int {
- pins = "gpio9";
- function = "normal";
+ rx {
+ pins = "gpio54";
+ function = "blsp_uart10";

- bias-disable;
- input-enable;
- power-source = <PM8941_GPIO_S3>;
- };
-
- wlan_sleep_clk_pin: wl-sleep-clk {
- pins = "gpio16";
- function = "func2";
+ drive-strength = <2>;
+ bias-pull-up;
+ };

- output-high;
- power-source = <PM8941_GPIO_S3>;
- };
+ cts {
+ pins = "gpio55";
+ function = "blsp_uart10";

- wlan_regulator_pin: wl-reg-active {
- pins = "gpio17";
- function = "normal";
+ drive-strength = <2>;
+ bias-pull-up;
+ };

- bias-disable;
- power-source = <PM8941_GPIO_S3>;
- };
+ rts {
+ pins = "gpio56";
+ function = "blsp_uart10";

- otg {
- gpio-hog;
- gpios = <35 GPIO_ACTIVE_HIGH>;
- output-high;
- line-name = "otg-gpio";
- };
+ drive-strength = <2>;
+ bias-disable;
};
};
};
diff --git a/arch/arm/boot/dts/qcom-msm8974-samsung-klte.dts b/arch/arm/boot/dts/qcom-msm8974-samsung-klte.dts
index 96e1c978b878..9ef5a68747f1 100644
--- a/arch/arm/boot/dts/qcom-msm8974-samsung-klte.dts
+++ b/arch/arm/boot/dts/qcom-msm8974-samsung-klte.dts
@@ -13,201 +13,42 @@ / {
aliases {
serial0 = &blsp1_uart1;
mmc0 = &sdhc_1; /* SDC1 eMMC slot */
- mmc1 = &sdhc_2; /* SDC2 SD card slot */
+ mmc1 = &sdhc_3; /* SDC2 SD card slot */
};

chosen {
stdout-path = "serial0:115200n8";
};

- smd {
- rpm {
- rpm_requests {
- pma8084-regulators {
- compatible = "qcom,rpm-pma8084-regulators";
- status = "okay";
-
- pma8084_s1: s1 {
- regulator-min-microvolt = <675000>;
- regulator-max-microvolt = <1050000>;
- regulator-always-on;
- };
-
- pma8084_s2: s2 {
- regulator-min-microvolt = <500000>;
- regulator-max-microvolt = <1050000>;
- };
-
- pma8084_s3: s3 {
- regulator-min-microvolt = <1300000>;
- regulator-max-microvolt = <1300000>;
- };
-
- pma8084_s4: s4 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- pma8084_s5: s5 {
- regulator-min-microvolt = <2150000>;
- regulator-max-microvolt = <2150000>;
- };
-
- pma8084_s6: s6 {
- regulator-min-microvolt = <1050000>;
- regulator-max-microvolt = <1050000>;
- };
-
- pma8084_l1: l1 {
- regulator-min-microvolt = <1225000>;
- regulator-max-microvolt = <1225000>;
- };
-
- pma8084_l2: l2 {
- regulator-min-microvolt = <1200000>;
- regulator-max-microvolt = <1200000>;
- };
-
- pma8084_l3: l3 {
- regulator-min-microvolt = <1050000>;
- regulator-max-microvolt = <1200000>;
- };
-
- pma8084_l4: l4 {
- regulator-min-microvolt = <1200000>;
- regulator-max-microvolt = <1225000>;
- };
-
- pma8084_l5: l5 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- pma8084_l6: l6 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- pma8084_l7: l7 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- pma8084_l8: l8 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- pma8084_l9: l9 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <2950000>;
- };
-
- pma8084_l10: l10 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <2950000>;
- };
-
- pma8084_l11: l11 {
- regulator-min-microvolt = <1300000>;
- regulator-max-microvolt = <1300000>;
- };
-
- pma8084_l12: l12 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- regulator-always-on;
- };
-
- pma8084_l13: l13 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <2950000>;
- };
-
- pma8084_l14: l14 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- pma8084_l15: l15 {
- regulator-min-microvolt = <2050000>;
- regulator-max-microvolt = <2050000>;
- };
-
- pma8084_l16: l16 {
- regulator-min-microvolt = <2700000>;
- regulator-max-microvolt = <2700000>;
- };
-
- pma8084_l17: l17 {
- regulator-min-microvolt = <2850000>;
- regulator-max-microvolt = <2850000>;
- };
-
- pma8084_l18: l18 {
- regulator-min-microvolt = <2850000>;
- regulator-max-microvolt = <2850000>;
- };
-
- pma8084_l19: l19 {
- regulator-min-microvolt = <3300000>;
- regulator-max-microvolt = <3300000>;
- };
-
- pma8084_l20: l20 {
- regulator-min-microvolt = <2950000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-allow-set-load;
- regulator-system-load = <200000>;
- };
-
- pma8084_l21: l21 {
- regulator-min-microvolt = <2950000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-allow-set-load;
- regulator-system-load = <200000>;
- };
-
- pma8084_l22: l22 {
- regulator-min-microvolt = <3000000>;
- regulator-max-microvolt = <3300000>;
- };
-
- pma8084_l23: l23 {
- regulator-min-microvolt = <3000000>;
- regulator-max-microvolt = <3000000>;
- };
-
- pma8084_l24: l24 {
- regulator-min-microvolt = <3075000>;
- regulator-max-microvolt = <3075000>;
- };
-
- pma8084_l25: l25 {
- regulator-min-microvolt = <2100000>;
- regulator-max-microvolt = <2100000>;
- };
-
- pma8084_l26: l26 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <2050000>;
- };
-
- pma8084_l27: l27 {
- regulator-min-microvolt = <1000000>;
- regulator-max-microvolt = <1225000>;
- };
-
- pma8084_lvs1: lvs1 {};
- pma8084_lvs2: lvs2 {};
- pma8084_lvs3: lvs3 {};
- pma8084_lvs4: lvs4 {};
-
- pma8084_5vs1: 5vs1 {};
- };
- };
+ gpio-keys {
+ compatible = "gpio-keys";
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&gpio_keys_pin_a>;
+
+ volume-down {
+ label = "volume_down";
+ gpios = <&pma8084_gpios 2 GPIO_ACTIVE_LOW>;
+ linux,input-type = <1>;
+ linux,code = <KEY_VOLUMEDOWN>;
+ debounce-interval = <15>;
+ };
+
+ home-key {
+ label = "home_key";
+ gpios = <&pma8084_gpios 3 GPIO_ACTIVE_LOW>;
+ linux,input-type = <1>;
+ linux,code = <KEY_HOMEPAGE>;
+ wakeup-source;
+ debounce-interval = <15>;
+ };
+
+ volume-up {
+ label = "volume_up";
+ gpios = <&pma8084_gpios 5 GPIO_ACTIVE_LOW>;
+ linux,input-type = <1>;
+ linux,code = <KEY_VOLUMEUP>;
+ debounce-interval = <15>;
};
};

@@ -215,8 +56,8 @@ i2c-gpio-touchkey {
compatible = "i2c-gpio";
#address-cells = <1>;
#size-cells = <0>;
- sda-gpios = <&msmgpio 95 (GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN)>;
- scl-gpios = <&msmgpio 96 (GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN)>;
+ sda-gpios = <&tlmm 95 (GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN)>;
+ scl-gpios = <&tlmm 96 (GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN)>;
pinctrl-names = "default";
pinctrl-0 = <&i2c_touchkey_pins>;

@@ -240,8 +81,8 @@ i2c-gpio-led {
compatible = "i2c-gpio";
#address-cells = <1>;
#size-cells = <0>;
- scl-gpios = <&msmgpio 121 (GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN)>;
- sda-gpios = <&msmgpio 120 (GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN)>;
+ scl-gpios = <&tlmm 121 (GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN)>;
+ sda-gpios = <&tlmm 120 (GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN)>;
pinctrl-names = "default";
pinctrl-0 = <&i2c_led_gpioex_pins>;

@@ -259,7 +100,7 @@ gpio_expander: gpio@20 {
pinctrl-names = "default";
pinctrl-0 = <&gpioex_pin>;

- reset-gpios = <&msmgpio 145 GPIO_ACTIVE_LOW>;
+ reset-gpios = <&tlmm 145 GPIO_ACTIVE_LOW>;
};

led-controller@30 {
@@ -315,594 +156,719 @@ vreg_panel: panel-regulator {
};

/delete-node/ vreg-boost;
-
- adsp-pil {
- cx-supply = <&pma8084_s2>;
- };
};

-&soc {
- serial@f991e000 {
- status = "okay";
- };
+&blsp1_i2c2 {
+ status = "okay";

- /* blsp2_uart8 */
- serial@f995e000 {
- status = "okay";
+ pinctrl-names = "default";
+ pinctrl-0 = <&i2c2_pins>;

- pinctrl-names = "default", "sleep";
- pinctrl-0 = <&blsp2_uart8_pins_active>;
- pinctrl-1 = <&blsp2_uart8_pins_sleep>;
+ touchscreen@20 {
+ compatible = "syna,rmi4-i2c";
+ reg = <0x20>;

- bluetooth {
- compatible = "brcm,bcm43540-bt";
- max-speed = <3000000>;
- pinctrl-names = "default";
- pinctrl-0 = <&bt_pins>;
- device-wakeup-gpios = <&msmgpio 91 GPIO_ACTIVE_HIGH>;
- shutdown-gpios = <&gpio_expander 9 GPIO_ACTIVE_HIGH>;
- interrupt-parent = <&msmgpio>;
- interrupts = <75 IRQ_TYPE_LEVEL_HIGH>;
- interrupt-names = "host-wakeup";
- };
- };
+ interrupt-parent = <&pma8084_gpios>;
+ interrupts = <8 IRQ_TYPE_EDGE_FALLING>;

- gpio-keys {
- compatible = "gpio-keys";
+ vdd-supply = <&max77826_ldo13>;
+ vio-supply = <&pma8084_lvs2>;

pinctrl-names = "default";
- pinctrl-0 = <&gpio_keys_pin_a>;
+ pinctrl-0 = <&touch_pin>;

- volume-down {
- label = "volume_down";
- gpios = <&pma8084_gpios 2 GPIO_ACTIVE_LOW>;
- linux,input-type = <1>;
- linux,code = <KEY_VOLUMEDOWN>;
- debounce-interval = <15>;
- };
+ syna,startup-delay-ms = <100>;

- home-key {
- label = "home_key";
- gpios = <&pma8084_gpios 3 GPIO_ACTIVE_LOW>;
- linux,input-type = <1>;
- linux,code = <KEY_HOMEPAGE>;
- wakeup-source;
- debounce-interval = <15>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ rmi4-f01@1 {
+ reg = <0x1>;
+ syna,nosleep-mode = <1>;
};

- volume-up {
- label = "volume_up";
- gpios = <&pma8084_gpios 5 GPIO_ACTIVE_LOW>;
- linux,input-type = <1>;
- linux,code = <KEY_VOLUMEUP>;
- debounce-interval = <15>;
+ rmi4-f12@12 {
+ reg = <0x12>;
+ syna,sensor-type = <1>;
};
};
+};

- pinctrl@fd510000 {
- blsp2_uart8_pins_active: blsp2-uart8-pins-active {
- pins = "gpio45", "gpio46", "gpio47", "gpio48";
- function = "blsp_uart8";
- drive-strength = <8>;
- bias-disable;
- };
+&blsp1_i2c6 {
+ status = "okay";

- blsp2_uart8_pins_sleep: blsp2-uart8-pins-sleep {
- pins = "gpio45", "gpio46", "gpio47", "gpio48";
- function = "gpio";
- drive-strength = <2>;
- bias-pull-down;
- };
+ pinctrl-names = "default";
+ pinctrl-0 = <&i2c6_pins>;
+
+ pmic@60 {
+ reg = <0x60>;
+ compatible = "maxim,max77826";

- bt_pins: bt-pins {
- hostwake {
- pins = "gpio75";
- function = "gpio";
- drive-strength = <16>;
- input-enable;
+ regulators {
+ max77826_ldo1: LDO1 {
+ regulator-min-microvolt = <1200000>;
+ regulator-max-microvolt = <1200000>;
};

- devwake {
- pins = "gpio91";
- function = "gpio";
- drive-strength = <2>;
+ max77826_ldo2: LDO2 {
+ regulator-min-microvolt = <1000000>;
+ regulator-max-microvolt = <1000000>;
};
- };

- sdhc1_pin_a: sdhc1-pin-active {
- clk {
- pins = "sdc1_clk";
- drive-strength = <4>;
- bias-disable;
+ max77826_ldo3: LDO3 {
+ regulator-min-microvolt = <1200000>;
+ regulator-max-microvolt = <1200000>;
};

- cmd-data {
- pins = "sdc1_cmd", "sdc1_data";
- drive-strength = <4>;
- bias-pull-up;
+ max77826_ldo4: LDO4 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
};
- };

- sdhc2_pin_a: sdhc2-pin-active {
- clk-cmd-data {
- pins = "gpio35", "gpio36", "gpio37", "gpio38",
- "gpio39", "gpio40";
- function = "sdc3";
- drive-strength = <8>;
- bias-disable;
+ max77826_ldo5: LDO5 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
};
- };

- sdhc2_cd_pin: sdhc2-cd {
- pins = "gpio62";
- function = "gpio";
+ max77826_ldo6: LDO6 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <3300000>;
+ };

- drive-strength = <2>;
- bias-disable;
- };
+ max77826_ldo7: LDO7 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };

- sdhc3_pin_a: sdhc3-pin-active {
- clk {
- pins = "sdc2_clk";
- drive-strength = <6>;
- bias-disable;
+ max77826_ldo8: LDO8 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <3300000>;
};

- cmd-data {
- pins = "sdc2_cmd", "sdc2_data";
- drive-strength = <6>;
- bias-pull-up;
+ max77826_ldo9: LDO9 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
};
- };

- i2c2_pins: i2c2 {
- mux {
- pins = "gpio6", "gpio7";
- function = "blsp_i2c2";
+ max77826_ldo10: LDO10 {
+ regulator-min-microvolt = <2800000>;
+ regulator-max-microvolt = <2950000>;
+ };

- drive-strength = <2>;
- bias-disable;
+ max77826_ldo11: LDO11 {
+ regulator-min-microvolt = <2700000>;
+ regulator-max-microvolt = <2950000>;
};
- };

- i2c6_pins: i2c6 {
- mux {
- pins = "gpio29", "gpio30";
- function = "blsp_i2c6";
+ max77826_ldo12: LDO12 {
+ regulator-min-microvolt = <2500000>;
+ regulator-max-microvolt = <3300000>;
+ };

- drive-strength = <2>;
- bias-disable;
+ max77826_ldo13: LDO13 {
+ regulator-min-microvolt = <3300000>;
+ regulator-max-microvolt = <3300000>;
};
- };

- i2c12_pins: i2c12 {
- mux {
- pins = "gpio87", "gpio88";
- function = "blsp_i2c12";
+ max77826_ldo14: LDO14 {
+ regulator-min-microvolt = <3300000>;
+ regulator-max-microvolt = <3300000>;
+ };

- drive-strength = <2>;
- bias-disable;
+ max77826_ldo15: LDO15 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
};
- };

- i2c_touchkey_pins: i2c-touchkey {
- mux {
- pins = "gpio95", "gpio96";
- function = "gpio";
- input-enable;
- bias-pull-up;
+ max77826_buck: BUCK {
+ regulator-min-microvolt = <1225000>;
+ regulator-max-microvolt = <1225000>;
};
- };

- i2c_led_gpioex_pins: i2c-led-gpioex {
- mux {
- pins = "gpio120", "gpio121";
- function = "gpio";
- input-enable;
- bias-pull-down;
+ max77826_buckboost: BUCKBOOST {
+ regulator-min-microvolt = <3400000>;
+ regulator-max-microvolt = <3400000>;
};
};
+ };
+};

- gpioex_pin: gpioex {
- res {
- pins = "gpio145";
- function = "gpio";
+&blsp1_uart2 {
+ status = "okay";
+};

- bias-pull-up;
- drive-strength = <2>;
- };
- };
+&blsp2_i2c6 {
+ status = "okay";
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&i2c12_pins>;
+
+ fuelgauge@36 {
+ compatible = "maxim,max17048";
+ reg = <0x36>;
+
+ maxim,double-soc;
+ maxim,rcomp = /bits/ 8 <0x56>;
+
+ interrupt-parent = <&pma8084_gpios>;
+ interrupts = <21 IRQ_TYPE_LEVEL_LOW>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&fuelgauge_pin>;
+ };
+};
+
+&blsp2_uart2 {
+ status = "okay";

- wifi_pin: wifi {
- int {
- pins = "gpio92";
- function = "gpio";
+ pinctrl-names = "default", "sleep";
+ pinctrl-0 = <&blsp2_uart2_pins_active>;
+ pinctrl-1 = <&blsp2_uart2_pins_sleep>;

- input-enable;
- bias-pull-down;
+ bluetooth {
+ compatible = "brcm,bcm43540-bt";
+ max-speed = <3000000>;
+ pinctrl-names = "default";
+ pinctrl-0 = <&bt_pins>;
+ device-wakeup-gpios = <&tlmm 91 GPIO_ACTIVE_HIGH>;
+ shutdown-gpios = <&gpio_expander 9 GPIO_ACTIVE_HIGH>;
+ interrupt-parent = <&tlmm>;
+ interrupts = <75 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-names = "host-wakeup";
+ };
+};
+
+&dsi0 {
+ status = "okay";
+
+ vdda-supply = <&pma8084_l2>;
+ vdd-supply = <&pma8084_l22>;
+ vddio-supply = <&pma8084_l12>;
+
+ panel: panel@0 {
+ reg = <0>;
+ compatible = "samsung,s6e3fa2";
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&panel_te_pin &panel_rst_pin>;
+
+ iovdd-supply = <&pma8084_lvs4>;
+ vddr-supply = <&vreg_panel>;
+
+ reset-gpios = <&pma8084_gpios 17 GPIO_ACTIVE_LOW>;
+ te-gpios = <&tlmm 12 GPIO_ACTIVE_HIGH>;
+
+ port {
+ panel_in: endpoint {
+ remote-endpoint = <&dsi0_out>;
};
};
+ };
+};

- panel_te_pin: panel {
- te {
- pins = "gpio12";
- function = "mdp_vsync";
+&dsi0_out {
+ remote-endpoint = <&panel_in>;
+ data-lanes = <0 1 2 3>;
+};

- drive-strength = <2>;
- bias-disable;
- };
+&dsi0_phy {
+ status = "okay";
+
+ vddio-supply = <&pma8084_l12>;
+};
+
+&gpu {
+ status = "okay";
+};
+
+&mdss {
+ status = "okay";
+};
+
+&otg {
+ status = "okay";
+
+ phys = <&usb_hs1_phy>;
+ phy-select = <&tcsr 0xb000 0>;
+
+ hnp-disable;
+ srp-disable;
+ adp-disable;
+
+ ulpi {
+ phy@a {
+ status = "okay";
+
+ v1p8-supply = <&pma8084_l6>;
+ v3p3-supply = <&pma8084_l24>;
+
+ qcom,init-seq = /bits/ 8 <0x1 0x64>;
};
};
+};

- sdhc_1: sdhci@f9824900 {
- status = "okay";
+&pma8084_gpios {
+ gpio_keys_pin_a: gpio-keys-active {
+ pins = "gpio2", "gpio3", "gpio5";
+ function = "normal";

- vmmc-supply = <&pma8084_l20>;
- vqmmc-supply = <&pma8084_s4>;
+ bias-pull-up;
+ power-source = <PMA8084_GPIO_S4>;
+ };

- bus-width = <8>;
- non-removable;
+ touchkey_pin: touchkey-int-pin {
+ pins = "gpio6";
+ function = "normal";
+ bias-disable;
+ input-enable;
+ power-source = <PMA8084_GPIO_S4>;
+ };

- pinctrl-names = "default";
- pinctrl-0 = <&sdhc1_pin_a>;
+ touch_pin: touchscreen-int-pin {
+ pins = "gpio8";
+ function = "normal";
+ bias-disable;
+ input-enable;
+ power-source = <PMA8084_GPIO_S4>;
};

- sdhc_2: sdhci@f9864900 {
- status = "okay";
+ panel_en_pin: panel-en-pin {
+ pins = "gpio14";
+ function = "normal";
+ bias-pull-up;
+ power-source = <PMA8084_GPIO_S4>;
+ qcom,drive-strength = <PMIC_GPIO_STRENGTH_LOW>;
+ };

- max-frequency = <100000000>;
+ wlan_sleep_clk_pin: wlan-sleep-clk-pin {
+ pins = "gpio16";
+ function = "func2";

- vmmc-supply = <&pma8084_l21>;
- vqmmc-supply = <&pma8084_l13>;
+ output-high;
+ power-source = <PMA8084_GPIO_S4>;
+ qcom,drive-strength = <PMIC_GPIO_STRENGTH_HIGH>;
+ };

- bus-width = <4>;
+ panel_rst_pin: panel-rst-pin {
+ pins = "gpio17";
+ function = "normal";
+ bias-disable;
+ power-source = <PMA8084_GPIO_S4>;
+ qcom,drive-strength = <PMIC_GPIO_STRENGTH_LOW>;
+ };

- /* cd-gpio is intentionally disabled. If enabled, an SD card
- * present during boot is not initialized correctly. Without
- * cd-gpios the driver resorts to polling, so hotplug works.
- */
- pinctrl-names = "default";
- pinctrl-0 = <&sdhc2_pin_a /* &sdhc2_cd_pin */>;
- // cd-gpios = <&msmgpio 62 GPIO_ACTIVE_LOW>;
+ fuelgauge_pin: fuelgauge-int-pin {
+ pins = "gpio21";
+ function = "normal";
+ bias-disable;
+ input-enable;
+ power-source = <PMA8084_GPIO_S4>;
};
+};

- sdhci@f98a4900 {
- status = "okay";
+&remoteproc_adsp {
+ status = "okay";
+ cx-supply = <&pma8084_s2>;
+};

- #address-cells = <1>;
- #size-cells = <0>;
+&remoteproc_mss {
+ status = "okay";
+ cx-supply = <&pma8084_s2>;
+ mss-supply = <&pma8084_s6>;
+ mx-supply = <&pma8084_s1>;
+ pll-supply = <&pma8084_l12>;
+};

- max-frequency = <100000000>;
+&rpm_requests {
+ pma8084-regulators {
+ compatible = "qcom,rpm-pma8084-regulators";

- pinctrl-names = "default";
- pinctrl-0 = <&sdhc3_pin_a>;
+ pma8084_s1: s1 {
+ regulator-min-microvolt = <675000>;
+ regulator-max-microvolt = <1050000>;
+ regulator-always-on;
+ };

- vmmc-supply = <&vreg_wlan>;
- vqmmc-supply = <&pma8084_s4>;
+ pma8084_s2: s2 {
+ regulator-min-microvolt = <500000>;
+ regulator-max-microvolt = <1050000>;
+ };

- bus-width = <4>;
- non-removable;
+ pma8084_s3: s3 {
+ regulator-min-microvolt = <1300000>;
+ regulator-max-microvolt = <1300000>;
+ };

- wifi@1 {
- reg = <1>;
- compatible = "brcm,bcm4329-fmac";
+ pma8084_s4: s4 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };

- interrupt-parent = <&msmgpio>;
- interrupts = <92 IRQ_TYPE_LEVEL_HIGH>;
- interrupt-names = "host-wake";
+ pma8084_s5: s5 {
+ regulator-min-microvolt = <2150000>;
+ regulator-max-microvolt = <2150000>;
+ };

- pinctrl-names = "default";
- pinctrl-0 = <&wlan_sleep_clk_pin &wifi_pin>;
+ pma8084_s6: s6 {
+ regulator-min-microvolt = <1050000>;
+ regulator-max-microvolt = <1050000>;
};
- };

- usb@f9a55000 {
- status = "okay";
+ pma8084_l1: l1 {
+ regulator-min-microvolt = <1225000>;
+ regulator-max-microvolt = <1225000>;
+ };

- phys = <&usb_hs1_phy>;
- phy-select = <&tcsr 0xb000 0>;
- /*extcon = <&smbb>, <&usb_id>;*/
- /*vbus-supply = <&chg_otg>;*/
+ pma8084_l2: l2 {
+ regulator-min-microvolt = <1200000>;
+ regulator-max-microvolt = <1200000>;
+ };

- hnp-disable;
- srp-disable;
- adp-disable;
+ pma8084_l3: l3 {
+ regulator-min-microvolt = <1050000>;
+ regulator-max-microvolt = <1200000>;
+ };

- ulpi {
- phy@a {
- status = "okay";
+ pma8084_l4: l4 {
+ regulator-min-microvolt = <1200000>;
+ regulator-max-microvolt = <1225000>;
+ };

- v1p8-supply = <&pma8084_l6>;
- v3p3-supply = <&pma8084_l24>;
+ pma8084_l5: l5 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };

- /*extcon = <&smbb>;*/
- qcom,init-seq = /bits/ 8 <0x1 0x64>;
- };
+ pma8084_l6: l6 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
};
- };

- i2c@f9924000 {
- status = "okay";
+ pma8084_l7: l7 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };

- pinctrl-names = "default";
- pinctrl-0 = <&i2c2_pins>;
+ pma8084_l8: l8 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };

- touchscreen@20 {
- compatible = "syna,rmi4-i2c";
- reg = <0x20>;
+ pma8084_l9: l9 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <2950000>;
+ };

- interrupt-parent = <&pma8084_gpios>;
- interrupts = <8 IRQ_TYPE_EDGE_FALLING>;
+ pma8084_l10: l10 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <2950000>;
+ };

- vdd-supply = <&max77826_ldo13>;
- vio-supply = <&pma8084_lvs2>;
+ pma8084_l11: l11 {
+ regulator-min-microvolt = <1300000>;
+ regulator-max-microvolt = <1300000>;
+ };

- pinctrl-names = "default";
- pinctrl-0 = <&touch_pin>;
+ pma8084_l12: l12 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-always-on;
+ };

- syna,startup-delay-ms = <100>;
+ pma8084_l13: l13 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <2950000>;
+ };

- #address-cells = <1>;
- #size-cells = <0>;
+ pma8084_l14: l14 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };

- rmi4-f01@1 {
- reg = <0x1>;
- syna,nosleep-mode = <1>;
- };
+ pma8084_l15: l15 {
+ regulator-min-microvolt = <2050000>;
+ regulator-max-microvolt = <2050000>;
+ };

- rmi4-f12@12 {
- reg = <0x12>;
- syna,sensor-type = <1>;
- };
+ pma8084_l16: l16 {
+ regulator-min-microvolt = <2700000>;
+ regulator-max-microvolt = <2700000>;
};
- };

- i2c@f9928000 {
- status = "okay";
+ pma8084_l17: l17 {
+ regulator-min-microvolt = <2850000>;
+ regulator-max-microvolt = <2850000>;
+ };

- pinctrl-names = "default";
- pinctrl-0 = <&i2c6_pins>;
-
- pmic@60 {
- reg = <0x60>;
- compatible = "maxim,max77826";
-
- regulators {
- max77826_ldo1: LDO1 {
- regulator-min-microvolt = <1200000>;
- regulator-max-microvolt = <1200000>;
- };
-
- max77826_ldo2: LDO2 {
- regulator-min-microvolt = <1000000>;
- regulator-max-microvolt = <1000000>;
- };
-
- max77826_ldo3: LDO3 {
- regulator-min-microvolt = <1200000>;
- regulator-max-microvolt = <1200000>;
- };
-
- max77826_ldo4: LDO4 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- max77826_ldo5: LDO5 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- max77826_ldo6: LDO6 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <3300000>;
- };
-
- max77826_ldo7: LDO7 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- max77826_ldo8: LDO8 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <3300000>;
- };
-
- max77826_ldo9: LDO9 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- max77826_ldo10: LDO10 {
- regulator-min-microvolt = <2800000>;
- regulator-max-microvolt = <2950000>;
- };
-
- max77826_ldo11: LDO11 {
- regulator-min-microvolt = <2700000>;
- regulator-max-microvolt = <2950000>;
- };
-
- max77826_ldo12: LDO12 {
- regulator-min-microvolt = <2500000>;
- regulator-max-microvolt = <3300000>;
- };
-
- max77826_ldo13: LDO13 {
- regulator-min-microvolt = <3300000>;
- regulator-max-microvolt = <3300000>;
- };
-
- max77826_ldo14: LDO14 {
- regulator-min-microvolt = <3300000>;
- regulator-max-microvolt = <3300000>;
- };
-
- max77826_ldo15: LDO15 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- max77826_buck: BUCK {
- regulator-min-microvolt = <1225000>;
- regulator-max-microvolt = <1225000>;
- };
-
- max77826_buckboost: BUCKBOOST {
- regulator-min-microvolt = <3400000>;
- regulator-max-microvolt = <3400000>;
- };
- };
+ pma8084_l18: l18 {
+ regulator-min-microvolt = <2850000>;
+ regulator-max-microvolt = <2850000>;
};
- };

- i2c@f9968000 {
- status = "okay";
+ pma8084_l19: l19 {
+ regulator-min-microvolt = <3300000>;
+ regulator-max-microvolt = <3300000>;
+ };

- pinctrl-names = "default";
- pinctrl-0 = <&i2c12_pins>;
+ pma8084_l20: l20 {
+ regulator-min-microvolt = <2950000>;
+ regulator-max-microvolt = <2950000>;
+ regulator-system-load = <200000>;
+ regulator-allow-set-load;
+ };

- fuelgauge@36 {
- compatible = "maxim,max17048";
- reg = <0x36>;
+ pma8084_l21: l21 {
+ regulator-min-microvolt = <2950000>;
+ regulator-max-microvolt = <2950000>;
+ regulator-system-load = <200000>;
+ regulator-allow-set-load;
+ };

- maxim,double-soc;
- maxim,rcomp = /bits/ 8 <0x56>;
+ pma8084_l22: l22 {
+ regulator-min-microvolt = <3000000>;
+ regulator-max-microvolt = <3300000>;
+ };

- interrupt-parent = <&pma8084_gpios>;
- interrupts = <21 IRQ_TYPE_LEVEL_LOW>;
+ pma8084_l23: l23 {
+ regulator-min-microvolt = <3000000>;
+ regulator-max-microvolt = <3000000>;
+ };

- pinctrl-names = "default";
- pinctrl-0 = <&fuelgauge_pin>;
+ pma8084_l24: l24 {
+ regulator-min-microvolt = <3075000>;
+ regulator-max-microvolt = <3075000>;
};
+
+ pma8084_l25: l25 {
+ regulator-min-microvolt = <2100000>;
+ regulator-max-microvolt = <2100000>;
+ };
+
+ pma8084_l26: l26 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <2050000>;
+ };
+
+ pma8084_l27: l27 {
+ regulator-min-microvolt = <1000000>;
+ regulator-max-microvolt = <1225000>;
+ };
+
+ pma8084_lvs1: lvs1 {};
+ pma8084_lvs2: lvs2 {};
+ pma8084_lvs3: lvs3 {};
+ pma8084_lvs4: lvs4 {};
+
+ pma8084_5vs1: 5vs1 {};
};
+};
+
+&sdhc_1 {
+ status = "okay";
+
+ vmmc-supply = <&pma8084_l20>;
+ vqmmc-supply = <&pma8084_s4>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&sdhc1_pin_a>;
+};
+
+&sdhc_2 {
+ status = "okay";
+ max-frequency = <100000000>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&sdhc3_pin_a>;
+
+ vmmc-supply = <&vreg_wlan>;
+ vqmmc-supply = <&pma8084_s4>;
+
+ non-removable;

- adreno@fdb00000 {
- status = "ok";
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ wifi@1 {
+ reg = <1>;
+ compatible = "brcm,bcm4329-fmac";
+
+ interrupt-parent = <&tlmm>;
+ interrupts = <92 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-names = "host-wake";
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&wlan_sleep_clk_pin &wifi_pin>;
};
+};

- mdss@fd900000 {
- status = "ok";
+&sdhc_3 {
+ status = "okay";
+ max-frequency = <100000000>;
+
+ vmmc-supply = <&pma8084_l21>;
+ vqmmc-supply = <&pma8084_l13>;
+
+ /*
+ * cd-gpio is intentionally disabled. If enabled, an SD card
+ * present during boot is not initialized correctly. Without
+ * cd-gpios the driver resorts to polling, so hotplug works.
+ */
+ pinctrl-names = "default";
+ pinctrl-0 = <&sdhc2_pin_a /* &sdhc2_cd_pin */>;
+ /* cd-gpios = <&tlmm 62 GPIO_ACTIVE_LOW>; */
+};

- mdp@fd900000 {
- status = "ok";
- };
+&tlmm {
+ blsp2_uart2_pins_active: blsp2-uart2-pins-active {
+ pins = "gpio45", "gpio46", "gpio47", "gpio48";
+ function = "blsp_uart8";
+ drive-strength = <8>;
+ bias-disable;
+ };

- dsi@fd922800 {
- status = "ok";
+ blsp2_uart2_pins_sleep: blsp2-uart2-pins-sleep {
+ pins = "gpio45", "gpio46", "gpio47", "gpio48";
+ function = "gpio";
+ drive-strength = <2>;
+ bias-pull-down;
+ };

- vdda-supply = <&pma8084_l2>;
- vdd-supply = <&pma8084_l22>;
- vddio-supply = <&pma8084_l12>;
+ bt_pins: bt-pins {
+ hostwake {
+ pins = "gpio75";
+ function = "gpio";
+ drive-strength = <16>;
+ input-enable;
+ };

- #address-cells = <1>;
- #size-cells = <0>;
+ devwake {
+ pins = "gpio91";
+ function = "gpio";
+ drive-strength = <2>;
+ };
+ };

- ports {
- port@1 {
- endpoint {
- remote-endpoint = <&panel_in>;
- data-lanes = <0 1 2 3>;
- };
- };
- };
+ sdhc1_pin_a: sdhc1-pin-active {
+ clk {
+ pins = "sdc1_clk";
+ drive-strength = <4>;
+ bias-disable;
+ };

- panel: panel@0 {
- reg = <0>;
- compatible = "samsung,s6e3fa2";
+ cmd-data {
+ pins = "sdc1_cmd", "sdc1_data";
+ drive-strength = <4>;
+ bias-pull-up;
+ };
+ };

- pinctrl-names = "default";
- pinctrl-0 = <&panel_te_pin &panel_rst_pin>;
+ sdhc2_pin_a: sdhc2-pin-active {
+ clk-cmd-data {
+ pins = "gpio35", "gpio36", "gpio37", "gpio38",
+ "gpio39", "gpio40";
+ function = "sdc3";
+ drive-strength = <8>;
+ bias-disable;
+ };
+ };

- iovdd-supply = <&pma8084_lvs4>;
- vddr-supply = <&vreg_panel>;
+ sdhc2_cd_pin: sdhc2-cd {
+ pins = "gpio62";
+ function = "gpio";

- reset-gpios = <&pma8084_gpios 17 GPIO_ACTIVE_LOW>;
- te-gpios = <&msmgpio 12 GPIO_ACTIVE_HIGH>;
+ drive-strength = <2>;
+ bias-disable;
+ };

- port {
- panel_in: endpoint {
- remote-endpoint = <&dsi0_out>;
- };
- };
- };
+ sdhc3_pin_a: sdhc3-pin-active {
+ clk {
+ pins = "sdc2_clk";
+ drive-strength = <6>;
+ bias-disable;
+ };
+
+ cmd-data {
+ pins = "sdc2_cmd", "sdc2_data";
+ drive-strength = <6>;
+ bias-pull-up;
};
+ };

- dsi-phy@fd922a00 {
- status = "ok";
+ i2c2_pins: i2c2 {
+ mux {
+ pins = "gpio6", "gpio7";
+ function = "blsp_i2c2";

- vddio-supply = <&pma8084_l12>;
+ drive-strength = <2>;
+ bias-disable;
};
};

- remoteproc@fc880000 {
- cx-supply = <&pma8084_s2>;
- mss-supply = <&pma8084_s6>;
- mx-supply = <&pma8084_s1>;
- pll-supply = <&pma8084_l12>;
+ i2c6_pins: i2c6 {
+ mux {
+ pins = "gpio29", "gpio30";
+ function = "blsp_i2c6";
+
+ drive-strength = <2>;
+ bias-disable;
+ };
};
-};

-&spmi_bus {
- pma8084@0 {
- gpios@c000 {
- gpio_keys_pin_a: gpio-keys-active {
- pins = "gpio2", "gpio3", "gpio5";
- function = "normal";
+ i2c12_pins: i2c12 {
+ mux {
+ pins = "gpio87", "gpio88";
+ function = "blsp_i2c12";

- bias-pull-up;
- power-source = <PMA8084_GPIO_S4>;
- };
+ drive-strength = <2>;
+ bias-disable;
+ };
+ };

- touchkey_pin: touchkey-int-pin {
- pins = "gpio6";
- function = "normal";
- bias-disable;
- input-enable;
- power-source = <PMA8084_GPIO_S4>;
- };
+ i2c_touchkey_pins: i2c-touchkey {
+ mux {
+ pins = "gpio95", "gpio96";
+ function = "gpio";
+ input-enable;
+ bias-pull-up;
+ };
+ };

- touch_pin: touchscreen-int-pin {
- pins = "gpio8";
- function = "normal";
- bias-disable;
- input-enable;
- power-source = <PMA8084_GPIO_S4>;
- };
+ i2c_led_gpioex_pins: i2c-led-gpioex {
+ mux {
+ pins = "gpio120", "gpio121";
+ function = "gpio";
+ input-enable;
+ bias-pull-down;
+ };
+ };

- panel_en_pin: panel-en-pin {
- pins = "gpio14";
- function = "normal";
- bias-pull-up;
- power-source = <PMA8084_GPIO_S4>;
- qcom,drive-strength = <PMIC_GPIO_STRENGTH_LOW>;
- };
+ gpioex_pin: gpioex {
+ res {
+ pins = "gpio145";
+ function = "gpio";

- wlan_sleep_clk_pin: wlan-sleep-clk-pin {
- pins = "gpio16";
- function = "func2";
+ bias-pull-up;
+ drive-strength = <2>;
+ };
+ };

- output-high;
- power-source = <PMA8084_GPIO_S4>;
- qcom,drive-strength = <PMIC_GPIO_STRENGTH_HIGH>;
- };
+ wifi_pin: wifi {
+ int {
+ pins = "gpio92";
+ function = "gpio";

- panel_rst_pin: panel-rst-pin {
- pins = "gpio17";
- function = "normal";
- bias-disable;
- power-source = <PMA8084_GPIO_S4>;
- qcom,drive-strength = <PMIC_GPIO_STRENGTH_LOW>;
- };
+ input-enable;
+ bias-pull-down;
+ };
+ };

+ panel_te_pin: panel {
+ te {
+ pins = "gpio12";
+ function = "mdp_vsync";

- fuelgauge_pin: fuelgauge-int-pin {
- pins = "gpio21";
- function = "normal";
- bias-disable;
- input-enable;
- power-source = <PMA8084_GPIO_S4>;
- };
+ drive-strength = <2>;
+ bias-disable;
};
};
};
diff --git a/arch/arm/boot/dts/qcom-msm8974-sony-xperia-amami.dts b/arch/arm/boot/dts/qcom-msm8974-sony-xperia-amami.dts
index 79e2cfbbb1ba..68d5626bf491 100644
--- a/arch/arm/boot/dts/qcom-msm8974-sony-xperia-amami.dts
+++ b/arch/arm/boot/dts/qcom-msm8974-sony-xperia-amami.dts
@@ -1,435 +1,13 @@
// SPDX-License-Identifier: GPL-2.0
-#include "qcom-msm8974.dtsi"
-#include "qcom-pm8841.dtsi"
-#include "qcom-pm8941.dtsi"
-#include <dt-bindings/gpio/gpio.h>
-#include <dt-bindings/input/input.h>
-#include <dt-bindings/pinctrl/qcom,pmic-gpio.h>
+#include "qcom-msm8974-sony-xperia-rhine.dtsi"

/ {
model = "Sony Xperia Z1 Compact";
compatible = "sony,xperia-amami", "qcom,msm8974";
-
- aliases {
- serial0 = &blsp1_uart2;
- };
-
- chosen {
- stdout-path = "serial0:115200n8";
- };
-
- gpio-keys {
- compatible = "gpio-keys";
-
- pinctrl-names = "default";
- pinctrl-0 = <&gpio_keys_pin_a>;
-
- volume-down {
- label = "volume_down";
- gpios = <&pm8941_gpios 2 GPIO_ACTIVE_LOW>;
- linux,input-type = <1>;
- linux,code = <KEY_VOLUMEDOWN>;
- };
-
- camera-snapshot {
- label = "camera_snapshot";
- gpios = <&pm8941_gpios 3 GPIO_ACTIVE_LOW>;
- linux,input-type = <1>;
- linux,code = <KEY_CAMERA>;
- };
-
- camera-focus {
- label = "camera_focus";
- gpios = <&pm8941_gpios 4 GPIO_ACTIVE_LOW>;
- linux,input-type = <1>;
- linux,code = <KEY_CAMERA_FOCUS>;
- };
-
- volume-up {
- label = "volume_up";
- gpios = <&pm8941_gpios 5 GPIO_ACTIVE_LOW>;
- linux,input-type = <1>;
- linux,code = <KEY_VOLUMEUP>;
- };
- };
-
- memory@0 {
- reg = <0 0x40000000>, <0x40000000 0x40000000>;
- device_type = "memory";
- };
-
- smd {
- rpm {
- rpm_requests {
- pm8841-regulators {
- s1 {
- regulator-min-microvolt = <675000>;
- regulator-max-microvolt = <1050000>;
- };
-
- s2 {
- regulator-min-microvolt = <500000>;
- regulator-max-microvolt = <1050000>;
- };
-
- s3 {
- regulator-min-microvolt = <500000>;
- regulator-max-microvolt = <1050000>;
- };
-
- s4 {
- regulator-min-microvolt = <500000>;
- regulator-max-microvolt = <1050000>;
- };
- };
-
- pm8941-regulators {
- vdd_l1_l3-supply = <&pm8941_s1>;
- vdd_l2_lvs1_2_3-supply = <&pm8941_s3>;
- vdd_l4_l11-supply = <&pm8941_s1>;
- vdd_l5_l7-supply = <&pm8941_s2>;
- vdd_l6_l12_l14_l15-supply = <&pm8941_s2>;
- vdd_l9_l10_l17_l22-supply = <&vreg_boost>;
- vdd_l13_l20_l23_l24-supply = <&vreg_boost>;
- vdd_l21-supply = <&vreg_boost>;
-
- s1 {
- regulator-min-microvolt = <1300000>;
- regulator-max-microvolt = <1300000>;
- regulator-always-on;
- regulator-boot-on;
- };
-
- s2 {
- regulator-min-microvolt = <2150000>;
- regulator-max-microvolt = <2150000>;
- regulator-boot-on;
- };
-
- s3 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- regulator-always-on;
- regulator-boot-on;
- };
-
- s4 {
- regulator-min-microvolt = <5000000>;
- regulator-max-microvolt = <5000000>;
- };
-
- l1 {
- regulator-min-microvolt = <1225000>;
- regulator-max-microvolt = <1225000>;
-
- regulator-always-on;
- regulator-boot-on;
- };
-
- l2 {
- regulator-min-microvolt = <1200000>;
- regulator-max-microvolt = <1200000>;
- };
-
- l3 {
- regulator-min-microvolt = <1200000>;
- regulator-max-microvolt = <1200000>;
- };
-
- l4 {
- regulator-min-microvolt = <1225000>;
- regulator-max-microvolt = <1225000>;
- };
-
- l5 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- l6 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-boot-on;
- };
-
- l7 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-boot-on;
- };
-
- l8 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- l9 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <2950000>;
- };
-
- l11 {
- regulator-min-microvolt = <1300000>;
- regulator-max-microvolt = <1350000>;
- };
-
- l12 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-always-on;
- regulator-boot-on;
- };
-
- l13 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-boot-on;
- };
-
- l14 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- l15 {
- regulator-min-microvolt = <2050000>;
- regulator-max-microvolt = <2050000>;
- };
-
- l16 {
- regulator-min-microvolt = <2700000>;
- regulator-max-microvolt = <2700000>;
- };
-
- l17 {
- regulator-min-microvolt = <2700000>;
- regulator-max-microvolt = <2700000>;
- };
-
- l18 {
- regulator-min-microvolt = <2850000>;
- regulator-max-microvolt = <2850000>;
- };
-
- l19 {
- regulator-min-microvolt = <3300000>;
- regulator-max-microvolt = <3300000>;
- };
-
- l20 {
- regulator-min-microvolt = <2950000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-allow-set-load;
- regulator-boot-on;
- regulator-system-load = <200000>;
- };
-
- l21 {
- regulator-min-microvolt = <2950000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-boot-on;
- };
-
- l22 {
- regulator-min-microvolt = <3000000>;
- regulator-max-microvolt = <3000000>;
- };
-
- l23 {
- regulator-min-microvolt = <2800000>;
- regulator-max-microvolt = <2800000>;
- };
-
- l24 {
- regulator-min-microvolt = <3075000>;
- regulator-max-microvolt = <3075000>;
-
- regulator-boot-on;
- };
- };
- };
- };
- };
};

-&soc {
- sdhci@f9824900 {
- status = "okay";
-
- vmmc-supply = <&pm8941_l20>;
- vqmmc-supply = <&pm8941_s3>;
-
- bus-width = <8>;
- non-removable;
-
- pinctrl-names = "default";
- pinctrl-0 = <&sdhc1_pin_a>;
- };
-
- sdhci@f98a4900 {
- status = "okay";
-
- bus-width = <4>;
-
- vmmc-supply = <&pm8941_l21>;
- vqmmc-supply = <&pm8941_l13>;
-
- cd-gpios = <&msmgpio 62 GPIO_ACTIVE_LOW>;
-
- pinctrl-names = "default";
- pinctrl-0 = <&sdhc2_pin_a>, <&sdhc2_cd_pin_a>;
- };
-
- serial@f991e000 {
- status = "okay";
-
- pinctrl-names = "default";
- pinctrl-0 = <&blsp1_uart2_pin_a>;
- };
-
-
- pinctrl@fd510000 {
- blsp1_uart2_pin_a: blsp1-uart2-pin-active {
- rx {
- pins = "gpio5";
- function = "blsp_uart2";
-
- drive-strength = <2>;
- bias-pull-up;
- };
-
- tx {
- pins = "gpio4";
- function = "blsp_uart2";
-
- drive-strength = <4>;
- bias-disable;
- };
- };
-
- i2c2_pins: i2c2 {
- mux {
- pins = "gpio6", "gpio7";
- function = "blsp_i2c2";
-
- drive-strength = <2>;
- bias-disable;
- };
- };
-
- sdhc1_pin_a: sdhc1-pin-active {
- clk {
- pins = "sdc1_clk";
- drive-strength = <16>;
- bias-disable;
- };
-
- cmd-data {
- pins = "sdc1_cmd", "sdc1_data";
- drive-strength = <10>;
- bias-pull-up;
- };
- };
-
- sdhc2_cd_pin_a: sdhc2-cd-pin-active {
- pins = "gpio62";
- function = "gpio";
-
- drive-strength = <2>;
- bias-disable;
- };
-
- sdhc2_pin_a: sdhc2-pin-active {
- clk {
- pins = "sdc2_clk";
- drive-strength = <10>;
- bias-disable;
- };
-
- cmd-data {
- pins = "sdc2_cmd", "sdc2_data";
- drive-strength = <6>;
- bias-pull-up;
- };
- };
- };
-
- dma-controller@f9944000 {
- qcom,controlled-remotely;
- };
-
- usb@f9a55000 {
- status = "okay";
-
- phys = <&usb_hs1_phy>;
- phy-select = <&tcsr 0xb000 0>;
- extcon = <&smbb>, <&usb_id>;
- vbus-supply = <&chg_otg>;
-
- hnp-disable;
- srp-disable;
- adp-disable;
-
- ulpi {
- phy@a {
- status = "okay";
-
- v1p8-supply = <&pm8941_l6>;
- v3p3-supply = <&pm8941_l24>;
-
- extcon = <&smbb>;
- qcom,init-seq = /bits/ 8 <0x1 0x64>;
- };
- };
- };
-};
-
-&spmi_bus {
- pm8941@0 {
- charger@1000 {
- qcom,fast-charge-safe-current = <1300000>;
- qcom,fast-charge-current-limit = <1300000>;
- qcom,dc-current-limit = <1300000>;
- qcom,fast-charge-safe-voltage = <4400000>;
- qcom,fast-charge-high-threshold-voltage = <4350000>;
- qcom,fast-charge-low-threshold-voltage = <3400000>;
- qcom,auto-recharge-threshold-voltage = <4200000>;
- qcom,minimum-input-voltage = <4300000>;
- };
-
- gpios@c000 {
- gpio_keys_pin_a: gpio-keys-active {
- pins = "gpio2", "gpio3", "gpio4", "gpio5";
- function = "normal";
-
- bias-pull-up;
- power-source = <PM8941_GPIO_S3>;
- };
- };
-
- coincell@2800 {
- status = "okay";
- qcom,rset-ohms = <2100>;
- qcom,vset-millivolts = <3000>;
- };
- };
-
- pm8941@1 {
- wled@d800 {
- status = "okay";
-
- qcom,cs-out;
- qcom,current-limit = <20>;
- qcom,current-boost-limit = <805>;
- qcom,switching-freq = <1600>;
- qcom,ovp = <29>;
- qcom,num-strings = <2>;
- };
- };
+&smbb {
+ qcom,fast-charge-safe-current = <1300000>;
+ qcom,fast-charge-current-limit = <1300000>;
+ qcom,dc-current-limit = <1300000>;
};
diff --git a/arch/arm/boot/dts/qcom-msm8974-sony-xperia-castor.dts b/arch/arm/boot/dts/qcom-msm8974-sony-xperia-castor.dts
index e66937e3f7dd..b8b6447b3217 100644
--- a/arch/arm/boot/dts/qcom-msm8974-sony-xperia-castor.dts
+++ b/arch/arm/boot/dts/qcom-msm8974-sony-xperia-castor.dts
@@ -11,7 +11,7 @@ / {

aliases {
serial0 = &blsp1_uart2;
- serial1 = &blsp2_uart7;
+ serial1 = &blsp2_uart1;
};

chosen {
@@ -53,193 +53,13 @@ volume-up {
};
};

- smd {
- rpm {
- rpm_requests {
- pm8941-regulators {
- vdd_l1_l3-supply = <&pm8941_s1>;
- vdd_l2_lvs1_2_3-supply = <&pm8941_s3>;
- vdd_l4_l11-supply = <&pm8941_s1>;
- vdd_l5_l7-supply = <&pm8941_s2>;
- vdd_l6_l12_l14_l15-supply = <&pm8941_s2>;
- vdd_l9_l10_l17_l22-supply = <&vreg_boost>;
- vdd_l13_l20_l23_l24-supply = <&vreg_boost>;
- vdd_l21-supply = <&vreg_boost>;
-
- s1 {
- regulator-min-microvolt = <1300000>;
- regulator-max-microvolt = <1300000>;
- regulator-always-on;
- regulator-boot-on;
- };
-
- s2 {
- regulator-min-microvolt = <2150000>;
- regulator-max-microvolt = <2150000>;
- regulator-boot-on;
- };
-
- s3 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- regulator-always-on;
- regulator-boot-on;
-
- regulator-system-load = <154000>;
- };
-
- s4 {
- regulator-min-microvolt = <5000000>;
- regulator-max-microvolt = <5000000>;
- };
-
- l1 {
- regulator-min-microvolt = <1225000>;
- regulator-max-microvolt = <1225000>;
-
- regulator-always-on;
- regulator-boot-on;
- };
-
- l2 {
- regulator-min-microvolt = <1200000>;
- regulator-max-microvolt = <1200000>;
- };
-
- l3 {
- regulator-min-microvolt = <1200000>;
- regulator-max-microvolt = <1200000>;
- };
-
- l4 {
- regulator-min-microvolt = <1225000>;
- regulator-max-microvolt = <1225000>;
- };
-
- l5 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- l6 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-boot-on;
- };
-
- l7 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-boot-on;
- };
-
- l8 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- l9 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <2950000>;
- };
-
- l11 {
- regulator-min-microvolt = <1300000>;
- regulator-max-microvolt = <1350000>;
- };
-
- l12 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-always-on;
- regulator-boot-on;
- };
-
- l13 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-boot-on;
- };
-
- l14 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- l15 {
- regulator-min-microvolt = <2050000>;
- regulator-max-microvolt = <2050000>;
- };
-
- l16 {
- regulator-min-microvolt = <2700000>;
- regulator-max-microvolt = <2700000>;
- };
-
- l17 {
- regulator-min-microvolt = <2700000>;
- regulator-max-microvolt = <2700000>;
- };
-
- l18 {
- regulator-min-microvolt = <2850000>;
- regulator-max-microvolt = <2850000>;
- };
-
- l19 {
- regulator-min-microvolt = <2850000>;
- regulator-max-microvolt = <2850000>;
- };
-
- l20 {
- regulator-min-microvolt = <2950000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-allow-set-load;
- regulator-boot-on;
- regulator-allow-set-load;
- regulator-system-load = <500000>;
- };
-
- l21 {
- regulator-min-microvolt = <2950000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-boot-on;
- };
-
- l22 {
- regulator-min-microvolt = <3000000>;
- regulator-max-microvolt = <3000000>;
- };
-
- l23 {
- regulator-min-microvolt = <2800000>;
- regulator-max-microvolt = <2800000>;
- };
-
- l24 {
- regulator-min-microvolt = <3075000>;
- regulator-max-microvolt = <3075000>;
-
- regulator-boot-on;
- };
- };
- };
- };
- };
-
vreg_bl_vddio: lcd-backlight-vddio {
compatible = "regulator-fixed";
regulator-name = "vreg_bl_vddio";
regulator-min-microvolt = <3150000>;
regulator-max-microvolt = <3150000>;

- gpio = <&msmgpio 69 0>;
+ gpio = <&tlmm 69 0>;
enable-active-high;

vin-supply = <&pm8941_s3>;
@@ -277,447 +97,605 @@ vreg_wlan: wlan-regulator {
};
};

-&soc {
- sdhci@f9824900 {
- status = "okay";
+&blsp1_uart2 {
+ status = "okay";

- vmmc-supply = <&pm8941_l20>;
- vqmmc-supply = <&pm8941_s3>;
-
- bus-width = <8>;
- non-removable;
+ pinctrl-names = "default";
+ pinctrl-0 = <&blsp1_uart2_pin_a>;
+};

- pinctrl-names = "default";
- pinctrl-0 = <&sdhc1_pin_a>;
- };
+&blsp2_i2c2 {
+ status = "okay";
+ clock-frequency = <355000>;

- sdhci@f9864900 {
- status = "okay";
+ pinctrl-names = "default";
+ pinctrl-0 = <&i2c8_pins>;

- max-frequency = <100000000>;
- non-removable;
- vmmc-supply = <&vreg_wlan>;
+ synaptics@2c {
+ compatible = "syna,rmi4-i2c";
+ reg = <0x2c>;

- pinctrl-names = "default";
- pinctrl-0 = <&sdhc3_pin_a>;
+ interrupt-parent = <&tlmm>;
+ interrupts = <86 IRQ_TYPE_EDGE_FALLING>;

#address-cells = <1>;
#size-cells = <0>;

- bcrmf@1 {
- compatible = "brcm,bcm4339-fmac", "brcm,bcm4329-fmac";
- reg = <1>;
+ vdd-supply = <&pm8941_l22>;
+ vio-supply = <&pm8941_lvs3>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&ts_int_pin>;

- brcm,drive-strength = <10>;
+ syna,startup-delay-ms = <10>;

- pinctrl-names = "default";
- pinctrl-0 = <&wlan_sleep_clk_pin>;
+ rmi-f01@1 {
+ reg = <0x1>;
+ syna,nosleep = <1>;
};
- };

- sdhci@f98a4900 {
- status = "okay";
+ rmi-f11@11 {
+ reg = <0x11>;
+ syna,f11-flip-x = <1>;
+ syna,sensor-type = <1>;
+ };
+ };
+};

- bus-width = <4>;
+&blsp2_i2c5 {
+ status = "okay";
+ clock-frequency = <355000>;

- vmmc-supply = <&pm8941_l21>;
- vqmmc-supply = <&pm8941_l13>;
+ pinctrl-names = "default";
+ pinctrl-0 = <&i2c11_pins>;

- cd-gpios = <&msmgpio 62 GPIO_ACTIVE_LOW>;
+ lp8566_wled: backlight@2c {
+ compatible = "ti,lp8556";
+ reg = <0x2c>;
+ power-supply = <&vreg_bl_vddio>;

- pinctrl-names = "default";
- pinctrl-0 = <&sdhc2_pin_a>, <&sdhc2_cd_pin_a>;
+ bl-name = "backlight";
+ dev-ctrl = /bits/ 8 <0x05>;
+ init-brt = /bits/ 8 <0x3f>;
+ rom_a0h {
+ rom-addr = /bits/ 8 <0xa0>;
+ rom-val = /bits/ 8 <0xff>;
+ };
+ rom_a1h {
+ rom-addr = /bits/ 8 <0xa1>;
+ rom-val = /bits/ 8 <0x3f>;
+ };
+ rom_a2h {
+ rom-addr = /bits/ 8 <0xa2>;
+ rom-val = /bits/ 8 <0x20>;
+ };
+ rom_a3h {
+ rom-addr = /bits/ 8 <0xa3>;
+ rom-val = /bits/ 8 <0x5e>;
+ };
+ rom_a4h {
+ rom-addr = /bits/ 8 <0xa4>;
+ rom-val = /bits/ 8 <0x02>;
+ };
+ rom_a5h {
+ rom-addr = /bits/ 8 <0xa5>;
+ rom-val = /bits/ 8 <0x04>;
+ };
+ rom_a6h {
+ rom-addr = /bits/ 8 <0xa6>;
+ rom-val = /bits/ 8 <0x80>;
+ };
+ rom_a7h {
+ rom-addr = /bits/ 8 <0xa7>;
+ rom-val = /bits/ 8 <0xf7>;
+ };
+ rom_a9h {
+ rom-addr = /bits/ 8 <0xa9>;
+ rom-val = /bits/ 8 <0x80>;
+ };
+ rom_aah {
+ rom-addr = /bits/ 8 <0xaa>;
+ rom-val = /bits/ 8 <0x0f>;
+ };
+ rom_aeh {
+ rom-addr = /bits/ 8 <0xae>;
+ rom-val = /bits/ 8 <0x0f>;
+ };
};
+};
+
+&blsp2_uart1 {
+ status = "okay";

- serial@f991e000 {
- status = "okay";
+ pinctrl-names = "default";
+ pinctrl-0 = <&blsp2_uart7_pin_a>;
+
+ bluetooth {
+ compatible = "brcm,bcm43438-bt";
+ max-speed = <3000000>;

pinctrl-names = "default";
- pinctrl-0 = <&blsp1_uart2_pin_a>;
+ pinctrl-0 = <&bt_host_wake_pin>,
+ <&bt_dev_wake_pin>,
+ <&bt_reg_on_pin>;
+
+ host-wakeup-gpios = <&tlmm 95 GPIO_ACTIVE_HIGH>;
+ device-wakeup-gpios = <&tlmm 96 GPIO_ACTIVE_HIGH>;
+ shutdown-gpios = <&pm8941_gpios 16 GPIO_ACTIVE_HIGH>;
};
+};

- serial@f995d000 {
- status = "ok";
+&otg {
+ status = "okay";

- pinctrl-names = "default";
- pinctrl-0 = <&blsp2_uart7_pin_a>;
+ phys = <&usb_hs1_phy>;
+ phy-select = <&tcsr 0xb000 0>;
+ extcon = <&smbb>, <&usb_id>;
+ vbus-supply = <&chg_otg>;

- bluetooth {
- compatible = "brcm,bcm43438-bt";
- max-speed = <3000000>;
+ hnp-disable;
+ srp-disable;
+ adp-disable;

- pinctrl-names = "default";
- pinctrl-0 = <&bt_host_wake_pin>,
- <&bt_dev_wake_pin>,
- <&bt_reg_on_pin>;
+ ulpi {
+ phy@a {
+ status = "okay";

- host-wakeup-gpios = <&msmgpio 95 GPIO_ACTIVE_HIGH>;
- device-wakeup-gpios = <&msmgpio 96 GPIO_ACTIVE_HIGH>;
- shutdown-gpios = <&pm8941_gpios 16 GPIO_ACTIVE_HIGH>;
+ v1p8-supply = <&pm8941_l6>;
+ v3p3-supply = <&pm8941_l24>;
+
+ extcon = <&smbb>;
+ qcom,init-seq = /bits/ 8 <0x1 0x64>;
};
};
+};

- usb@f9a55000 {
- status = "okay";
+&pm8941_coincell {
+ status = "okay";

- phys = <&usb_hs1_phy>;
- phy-select = <&tcsr 0xb000 0>;
- extcon = <&smbb>, <&usb_id>;
- vbus-supply = <&chg_otg>;
+ qcom,rset-ohms = <2100>;
+ qcom,vset-millivolts = <3000>;
+};

- hnp-disable;
- srp-disable;
- adp-disable;
+&pm8941_gpios {
+ gpio_keys_pin_a: gpio-keys-active {
+ pins = "gpio2", "gpio5";
+ function = "normal";

- ulpi {
- phy@a {
- status = "okay";
+ bias-pull-up;
+ power-source = <PM8941_GPIO_S3>;
+ };

- v1p8-supply = <&pm8941_l6>;
- v3p3-supply = <&pm8941_l24>;
+ bt_reg_on_pin: bt-reg-on {
+ pins = "gpio16";
+ function = "normal";

- extcon = <&smbb>;
- qcom,init-seq = /bits/ 8 <0x1 0x64>;
- };
- };
+ output-low;
+ power-source = <PM8941_GPIO_S3>;
};

- pinctrl@fd510000 {
- blsp1_uart2_pin_a: blsp1-uart2-pin-active {
- rx {
- pins = "gpio5";
- function = "blsp_uart2";
+ wlan_sleep_clk_pin: wl-sleep-clk {
+ pins = "gpio17";
+ function = "func2";

- drive-strength = <2>;
- bias-pull-up;
- };
+ output-high;
+ power-source = <PM8941_GPIO_S3>;
+ };

- tx {
- pins = "gpio4";
- function = "blsp_uart2";
+ wlan_regulator_pin: wl-reg-active {
+ pins = "gpio18";
+ function = "normal";

- drive-strength = <4>;
- bias-disable;
- };
- };
+ bias-disable;
+ power-source = <PM8941_GPIO_S3>;
+ };

- blsp2_uart7_pin_a: blsp2-uart7-pin-active {
- tx {
- pins = "gpio41";
- function = "blsp_uart7";
+ lcd_dcdc_en_pin_a: lcd-dcdc-en-active {
+ pins = "gpio20";
+ function = "normal";

- drive-strength = <2>;
- bias-disable;
- };
+ bias-disable;
+ power-source = <PM8941_GPIO_S3>;
+ input-disable;
+ output-low;
+ };

- rx {
- pins = "gpio42";
- function = "blsp_uart7";
+};

- drive-strength = <2>;
- bias-pull-up;
- };
+&rpm_requests {
+ pm8941-regulators {
+ compatible = "qcom,rpm-pm8941-regulators";
+
+ vdd_l1_l3-supply = <&pm8941_s1>;
+ vdd_l2_lvs1_2_3-supply = <&pm8941_s3>;
+ vdd_l4_l11-supply = <&pm8941_s1>;
+ vdd_l5_l7-supply = <&pm8941_s2>;
+ vdd_l6_l12_l14_l15-supply = <&pm8941_s2>;
+ vdd_l9_l10_l17_l22-supply = <&vreg_boost>;
+ vdd_l13_l20_l23_l24-supply = <&vreg_boost>;
+ vdd_l21-supply = <&vreg_boost>;
+
+ pm8941_s1: s1 {
+ regulator-min-microvolt = <1300000>;
+ regulator-max-microvolt = <1300000>;
+ regulator-always-on;
+ regulator-boot-on;
+ };

- cts {
- pins = "gpio43";
- function = "blsp_uart7";
+ pm8941_s2: s2 {
+ regulator-min-microvolt = <2150000>;
+ regulator-max-microvolt = <2150000>;
+ regulator-boot-on;
+ };

- drive-strength = <2>;
- bias-pull-up;
- };
+ pm8941_s3: s3 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-system-load = <154000>;
+ regulator-always-on;
+ regulator-boot-on;
+ };

- rts {
- pins = "gpio44";
- function = "blsp_uart7";
+ pm8941_s4: s4 {
+ regulator-min-microvolt = <5000000>;
+ regulator-max-microvolt = <5000000>;
+ };

- drive-strength = <2>;
- bias-disable;
- };
+ pm8941_l1: l1 {
+ regulator-min-microvolt = <1225000>;
+ regulator-max-microvolt = <1225000>;
+ regulator-always-on;
+ regulator-boot-on;
};

- i2c8_pins: i2c8 {
- mux {
- pins = "gpio47", "gpio48";
- function = "blsp_i2c8";
+ pm8941_l2: l2 {
+ regulator-min-microvolt = <1200000>;
+ regulator-max-microvolt = <1200000>;
+ };

- drive-strength = <2>;
- bias-disable;
- };
+ pm8941_l3: l3 {
+ regulator-min-microvolt = <1200000>;
+ regulator-max-microvolt = <1200000>;
};

- i2c11_pins: i2c11 {
- mux {
- pins = "gpio83", "gpio84";
- function = "blsp_i2c11";
+ pm8941_l4: l4 {
+ regulator-min-microvolt = <1225000>;
+ regulator-max-microvolt = <1225000>;
+ };

- drive-strength = <2>;
- bias-disable;
- };
+ pm8941_l5: l5 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
};

- lcd_backlight_en_pin_a: lcd-backlight-vddio {
- pins = "gpio69";
- drive-strength = <10>;
- output-low;
- bias-disable;
+ pm8941_l6: l6 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-boot-on;
};

- sdhc1_pin_a: sdhc1-pin-active {
- clk {
- pins = "sdc1_clk";
- drive-strength = <16>;
- bias-disable;
- };
+ pm8941_l7: l7 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-boot-on;
+ };

- cmd-data {
- pins = "sdc1_cmd", "sdc1_data";
- drive-strength = <10>;
- bias-pull-up;
- };
+ pm8941_l8: l8 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
};

- sdhc2_cd_pin_a: sdhc2-cd-pin-active {
- pins = "gpio62";
- function = "gpio";
+ pm8941_l9: l9 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <2950000>;
+ };

- drive-strength = <2>;
- bias-disable;
- };
+ pm8941_l11: l11 {
+ regulator-min-microvolt = <1300000>;
+ regulator-max-microvolt = <1350000>;
+ };

- sdhc2_pin_a: sdhc2-pin-active {
- clk {
- pins = "sdc2_clk";
- drive-strength = <6>;
- bias-disable;
- };
+ pm8941_l12: l12 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-always-on;
+ regulator-boot-on;
+ };

- cmd-data {
- pins = "sdc2_cmd", "sdc2_data";
- drive-strength = <6>;
- bias-pull-up;
- };
+ pm8941_l13: l13 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <2950000>;
+ regulator-boot-on;
};

- sdhc3_pin_a: sdhc3-pin-active {
- clk {
- pins = "gpio40";
- function = "sdc3";
+ pm8941_l14: l14 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };

- drive-strength = <10>;
- bias-disable;
- };
+ pm8941_l15: l15 {
+ regulator-min-microvolt = <2050000>;
+ regulator-max-microvolt = <2050000>;
+ };

- cmd {
- pins = "gpio39";
- function = "sdc3";
+ pm8941_l16: l16 {
+ regulator-min-microvolt = <2700000>;
+ regulator-max-microvolt = <2700000>;
+ };

- drive-strength = <10>;
- bias-pull-up;
- };
+ pm8941_l17: l17 {
+ regulator-min-microvolt = <2700000>;
+ regulator-max-microvolt = <2700000>;
+ };

- data {
- pins = "gpio35", "gpio36", "gpio37", "gpio38";
- function = "sdc3";
+ pm8941_l18: l18 {
+ regulator-min-microvolt = <2850000>;
+ regulator-max-microvolt = <2850000>;
+ };

- drive-strength = <10>;
- bias-pull-up;
- };
+ pm8941_l19: l19 {
+ regulator-min-microvolt = <2850000>;
+ regulator-max-microvolt = <2850000>;
};

- ts_int_pin: synaptics {
- pin {
- pins = "gpio86";
- function = "gpio";
+ pm8941_l20: l20 {
+ regulator-min-microvolt = <2950000>;
+ regulator-max-microvolt = <2950000>;
+ regulator-system-load = <500000>;
+ regulator-allow-set-load;
+ regulator-boot-on;
+ };

- drive-strength = <2>;
- bias-disable;
- input-enable;
- };
+ pm8941_l21: l21 {
+ regulator-min-microvolt = <2950000>;
+ regulator-max-microvolt = <2950000>;
+ regulator-boot-on;
};

- bt_host_wake_pin: bt-host-wake {
- pins = "gpio95";
- function = "gpio";
+ pm8941_l22: l22 {
+ regulator-min-microvolt = <3000000>;
+ regulator-max-microvolt = <3000000>;
+ };

- drive-strength = <2>;
- bias-disable;
- output-low;
+ pm8941_l23: l23 {
+ regulator-min-microvolt = <2800000>;
+ regulator-max-microvolt = <2800000>;
};

- bt_dev_wake_pin: bt-dev-wake {
- pins = "gpio96";
- function = "gpio";
+ pm8941_l24: l24 {
+ regulator-min-microvolt = <3075000>;
+ regulator-max-microvolt = <3075000>;
+ regulator-boot-on;
+ };
+
+ pm8941_lvs3: lvs3 {};
+ };
+};
+
+&sdhc_1 {
+ status = "okay";
+
+ vmmc-supply = <&pm8941_l20>;
+ vqmmc-supply = <&pm8941_s3>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&sdhc1_pin_a>;
+};
+
+&sdhc_2 {
+ status = "okay";
+
+ vmmc-supply = <&pm8941_l21>;
+ vqmmc-supply = <&pm8941_l13>;
+
+ cd-gpios = <&tlmm 62 GPIO_ACTIVE_LOW>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&sdhc2_pin_a>, <&sdhc2_cd_pin_a>;
+};
+
+&sdhc_3 {
+ status = "okay";
+
+ max-frequency = <100000000>;
+ vmmc-supply = <&vreg_wlan>;
+ non-removable;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&sdhc3_pin_a>;
+
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ bcrmf@1 {
+ compatible = "brcm,bcm4339-fmac", "brcm,bcm4329-fmac";
+ reg = <1>;
+
+ brcm,drive-strength = <10>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&wlan_sleep_clk_pin>;
+ };
+};
+
+&smbb {
+ qcom,fast-charge-safe-current = <1500000>;
+ qcom,fast-charge-current-limit = <1500000>;
+ qcom,dc-current-limit = <1800000>;
+ qcom,fast-charge-safe-voltage = <4400000>;
+ qcom,fast-charge-high-threshold-voltage = <4350000>;
+ qcom,fast-charge-low-threshold-voltage = <3400000>;
+ qcom,auto-recharge-threshold-voltage = <4200000>;
+ qcom,minimum-input-voltage = <4300000>;
+};
+
+&tlmm {
+ blsp1_uart2_pin_a: blsp1-uart2-pin-active {
+ rx {
+ pins = "gpio5";
+ function = "blsp_uart2";

drive-strength = <2>;
+ bias-pull-up;
+ };
+
+ tx {
+ pins = "gpio4";
+ function = "blsp_uart2";
+
+ drive-strength = <4>;
bias-disable;
};
};

- i2c@f9964000 {
- status = "okay";
+ blsp2_uart7_pin_a: blsp2-uart7-pin-active {
+ tx {
+ pins = "gpio41";
+ function = "blsp_uart7";

- clock-frequency = <355000>;
- qcom,src-freq = <50000000>;
-
- pinctrl-names = "default";
- pinctrl-0 = <&i2c8_pins>;
+ drive-strength = <2>;
+ bias-disable;
+ };

- synaptics@2c {
- compatible = "syna,rmi4-i2c";
- reg = <0x2c>;
+ rx {
+ pins = "gpio42";
+ function = "blsp_uart7";

- interrupt-parent = <&msmgpio>;
- interrupts = <86 IRQ_TYPE_EDGE_FALLING>;
+ drive-strength = <2>;
+ bias-pull-up;
+ };

- #address-cells = <1>;
- #size-cells = <0>;
+ cts {
+ pins = "gpio43";
+ function = "blsp_uart7";

- vdd-supply = <&pm8941_l22>;
- vio-supply = <&pm8941_lvs3>;
+ drive-strength = <2>;
+ bias-pull-up;
+ };

- pinctrl-names = "default";
- pinctrl-0 = <&ts_int_pin>;
+ rts {
+ pins = "gpio44";
+ function = "blsp_uart7";

- syna,startup-delay-ms = <10>;
+ drive-strength = <2>;
+ bias-disable;
+ };
+ };

- rmi-f01@1 {
- reg = <0x1>;
- syna,nosleep = <1>;
- };
+ i2c8_pins: i2c8 {
+ mux {
+ pins = "gpio47", "gpio48";
+ function = "blsp_i2c8";

- rmi-f11@11 {
- reg = <0x11>;
- syna,f11-flip-x = <1>;
- syna,sensor-type = <1>;
- };
+ drive-strength = <2>;
+ bias-disable;
};
};

- i2c@f9967000 {
- status = "okay";
- pinctrl-names = "default";
- pinctrl-0 = <&i2c11_pins>;
- clock-frequency = <355000>;
- qcom,src-freq = <50000000>;
-
- lp8566_wled: backlight@2c {
- compatible = "ti,lp8556";
- reg = <0x2c>;
- power-supply = <&vreg_bl_vddio>;
-
- bl-name = "backlight";
- dev-ctrl = /bits/ 8 <0x05>;
- init-brt = /bits/ 8 <0x3f>;
- rom_a0h {
- rom-addr = /bits/ 8 <0xa0>;
- rom-val = /bits/ 8 <0xff>;
- };
- rom_a1h {
- rom-addr = /bits/ 8 <0xa1>;
- rom-val = /bits/ 8 <0x3f>;
- };
- rom_a2h {
- rom-addr = /bits/ 8 <0xa2>;
- rom-val = /bits/ 8 <0x20>;
- };
- rom_a3h {
- rom-addr = /bits/ 8 <0xa3>;
- rom-val = /bits/ 8 <0x5e>;
- };
- rom_a4h {
- rom-addr = /bits/ 8 <0xa4>;
- rom-val = /bits/ 8 <0x02>;
- };
- rom_a5h {
- rom-addr = /bits/ 8 <0xa5>;
- rom-val = /bits/ 8 <0x04>;
- };
- rom_a6h {
- rom-addr = /bits/ 8 <0xa6>;
- rom-val = /bits/ 8 <0x80>;
- };
- rom_a7h {
- rom-addr = /bits/ 8 <0xa7>;
- rom-val = /bits/ 8 <0xf7>;
- };
- rom_a9h {
- rom-addr = /bits/ 8 <0xa9>;
- rom-val = /bits/ 8 <0x80>;
- };
- rom_aah {
- rom-addr = /bits/ 8 <0xaa>;
- rom-val = /bits/ 8 <0x0f>;
- };
- rom_aeh {
- rom-addr = /bits/ 8 <0xae>;
- rom-val = /bits/ 8 <0x0f>;
- };
+ i2c11_pins: i2c11 {
+ mux {
+ pins = "gpio83", "gpio84";
+ function = "blsp_i2c11";
+
+ drive-strength = <2>;
+ bias-disable;
};
};
-};

-&spmi_bus {
- pm8941@0 {
- charger@1000 {
- qcom,fast-charge-safe-current = <1500000>;
- qcom,fast-charge-current-limit = <1500000>;
- qcom,dc-current-limit = <1800000>;
- qcom,fast-charge-safe-voltage = <4400000>;
- qcom,fast-charge-high-threshold-voltage = <4350000>;
- qcom,fast-charge-low-threshold-voltage = <3400000>;
- qcom,auto-recharge-threshold-voltage = <4200000>;
- qcom,minimum-input-voltage = <4300000>;
+ lcd_backlight_en_pin_a: lcd-backlight-vddio {
+ pins = "gpio69";
+ drive-strength = <10>;
+ output-low;
+ bias-disable;
+ };
+
+ sdhc1_pin_a: sdhc1-pin-active {
+ clk {
+ pins = "sdc1_clk";
+ drive-strength = <16>;
+ bias-disable;
};

- gpios@c000 {
- gpio_keys_pin_a: gpio-keys-active {
- pins = "gpio2", "gpio5";
- function = "normal";
+ cmd-data {
+ pins = "sdc1_cmd", "sdc1_data";
+ drive-strength = <10>;
+ bias-pull-up;
+ };
+ };

- bias-pull-up;
- power-source = <PM8941_GPIO_S3>;
- };
+ sdhc2_cd_pin_a: sdhc2-cd-pin-active {
+ pins = "gpio62";
+ function = "gpio";

- bt_reg_on_pin: bt-reg-on {
- pins = "gpio16";
- function = "normal";
+ drive-strength = <2>;
+ bias-disable;
+ };

- output-low;
- power-source = <PM8941_GPIO_S3>;
- };
+ sdhc2_pin_a: sdhc2-pin-active {
+ clk {
+ pins = "sdc2_clk";
+ drive-strength = <6>;
+ bias-disable;
+ };

- wlan_sleep_clk_pin: wl-sleep-clk {
- pins = "gpio17";
- function = "func2";
+ cmd-data {
+ pins = "sdc2_cmd", "sdc2_data";
+ drive-strength = <6>;
+ bias-pull-up;
+ };
+ };

- output-high;
- power-source = <PM8941_GPIO_S3>;
- };
+ sdhc3_pin_a: sdhc3-pin-active {
+ clk {
+ pins = "gpio40";
+ function = "sdc3";

- wlan_regulator_pin: wl-reg-active {
- pins = "gpio18";
- function = "normal";
+ drive-strength = <10>;
+ bias-disable;
+ };

- bias-disable;
- power-source = <PM8941_GPIO_S3>;
- };
+ cmd {
+ pins = "gpio39";
+ function = "sdc3";

- lcd_dcdc_en_pin_a: lcd-dcdc-en-active {
- pins = "gpio20";
- function = "normal";
+ drive-strength = <10>;
+ bias-pull-up;
+ };

- bias-disable;
- power-source = <PM8941_GPIO_S3>;
- input-disable;
- output-low;
- };
+ data {
+ pins = "gpio35", "gpio36", "gpio37", "gpio38";
+ function = "sdc3";

+ drive-strength = <10>;
+ bias-pull-up;
};
+ };

- coincell@2800 {
- status = "okay";
- qcom,rset-ohms = <2100>;
- qcom,vset-millivolts = <3000>;
+ ts_int_pin: synaptics {
+ pin {
+ pins = "gpio86";
+ function = "gpio";
+
+ drive-strength = <2>;
+ bias-disable;
+ input-enable;
};
};
+
+ bt_host_wake_pin: bt-host-wake {
+ pins = "gpio95";
+ function = "gpio";
+
+ drive-strength = <2>;
+ bias-disable;
+ output-low;
+ };
+
+ bt_dev_wake_pin: bt-dev-wake {
+ pins = "gpio96";
+ function = "gpio";
+
+ drive-strength = <2>;
+ bias-disable;
+ };
};
diff --git a/arch/arm/boot/dts/qcom-msm8974-sony-xperia-honami.dts b/arch/arm/boot/dts/qcom-msm8974-sony-xperia-honami.dts
index a62e5c25b23c..ea6a941d8f8c 100644
--- a/arch/arm/boot/dts/qcom-msm8974-sony-xperia-honami.dts
+++ b/arch/arm/boot/dts/qcom-msm8974-sony-xperia-honami.dts
@@ -1,484 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
-#include "qcom-msm8974.dtsi"
-#include "qcom-pm8841.dtsi"
-#include "qcom-pm8941.dtsi"
-#include <dt-bindings/gpio/gpio.h>
-#include <dt-bindings/input/input.h>
-#include <dt-bindings/pinctrl/qcom,pmic-gpio.h>
+#include "qcom-msm8974-sony-xperia-rhine.dtsi"

/ {
model = "Sony Xperia Z1";
compatible = "sony,xperia-honami", "qcom,msm8974";
-
- aliases {
- serial0 = &blsp1_uart2;
- };
-
- chosen {
- stdout-path = "serial0:115200n8";
- };
-
- gpio-keys {
- compatible = "gpio-keys";
-
- pinctrl-names = "default";
- pinctrl-0 = <&gpio_keys_pin_a>;
-
- volume-down {
- label = "volume_down";
- gpios = <&pm8941_gpios 2 GPIO_ACTIVE_LOW>;
- linux,input-type = <1>;
- linux,code = <KEY_VOLUMEDOWN>;
- };
-
- camera-snapshot {
- label = "camera_snapshot";
- gpios = <&pm8941_gpios 3 GPIO_ACTIVE_LOW>;
- linux,input-type = <1>;
- linux,code = <KEY_CAMERA>;
- };
-
- camera-focus {
- label = "camera_focus";
- gpios = <&pm8941_gpios 4 GPIO_ACTIVE_LOW>;
- linux,input-type = <1>;
- linux,code = <KEY_CAMERA_FOCUS>;
- };
-
- volume-up {
- label = "volume_up";
- gpios = <&pm8941_gpios 5 GPIO_ACTIVE_LOW>;
- linux,input-type = <1>;
- linux,code = <KEY_VOLUMEUP>;
- };
- };
-
- memory@0 {
- reg = <0 0x40000000>, <0x40000000 0x40000000>;
- device_type = "memory";
- };
-
- smd {
- rpm {
- rpm_requests {
- pm8841-regulators {
- s1 {
- regulator-min-microvolt = <675000>;
- regulator-max-microvolt = <1050000>;
- };
-
- s2 {
- regulator-min-microvolt = <500000>;
- regulator-max-microvolt = <1050000>;
- };
-
- s3 {
- regulator-min-microvolt = <500000>;
- regulator-max-microvolt = <1050000>;
- };
-
- s4 {
- regulator-min-microvolt = <500000>;
- regulator-max-microvolt = <1050000>;
- };
- };
-
- pm8941-regulators {
- vdd_l1_l3-supply = <&pm8941_s1>;
- vdd_l2_lvs1_2_3-supply = <&pm8941_s3>;
- vdd_l4_l11-supply = <&pm8941_s1>;
- vdd_l5_l7-supply = <&pm8941_s2>;
- vdd_l6_l12_l14_l15-supply = <&pm8941_s2>;
- vdd_l9_l10_l17_l22-supply = <&vreg_boost>;
- vdd_l13_l20_l23_l24-supply = <&vreg_boost>;
- vdd_l21-supply = <&vreg_boost>;
-
- s1 {
- regulator-min-microvolt = <1300000>;
- regulator-max-microvolt = <1300000>;
- regulator-always-on;
- regulator-boot-on;
- };
-
- s2 {
- regulator-min-microvolt = <2150000>;
- regulator-max-microvolt = <2150000>;
- regulator-boot-on;
- };
-
- s3 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- regulator-always-on;
- regulator-boot-on;
- };
-
- s4 {
- regulator-min-microvolt = <5000000>;
- regulator-max-microvolt = <5000000>;
- };
-
- l1 {
- regulator-min-microvolt = <1225000>;
- regulator-max-microvolt = <1225000>;
-
- regulator-always-on;
- regulator-boot-on;
- };
-
- l2 {
- regulator-min-microvolt = <1200000>;
- regulator-max-microvolt = <1200000>;
- };
-
- l3 {
- regulator-min-microvolt = <1200000>;
- regulator-max-microvolt = <1200000>;
- };
-
- l4 {
- regulator-min-microvolt = <1225000>;
- regulator-max-microvolt = <1225000>;
- };
-
- l5 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- l6 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-boot-on;
- };
-
- l7 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-boot-on;
- };
-
- l8 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- l9 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <2950000>;
- };
-
- l11 {
- regulator-min-microvolt = <1300000>;
- regulator-max-microvolt = <1350000>;
- };
-
- l12 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
-
- regulator-always-on;
- regulator-boot-on;
- };
-
- l13 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-boot-on;
- };
-
- l14 {
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- };
-
- l15 {
- regulator-min-microvolt = <2050000>;
- regulator-max-microvolt = <2050000>;
- };
-
- l16 {
- regulator-min-microvolt = <2700000>;
- regulator-max-microvolt = <2700000>;
- };
-
- l17 {
- regulator-min-microvolt = <2700000>;
- regulator-max-microvolt = <2700000>;
- };
-
- l18 {
- regulator-min-microvolt = <2850000>;
- regulator-max-microvolt = <2850000>;
- };
-
- l19 {
- regulator-min-microvolt = <3300000>;
- regulator-max-microvolt = <3300000>;
- };
-
- l20 {
- regulator-min-microvolt = <2950000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-allow-set-load;
- regulator-boot-on;
- regulator-system-load = <200000>;
- };
-
- l21 {
- regulator-min-microvolt = <2950000>;
- regulator-max-microvolt = <2950000>;
-
- regulator-boot-on;
- };
-
- l22 {
- regulator-min-microvolt = <3000000>;
- regulator-max-microvolt = <3000000>;
- };
-
- l23 {
- regulator-min-microvolt = <2800000>;
- regulator-max-microvolt = <2800000>;
- };
-
- l24 {
- regulator-min-microvolt = <3075000>;
- regulator-max-microvolt = <3075000>;
-
- regulator-boot-on;
- };
- };
- };
- };
- };
-};
-
-&soc {
- usb@f9a55000 {
- status = "okay";
-
- phys = <&usb_hs1_phy>;
- phy-select = <&tcsr 0xb000 0>;
- extcon = <&smbb>, <&usb_id>;
- vbus-supply = <&chg_otg>;
-
- hnp-disable;
- srp-disable;
- adp-disable;
-
- ulpi {
- phy@a {
- status = "okay";
-
- v1p8-supply = <&pm8941_l6>;
- v3p3-supply = <&pm8941_l24>;
-
- extcon = <&smbb>;
- qcom,init-seq = /bits/ 8 <0x1 0x64>;
- };
- };
- };
-
- sdhci@f9824900 {
- status = "okay";
-
- vmmc-supply = <&pm8941_l20>;
- vqmmc-supply = <&pm8941_s3>;
-
- bus-width = <8>;
- non-removable;
-
- pinctrl-names = "default";
- pinctrl-0 = <&sdhc1_pin_a>;
- };
-
- sdhci@f98a4900 {
- status = "okay";
-
- bus-width = <4>;
-
- vmmc-supply = <&pm8941_l21>;
- vqmmc-supply = <&pm8941_l13>;
-
- cd-gpios = <&msmgpio 62 GPIO_ACTIVE_LOW>;
-
- pinctrl-names = "default";
- pinctrl-0 = <&sdhc2_pin_a>, <&sdhc2_cd_pin_a>;
- };
-
- serial@f991e000 {
- status = "okay";
-
- pinctrl-names = "default";
- pinctrl-0 = <&blsp1_uart2_pin_a>;
- };
-
- i2c@f9924000 {
- status = "okay";
-
- clock-frequency = <355000>;
- qcom,src-freq = <50000000>;
-
- pinctrl-names = "default";
- pinctrl-0 = <&i2c2_pins>;
-
- synaptics@2c {
- compatible = "syna,rmi4-i2c";
- reg = <0x2c>;
-
- interrupts-extended = <&msmgpio 61 IRQ_TYPE_EDGE_FALLING>;
-
- #address-cells = <1>;
- #size-cells = <0>;
-
- vdd-supply = <&pm8941_l22>;
- vio-supply = <&pm8941_lvs3>;
-
- pinctrl-names = "default";
- pinctrl-0 = <&ts_int_pin>;
-
- syna,startup-delay-ms = <10>;
-
- rmi4-f01@1 {
- reg = <0x1>;
- syna,nosleep-mode = <1>;
- };
-
- rmi4-f11@11 {
- reg = <0x11>;
- touchscreen-inverted-x;
- syna,sensor-type = <1>;
- };
- };
- };
-
- pinctrl@fd510000 {
- blsp1_uart2_pin_a: blsp1-uart2-pin-active {
- rx {
- pins = "gpio5";
- function = "blsp_uart2";
-
- drive-strength = <2>;
- bias-pull-up;
- };
-
- tx {
- pins = "gpio4";
- function = "blsp_uart2";
-
- drive-strength = <4>;
- bias-disable;
- };
- };
-
- i2c2_pins: i2c2 {
- mux {
- pins = "gpio6", "gpio7";
- function = "blsp_i2c2";
-
- drive-strength = <2>;
- bias-disable;
- };
- };
-
- sdhc1_pin_a: sdhc1-pin-active {
- clk {
- pins = "sdc1_clk";
- drive-strength = <16>;
- bias-disable;
- };
-
- cmd-data {
- pins = "sdc1_cmd", "sdc1_data";
- drive-strength = <10>;
- bias-pull-up;
- };
- };
-
- sdhc2_cd_pin_a: sdhc2-cd-pin-active {
- pins = "gpio62";
- function = "gpio";
-
- drive-strength = <2>;
- bias-disable;
- };
-
- sdhc2_pin_a: sdhc2-pin-active {
- clk {
- pins = "sdc2_clk";
- drive-strength = <10>;
- bias-disable;
- };
-
- cmd-data {
- pins = "sdc2_cmd", "sdc2_data";
- drive-strength = <6>;
- bias-pull-up;
- };
- };
-
- ts_int_pin: touch-int {
- pin {
- pins = "gpio61";
- function = "gpio";
-
- drive-strength = <2>;
- bias-disable;
- input-enable;
- };
- };
- };
-
- dma-controller@f9944000 {
- qcom,controlled-remotely;
- };
-};
-
-&spmi_bus {
- pm8941@0 {
- charger@1000 {
- qcom,fast-charge-safe-current = <1500000>;
- qcom,fast-charge-current-limit = <1500000>;
- qcom,dc-current-limit = <1800000>;
- qcom,fast-charge-safe-voltage = <4400000>;
- qcom,fast-charge-high-threshold-voltage = <4350000>;
- qcom,fast-charge-low-threshold-voltage = <3400000>;
- qcom,auto-recharge-threshold-voltage = <4200000>;
- qcom,minimum-input-voltage = <4300000>;
- };
-
- gpios@c000 {
- gpio_keys_pin_a: gpio-keys-active {
- pins = "gpio2", "gpio3", "gpio4", "gpio5";
- function = "normal";
-
- bias-pull-up;
- power-source = <PM8941_GPIO_S3>;
- };
- };
-
- coincell@2800 {
- status = "okay";
- qcom,rset-ohms = <2100>;
- qcom,vset-millivolts = <3000>;
- };
- };
-
- pm8941@1 {
- wled@d800 {
- status = "okay";
-
- qcom,cs-out;
- qcom,current-limit = <20>;
- qcom,current-boost-limit = <805>;
- qcom,switching-freq = <1600>;
- qcom,ovp = <29>;
- qcom,num-strings = <2>;
- };
- };
};
diff --git a/arch/arm/boot/dts/qcom-msm8974-sony-xperia-rhine.dtsi b/arch/arm/boot/dts/qcom-msm8974-sony-xperia-rhine.dtsi
new file mode 100644
index 000000000000..870e0aeb4d05
--- /dev/null
+++ b/arch/arm/boot/dts/qcom-msm8974-sony-xperia-rhine.dtsi
@@ -0,0 +1,455 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "qcom-msm8974.dtsi"
+#include "qcom-pm8841.dtsi"
+#include "qcom-pm8941.dtsi"
+#include <dt-bindings/gpio/gpio.h>
+#include <dt-bindings/input/input.h>
+#include <dt-bindings/pinctrl/qcom,pmic-gpio.h>
+
+/ {
+ aliases {
+ serial0 = &blsp1_uart2;
+ };
+
+ chosen {
+ stdout-path = "serial0:115200n8";
+ };
+
+ gpio-keys {
+ compatible = "gpio-keys";
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&gpio_keys_pin_a>;
+
+ volume-down {
+ label = "volume_down";
+ gpios = <&pm8941_gpios 2 GPIO_ACTIVE_LOW>;
+ linux,input-type = <1>;
+ linux,code = <KEY_VOLUMEDOWN>;
+ };
+
+ camera-snapshot {
+ label = "camera_snapshot";
+ gpios = <&pm8941_gpios 3 GPIO_ACTIVE_LOW>;
+ linux,input-type = <1>;
+ linux,code = <KEY_CAMERA>;
+ };
+
+ camera-focus {
+ label = "camera_focus";
+ gpios = <&pm8941_gpios 4 GPIO_ACTIVE_LOW>;
+ linux,input-type = <1>;
+ linux,code = <KEY_CAMERA_FOCUS>;
+ };
+
+ volume-up {
+ label = "volume_up";
+ gpios = <&pm8941_gpios 5 GPIO_ACTIVE_LOW>;
+ linux,input-type = <1>;
+ linux,code = <KEY_VOLUMEUP>;
+ };
+ };
+};
+
+&blsp1_i2c2 {
+ status = "okay";
+ clock-frequency = <355000>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&i2c2_pins>;
+
+ synaptics@2c {
+ compatible = "syna,rmi4-i2c";
+ reg = <0x2c>;
+
+ interrupts-extended = <&tlmm 61 IRQ_TYPE_EDGE_FALLING>;
+
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ vdd-supply = <&pm8941_l22>;
+ vio-supply = <&pm8941_lvs3>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&ts_int_pin>;
+
+ syna,startup-delay-ms = <10>;
+
+ rmi4-f01@1 {
+ reg = <0x1>;
+ syna,nosleep-mode = <1>;
+ };
+
+ rmi4-f11@11 {
+ reg = <0x11>;
+ touchscreen-inverted-x;
+ syna,sensor-type = <1>;
+ };
+ };
+};
+
+&blsp1_uart2 {
+ status = "okay";
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&blsp1_uart2_pin_a>;
+};
+
+&blsp2_dma {
+ qcom,controlled-remotely;
+};
+
+&otg {
+ status = "okay";
+
+ phys = <&usb_hs1_phy>;
+ phy-select = <&tcsr 0xb000 0>;
+ extcon = <&smbb>, <&usb_id>;
+ vbus-supply = <&chg_otg>;
+
+ hnp-disable;
+ srp-disable;
+ adp-disable;
+
+ ulpi {
+ phy@a {
+ status = "okay";
+
+ v1p8-supply = <&pm8941_l6>;
+ v3p3-supply = <&pm8941_l24>;
+
+ extcon = <&smbb>;
+ qcom,init-seq = /bits/ 8 <0x1 0x64>;
+ };
+ };
+};
+
+&pm8941_coincell {
+ status = "okay";
+ qcom,rset-ohms = <2100>;
+ qcom,vset-millivolts = <3000>;
+};
+
+&pm8941_gpios {
+ gpio_keys_pin_a: gpio-keys-active {
+ pins = "gpio2", "gpio3", "gpio4", "gpio5";
+ function = "normal";
+
+ bias-pull-up;
+ power-source = <PM8941_GPIO_S3>;
+ };
+};
+
+&pm8941_wled {
+ status = "okay";
+
+ qcom,cs-out;
+ qcom,current-limit = <20>;
+ qcom,current-boost-limit = <805>;
+ qcom,switching-freq = <1600>;
+ qcom,ovp = <29>;
+ qcom,num-strings = <2>;
+};
+
+&rpm_requests {
+ pm8841-regulators {
+ compatible = "qcom,rpm-pm8841-regulators";
+
+ pm8841_s1: s1 {
+ regulator-min-microvolt = <675000>;
+ regulator-max-microvolt = <1050000>;
+ };
+
+ pm8841_s2: s2 {
+ regulator-min-microvolt = <500000>;
+ regulator-max-microvolt = <1050000>;
+ };
+
+ pm8841_s3: s3 {
+ regulator-min-microvolt = <500000>;
+ regulator-max-microvolt = <1050000>;
+ };
+
+ pm8841_s4: s4 {
+ regulator-min-microvolt = <500000>;
+ regulator-max-microvolt = <1050000>;
+ };
+ };
+
+ pm8941-regulators {
+ compatible = "qcom,rpm-pm8941-regulators";
+
+ vdd_l1_l3-supply = <&pm8941_s1>;
+ vdd_l2_lvs1_2_3-supply = <&pm8941_s3>;
+ vdd_l4_l11-supply = <&pm8941_s1>;
+ vdd_l5_l7-supply = <&pm8941_s2>;
+ vdd_l6_l12_l14_l15-supply = <&pm8941_s2>;
+ vdd_l9_l10_l17_l22-supply = <&vreg_boost>;
+ vdd_l13_l20_l23_l24-supply = <&vreg_boost>;
+ vdd_l21-supply = <&vreg_boost>;
+
+ pm8941_s1: s1 {
+ regulator-min-microvolt = <1300000>;
+ regulator-max-microvolt = <1300000>;
+ regulator-always-on;
+ regulator-boot-on;
+ };
+
+ pm8941_s2: s2 {
+ regulator-min-microvolt = <2150000>;
+ regulator-max-microvolt = <2150000>;
+ regulator-boot-on;
+ };
+
+ pm8941_s3: s3 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-always-on;
+ regulator-boot-on;
+ };
+
+ pm8941_s4: s4 {
+ regulator-min-microvolt = <5000000>;
+ regulator-max-microvolt = <5000000>;
+ };
+
+ pm8941_l1: l1 {
+ regulator-min-microvolt = <1225000>;
+ regulator-max-microvolt = <1225000>;
+ regulator-always-on;
+ regulator-boot-on;
+ };
+
+ pm8941_l2: l2 {
+ regulator-min-microvolt = <1200000>;
+ regulator-max-microvolt = <1200000>;
+ };
+
+ pm8941_l3: l3 {
+ regulator-min-microvolt = <1200000>;
+ regulator-max-microvolt = <1200000>;
+ };
+
+ pm8941_l4: l4 {
+ regulator-min-microvolt = <1225000>;
+ regulator-max-microvolt = <1225000>;
+ };
+
+ pm8941_l5: l5 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };
+
+ pm8941_l6: l6 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-boot-on;
+ };
+
+ pm8941_l7: l7 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-boot-on;
+ };
+
+ pm8941_l8: l8 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };
+
+ pm8941_l9: l9 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <2950000>;
+ };
+
+ pm8941_l11: l11 {
+ regulator-min-microvolt = <1300000>;
+ regulator-max-microvolt = <1350000>;
+ };
+
+ pm8941_l12: l12 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-always-on;
+ regulator-boot-on;
+ };
+
+ pm8941_l13: l13 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <2950000>;
+ regulator-boot-on;
+ };
+
+ pm8941_l14: l14 {
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };
+
+ pm8941_l15: l15 {
+ regulator-min-microvolt = <2050000>;
+ regulator-max-microvolt = <2050000>;
+ };
+
+ pm8941_l16: l16 {
+ regulator-min-microvolt = <2700000>;
+ regulator-max-microvolt = <2700000>;
+ };
+
+ pm8941_l17: l17 {
+ regulator-min-microvolt = <2700000>;
+ regulator-max-microvolt = <2700000>;
+ };
+
+ pm8941_l18: l18 {
+ regulator-min-microvolt = <2850000>;
+ regulator-max-microvolt = <2850000>;
+ };
+
+ pm8941_l19: l19 {
+ regulator-min-microvolt = <3300000>;
+ regulator-max-microvolt = <3300000>;
+ };
+
+ pm8941_l20: l20 {
+ regulator-min-microvolt = <2950000>;
+ regulator-max-microvolt = <2950000>;
+ regulator-system-load = <200000>;
+ regulator-allow-set-load;
+ regulator-boot-on;
+ };
+
+ pm8941_l21: l21 {
+ regulator-min-microvolt = <2950000>;
+ regulator-max-microvolt = <2950000>;
+ regulator-boot-on;
+ };
+
+ pm8941_l22: l22 {
+ regulator-min-microvolt = <3000000>;
+ regulator-max-microvolt = <3000000>;
+ };
+
+ pm8941_l23: l23 {
+ regulator-min-microvolt = <2800000>;
+ regulator-max-microvolt = <2800000>;
+ };
+
+ pm8941_l24: l24 {
+ regulator-min-microvolt = <3075000>;
+ regulator-max-microvolt = <3075000>;
+ regulator-boot-on;
+ };
+
+ pm8941_lvs3: lvs3 {};
+ };
+};
+
+&sdhc_1 {
+ status = "okay";
+
+ vmmc-supply = <&pm8941_l20>;
+ vqmmc-supply = <&pm8941_s3>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&sdhc1_pin_a>;
+};
+
+&sdhc_2 {
+ status = "okay";
+
+ vmmc-supply = <&pm8941_l21>;
+ vqmmc-supply = <&pm8941_l13>;
+
+ cd-gpios = <&tlmm 62 GPIO_ACTIVE_LOW>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&sdhc2_pin_a>, <&sdhc2_cd_pin_a>;
+};
+
+&smbb {
+ qcom,fast-charge-safe-current = <1500000>;
+ qcom,fast-charge-current-limit = <1500000>;
+ qcom,dc-current-limit = <1800000>;
+ qcom,fast-charge-safe-voltage = <4400000>;
+ qcom,fast-charge-high-threshold-voltage = <4350000>;
+ qcom,fast-charge-low-threshold-voltage = <3400000>;
+ qcom,auto-recharge-threshold-voltage = <4200000>;
+ qcom,minimum-input-voltage = <4300000>;
+};
+
+&tlmm {
+ ts_int_pin: touch-int {
+ pin {
+ pins = "gpio61";
+ function = "gpio";
+
+ drive-strength = <2>;
+ bias-disable;
+ input-enable;
+ };
+ };
+
+ blsp1_uart2_pin_a: blsp1-uart2-pin-active {
+ rx {
+ pins = "gpio5";
+ function = "blsp_uart2";
+
+ drive-strength = <2>;
+ bias-pull-up;
+ };
+
+ tx {
+ pins = "gpio4";
+ function = "blsp_uart2";
+
+ drive-strength = <4>;
+ bias-disable;
+ };
+ };
+
+ i2c2_pins: i2c2 {
+ mux {
+ pins = "gpio6", "gpio7";
+ function = "blsp_i2c2";
+
+ drive-strength = <2>;
+ bias-disable;
+ };
+ };
+
+ sdhc1_pin_a: sdhc1-pin-active {
+ clk {
+ pins = "sdc1_clk";
+ drive-strength = <16>;
+ bias-disable;
+ };
+
+ cmd-data {
+ pins = "sdc1_cmd", "sdc1_data";
+ drive-strength = <10>;
+ bias-pull-up;
+ };
+ };
+
+ sdhc2_cd_pin_a: sdhc2-cd-pin-active {
+ pins = "gpio62";
+ function = "gpio";
+
+ drive-strength = <2>;
+ bias-disable;
+ };
+
+ sdhc2_pin_a: sdhc2-pin-active {
+ clk {
+ pins = "sdc2_clk";
+ drive-strength = <10>;
+ bias-disable;
+ };
+
+ cmd-data {
+ pins = "sdc2_cmd", "sdc2_data";
+ drive-strength = <6>;
+ bias-pull-up;
+ };
+ };
+};
diff --git a/arch/arm/boot/dts/qcom-msm8974.dtsi b/arch/arm/boot/dts/qcom-msm8974.dtsi
index 412d94736c35..05a36566bd52 100644
--- a/arch/arm/boot/dts/qcom-msm8974.dtsi
+++ b/arch/arm/boot/dts/qcom-msm8974.dtsi
@@ -16,57 +16,17 @@ / {
compatible = "qcom,msm8974";
interrupt-parent = <&intc>;

- reserved-memory {
- #address-cells = <1>;
- #size-cells = <1>;
- ranges;
-
- mpss_region: mpss@8000000 {
- reg = <0x08000000 0x5100000>;
- no-map;
- };
-
- mba_region: mba@d100000 {
- reg = <0x0d100000 0x100000>;
- no-map;
- };
-
- wcnss_region: wcnss@d200000 {
- reg = <0x0d200000 0xa00000>;
- no-map;
- };
-
- adsp_region: adsp@dc00000 {
- reg = <0x0dc00000 0x1900000>;
- no-map;
- };
-
- venus@f500000 {
- reg = <0x0f500000 0x500000>;
- no-map;
- };
-
- smem_region: smem@fa00000 {
- reg = <0xfa00000 0x200000>;
- no-map;
- };
-
- tz@fc00000 {
- reg = <0x0fc00000 0x160000>;
- no-map;
- };
-
- rfsa@fd60000 {
- reg = <0x0fd60000 0x20000>;
- no-map;
+ clocks {
+ xo_board: xo_board {
+ compatible = "fixed-clock";
+ #clock-cells = <0>;
+ clock-frequency = <19200000>;
};

- rmtfs@fd80000 {
- compatible = "qcom,rmtfs-mem";
- reg = <0x0fd80000 0x180000>;
- no-map;
-
- qcom,client-id = <1>;
+ sleep_clk: sleep_clk {
+ compatible = "fixed-clock";
+ #clock-cells = <0>;
+ clock-frequency = <32768>;
};
};

@@ -136,283 +96,120 @@ CPU_SPC: spc {
};
};

+ firmware {
+ scm {
+ compatible = "qcom,scm";
+ clocks = <&gcc GCC_CE1_CLK>, <&gcc GCC_CE1_AXI_CLK>, <&gcc GCC_CE1_AHB_CLK>;
+ clock-names = "core", "bus", "iface";
+ };
+ };
+
memory {
device_type = "memory";
reg = <0x0 0x0>;
};

- thermal-zones {
- cpu0-thermal {
- polling-delay-passive = <250>;
- polling-delay = <1000>;
+ pmu {
+ compatible = "qcom,krait-pmu";
+ interrupts = <GIC_PPI 7 0xf04>;
+ };

- thermal-sensors = <&tsens 5>;
+ reserved-memory {
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges;

- trips {
- cpu_alert0: trip0 {
- temperature = <75000>;
- hysteresis = <2000>;
- type = "passive";
- };
- cpu_crit0: trip1 {
- temperature = <110000>;
- hysteresis = <2000>;
- type = "critical";
- };
- };
+ mpss_region: mpss@8000000 {
+ reg = <0x08000000 0x5100000>;
+ no-map;
};

- cpu1-thermal {
- polling-delay-passive = <250>;
- polling-delay = <1000>;
+ mba_region: mba@d100000 {
+ reg = <0x0d100000 0x100000>;
+ no-map;
+ };

- thermal-sensors = <&tsens 6>;
+ wcnss_region: wcnss@d200000 {
+ reg = <0x0d200000 0xa00000>;
+ no-map;
+ };

- trips {
- cpu_alert1: trip0 {
- temperature = <75000>;
- hysteresis = <2000>;
- type = "passive";
- };
- cpu_crit1: trip1 {
- temperature = <110000>;
- hysteresis = <2000>;
- type = "critical";
- };
- };
+ adsp_region: adsp@dc00000 {
+ reg = <0x0dc00000 0x1900000>;
+ no-map;
};

- cpu2-thermal {
- polling-delay-passive = <250>;
- polling-delay = <1000>;
+ venus_region: memory@f500000 {
+ reg = <0x0f500000 0x500000>;
+ no-map;
+ };

- thermal-sensors = <&tsens 7>;
+ smem_region: smem@fa00000 {
+ reg = <0xfa00000 0x200000>;
+ no-map;
+ };

- trips {
- cpu_alert2: trip0 {
- temperature = <75000>;
- hysteresis = <2000>;
- type = "passive";
- };
- cpu_crit2: trip1 {
- temperature = <110000>;
- hysteresis = <2000>;
- type = "critical";
- };
- };
+ tz_region: memory@fc00000 {
+ reg = <0x0fc00000 0x160000>;
+ no-map;
};

- cpu3-thermal {
- polling-delay-passive = <250>;
- polling-delay = <1000>;
+ rfsa_mem: memory@fd60000 {
+ reg = <0x0fd60000 0x20000>;
+ no-map;
+ };

- thermal-sensors = <&tsens 8>;
+ rmtfs@fd80000 {
+ compatible = "qcom,rmtfs-mem";
+ reg = <0x0fd80000 0x180000>;
+ no-map;

- trips {
- cpu_alert3: trip0 {
- temperature = <75000>;
- hysteresis = <2000>;
- type = "passive";
- };
- cpu_crit3: trip1 {
- temperature = <110000>;
- hysteresis = <2000>;
- type = "critical";
- };
- };
+ qcom,client-id = <1>;
};
+ };

- q6-dsp-thermal {
- polling-delay-passive = <250>;
- polling-delay = <1000>;
-
- thermal-sensors = <&tsens 1>;
+ smem {
+ compatible = "qcom,smem";

- trips {
- q6_dsp_alert0: trip-point0 {
- temperature = <90000>;
- hysteresis = <2000>;
- type = "hot";
- };
- };
- };
+ memory-region = <&smem_region>;
+ qcom,rpm-msg-ram = <&rpm_msg_ram>;

- modemtx-thermal {
- polling-delay-passive = <250>;
- polling-delay = <1000>;
+ hwlocks = <&tcsr_mutex 3>;
+ };

- thermal-sensors = <&tsens 2>;
+ smp2p-adsp {
+ compatible = "qcom,smp2p";
+ qcom,smem = <443>, <429>;

- trips {
- modemtx_alert0: trip-point0 {
- temperature = <90000>;
- hysteresis = <2000>;
- type = "hot";
- };
- };
- };
+ interrupt-parent = <&intc>;
+ interrupts = <GIC_SPI 158 IRQ_TYPE_EDGE_RISING>;

- video-thermal {
- polling-delay-passive = <250>;
- polling-delay = <1000>;
+ qcom,ipc = <&apcs 8 10>;

- thermal-sensors = <&tsens 3>;
+ qcom,local-pid = <0>;
+ qcom,remote-pid = <2>;

- trips {
- video_alert0: trip-point0 {
- temperature = <95000>;
- hysteresis = <2000>;
- type = "hot";
- };
- };
+ adsp_smp2p_out: master-kernel {
+ qcom,entry-name = "master-kernel";
+ #qcom,smem-state-cells = <1>;
};

- wlan-thermal {
- polling-delay-passive = <250>;
- polling-delay = <1000>;
-
- thermal-sensors = <&tsens 4>;
+ adsp_smp2p_in: slave-kernel {
+ qcom,entry-name = "slave-kernel";

- trips {
- wlan_alert0: trip-point0 {
- temperature = <105000>;
- hysteresis = <2000>;
- type = "hot";
- };
- };
+ interrupt-controller;
+ #interrupt-cells = <2>;
};
+ };

- gpu-top-thermal {
- polling-delay-passive = <250>;
- polling-delay = <1000>;
+ smp2p-modem {
+ compatible = "qcom,smp2p";
+ qcom,smem = <435>, <428>;

- thermal-sensors = <&tsens 9>;
+ interrupt-parent = <&intc>;
+ interrupts = <GIC_SPI 27 IRQ_TYPE_EDGE_RISING>;

- trips {
- gpu1_alert0: trip-point0 {
- temperature = <90000>;
- hysteresis = <2000>;
- type = "hot";
- };
- };
- };
-
- gpu-bottom-thermal {
- polling-delay-passive = <250>;
- polling-delay = <1000>;
-
- thermal-sensors = <&tsens 10>;
-
- trips {
- gpu2_alert0: trip-point0 {
- temperature = <90000>;
- hysteresis = <2000>;
- type = "hot";
- };
- };
- };
- };
-
- cpu-pmu {
- compatible = "qcom,krait-pmu";
- interrupts = <GIC_PPI 7 0xf04>;
- };
-
- clocks {
- xo_board: xo_board {
- compatible = "fixed-clock";
- #clock-cells = <0>;
- clock-frequency = <19200000>;
- };
-
- sleep_clk: sleep_clk {
- compatible = "fixed-clock";
- #clock-cells = <0>;
- clock-frequency = <32768>;
- };
- };
-
- timer {
- compatible = "arm,armv7-timer";
- interrupts = <GIC_PPI 2 0xf08>,
- <GIC_PPI 3 0xf08>,
- <GIC_PPI 4 0xf08>,
- <GIC_PPI 1 0xf08>;
- clock-frequency = <19200000>;
- };
-
- adsp-pil {
- compatible = "qcom,msm8974-adsp-pil";
-
- interrupts-extended = <&intc GIC_SPI 162 IRQ_TYPE_EDGE_RISING>,
- <&adsp_smp2p_in 0 IRQ_TYPE_EDGE_RISING>,
- <&adsp_smp2p_in 1 IRQ_TYPE_EDGE_RISING>,
- <&adsp_smp2p_in 2 IRQ_TYPE_EDGE_RISING>,
- <&adsp_smp2p_in 3 IRQ_TYPE_EDGE_RISING>;
- interrupt-names = "wdog", "fatal", "ready", "handover", "stop-ack";
-
- cx-supply = <&pm8841_s2>;
-
- clocks = <&xo_board>;
- clock-names = "xo";
-
- memory-region = <&adsp_region>;
-
- qcom,smem-states = <&adsp_smp2p_out 0>;
- qcom,smem-state-names = "stop";
-
- smd-edge {
- interrupts = <GIC_SPI 156 IRQ_TYPE_EDGE_RISING>;
-
- qcom,ipc = <&apcs 8 8>;
- qcom,smd-edge = <1>;
-
- label = "lpass";
- };
- };
-
- smem {
- compatible = "qcom,smem";
-
- memory-region = <&smem_region>;
- qcom,rpm-msg-ram = <&rpm_msg_ram>;
-
- hwlocks = <&tcsr_mutex 3>;
- };
-
- smp2p-adsp {
- compatible = "qcom,smp2p";
- qcom,smem = <443>, <429>;
-
- interrupt-parent = <&intc>;
- interrupts = <GIC_SPI 158 IRQ_TYPE_EDGE_RISING>;
-
- qcom,ipc = <&apcs 8 10>;
-
- qcom,local-pid = <0>;
- qcom,remote-pid = <2>;
-
- adsp_smp2p_out: master-kernel {
- qcom,entry-name = "master-kernel";
- #qcom,smem-state-cells = <1>;
- };
-
- adsp_smp2p_in: slave-kernel {
- qcom,entry-name = "slave-kernel";
-
- interrupt-controller;
- #interrupt-cells = <2>;
- };
- };
-
- smp2p-modem {
- compatible = "qcom,smp2p";
- qcom,smem = <435>, <428>;
-
- interrupt-parent = <&intc>;
- interrupts = <GIC_SPI 27 IRQ_TYPE_EDGE_RISING>;
-
- qcom,ipc = <&apcs 8 14>;
+ qcom,ipc = <&apcs 8 14>;

qcom,local-pid = <0>;
qcom,remote-pid = <1>;
@@ -497,11 +294,23 @@ wcnss_smsm: wcnss@7 {
};
};

- firmware {
- scm {
- compatible = "qcom,scm";
- clocks = <&gcc GCC_CE1_CLK>, <&gcc GCC_CE1_AXI_CLK>, <&gcc GCC_CE1_AHB_CLK>;
- clock-names = "core", "bus", "iface";
+ smd {
+ compatible = "qcom,smd";
+
+ rpm {
+ interrupts = <GIC_SPI 168 IRQ_TYPE_EDGE_RISING>;
+ qcom,ipc = <&apcs 8 0>;
+ qcom,smd-edge = <15>;
+
+ rpm_requests: rpm_requests {
+ compatible = "qcom,rpm-msm8974";
+ qcom,smd-channels = "rpm_requests";
+
+ rpmcc: clock-controller {
+ compatible = "qcom,rpmcc-msm8974", "qcom,rpmcc";
+ #clock-cells = <1>;
+ };
+ };
};
};

@@ -524,31 +333,6 @@ apcs: syscon@f9011000 {
reg = <0xf9011000 0x1000>;
};

- qfprom: qfprom@fc4bc000 {
- #address-cells = <1>;
- #size-cells = <1>;
- compatible = "qcom,qfprom";
- reg = <0xfc4bc000 0x1000>;
- tsens_calib: calib@d0 {
- reg = <0xd0 0x18>;
- };
- tsens_backup: backup@440 {
- reg = <0x440 0x10>;
- };
- };
-
- tsens: thermal-sensor@fc4a9000 {
- compatible = "qcom,msm8974-tsens";
- reg = <0xfc4a9000 0x1000>, /* TM */
- <0xfc4a8000 0x1000>; /* SROT */
- nvmem-cells = <&tsens_calib>, <&tsens_backup>;
- nvmem-cell-names = "calib", "calib_backup";
- #qcom,sensors = <11>;
- interrupts = <GIC_SPI 184 IRQ_TYPE_LEVEL_HIGH>;
- interrupt-names = "uplow";
- #thermal-sensor-cells = <1>;
- };
-
timer@f9020000 {
#address-cells = <1>;
#size-cells = <1>;
@@ -654,47 +438,53 @@ acc3: clock-controller@f90b8000 {
reg = <0xf90b8000 0x1000>, <0xf9008000 0x1000>;
};

- restart@fc4ab000 {
- compatible = "qcom,pshold";
- reg = <0xfc4ab000 0x4>;
- };
-
- gcc: clock-controller@fc400000 {
- compatible = "qcom,gcc-msm8974";
- #clock-cells = <1>;
- #reset-cells = <1>;
- #power-domain-cells = <1>;
- reg = <0xfc400000 0x4000>;
- };
+ sdhc_1: sdhci@f9824900 {
+ compatible = "qcom,msm8974-sdhci", "qcom,sdhci-msm-v4";
+ reg = <0xf9824900 0x11c>, <0xf9824000 0x800>;
+ reg-names = "hc_mem", "core_mem";
+ interrupts = <GIC_SPI 123 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 138 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-names = "hc_irq", "pwr_irq";
+ clocks = <&gcc GCC_SDCC1_APPS_CLK>,
+ <&gcc GCC_SDCC1_AHB_CLK>,
+ <&xo_board>;
+ clock-names = "core", "iface", "xo";
+ bus-width = <8>;
+ non-removable;

- tcsr: syscon@fd4a0000 {
- compatible = "syscon";
- reg = <0xfd4a0000 0x10000>;
+ status = "disabled";
};

- tcsr_mutex_block: syscon@fd484000 {
- compatible = "syscon";
- reg = <0xfd484000 0x2000>;
- };
+ sdhc_3: sdhci@f9864900 {
+ compatible = "qcom,msm8974-sdhci", "qcom,sdhci-msm-v4";
+ reg = <0xf9864900 0x11c>, <0xf9864000 0x800>;
+ reg-names = "hc_mem", "core_mem";
+ interrupts = <GIC_SPI 127 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 224 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-names = "hc_irq", "pwr_irq";
+ clocks = <&gcc GCC_SDCC3_APPS_CLK>,
+ <&gcc GCC_SDCC3_AHB_CLK>,
+ <&xo_board>;
+ clock-names = "core", "iface", "xo";
+ bus-width = <4>;

- mmcc: clock-controller@fd8c0000 {
- compatible = "qcom,mmcc-msm8974";
- #clock-cells = <1>;
- #reset-cells = <1>;
- #power-domain-cells = <1>;
- reg = <0xfd8c0000 0x6000>;
+ status = "disabled";
};

- tcsr_mutex: tcsr-mutex {
- compatible = "qcom,tcsr-mutex";
- syscon = <&tcsr_mutex_block 0 0x80>;
-
- #hwlock-cells = <1>;
- };
+ sdhc_2: sdhci@f98a4900 {
+ compatible = "qcom,msm8974-sdhci", "qcom,sdhci-msm-v4";
+ reg = <0xf98a4900 0x11c>, <0xf98a4000 0x800>;
+ reg-names = "hc_mem", "core_mem";
+ interrupts = <GIC_SPI 125 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 221 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-names = "hc_irq", "pwr_irq";
+ clocks = <&gcc GCC_SDCC2_APPS_CLK>,
+ <&gcc GCC_SDCC2_AHB_CLK>,
+ <&xo_board>;
+ clock-names = "core", "iface", "xo";
+ bus-width = <4>;

- rpm_msg_ram: memory@fc428000 {
- compatible = "qcom,rpm-msg-ram";
- reg = <0xfc428000 0x4000>;
+ status = "disabled";
};

blsp1_uart1: serial@f991d000 {
@@ -715,16 +505,70 @@ blsp1_uart2: serial@f991e000 {
status = "disabled";
};

- blsp2_uart7: serial@f995d000 {
+ blsp1_i2c1: i2c@f9923000 {
+ status = "disabled";
+ compatible = "qcom,i2c-qup-v2.1.1";
+ reg = <0xf9923000 0x1000>;
+ interrupts = <0 95 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&gcc GCC_BLSP1_QUP1_I2C_APPS_CLK>, <&gcc GCC_BLSP1_AHB_CLK>;
+ clock-names = "core", "iface";
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+
+ blsp1_i2c2: i2c@f9924000 {
+ status = "disabled";
+ compatible = "qcom,i2c-qup-v2.1.1";
+ reg = <0xf9924000 0x1000>;
+ interrupts = <GIC_SPI 96 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&gcc GCC_BLSP1_QUP2_I2C_APPS_CLK>, <&gcc GCC_BLSP1_AHB_CLK>;
+ clock-names = "core", "iface";
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+
+ blsp1_i2c3: i2c@f9925000 {
+ status = "disabled";
+ compatible = "qcom,i2c-qup-v2.1.1";
+ reg = <0xf9925000 0x1000>;
+ interrupts = <0 97 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&gcc GCC_BLSP1_QUP3_I2C_APPS_CLK>, <&gcc GCC_BLSP1_AHB_CLK>;
+ clock-names = "core", "iface";
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+
+ blsp1_i2c6: i2c@f9928000 {
+ status = "disabled";
+ compatible = "qcom,i2c-qup-v2.1.1";
+ reg = <0xf9928000 0x1000>;
+ interrupts = <GIC_SPI 100 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&gcc GCC_BLSP1_QUP6_I2C_APPS_CLK>, <&gcc GCC_BLSP1_AHB_CLK>;
+ clock-names = "core", "iface";
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+
+ blsp2_dma: dma-controller@f9944000 {
+ compatible = "qcom,bam-v1.4.0";
+ reg = <0xf9944000 0x19000>;
+ interrupts = <GIC_SPI 239 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&gcc GCC_BLSP2_AHB_CLK>;
+ clock-names = "bam_clk";
+ #dma-cells = <1>;
+ qcom,ee = <0>;
+ };
+
+ blsp2_uart1: serial@f995d000 {
compatible = "qcom,msm-uartdm-v1.4", "qcom,msm-uartdm";
reg = <0xf995d000 0x1000>;
- interrupts = <GIC_SPI 113 IRQ_TYPE_NONE>;
+ interrupts = <GIC_SPI 113 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&gcc GCC_BLSP2_UART1_APPS_CLK>, <&gcc GCC_BLSP2_AHB_CLK>;
clock-names = "core", "iface";
status = "disabled";
};

- blsp2_uart8: serial@f995e000 {
+ blsp2_uart2: serial@f995e000 {
compatible = "qcom,msm-uartdm-v1.4", "qcom,msm-uartdm";
reg = <0xf995e000 0x1000>;
interrupts = <GIC_SPI 114 IRQ_TYPE_LEVEL_HIGH>;
@@ -733,7 +577,7 @@ blsp2_uart8: serial@f995e000 {
status = "disabled";
};

- blsp2_uart10: serial@f9960000 {
+ blsp2_uart4: serial@f9960000 {
compatible = "qcom,msm-uartdm-v1.4", "qcom,msm-uartdm";
reg = <0xf9960000 0x1000>;
interrupts = <GIC_SPI 116 IRQ_TYPE_LEVEL_HIGH>;
@@ -742,46 +586,39 @@ blsp2_uart10: serial@f9960000 {
status = "disabled";
};

- sdhci@f9824900 {
- compatible = "qcom,msm8974-sdhci", "qcom,sdhci-msm-v4";
- reg = <0xf9824900 0x11c>, <0xf9824000 0x800>;
- reg-names = "hc_mem", "core_mem";
- interrupts = <GIC_SPI 123 IRQ_TYPE_LEVEL_HIGH>,
- <GIC_SPI 138 IRQ_TYPE_LEVEL_HIGH>;
- interrupt-names = "hc_irq", "pwr_irq";
- clocks = <&gcc GCC_SDCC1_APPS_CLK>,
- <&gcc GCC_SDCC1_AHB_CLK>,
- <&xo_board>;
- clock-names = "core", "iface", "xo";
+ blsp2_i2c2: i2c@f9964000 {
status = "disabled";
+ compatible = "qcom,i2c-qup-v2.1.1";
+ reg = <0xf9964000 0x1000>;
+ interrupts = <GIC_SPI 102 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&gcc GCC_BLSP2_QUP2_I2C_APPS_CLK>, <&gcc GCC_BLSP2_AHB_CLK>;
+ clock-names = "core", "iface";
+ #address-cells = <1>;
+ #size-cells = <0>;
};

- sdhci@f9864900 {
- compatible = "qcom,msm8974-sdhci", "qcom,sdhci-msm-v4";
- reg = <0xf9864900 0x11c>, <0xf9864000 0x800>;
- reg-names = "hc_mem", "core_mem";
- interrupts = <GIC_SPI 127 IRQ_TYPE_LEVEL_HIGH>,
- <GIC_SPI 224 IRQ_TYPE_LEVEL_HIGH>;
- interrupt-names = "hc_irq", "pwr_irq";
- clocks = <&gcc GCC_SDCC3_APPS_CLK>,
- <&gcc GCC_SDCC3_AHB_CLK>,
- <&xo_board>;
- clock-names = "core", "iface", "xo";
+ blsp2_i2c5: i2c@f9967000 {
status = "disabled";
+ compatible = "qcom,i2c-qup-v2.1.1";
+ reg = <0xf9967000 0x1000>;
+ interrupts = <GIC_SPI 105 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&gcc GCC_BLSP2_QUP5_I2C_APPS_CLK>, <&gcc GCC_BLSP2_AHB_CLK>;
+ clock-names = "core", "iface";
+ #address-cells = <1>;
+ #size-cells = <0>;
+ dmas = <&blsp2_dma 20>, <&blsp2_dma 21>;
+ dma-names = "tx", "rx";
};

- sdhci@f98a4900 {
- compatible = "qcom,msm8974-sdhci", "qcom,sdhci-msm-v4";
- reg = <0xf98a4900 0x11c>, <0xf98a4000 0x800>;
- reg-names = "hc_mem", "core_mem";
- interrupts = <GIC_SPI 125 IRQ_TYPE_LEVEL_HIGH>,
- <GIC_SPI 221 IRQ_TYPE_LEVEL_HIGH>;
- interrupt-names = "hc_irq", "pwr_irq";
- clocks = <&gcc GCC_SDCC2_APPS_CLK>,
- <&gcc GCC_SDCC2_AHB_CLK>,
- <&xo_board>;
- clock-names = "core", "iface", "xo";
+ blsp2_i2c6: i2c@f9968000 {
status = "disabled";
+ compatible = "qcom,i2c-qup-v2.1.1";
+ reg = <0xf9968000 0x1000>;
+ interrupts = <0 106 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&gcc GCC_BLSP2_QUP6_I2C_APPS_CLK>, <&gcc GCC_BLSP2_AHB_CLK>;
+ clock-names = "core", "iface";
+ #address-cells = <1>;
+ #size-cells = <0>;
};

otg: usb@f9a55000 {
@@ -835,55 +672,6 @@ rng@f9bff000 {
clock-names = "core";
};

- remoteproc@fc880000 {
- compatible = "qcom,msm8974-mss-pil";
- reg = <0xfc880000 0x100>, <0xfc820000 0x020>;
- reg-names = "qdsp6", "rmb";
-
- interrupts-extended = <&intc GIC_SPI 24 IRQ_TYPE_EDGE_RISING>,
- <&modem_smp2p_in 0 IRQ_TYPE_EDGE_RISING>,
- <&modem_smp2p_in 1 IRQ_TYPE_EDGE_RISING>,
- <&modem_smp2p_in 2 IRQ_TYPE_EDGE_RISING>,
- <&modem_smp2p_in 3 IRQ_TYPE_EDGE_RISING>;
- interrupt-names = "wdog", "fatal", "ready", "handover", "stop-ack";
-
- clocks = <&gcc GCC_MSS_Q6_BIMC_AXI_CLK>,
- <&gcc GCC_MSS_CFG_AHB_CLK>,
- <&gcc GCC_BOOT_ROM_AHB_CLK>,
- <&xo_board>;
- clock-names = "iface", "bus", "mem", "xo";
-
- resets = <&gcc GCC_MSS_RESTART>;
- reset-names = "mss_restart";
-
- cx-supply = <&pm8841_s2>;
- mss-supply = <&pm8841_s3>;
- mx-supply = <&pm8841_s1>;
- pll-supply = <&pm8941_l12>;
-
- qcom,halt-regs = <&tcsr_mutex_block 0x1180 0x1200 0x1280>;
-
- qcom,smem-states = <&modem_smp2p_out 0>;
- qcom,smem-state-names = "stop";
-
- mba {
- memory-region = <&mba_region>;
- };
-
- mpss {
- memory-region = <&mpss_region>;
- };
-
- smd-edge {
- interrupts = <GIC_SPI 25 IRQ_TYPE_EDGE_RISING>;
-
- qcom,ipc = <&apcs 8 12>;
- qcom,smd-edge = <0>;
-
- label = "modem";
- };
- };
-
pronto: remoteproc@fb21b000 {
compatible = "qcom,pronto-v2-pil", "qcom,pronto";
reg = <0xfb204000 0x2000>, <0xfb202000 0x1000>, <0xfb21b000 0x3000>;
@@ -898,8 +686,6 @@ pronto: remoteproc@fb21b000 {
<&wcnss_smp2p_in 3 IRQ_TYPE_EDGE_RISING>;
interrupt-names = "wdog", "fatal", "ready", "handover", "stop-ack";

- vddpx-supply = <&pm8941_s3>;
-
qcom,smem-states = <&wcnss_smp2p_out 0>;
qcom,smem-state-names = "stop";

@@ -910,11 +696,6 @@ iris {

clocks = <&rpmcc RPM_SMD_CXO_A2>;
clock-names = "xo";
-
- vddxo-supply = <&pm8941_l6>;
- vddrfa-supply = <&pm8941_l11>;
- vddpa-supply = <&pm8941_l19>;
- vdddig-supply = <&pm8941_s3>;
};

smd-edge {
@@ -942,139 +723,32 @@ wifi {
interrupt-names = "tx", "rx";

qcom,smem-states = <&apps_smsm 10>, <&apps_smsm 9>;
- qcom,smem-state-names = "tx-enable", "tx-rings-empty";
+ qcom,smem-state-names = "tx-enable",
+ "tx-rings-empty";
};
};
};
};

- msmgpio: pinctrl@fd510000 {
- compatible = "qcom,msm8974-pinctrl";
- reg = <0xfd510000 0x4000>;
- gpio-controller;
- gpio-ranges = <&msmgpio 0 0 146>;
- #gpio-cells = <2>;
- interrupt-controller;
- #interrupt-cells = <2>;
- interrupts = <GIC_SPI 208 IRQ_TYPE_LEVEL_HIGH>;
- };
-
- i2c@f9923000 {
- status = "disabled";
- compatible = "qcom,i2c-qup-v2.1.1";
- reg = <0xf9923000 0x1000>;
- interrupts = <0 95 IRQ_TYPE_LEVEL_HIGH>;
- clocks = <&gcc GCC_BLSP1_QUP1_I2C_APPS_CLK>, <&gcc GCC_BLSP1_AHB_CLK>;
- clock-names = "core", "iface";
- #address-cells = <1>;
- #size-cells = <0>;
- };
-
- i2c@f9924000 {
- status = "disabled";
- compatible = "qcom,i2c-qup-v2.1.1";
- reg = <0xf9924000 0x1000>;
- interrupts = <GIC_SPI 96 IRQ_TYPE_LEVEL_HIGH>;
- clocks = <&gcc GCC_BLSP1_QUP2_I2C_APPS_CLK>, <&gcc GCC_BLSP1_AHB_CLK>;
- clock-names = "core", "iface";
- #address-cells = <1>;
- #size-cells = <0>;
- };
-
- blsp_i2c3: i2c@f9925000 {
- status = "disabled";
- compatible = "qcom,i2c-qup-v2.1.1";
- reg = <0xf9925000 0x1000>;
- interrupts = <0 97 IRQ_TYPE_LEVEL_HIGH>;
- clocks = <&gcc GCC_BLSP1_QUP3_I2C_APPS_CLK>, <&gcc GCC_BLSP1_AHB_CLK>;
- clock-names = "core", "iface";
- #address-cells = <1>;
- #size-cells = <0>;
- };
-
- blsp_i2c6: i2c@f9928000 {
- status = "disabled";
- compatible = "qcom,i2c-qup-v2.1.1";
- reg = <0xf9928000 0x1000>;
- interrupts = <GIC_SPI 100 IRQ_TYPE_LEVEL_HIGH>;
- clocks = <&gcc GCC_BLSP1_QUP6_I2C_APPS_CLK>, <&gcc GCC_BLSP1_AHB_CLK>;
- clock-names = "core", "iface";
- #address-cells = <1>;
- #size-cells = <0>;
- };
-
- blsp_i2c8: i2c@f9964000 {
- status = "disabled";
- compatible = "qcom,i2c-qup-v2.1.1";
- reg = <0xf9964000 0x1000>;
- interrupts = <GIC_SPI 102 IRQ_TYPE_LEVEL_HIGH>;
- clocks = <&gcc GCC_BLSP2_QUP2_I2C_APPS_CLK>, <&gcc GCC_BLSP2_AHB_CLK>;
- clock-names = "core", "iface";
- #address-cells = <1>;
- #size-cells = <0>;
- };
-
- blsp_i2c11: i2c@f9967000 {
- status = "disabled";
- compatible = "qcom,i2c-qup-v2.1.1";
- reg = <0xf9967000 0x1000>;
- interrupts = <GIC_SPI 105 IRQ_TYPE_LEVEL_HIGH>;
- clocks = <&gcc GCC_BLSP2_QUP5_I2C_APPS_CLK>, <&gcc GCC_BLSP2_AHB_CLK>;
- clock-names = "core", "iface";
- #address-cells = <1>;
- #size-cells = <0>;
- dmas = <&blsp2_dma 20>, <&blsp2_dma 21>;
- dma-names = "tx", "rx";
- };
-
- blsp_i2c12: i2c@f9968000 {
- status = "disabled";
- compatible = "qcom,i2c-qup-v2.1.1";
- reg = <0xf9968000 0x1000>;
- interrupts = <0 106 IRQ_TYPE_LEVEL_HIGH>;
- clocks = <&gcc GCC_BLSP2_QUP6_I2C_APPS_CLK>, <&gcc GCC_BLSP2_AHB_CLK>;
- clock-names = "core", "iface";
- #address-cells = <1>;
- #size-cells = <0>;
- };
-
- spmi_bus: spmi@fc4cf000 {
- compatible = "qcom,spmi-pmic-arb";
- reg-names = "core", "intr", "cnfg";
- reg = <0xfc4cf000 0x1000>,
- <0xfc4cb000 0x1000>,
- <0xfc4ca000 0x1000>;
- interrupt-names = "periph_irq";
- interrupts = <GIC_SPI 190 IRQ_TYPE_LEVEL_HIGH>;
- qcom,ee = <0>;
- qcom,channel = <0>;
- #address-cells = <2>;
- #size-cells = <0>;
- interrupt-controller;
- #interrupt-cells = <4>;
- };
-
- blsp2_dma: dma-controller@f9944000 {
- compatible = "qcom,bam-v1.4.0";
- reg = <0xf9944000 0x19000>;
- interrupts = <GIC_SPI 239 IRQ_TYPE_LEVEL_HIGH>;
- clocks = <&gcc GCC_BLSP2_AHB_CLK>;
- clock-names = "bam_clk";
- #dma-cells = <1>;
- qcom,ee = <0>;
- };
-
- etr@fc322000 {
+ etf@fc307000 {
compatible = "arm,coresight-tmc", "arm,primecell";
- reg = <0xfc322000 0x1000>;
+ reg = <0xfc307000 0x1000>;

clocks = <&rpmcc RPM_SMD_QDSS_CLK>, <&rpmcc RPM_SMD_QDSS_A_CLK>;
clock-names = "apb_pclk", "atclk";

+ out-ports {
+ port {
+ etf_out: endpoint {
+ remote-endpoint = <&replicator_in>;
+ };
+ };
+ };
+
in-ports {
port {
- etr_in: endpoint {
- remote-endpoint = <&replicator_out0>;
+ etf_in: endpoint {
+ remote-endpoint = <&merger_out>;
};
};
};
@@ -1096,59 +770,39 @@ tpiu_in: endpoint {
};
};

- replicator@fc31c000 {
- compatible = "arm,coresight-dynamic-replicator", "arm,primecell";
- reg = <0xfc31c000 0x1000>;
+ funnel@fc31a000 {
+ compatible = "arm,coresight-dynamic-funnel", "arm,primecell";
+ reg = <0xfc31a000 0x1000>;

clocks = <&rpmcc RPM_SMD_QDSS_CLK>, <&rpmcc RPM_SMD_QDSS_A_CLK>;
clock-names = "apb_pclk", "atclk";

- out-ports {
+ in-ports {
#address-cells = <1>;
#size-cells = <0>;

- port@0 {
- reg = <0>;
- replicator_out0: endpoint {
- remote-endpoint = <&etr_in>;
- };
- };
- port@1 {
- reg = <1>;
- replicator_out1: endpoint {
- remote-endpoint = <&tpiu_in>;
+ /*
+ * Not described input ports:
+ * 0 - not-connected
+ * 1 - connected trought funnel to Multimedia CPU
+ * 2 - connected to Wireless CPU
+ * 3 - not-connected
+ * 4 - not-connected
+ * 6 - not-connected
+ * 7 - connected to STM
+ */
+ port@5 {
+ reg = <5>;
+ funnel1_in5: endpoint {
+ remote-endpoint = <&kpss_out>;
};
};
};

- in-ports {
+ out-ports {
port {
- replicator_in: endpoint {
- remote-endpoint = <&etf_out>;
- };
- };
- };
- };
-
- etf@fc307000 {
- compatible = "arm,coresight-tmc", "arm,primecell";
- reg = <0xfc307000 0x1000>;
-
- clocks = <&rpmcc RPM_SMD_QDSS_CLK>, <&rpmcc RPM_SMD_QDSS_A_CLK>;
- clock-names = "apb_pclk", "atclk";
-
- out-ports {
- port {
- etf_out: endpoint {
- remote-endpoint = <&replicator_in>;
- };
- };
- };
-
- in-ports {
- port {
- etf_in: endpoint {
- remote-endpoint = <&merger_out>;
+ funnel1_out: endpoint {
+ remote-endpoint = <&merger_in1>;
};
};
};
@@ -1188,85 +842,51 @@ merger_out: endpoint {
};
};

- funnel@fc31a000 {
- compatible = "arm,coresight-dynamic-funnel", "arm,primecell";
- reg = <0xfc31a000 0x1000>;
+ replicator@fc31c000 {
+ compatible = "arm,coresight-dynamic-replicator", "arm,primecell";
+ reg = <0xfc31c000 0x1000>;

clocks = <&rpmcc RPM_SMD_QDSS_CLK>, <&rpmcc RPM_SMD_QDSS_A_CLK>;
clock-names = "apb_pclk", "atclk";

- in-ports {
+ out-ports {
#address-cells = <1>;
#size-cells = <0>;

- /*
- * Not described input ports:
- * 0 - not-connected
- * 1 - connected trought funnel to Multimedia CPU
- * 2 - connected to Wireless CPU
- * 3 - not-connected
- * 4 - not-connected
- * 6 - not-connected
- * 7 - connected to STM
- */
- port@5 {
- reg = <5>;
- funnel1_in5: endpoint {
- remote-endpoint = <&kpss_out>;
+ port@0 {
+ reg = <0>;
+ replicator_out0: endpoint {
+ remote-endpoint = <&etr_in>;
+ };
+ };
+ port@1 {
+ reg = <1>;
+ replicator_out1: endpoint {
+ remote-endpoint = <&tpiu_in>;
};
};
};

- out-ports {
+ in-ports {
port {
- funnel1_out: endpoint {
- remote-endpoint = <&merger_in1>;
+ replicator_in: endpoint {
+ remote-endpoint = <&etf_out>;
};
};
};
};

- funnel@fc345000 { /* KPSS funnel only 4 inputs are used */
- compatible = "arm,coresight-dynamic-funnel", "arm,primecell";
- reg = <0xfc345000 0x1000>;
+ etr@fc322000 {
+ compatible = "arm,coresight-tmc", "arm,primecell";
+ reg = <0xfc322000 0x1000>;

clocks = <&rpmcc RPM_SMD_QDSS_CLK>, <&rpmcc RPM_SMD_QDSS_A_CLK>;
clock-names = "apb_pclk", "atclk";

in-ports {
- #address-cells = <1>;
- #size-cells = <0>;
-
- port@0 {
- reg = <0>;
- kpss_in0: endpoint {
- remote-endpoint = <&etm0_out>;
- };
- };
- port@1 {
- reg = <1>;
- kpss_in1: endpoint {
- remote-endpoint = <&etm1_out>;
- };
- };
- port@2 {
- reg = <2>;
- kpss_in2: endpoint {
- remote-endpoint = <&etm2_out>;
- };
- };
- port@3 {
- reg = <3>;
- kpss_in3: endpoint {
- remote-endpoint = <&etm3_out>;
- };
- };
- };
-
- out-ports {
port {
- kpss_out: endpoint {
- remote-endpoint = <&funnel1_in5>;
+ etr_in: endpoint {
+ remote-endpoint = <&replicator_out0>;
};
};
};
@@ -1344,25 +964,66 @@ etm3_out: endpoint {
};
};

- ocmem@fdd00000 {
- compatible = "qcom,msm8974-ocmem";
- reg = <0xfdd00000 0x2000>,
- <0xfec00000 0x180000>;
- reg-names = "ctrl",
- "mem";
- clocks = <&rpmcc RPM_SMD_OCMEMGX_CLK>,
- <&mmcc OCMEMCX_OCMEMNOC_CLK>;
- clock-names = "core",
- "iface";
+ /* KPSS funnel, only 4 inputs are used */
+ funnel@fc345000 {
+ compatible = "arm,coresight-dynamic-funnel", "arm,primecell";
+ reg = <0xfc345000 0x1000>;

- #address-cells = <1>;
- #size-cells = <1>;
+ clocks = <&rpmcc RPM_SMD_QDSS_CLK>, <&rpmcc RPM_SMD_QDSS_A_CLK>;
+ clock-names = "apb_pclk", "atclk";

- gmu_sram: gmu-sram@0 {
- reg = <0x0 0x100000>;
+ in-ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ port@0 {
+ reg = <0>;
+ kpss_in0: endpoint {
+ remote-endpoint = <&etm0_out>;
+ };
+ };
+ port@1 {
+ reg = <1>;
+ kpss_in1: endpoint {
+ remote-endpoint = <&etm1_out>;
+ };
+ };
+ port@2 {
+ reg = <2>;
+ kpss_in2: endpoint {
+ remote-endpoint = <&etm2_out>;
+ };
+ };
+ port@3 {
+ reg = <3>;
+ kpss_in3: endpoint {
+ remote-endpoint = <&etm3_out>;
+ };
+ };
+ };
+
+ out-ports {
+ port {
+ kpss_out: endpoint {
+ remote-endpoint = <&funnel1_in5>;
+ };
+ };
};
};

+ gcc: clock-controller@fc400000 {
+ compatible = "qcom,gcc-msm8974";
+ #clock-cells = <1>;
+ #reset-cells = <1>;
+ #power-domain-cells = <1>;
+ reg = <0xfc400000 0x4000>;
+ };
+
+ rpm_msg_ram: memory@fc428000 {
+ compatible = "qcom,rpm-msg-ram";
+ reg = <0xfc428000 0x4000>;
+ };
+
bimc: interconnect@fc380000 {
reg = <0xfc380000 0x6a000>;
compatible = "qcom,msm8974-bimc";
@@ -1417,79 +1078,151 @@ cnoc: interconnect@fc480000 {
<&rpmcc RPM_SMD_CNOC_A_CLK>;
};

- gpu: adreno@fdb00000 {
- status = "disabled";
-
- compatible = "qcom,adreno-330.1",
- "qcom,adreno";
- reg = <0xfdb00000 0x10000>;
- reg-names = "kgsl_3d0_reg_memory";
- interrupts = <GIC_SPI 33 IRQ_TYPE_LEVEL_HIGH>;
- interrupt-names = "kgsl_3d0_irq";
- clock-names = "core",
- "iface",
- "mem_iface";
- clocks = <&mmcc OXILI_GFX3D_CLK>,
- <&mmcc OXILICX_AHB_CLK>,
- <&mmcc OXILICX_AXI_CLK>;
- sram = <&gmu_sram>;
- power-domains = <&mmcc OXILICX_GDSC>;
- operating-points-v2 = <&gpu_opp_table>;
-
- interconnects = <&mmssnoc MNOC_MAS_GRAPHICS_3D &bimc BIMC_SLV_EBI_CH0>,
- <&ocmemnoc OCMEM_VNOC_MAS_GFX3D &ocmemnoc OCMEM_SLV_OCMEM>;
- interconnect-names = "gfx-mem",
- "ocmem";
-
- // iommus = <&gpu_iommu 0>;
-
- gpu_opp_table: opp_table {
- compatible = "operating-points-v2";
-
- opp-320000000 {
- opp-hz = /bits/ 64 <320000000>;
- };
+ tsens: thermal-sensor@fc4a9000 {
+ compatible = "qcom,msm8974-tsens";
+ reg = <0xfc4a9000 0x1000>, /* TM */
+ <0xfc4a8000 0x1000>; /* SROT */
+ nvmem-cells = <&tsens_calib>, <&tsens_backup>;
+ nvmem-cell-names = "calib", "calib_backup";
+ #qcom,sensors = <11>;
+ interrupts = <GIC_SPI 184 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-names = "uplow";
+ #thermal-sensor-cells = <1>;
+ };

- opp-200000000 {
- opp-hz = /bits/ 64 <200000000>;
- };
+ restart@fc4ab000 {
+ compatible = "qcom,pshold";
+ reg = <0xfc4ab000 0x4>;
+ };

- opp-27000000 {
- opp-hz = /bits/ 64 <27000000>;
- };
+ qfprom: qfprom@fc4bc000 {
+ #address-cells = <1>;
+ #size-cells = <1>;
+ compatible = "qcom,qfprom";
+ reg = <0xfc4bc000 0x1000>;
+ tsens_calib: calib@d0 {
+ reg = <0xd0 0x18>;
+ };
+ tsens_backup: backup@440 {
+ reg = <0x440 0x10>;
};
};

- mdss: mdss@fd900000 {
- status = "disabled";
+ spmi_bus: spmi@fc4cf000 {
+ compatible = "qcom,spmi-pmic-arb";
+ reg-names = "core", "intr", "cnfg";
+ reg = <0xfc4cf000 0x1000>,
+ <0xfc4cb000 0x1000>,
+ <0xfc4ca000 0x1000>;
+ interrupt-names = "periph_irq";
+ interrupts = <GIC_SPI 190 IRQ_TYPE_LEVEL_HIGH>;
+ qcom,ee = <0>;
+ qcom,channel = <0>;
+ #address-cells = <2>;
+ #size-cells = <0>;
+ interrupt-controller;
+ #interrupt-cells = <4>;
+ };

- compatible = "qcom,mdss";
- reg = <0xfd900000 0x100>,
- <0xfd924000 0x1000>;
- reg-names = "mdss_phys",
- "vbif_phys";
+ remoteproc_mss: remoteproc@fc880000 {
+ compatible = "qcom,msm8974-mss-pil";
+ reg = <0xfc880000 0x100>, <0xfc820000 0x020>;
+ reg-names = "qdsp6", "rmb";

- power-domains = <&mmcc MDSS_GDSC>;
+ interrupts-extended = <&intc GIC_SPI 24 IRQ_TYPE_EDGE_RISING>,
+ <&modem_smp2p_in 0 IRQ_TYPE_EDGE_RISING>,
+ <&modem_smp2p_in 1 IRQ_TYPE_EDGE_RISING>,
+ <&modem_smp2p_in 2 IRQ_TYPE_EDGE_RISING>,
+ <&modem_smp2p_in 3 IRQ_TYPE_EDGE_RISING>;
+ interrupt-names = "wdog", "fatal", "ready", "handover", "stop-ack";

- clocks = <&mmcc MDSS_AHB_CLK>,
- <&mmcc MDSS_AXI_CLK>,
- <&mmcc MDSS_VSYNC_CLK>;
- clock-names = "iface",
- "bus",
- "vsync";
+ clocks = <&gcc GCC_MSS_Q6_BIMC_AXI_CLK>,
+ <&gcc GCC_MSS_CFG_AHB_CLK>,
+ <&gcc GCC_BOOT_ROM_AHB_CLK>,
+ <&xo_board>;
+ clock-names = "iface", "bus", "mem", "xo";
+
+ resets = <&gcc GCC_MSS_RESTART>;
+ reset-names = "mss_restart";
+
+ qcom,halt-regs = <&tcsr_mutex_block 0x1180 0x1200 0x1280>;
+
+ qcom,smem-states = <&modem_smp2p_out 0>;
+ qcom,smem-state-names = "stop";
+
+ status = "disabled";
+
+ mba {
+ memory-region = <&mba_region>;
+ };
+
+ mpss {
+ memory-region = <&mpss_region>;
+ };
+
+ smd-edge {
+ interrupts = <GIC_SPI 25 IRQ_TYPE_EDGE_RISING>;
+
+ qcom,ipc = <&apcs 8 12>;
+ qcom,smd-edge = <0>;
+
+ label = "modem";
+ };
+ };
+
+ tcsr_mutex_block: syscon@fd484000 {
+ compatible = "syscon";
+ reg = <0xfd484000 0x2000>;
+ };
+
+ tcsr: syscon@fd4a0000 {
+ compatible = "syscon";
+ reg = <0xfd4a0000 0x10000>;
+ };
+
+ tlmm: pinctrl@fd510000 {
+ compatible = "qcom,msm8974-pinctrl";
+ reg = <0xfd510000 0x4000>;
+ gpio-controller;
+ gpio-ranges = <&tlmm 0 0 146>;
+ #gpio-cells = <2>;
+ interrupt-controller;
+ #interrupt-cells = <2>;
+ interrupts = <GIC_SPI 208 IRQ_TYPE_LEVEL_HIGH>;
+ };
+
+ mmcc: clock-controller@fd8c0000 {
+ compatible = "qcom,mmcc-msm8974";
+ #clock-cells = <1>;
+ #reset-cells = <1>;
+ #power-domain-cells = <1>;
+ reg = <0xfd8c0000 0x6000>;
+ };
+
+ mdss: mdss@fd900000 {
+ compatible = "qcom,mdss";
+ reg = <0xfd900000 0x100>, <0xfd924000 0x1000>;
+ reg-names = "mdss_phys", "vbif_phys";
+
+ power-domains = <&mmcc MDSS_GDSC>;
+
+ clocks = <&mmcc MDSS_AHB_CLK>,
+ <&mmcc MDSS_AXI_CLK>,
+ <&mmcc MDSS_VSYNC_CLK>;
+ clock-names = "iface", "bus", "vsync";

interrupts = <GIC_SPI 72 IRQ_TYPE_LEVEL_HIGH>;

interrupt-controller;
#interrupt-cells = <1>;

+ status = "disabled";
+
#address-cells = <1>;
#size-cells = <1>;
ranges;

mdp: mdp@fd900000 {
- status = "disabled";
-
compatible = "qcom,mdp5";
reg = <0xfd900100 0x22000>;
reg-names = "mdp_phys";
@@ -1501,10 +1234,7 @@ mdp: mdp@fd900000 {
<&mmcc MDSS_AXI_CLK>,
<&mmcc MDSS_MDP_CLK>,
<&mmcc MDSS_VSYNC_CLK>;
- clock-names = "iface",
- "bus",
- "core",
- "vsync";
+ clock-names = "iface", "bus", "core", "vsync";

interconnects = <&mmssnoc MNOC_MAS_MDP_PORT0 &bimc BIMC_SLV_EBI_CH0>;
interconnect-names = "mdp0-mem";
@@ -1523,8 +1253,6 @@ mdp5_intf1_out: endpoint {
};

dsi0: dsi@fd922800 {
- status = "disabled";
-
compatible = "qcom,mdss-dsi-ctrl";
reg = <0xfd922800 0x1f8>;
reg-names = "dsi_ctrl";
@@ -1532,29 +1260,32 @@ dsi0: dsi@fd922800 {
interrupt-parent = <&mdss>;
interrupts = <4 IRQ_TYPE_LEVEL_HIGH>;

- assigned-clocks = <&mmcc BYTE0_CLK_SRC>,
- <&mmcc PCLK0_CLK_SRC>;
- assigned-clock-parents = <&dsi_phy0 0>,
- <&dsi_phy0 1>;
+ assigned-clocks = <&mmcc BYTE0_CLK_SRC>, <&mmcc PCLK0_CLK_SRC>;
+ assigned-clock-parents = <&dsi0_phy 0>, <&dsi0_phy 1>;

clocks = <&mmcc MDSS_MDP_CLK>,
- <&mmcc MDSS_AHB_CLK>,
- <&mmcc MDSS_AXI_CLK>,
- <&mmcc MDSS_BYTE0_CLK>,
- <&mmcc MDSS_PCLK0_CLK>,
- <&mmcc MDSS_ESC0_CLK>,
- <&mmcc MMSS_MISC_AHB_CLK>;
+ <&mmcc MDSS_AHB_CLK>,
+ <&mmcc MDSS_AXI_CLK>,
+ <&mmcc MDSS_BYTE0_CLK>,
+ <&mmcc MDSS_PCLK0_CLK>,
+ <&mmcc MDSS_ESC0_CLK>,
+ <&mmcc MMSS_MISC_AHB_CLK>;
clock-names = "mdp_core",
- "iface",
- "bus",
- "byte",
- "pixel",
- "core",
- "core_mmss";
-
- phys = <&dsi_phy0>;
+ "iface",
+ "bus",
+ "byte",
+ "pixel",
+ "core",
+ "core_mmss";
+
+ phys = <&dsi0_phy>;
phy-names = "dsi-phy";

+ status = "disabled";
+
+ #address-cells = <1>;
+ #size-cells = <0>;
+
ports {
#address-cells = <1>;
#size-cells = <0>;
@@ -1574,27 +1305,118 @@ dsi0_out: endpoint {
};
};

- dsi_phy0: dsi-phy@fd922a00 {
- status = "disabled";
-
+ dsi0_phy: dsi-phy@fd922a00 {
compatible = "qcom,dsi-phy-28nm-hpm";
reg = <0xfd922a00 0xd4>,
<0xfd922b00 0x280>,
<0xfd922d80 0x30>;
reg-names = "dsi_pll",
- "dsi_phy",
- "dsi_phy_regulator";
+ "dsi_phy",
+ "dsi_phy_regulator";

#clock-cells = <1>;
#phy-cells = <0>;
- qcom,dsi-phy-index = <0>;

clocks = <&mmcc MDSS_AHB_CLK>, <&xo_board>;
clock-names = "iface", "ref";
+
+ status = "disabled";
+ };
+ };
+
+ gpu: adreno@fdb00000 {
+ compatible = "qcom,adreno-330.1", "qcom,adreno";
+ reg = <0xfdb00000 0x10000>;
+ reg-names = "kgsl_3d0_reg_memory";
+
+ interrupts = <GIC_SPI 33 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-names = "kgsl_3d0_irq";
+
+ clocks = <&mmcc OXILI_GFX3D_CLK>,
+ <&mmcc OXILICX_AHB_CLK>,
+ <&mmcc OXILICX_AXI_CLK>;
+ clock-names = "core", "iface", "mem_iface";
+
+ sram = <&gmu_sram>;
+ power-domains = <&mmcc OXILICX_GDSC>;
+ operating-points-v2 = <&gpu_opp_table>;
+
+ interconnects = <&mmssnoc MNOC_MAS_GRAPHICS_3D &bimc BIMC_SLV_EBI_CH0>,
+ <&ocmemnoc OCMEM_VNOC_MAS_GFX3D &ocmemnoc OCMEM_SLV_OCMEM>;
+ interconnect-names = "gfx-mem", "ocmem";
+
+ // iommus = <&gpu_iommu 0>;
+
+ status = "disabled";
+
+ gpu_opp_table: opp_table {
+ compatible = "operating-points-v2";
+
+ opp-320000000 {
+ opp-hz = /bits/ 64 <320000000>;
+ };
+
+ opp-200000000 {
+ opp-hz = /bits/ 64 <200000000>;
+ };
+
+ opp-27000000 {
+ opp-hz = /bits/ 64 <27000000>;
+ };
};
};

- imem@fe805000 {
+ ocmem@fdd00000 {
+ compatible = "qcom,msm8974-ocmem";
+ reg = <0xfdd00000 0x2000>,
+ <0xfec00000 0x180000>;
+ reg-names = "ctrl", "mem";
+ ranges = <0 0xfec00000 0x180000>;
+ clocks = <&rpmcc RPM_SMD_OCMEMGX_CLK>,
+ <&mmcc OCMEMCX_OCMEMNOC_CLK>;
+ clock-names = "core", "iface";
+
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ gmu_sram: gmu-sram@0 {
+ reg = <0x0 0x100000>;
+ };
+ };
+
+ remoteproc_adsp: remoteproc@fe200000 {
+ compatible = "qcom,msm8974-adsp-pil";
+ reg = <0xfe200000 0x100>;
+
+ interrupts-extended = <&intc GIC_SPI 162 IRQ_TYPE_EDGE_RISING>,
+ <&adsp_smp2p_in 0 IRQ_TYPE_EDGE_RISING>,
+ <&adsp_smp2p_in 1 IRQ_TYPE_EDGE_RISING>,
+ <&adsp_smp2p_in 2 IRQ_TYPE_EDGE_RISING>,
+ <&adsp_smp2p_in 3 IRQ_TYPE_EDGE_RISING>;
+ interrupt-names = "wdog", "fatal", "ready", "handover", "stop-ack";
+
+ clocks = <&xo_board>;
+ clock-names = "xo";
+
+ memory-region = <&adsp_region>;
+
+ qcom,smem-states = <&adsp_smp2p_out 0>;
+ qcom,smem-state-names = "stop";
+
+ status = "disabled";
+
+ smd-edge {
+ interrupts = <GIC_SPI 156 IRQ_TYPE_EDGE_RISING>;
+
+ qcom,ipc = <&apcs 8 8>;
+ qcom,smd-edge = <1>;
+ label = "lpass";
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ };
+
+ imem: imem@fe805000 {
status = "disabled";
compatible = "syscon", "simple-mfd";
reg = <0xfe805000 0x1000>;
@@ -1606,76 +1428,194 @@ reboot-mode {
};
};

- smd {
- compatible = "qcom,smd";
+ tcsr_mutex: tcsr-mutex {
+ compatible = "qcom,tcsr-mutex";
+ syscon = <&tcsr_mutex_block 0 0x80>;

- rpm {
- interrupts = <GIC_SPI 168 IRQ_TYPE_EDGE_RISING>;
- qcom,ipc = <&apcs 8 0>;
- qcom,smd-edge = <15>;
+ #hwlock-cells = <1>;
+ };

- rpm_requests {
- compatible = "qcom,rpm-msm8974";
- qcom,smd-channels = "rpm_requests";
+ thermal-zones {
+ cpu0-thermal {
+ polling-delay-passive = <250>;
+ polling-delay = <1000>;

- rpmcc: clock-controller {
- compatible = "qcom,rpmcc-msm8974", "qcom,rpmcc";
- #clock-cells = <1>;
+ thermal-sensors = <&tsens 5>;
+
+ trips {
+ cpu_alert0: trip0 {
+ temperature = <75000>;
+ hysteresis = <2000>;
+ type = "passive";
+ };
+ cpu_crit0: trip1 {
+ temperature = <110000>;
+ hysteresis = <2000>;
+ type = "critical";
+ };
+ };
+ };
+
+ cpu1-thermal {
+ polling-delay-passive = <250>;
+ polling-delay = <1000>;
+
+ thermal-sensors = <&tsens 6>;
+
+ trips {
+ cpu_alert1: trip0 {
+ temperature = <75000>;
+ hysteresis = <2000>;
+ type = "passive";
+ };
+ cpu_crit1: trip1 {
+ temperature = <110000>;
+ hysteresis = <2000>;
+ type = "critical";
+ };
+ };
+ };
+
+ cpu2-thermal {
+ polling-delay-passive = <250>;
+ polling-delay = <1000>;
+
+ thermal-sensors = <&tsens 7>;
+
+ trips {
+ cpu_alert2: trip0 {
+ temperature = <75000>;
+ hysteresis = <2000>;
+ type = "passive";
+ };
+ cpu_crit2: trip1 {
+ temperature = <110000>;
+ hysteresis = <2000>;
+ type = "critical";
+ };
+ };
+ };
+
+ cpu3-thermal {
+ polling-delay-passive = <250>;
+ polling-delay = <1000>;
+
+ thermal-sensors = <&tsens 8>;
+
+ trips {
+ cpu_alert3: trip0 {
+ temperature = <75000>;
+ hysteresis = <2000>;
+ type = "passive";
+ };
+ cpu_crit3: trip1 {
+ temperature = <110000>;
+ hysteresis = <2000>;
+ type = "critical";
+ };
+ };
+ };
+
+ q6-dsp-thermal {
+ polling-delay-passive = <250>;
+ polling-delay = <1000>;
+
+ thermal-sensors = <&tsens 1>;
+
+ trips {
+ q6_dsp_alert0: trip-point0 {
+ temperature = <90000>;
+ hysteresis = <2000>;
+ type = "hot";
};
+ };
+ };

- pm8841-regulators {
- compatible = "qcom,rpm-pm8841-regulators";
-
- pm8841_s1: s1 {};
- pm8841_s2: s2 {};
- pm8841_s3: s3 {};
- pm8841_s4: s4 {};
- pm8841_s5: s5 {};
- pm8841_s6: s6 {};
- pm8841_s7: s7 {};
- pm8841_s8: s8 {};
+ modemtx-thermal {
+ polling-delay-passive = <250>;
+ polling-delay = <1000>;
+
+ thermal-sensors = <&tsens 2>;
+
+ trips {
+ modemtx_alert0: trip-point0 {
+ temperature = <90000>;
+ hysteresis = <2000>;
+ type = "hot";
};
+ };
+ };
+
+ video-thermal {
+ polling-delay-passive = <250>;
+ polling-delay = <1000>;

- pm8941-regulators {
- compatible = "qcom,rpm-pm8941-regulators";
-
- pm8941_s1: s1 {};
- pm8941_s2: s2 {};
- pm8941_s3: s3 {};
-
- pm8941_l1: l1 {};
- pm8941_l2: l2 {};
- pm8941_l3: l3 {};
- pm8941_l4: l4 {};
- pm8941_l5: l5 {};
- pm8941_l6: l6 {};
- pm8941_l7: l7 {};
- pm8941_l8: l8 {};
- pm8941_l9: l9 {};
- pm8941_l10: l10 {};
- pm8941_l11: l11 {};
- pm8941_l12: l12 {};
- pm8941_l13: l13 {};
- pm8941_l14: l14 {};
- pm8941_l15: l15 {};
- pm8941_l16: l16 {};
- pm8941_l17: l17 {};
- pm8941_l18: l18 {};
- pm8941_l19: l19 {};
- pm8941_l20: l20 {};
- pm8941_l21: l21 {};
- pm8941_l22: l22 {};
- pm8941_l23: l23 {};
- pm8941_l24: l24 {};
-
- pm8941_lvs1: lvs1 {};
- pm8941_lvs2: lvs2 {};
- pm8941_lvs3: lvs3 {};
+ thermal-sensors = <&tsens 3>;
+
+ trips {
+ video_alert0: trip-point0 {
+ temperature = <95000>;
+ hysteresis = <2000>;
+ type = "hot";
+ };
+ };
+ };
+
+ wlan-thermal {
+ polling-delay-passive = <250>;
+ polling-delay = <1000>;
+
+ thermal-sensors = <&tsens 4>;
+
+ trips {
+ wlan_alert0: trip-point0 {
+ temperature = <105000>;
+ hysteresis = <2000>;
+ type = "hot";
+ };
+ };
+ };
+
+ gpu-top-thermal {
+ polling-delay-passive = <250>;
+ polling-delay = <1000>;
+
+ thermal-sensors = <&tsens 9>;
+
+ trips {
+ gpu1_alert0: trip-point0 {
+ temperature = <90000>;
+ hysteresis = <2000>;
+ type = "hot";
+ };
+ };
+ };
+
+ gpu-bottom-thermal {
+ polling-delay-passive = <250>;
+ polling-delay = <1000>;
+
+ thermal-sensors = <&tsens 10>;
+
+ trips {
+ gpu2_alert0: trip-point0 {
+ temperature = <90000>;
+ hysteresis = <2000>;
+ type = "hot";
};
};
};
};

+ timer {
+ compatible = "arm,armv7-timer";
+ interrupts = <GIC_PPI 2 0xf08>,
+ <GIC_PPI 3 0xf08>,
+ <GIC_PPI 4 0xf08>,
+ <GIC_PPI 1 0xf08>;
+ clock-frequency = <19200000>;
+ };
+
vreg_boost: vreg-boost {
compatible = "regulator-fixed";

@@ -1692,6 +1632,7 @@ vreg_boost: vreg-boost {
pinctrl-names = "default";
pinctrl-0 = <&boost_bypass_n_pin>;
};
+
vreg_vph_pwr: vreg-vph-pwr {
compatible = "regulator-fixed";
regulator-name = "vph-pwr";
diff --git a/arch/arm/boot/dts/qcom-pm8841.dtsi b/arch/arm/boot/dts/qcom-pm8841.dtsi
index 2caf71eacb52..b5cdde034d18 100644
--- a/arch/arm/boot/dts/qcom-pm8841.dtsi
+++ b/arch/arm/boot/dts/qcom-pm8841.dtsi
@@ -24,6 +24,7 @@ temp-alarm@2400 {
compatible = "qcom,spmi-temp-alarm";
reg = <0x2400>;
interrupts = <4 0x24 0 IRQ_TYPE_EDGE_RISING>;
+ #thermal-sensor-cells = <0>;
};
};

diff --git a/arch/arm/boot/dts/qcom-pm8941.dtsi b/arch/arm/boot/dts/qcom-pm8941.dtsi
index da00b8f5eecd..cdd2bdb77b32 100644
--- a/arch/arm/boot/dts/qcom-pm8941.dtsi
+++ b/arch/arm/boot/dts/qcom-pm8941.dtsi
@@ -131,7 +131,7 @@ pm8941_iadc: iadc@3600 {
qcom,external-resistor-micro-ohms = <10000>;
};

- coincell@2800 {
+ pm8941_coincell: coincell@2800 {
compatible = "qcom,pm8941-coincell";
reg = <0x2800>;
status = "disabled";
diff --git a/arch/arm/boot/dts/qcom-sdx55.dtsi b/arch/arm/boot/dts/qcom-sdx55.dtsi
index d455795da44c..b75e672c239d 100644
--- a/arch/arm/boot/dts/qcom-sdx55.dtsi
+++ b/arch/arm/boot/dts/qcom-sdx55.dtsi
@@ -206,7 +206,7 @@ gcc: clock-controller@100000 {
blsp1_uart3: serial@831000 {
compatible = "qcom,msm-uartdm-v1.4", "qcom,msm-uartdm";
reg = <0x00831000 0x200>;
- interrupts = <GIC_SPI 26 IRQ_TYPE_LEVEL_LOW>;
+ interrupts = <GIC_SPI 26 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&gcc 30>,
<&gcc 9>;
clock-names = "core", "iface";
diff --git a/arch/arm/boot/dts/ste-ux500-samsung-codina.dts b/arch/arm/boot/dts/ste-ux500-samsung-codina.dts
index 1c1725d31c7c..0a10fb0a3741 100644
--- a/arch/arm/boot/dts/ste-ux500-samsung-codina.dts
+++ b/arch/arm/boot/dts/ste-ux500-samsung-codina.dts
@@ -566,8 +566,8 @@ lisd3dh@19 {
reg = <0x19>;
vdd-supply = <&ab8500_ldo_aux1_reg>; // 3V
vddio-supply = <&ab8500_ldo_aux2_reg>; // 1.8V
- mount-matrix = "0", "-1", "0",
- "1", "0", "0",
+ mount-matrix = "0", "1", "0",
+ "-1", "0", "0",
"0", "0", "1";
};
};
diff --git a/arch/arm/boot/dts/ste-ux500-samsung-gavini.dts b/arch/arm/boot/dts/ste-ux500-samsung-gavini.dts
index fd170974765f..149838b5f9c1 100644
--- a/arch/arm/boot/dts/ste-ux500-samsung-gavini.dts
+++ b/arch/arm/boot/dts/ste-ux500-samsung-gavini.dts
@@ -523,8 +523,8 @@ i2c-gate {
accelerometer@18 {
compatible = "bosch,bma222e";
reg = <0x18>;
- mount-matrix = "0", "1", "0",
- "-1", "0", "0",
+ mount-matrix = "0", "-1", "0",
+ "1", "0", "0",
"0", "0", "1";
vddio-supply = <&ab8500_ldo_aux2_reg>; // 1.8V
vdd-supply = <&ab8500_ldo_aux1_reg>; // 3V
diff --git a/arch/arm/boot/dts/ste-ux500-samsung-janice.dts b/arch/arm/boot/dts/ste-ux500-samsung-janice.dts
index 42762bfcd878..2069efb252a7 100644
--- a/arch/arm/boot/dts/ste-ux500-samsung-janice.dts
+++ b/arch/arm/boot/dts/ste-ux500-samsung-janice.dts
@@ -610,8 +610,8 @@ i2c-gate {
accelerometer@8 {
compatible = "bosch,bma222";
reg = <0x08>;
- mount-matrix = "0", "1", "0",
- "-1", "0", "0",
+ mount-matrix = "0", "-1", "0",
+ "1", "0", "0",
"0", "0", "1";
vddio-supply = <&ab8500_ldo_aux2_reg>; // 1.8V
vdd-supply = <&ab8500_ldo_aux1_reg>; // 3V
diff --git a/arch/arm/boot/dts/uniphier-pxs2.dtsi b/arch/arm/boot/dts/uniphier-pxs2.dtsi
index e81e5937a60a..03301ddb3403 100644
--- a/arch/arm/boot/dts/uniphier-pxs2.dtsi
+++ b/arch/arm/boot/dts/uniphier-pxs2.dtsi
@@ -597,8 +597,8 @@ usb0: usb@65a00000 {
compatible = "socionext,uniphier-dwc3", "snps,dwc3";
status = "disabled";
reg = <0x65a00000 0xcd00>;
- interrupt-names = "host", "peripheral";
- interrupts = <0 134 4>, <0 135 4>;
+ interrupt-names = "dwc_usb3";
+ interrupts = <0 134 4>;
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_usb0>, <&pinctrl_usb2>;
clock-names = "ref", "bus_early", "suspend";
@@ -693,8 +693,8 @@ usb1: usb@65c00000 {
compatible = "socionext,uniphier-dwc3", "snps,dwc3";
status = "disabled";
reg = <0x65c00000 0xcd00>;
- interrupt-names = "host", "peripheral";
- interrupts = <0 137 4>, <0 138 4>;
+ interrupt-names = "dwc_usb3";
+ interrupts = <0 137 4>;
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_usb1>, <&pinctrl_usb3>;
clock-names = "ref", "bus_early", "suspend";
diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index e4dba5461cb3..149a5bd6b88c 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -63,7 +63,7 @@ config CRYPTO_SHA512_ARM
using optimized ARM assembler and NEON, when available.

config CRYPTO_BLAKE2S_ARM
- tristate "BLAKE2s digest algorithm (ARM)"
+ bool "BLAKE2s digest algorithm (ARM)"
select CRYPTO_ARCH_HAVE_LIB_BLAKE2S
help
BLAKE2s digest algorithm optimized with ARM scalar instructions. This
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index 0274f81cc8ea..971e74546fb1 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -9,8 +9,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
-obj-$(CONFIG_CRYPTO_BLAKE2S_ARM) += blake2s-arm.o
-obj-$(if $(CONFIG_CRYPTO_BLAKE2S_ARM),y) += libblake2s-arm.o
+obj-$(CONFIG_CRYPTO_BLAKE2S_ARM) += libblake2s-arm.o
obj-$(CONFIG_CRYPTO_BLAKE2B_NEON) += blake2b-neon.o
obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha-neon.o
obj-$(CONFIG_CRYPTO_POLY1305_ARM) += poly1305-arm.o
@@ -32,7 +31,6 @@ sha256-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha256_neon_glue.o
sha256-arm-y := sha256-core.o sha256_glue.o $(sha256-arm-neon-y)
sha512-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha512-neon-glue.o
sha512-arm-y := sha512-core.o sha512-glue.o $(sha512-arm-neon-y)
-blake2s-arm-y := blake2s-shash.o
libblake2s-arm-y:= blake2s-core.o blake2s-glue.o
blake2b-neon-y := blake2b-neon-core.o blake2b-neon-glue.o
sha1-arm-ce-y := sha1-ce-core.o sha1-ce-glue.o
diff --git a/arch/arm/crypto/blake2s-shash.c b/arch/arm/crypto/blake2s-shash.c
deleted file mode 100644
index 763c73beea2d..000000000000
--- a/arch/arm/crypto/blake2s-shash.c
+++ /dev/null
@@ -1,75 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * BLAKE2s digest algorithm, ARM scalar implementation
- *
- * Copyright 2020 Google LLC
- */
-
-#include <crypto/internal/blake2s.h>
-#include <crypto/internal/hash.h>
-
-#include <linux/module.h>
-
-static int crypto_blake2s_update_arm(struct shash_desc *desc,
- const u8 *in, unsigned int inlen)
-{
- return crypto_blake2s_update(desc, in, inlen, false);
-}
-
-static int crypto_blake2s_final_arm(struct shash_desc *desc, u8 *out)
-{
- return crypto_blake2s_final(desc, out, false);
-}
-
-#define BLAKE2S_ALG(name, driver_name, digest_size) \
- { \
- .base.cra_name = name, \
- .base.cra_driver_name = driver_name, \
- .base.cra_priority = 200, \
- .base.cra_flags = CRYPTO_ALG_OPTIONAL_KEY, \
- .base.cra_blocksize = BLAKE2S_BLOCK_SIZE, \
- .base.cra_ctxsize = sizeof(struct blake2s_tfm_ctx), \
- .base.cra_module = THIS_MODULE, \
- .digestsize = digest_size, \
- .setkey = crypto_blake2s_setkey, \
- .init = crypto_blake2s_init, \
- .update = crypto_blake2s_update_arm, \
- .final = crypto_blake2s_final_arm, \
- .descsize = sizeof(struct blake2s_state), \
- }
-
-static struct shash_alg blake2s_arm_algs[] = {
- BLAKE2S_ALG("blake2s-128", "blake2s-128-arm", BLAKE2S_128_HASH_SIZE),
- BLAKE2S_ALG("blake2s-160", "blake2s-160-arm", BLAKE2S_160_HASH_SIZE),
- BLAKE2S_ALG("blake2s-224", "blake2s-224-arm", BLAKE2S_224_HASH_SIZE),
- BLAKE2S_ALG("blake2s-256", "blake2s-256-arm", BLAKE2S_256_HASH_SIZE),
-};
-
-static int __init blake2s_arm_mod_init(void)
-{
- return IS_REACHABLE(CONFIG_CRYPTO_HASH) ?
- crypto_register_shashes(blake2s_arm_algs,
- ARRAY_SIZE(blake2s_arm_algs)) : 0;
-}
-
-static void __exit blake2s_arm_mod_exit(void)
-{
- if (IS_REACHABLE(CONFIG_CRYPTO_HASH))
- crypto_unregister_shashes(blake2s_arm_algs,
- ARRAY_SIZE(blake2s_arm_algs));
-}
-
-module_init(blake2s_arm_mod_init);
-module_exit(blake2s_arm_mod_exit);
-
-MODULE_DESCRIPTION("BLAKE2s digest algorithm, ARM scalar implementation");
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Eric Biggers <ebiggers@xxxxxxxxxx>");
-MODULE_ALIAS_CRYPTO("blake2s-128");
-MODULE_ALIAS_CRYPTO("blake2s-128-arm");
-MODULE_ALIAS_CRYPTO("blake2s-160");
-MODULE_ALIAS_CRYPTO("blake2s-160-arm");
-MODULE_ALIAS_CRYPTO("blake2s-224");
-MODULE_ALIAS_CRYPTO("blake2s-224-arm");
-MODULE_ALIAS_CRYPTO("blake2s-256");
-MODULE_ALIAS_CRYPTO("blake2s-256-arm");
diff --git a/arch/arm/lib/findbit.S b/arch/arm/lib/findbit.S
index b5e8b9ae4c7d..7fd3600db8ef 100644
--- a/arch/arm/lib/findbit.S
+++ b/arch/arm/lib/findbit.S
@@ -40,8 +40,8 @@ ENDPROC(_find_first_zero_bit_le)
* Prototype: int find_next_zero_bit(void *addr, unsigned int maxbit, int offset)
*/
ENTRY(_find_next_zero_bit_le)
- teq r1, #0
- beq 3b
+ cmp r2, r1
+ bhs 3b
ands ip, r2, #7
beq 1b @ If new byte, goto old routine
ARM( ldrb r3, [r0, r2, lsr #3] )
@@ -81,8 +81,8 @@ ENDPROC(_find_first_bit_le)
* Prototype: int find_next_zero_bit(void *addr, unsigned int maxbit, int offset)
*/
ENTRY(_find_next_bit_le)
- teq r1, #0
- beq 3b
+ cmp r2, r1
+ bhs 3b
ands ip, r2, #7
beq 1b @ If new byte, goto old routine
ARM( ldrb r3, [r0, r2, lsr #3] )
@@ -115,8 +115,8 @@ ENTRY(_find_first_zero_bit_be)
ENDPROC(_find_first_zero_bit_be)

ENTRY(_find_next_zero_bit_be)
- teq r1, #0
- beq 3b
+ cmp r2, r1
+ bhs 3b
ands ip, r2, #7
beq 1b @ If new byte, goto old routine
eor r3, r2, #0x18 @ big endian byte ordering
@@ -149,8 +149,8 @@ ENTRY(_find_first_bit_be)
ENDPROC(_find_first_bit_be)

ENTRY(_find_next_bit_be)
- teq r1, #0
- beq 3b
+ cmp r2, r1
+ bhs 3b
ands ip, r2, #7
beq 1b @ If new byte, goto old routine
eor r3, r2, #0x18 @ big endian byte ordering
diff --git a/arch/arm/mach-bcm/bcm_kona_smc.c b/arch/arm/mach-bcm/bcm_kona_smc.c
index 43829e49ad93..347bfb7f03e2 100644
--- a/arch/arm/mach-bcm/bcm_kona_smc.c
+++ b/arch/arm/mach-bcm/bcm_kona_smc.c
@@ -52,6 +52,7 @@ int __init bcm_kona_smc_init(void)
return -ENODEV;

prop_val = of_get_address(node, 0, &prop_size, NULL);
+ of_node_put(node);
if (!prop_val)
return -EINVAL;

diff --git a/arch/arm/mach-omap2/display.c b/arch/arm/mach-omap2/display.c
index 21413a9b7b6c..8d829f3dafe7 100644
--- a/arch/arm/mach-omap2/display.c
+++ b/arch/arm/mach-omap2/display.c
@@ -211,6 +211,7 @@ static int __init omapdss_init_fbdev(void)
node = of_find_node_by_name(NULL, "omap4_padconf_global");
if (node)
omap4_dsi_mux_syscon = syscon_node_to_regmap(node);
+ of_node_put(node);

return 0;
}
@@ -259,11 +260,13 @@ static int __init omapdss_init_of(void)

if (!pdev) {
pr_err("Unable to find DSS platform device\n");
+ of_node_put(node);
return -ENODEV;
}

r = of_platform_populate(node, NULL, NULL, &pdev->dev);
put_device(&pdev->dev);
+ of_node_put(node);
if (r) {
pr_err("Unable to populate DSS submodule devices\n");
return r;
diff --git a/arch/arm/mach-omap2/pdata-quirks.c b/arch/arm/mach-omap2/pdata-quirks.c
index e7fd29a502a0..af9d8c432af0 100644
--- a/arch/arm/mach-omap2/pdata-quirks.c
+++ b/arch/arm/mach-omap2/pdata-quirks.c
@@ -551,6 +551,8 @@ pdata_quirks_init_clocks(const struct of_device_id *omap_dt_match_table)

of_platform_populate(np, omap_dt_match_table,
omap_auxdata_lookup, NULL);
+
+ of_node_put(np);
}
}

diff --git a/arch/arm/mach-omap2/prm3xxx.c b/arch/arm/mach-omap2/prm3xxx.c
index 1b442b128569..63e73e9b82bc 100644
--- a/arch/arm/mach-omap2/prm3xxx.c
+++ b/arch/arm/mach-omap2/prm3xxx.c
@@ -708,6 +708,7 @@ static int omap3xxx_prm_late_init(void)
}

irq_num = of_irq_get(np, 0);
+ of_node_put(np);
if (irq_num == -EPROBE_DEFER)
return irq_num;

diff --git a/arch/arm/mach-shmobile/regulator-quirk-rcar-gen2.c b/arch/arm/mach-shmobile/regulator-quirk-rcar-gen2.c
index 09ef73b99dd8..ba44cec5e59a 100644
--- a/arch/arm/mach-shmobile/regulator-quirk-rcar-gen2.c
+++ b/arch/arm/mach-shmobile/regulator-quirk-rcar-gen2.c
@@ -125,6 +125,7 @@ static int regulator_quirk_notify(struct notifier_block *nb,

list_for_each_entry_safe(pos, tmp, &quirk_list, list) {
list_del(&pos->list);
+ of_node_put(pos->np);
kfree(pos);
}

@@ -174,11 +175,12 @@ static int __init rcar_gen2_regulator_quirk(void)
memcpy(&quirk->i2c_msg, id->data, sizeof(quirk->i2c_msg));

quirk->id = id;
- quirk->np = np;
+ quirk->np = of_node_get(np);
quirk->i2c_msg.addr = addr;

ret = of_irq_parse_one(np, 0, argsa);
if (ret) { /* Skip invalid entry and continue */
+ of_node_put(np);
kfree(quirk);
continue;
}
@@ -225,6 +227,7 @@ static int __init rcar_gen2_regulator_quirk(void)
err_mem:
list_for_each_entry_safe(pos, tmp, &quirk_list, list) {
list_del(&pos->list);
+ of_node_put(pos->np);
kfree(pos);
}

diff --git a/arch/arm/mach-zynq/common.c b/arch/arm/mach-zynq/common.c
index e1ca6a5732d2..15e8a321a713 100644
--- a/arch/arm/mach-zynq/common.c
+++ b/arch/arm/mach-zynq/common.c
@@ -77,6 +77,7 @@ static int __init zynq_get_revision(void)
}

zynq_devcfg_base = of_iomap(np, 0);
+ of_node_put(np);
if (!zynq_devcfg_base) {
pr_err("%s: Unable to map I/O memory\n", __func__);
return -1;
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 20ea89d9ac2f..54cf6faf339c 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -223,6 +223,7 @@ config ARM64
select THREAD_INFO_IN_TASK
select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD
select TRACE_IRQFLAGS_SUPPORT
+ select TRACE_IRQFLAGS_NMI_SUPPORT
help
ARM 64-bit (AArch64) Linux support.

@@ -500,6 +501,22 @@ config ARM64_ERRATUM_834220

If unsure, say Y.

+config ARM64_ERRATUM_1742098
+ bool "Cortex-A57/A72: 1742098: ELR recorded incorrectly on interrupt taken between cryptographic instructions in a sequence"
+ depends on COMPAT
+ default y
+ help
+ This option removes the AES hwcap for aarch32 user-space to
+ workaround erratum 1742098 on Cortex-A57 and Cortex-A72.
+
+ Affected parts may corrupt the AES state if an interrupt is
+ taken between a pair of AES instructions. These instructions
+ are only present if the cryptography extensions are present.
+ All software should have a fallback implementation for CPUs
+ that don't implement the cryptography extensions.
+
+ If unsure, say Y.
+
config ARM64_ERRATUM_845719
bool "Cortex-A53: 845719: a load might read incorrect data"
depends on COMPAT
diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64-orangepi-win.dts b/arch/arm64/boot/dts/allwinner/sun50i-a64-orangepi-win.dts
index c519d9fa6967..3d2c68d58f49 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-a64-orangepi-win.dts
+++ b/arch/arm64/boot/dts/allwinner/sun50i-a64-orangepi-win.dts
@@ -40,7 +40,7 @@ hdmi_con_in: endpoint {
leds {
compatible = "gpio-leds";

- status {
+ led-0 {
label = "orangepi:green:status";
gpios = <&pio 7 11 GPIO_ACTIVE_HIGH>; /* PH11 */
};
diff --git a/arch/arm64/boot/dts/exynos/exynosautov9-pinctrl.dtsi b/arch/arm64/boot/dts/exynos/exynosautov9-pinctrl.dtsi
index ef0349d1c3d0..68f4a0fae7cf 100644
--- a/arch/arm64/boot/dts/exynos/exynosautov9-pinctrl.dtsi
+++ b/arch/arm64/boot/dts/exynos/exynosautov9-pinctrl.dtsi
@@ -1089,21 +1089,21 @@ spi10_cs_func: spi10-cs-func-pins {

/* PERIC1 USI11_SPI */
spi11_bus: spi11-pins {
- samsung,pins = "gpp3-6", "gpp3-5", "gpp3-4";
+ samsung,pins = "gpp5-6", "gpp5-5", "gpp5-4";
samsung,pin-function = <EXYNOS_PIN_FUNC_2>;
samsung,pin-pud = <EXYNOS_PIN_PULL_NONE>;
samsung,pin-drv = <EXYNOS5420_PIN_DRV_LV1>;
};

spi11_cs: spi11-cs-pins {
- samsung,pins = "gpp3-7";
+ samsung,pins = "gpp5-7";
samsung,pin-function = <EXYNOS_PIN_FUNC_OUTPUT>;
samsung,pin-pud = <EXYNOS_PIN_PULL_NONE>;
samsung,pin-drv = <EXYNOS5420_PIN_DRV_LV1>;
};

spi11_cs_func: spi11-cs-func-pins {
- samsung,pins = "gpp3-7";
+ samsung,pins = "gpp5-7";
samsung,pin-function = <EXYNOS_PIN_FUNC_2>;
samsung,pin-pud = <EXYNOS_PIN_PULL_NONE>;
samsung,pin-drv = <EXYNOS5420_PIN_DRV_LV1>;
diff --git a/arch/arm64/boot/dts/mediatek/mt7622-bananapi-bpi-r64.dts b/arch/arm64/boot/dts/mediatek/mt7622-bananapi-bpi-r64.dts
index 2b9bf8dd14ec..7538918c7a82 100644
--- a/arch/arm64/boot/dts/mediatek/mt7622-bananapi-bpi-r64.dts
+++ b/arch/arm64/boot/dts/mediatek/mt7622-bananapi-bpi-r64.dts
@@ -49,7 +49,7 @@ factory {
wps {
label = "wps";
linux,code = <KEY_WPS_BUTTON>;
- gpios = <&pio 102 GPIO_ACTIVE_HIGH>;
+ gpios = <&pio 102 GPIO_ACTIVE_LOW>;
};
};

diff --git a/arch/arm64/boot/dts/mediatek/mt8192.dtsi b/arch/arm64/boot/dts/mediatek/mt8192.dtsi
index bcecc7484453..96439a4bce09 100644
--- a/arch/arm64/boot/dts/mediatek/mt8192.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8192.dtsi
@@ -41,7 +41,7 @@ cpu0: cpu@0 {
reg = <0x000>;
enable-method = "psci";
clock-frequency = <1701000000>;
- cpu-idle-states = <&cpuoff_l &clusteroff_l>;
+ cpu-idle-states = <&cpu_sleep_l &cluster_sleep_l>;
next-level-cache = <&l2_0>;
capacity-dmips-mhz = <530>;
};
@@ -52,7 +52,7 @@ cpu1: cpu@100 {
reg = <0x100>;
enable-method = "psci";
clock-frequency = <1701000000>;
- cpu-idle-states = <&cpuoff_l &clusteroff_l>;
+ cpu-idle-states = <&cpu_sleep_l &cluster_sleep_l>;
next-level-cache = <&l2_0>;
capacity-dmips-mhz = <530>;
};
@@ -63,7 +63,7 @@ cpu2: cpu@200 {
reg = <0x200>;
enable-method = "psci";
clock-frequency = <1701000000>;
- cpu-idle-states = <&cpuoff_l &clusteroff_l>;
+ cpu-idle-states = <&cpu_sleep_l &cluster_sleep_l>;
next-level-cache = <&l2_0>;
capacity-dmips-mhz = <530>;
};
@@ -74,7 +74,7 @@ cpu3: cpu@300 {
reg = <0x300>;
enable-method = "psci";
clock-frequency = <1701000000>;
- cpu-idle-states = <&cpuoff_l &clusteroff_l>;
+ cpu-idle-states = <&cpu_sleep_l &cluster_sleep_l>;
next-level-cache = <&l2_0>;
capacity-dmips-mhz = <530>;
};
@@ -85,7 +85,7 @@ cpu4: cpu@400 {
reg = <0x400>;
enable-method = "psci";
clock-frequency = <2171000000>;
- cpu-idle-states = <&cpuoff_b &clusteroff_b>;
+ cpu-idle-states = <&cpu_sleep_b &cluster_sleep_b>;
next-level-cache = <&l2_1>;
capacity-dmips-mhz = <1024>;
};
@@ -96,7 +96,7 @@ cpu5: cpu@500 {
reg = <0x500>;
enable-method = "psci";
clock-frequency = <2171000000>;
- cpu-idle-states = <&cpuoff_b &clusteroff_b>;
+ cpu-idle-states = <&cpu_sleep_b &cluster_sleep_b>;
next-level-cache = <&l2_1>;
capacity-dmips-mhz = <1024>;
};
@@ -107,7 +107,7 @@ cpu6: cpu@600 {
reg = <0x600>;
enable-method = "psci";
clock-frequency = <2171000000>;
- cpu-idle-states = <&cpuoff_b &clusteroff_b>;
+ cpu-idle-states = <&cpu_sleep_b &cluster_sleep_b>;
next-level-cache = <&l2_1>;
capacity-dmips-mhz = <1024>;
};
@@ -118,7 +118,7 @@ cpu7: cpu@700 {
reg = <0x700>;
enable-method = "psci";
clock-frequency = <2171000000>;
- cpu-idle-states = <&cpuoff_b &clusteroff_b>;
+ cpu-idle-states = <&cpu_sleep_b &cluster_sleep_b>;
next-level-cache = <&l2_1>;
capacity-dmips-mhz = <1024>;
};
@@ -170,8 +170,8 @@ l3_0: l3-cache {
};

idle-states {
- entry-method = "arm,psci";
- cpuoff_l: cpuoff_l {
+ entry-method = "psci";
+ cpu_sleep_l: cpu-sleep-l {
compatible = "arm,idle-state";
arm,psci-suspend-param = <0x00010001>;
local-timer-stop;
@@ -179,7 +179,7 @@ cpuoff_l: cpuoff_l {
exit-latency-us = <140>;
min-residency-us = <780>;
};
- cpuoff_b: cpuoff_b {
+ cpu_sleep_b: cpu-sleep-b {
compatible = "arm,idle-state";
arm,psci-suspend-param = <0x00010001>;
local-timer-stop;
@@ -187,7 +187,7 @@ cpuoff_b: cpuoff_b {
exit-latency-us = <145>;
min-residency-us = <720>;
};
- clusteroff_l: clusteroff_l {
+ cluster_sleep_l: cluster-sleep-l {
compatible = "arm,idle-state";
arm,psci-suspend-param = <0x01010002>;
local-timer-stop;
@@ -195,7 +195,7 @@ clusteroff_l: clusteroff_l {
exit-latency-us = <155>;
min-residency-us = <860>;
};
- clusteroff_b: clusteroff_b {
+ cluster_sleep_b: cluster-sleep-b {
compatible = "arm,idle-state";
arm,psci-suspend-param = <0x01010002>;
local-timer-stop;
diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
index e9b40f5d79ec..77c597a6386f 100644
--- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
@@ -1807,6 +1807,7 @@ sram@30000000 {
#address-cells = <1>;
#size-cells = <1>;
ranges = <0x0 0x0 0x30000000 0x50000>;
+ no-memory-wc;

cpu_bpmp_tx: sram@4e000 {
reg = <0x4e000 0x1000>;
diff --git a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
index a7d7cfd66379..b0f9393dd39c 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
@@ -75,7 +75,7 @@ eeprom@50 {

/* SDMMC1 (SD/MMC) */
mmc@3400000 {
- cd-gpios = <&gpio TEGRA194_MAIN_GPIO(A, 0) GPIO_ACTIVE_LOW>;
+ cd-gpios = <&gpio TEGRA194_MAIN_GPIO(G, 7) GPIO_ACTIVE_LOW>;
};

/* SDMMC4 (eMMC) */
diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 751ebe5e9506..61465eb6cccd 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -2648,6 +2648,7 @@ sram@40000000 {
#address-cells = <1>;
#size-cells = <1>;
ranges = <0x0 0x0 0x40000000 0x50000>;
+ no-memory-wc;

cpu_bpmp_tx: sram@4e000 {
reg = <0x4e000 0x1000>;
diff --git a/arch/arm64/boot/dts/nvidia/tegra234.dtsi b/arch/arm64/boot/dts/nvidia/tegra234.dtsi
index aaace605bdaa..9916b87fa83f 100644
--- a/arch/arm64/boot/dts/nvidia/tegra234.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra234.dtsi
@@ -1264,6 +1264,7 @@ sram@40000000 {
#address-cells = <1>;
#size-cells = <1>;
ranges = <0x0 0x0 0x40000000 0x80000>;
+ no-memory-wc;

cpu_bpmp_tx: sram@70000 {
reg = <0x70000 0x1000>;
diff --git a/arch/arm64/boot/dts/qcom/ipq6018.dtsi b/arch/arm64/boot/dts/qcom/ipq6018.dtsi
index aac56575e30d..5cdec104d899 100644
--- a/arch/arm64/boot/dts/qcom/ipq6018.dtsi
+++ b/arch/arm64/boot/dts/qcom/ipq6018.dtsi
@@ -525,9 +525,9 @@ timer {
};

timer@b120000 {
- #address-cells = <2>;
- #size-cells = <2>;
- ranges;
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges = <0 0 0 0x10000000>;
compatible = "arm,armv7-timer-mem";
reg = <0x0 0x0b120000 0x0 0x1000>;

@@ -535,49 +535,49 @@ frame@b120000 {
frame-number = <0>;
interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 7 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x0b121000 0x0 0x1000>,
- <0x0 0x0b122000 0x0 0x1000>;
+ reg = <0x0b121000 0x1000>,
+ <0x0b122000 0x1000>;
};

frame@b123000 {
frame-number = <1>;
interrupts = <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0xb123000 0x0 0x1000>;
+ reg = <0x0b123000 0x1000>;
status = "disabled";
};

frame@b124000 {
frame-number = <2>;
interrupts = <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x0b124000 0x0 0x1000>;
+ reg = <0x0b124000 0x1000>;
status = "disabled";
};

frame@b125000 {
frame-number = <3>;
interrupts = <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x0b125000 0x0 0x1000>;
+ reg = <0x0b125000 0x1000>;
status = "disabled";
};

frame@b126000 {
frame-number = <4>;
interrupts = <GIC_SPI 12 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x0b126000 0x0 0x1000>;
+ reg = <0x0b126000 0x1000>;
status = "disabled";
};

frame@b127000 {
frame-number = <5>;
interrupts = <GIC_SPI 13 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x0b127000 0x0 0x1000>;
+ reg = <0x0b127000 0x1000>;
status = "disabled";
};

frame@b128000 {
frame-number = <6>;
interrupts = <GIC_SPI 14 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x0b128000 0x0 0x1000>;
+ reg = <0x0b128000 0x1000>;
status = "disabled";
};
};
diff --git a/arch/arm64/boot/dts/qcom/ipq8074.dtsi b/arch/arm64/boot/dts/qcom/ipq8074.dtsi
index 8d4e0e193439..664fba3632b1 100644
--- a/arch/arm64/boot/dts/qcom/ipq8074.dtsi
+++ b/arch/arm64/boot/dts/qcom/ipq8074.dtsi
@@ -534,7 +534,7 @@ qpic_bam: dma-controller@7984000 {
status = "disabled";
};

- qpic_nand: nand@79b0000 {
+ qpic_nand: nand-controller@79b0000 {
compatible = "qcom,ipq8074-nand";
reg = <0x079b0000 0x10000>;
#address-cells = <1>;
diff --git a/arch/arm64/boot/dts/qcom/msm8916.dtsi b/arch/arm64/boot/dts/qcom/msm8916.dtsi
index e34963505e07..7ecd747dc624 100644
--- a/arch/arm64/boot/dts/qcom/msm8916.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8916.dtsi
@@ -1758,8 +1758,8 @@ pronto: remoteproc@a21b000 {
<&rpmpd MSM8916_VDDMX>;
power-domain-names = "cx", "mx";

- qcom,state = <&wcnss_smp2p_out 0>;
- qcom,state-names = "stop";
+ qcom,smem-states = <&wcnss_smp2p_out 0>;
+ qcom,smem-state-names = "stop";

pinctrl-names = "default";
pinctrl-0 = <&wcnss_pin_a>;
diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi b/arch/arm64/boot/dts/qcom/msm8996.dtsi
index b9a48cfd760f..078edf46de2b 100644
--- a/arch/arm64/boot/dts/qcom/msm8996.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi
@@ -603,7 +603,7 @@ pciephy_0: phy@35000 {
<0x00035400 0x1dc>;
#phy-cells = <0>;

- #clock-cells = <1>;
+ #clock-cells = <0>;
clock-output-names = "pcie_0_pipe_clk_src";
clocks = <&gcc GCC_PCIE_0_PIPE_CLK>;
clock-names = "pipe0";
@@ -617,6 +617,7 @@ pciephy_1: phy@36000 {
<0x00036400 0x1dc>;
#phy-cells = <0>;

+ #clock-cells = <0>;
clock-output-names = "pcie_1_pipe_clk_src";
clocks = <&gcc GCC_PCIE_1_PIPE_CLK>;
clock-names = "pipe1";
@@ -630,6 +631,7 @@ pciephy_2: phy@37000 {
<0x00037400 0x1dc>;
#phy-cells = <0>;

+ #clock-cells = <0>;
clock-output-names = "pcie_2_pipe_clk_src";
clocks = <&gcc GCC_PCIE_2_PIPE_CLK>;
clock-names = "pipe2";
@@ -2659,7 +2661,7 @@ ssusb_phy_0: phy@7410200 {
<0x07410600 0x1a8>;
#phy-cells = <0>;

- #clock-cells = <1>;
+ #clock-cells = <0>;
clock-output-names = "usb3_phy_pipe_clk_src";
clocks = <&gcc GCC_USB3_PHY_PIPE_CLK>;
clock-names = "pipe0";
diff --git a/arch/arm64/boot/dts/qcom/msm8998-sony-xperia-yoshino-poplar.dts b/arch/arm64/boot/dts/qcom/msm8998-sony-xperia-yoshino-poplar.dts
index 4a1f98a21031..c21333aa73c2 100644
--- a/arch/arm64/boot/dts/qcom/msm8998-sony-xperia-yoshino-poplar.dts
+++ b/arch/arm64/boot/dts/qcom/msm8998-sony-xperia-yoshino-poplar.dts
@@ -26,11 +26,13 @@ &lab {
};

&vreg_l18a_2p85 {
- regulator-min-microvolt = <2850000>;
- regulator-max-microvolt = <2850000>;
+ /* Note: Round-down from 2850000 to be a multiple of PLDO step-size 8000 */
+ regulator-min-microvolt = <2848000>;
+ regulator-max-microvolt = <2848000>;
};

&vreg_l22a_2p85 {
- regulator-min-microvolt = <2700000>;
- regulator-max-microvolt = <2700000>;
+ /* Note: Round-down from 2700000 to be a multiple of PLDO step-size 8000 */
+ regulator-min-microvolt = <2696000>;
+ regulator-max-microvolt = <2696000>;
};
diff --git a/arch/arm64/boot/dts/qcom/qcs404.dtsi b/arch/arm64/boot/dts/qcom/qcs404.dtsi
index 3f06f7cd3cf2..65d71adee345 100644
--- a/arch/arm64/boot/dts/qcom/qcs404.dtsi
+++ b/arch/arm64/boot/dts/qcom/qcs404.dtsi
@@ -548,7 +548,7 @@ dwc3@7580000 {
compatible = "snps,dwc3";
reg = <0x07580000 0xcd00>;
interrupts = <GIC_SPI 26 IRQ_TYPE_LEVEL_HIGH>;
- phys = <&usb2_phy_sec>, <&usb3_phy>;
+ phys = <&usb2_phy_prim>, <&usb3_phy>;
phy-names = "usb2-phy", "usb3-phy";
snps,has-lpm-erratum;
snps,hird-threshold = /bits/ 8 <0x10>;
@@ -577,7 +577,7 @@ dwc3@78c0000 {
compatible = "snps,dwc3";
reg = <0x078c0000 0xcc00>;
interrupts = <GIC_SPI 44 IRQ_TYPE_LEVEL_HIGH>;
- phys = <&usb2_phy_prim>;
+ phys = <&usb2_phy_sec>;
phy-names = "usb2-phy";
snps,has-lpm-erratum;
snps,hird-threshold = /bits/ 8 <0x10>;
diff --git a/arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi b/arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi
index 732e1181af48..262224504921 100644
--- a/arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi
@@ -42,6 +42,7 @@ charger-crit {
*/

/delete-node/ &hyp_mem;
+/delete-node/ &ipa_fw_mem;
/delete-node/ &xbl_mem;
/delete-node/ &aop_mem;
/delete-node/ &sec_apps_mem;
diff --git a/arch/arm64/boot/dts/qcom/sc7180.dtsi b/arch/arm64/boot/dts/qcom/sc7180.dtsi
index e1c46b80f14a..120db2d0a309 100644
--- a/arch/arm64/boot/dts/qcom/sc7180.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7180.dtsi
@@ -3219,7 +3219,7 @@ aoss_reset: reset-controller@c2a0000 {
};

aoss_qmp: power-controller@c300000 {
- compatible = "qcom,sc7180-aoss-qmp";
+ compatible = "qcom,sc7180-aoss-qmp", "qcom,aoss-qmp";
reg = <0 0x0c300000 0 0x400>;
interrupts = <GIC_SPI 389 IRQ_TYPE_EDGE_RISING>;
mboxes = <&apss_shared 0>;
@@ -3388,9 +3388,9 @@ watchdog@17c10000 {
};

timer@17c20000{
- #address-cells = <2>;
- #size-cells = <2>;
- ranges;
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges = <0 0 0 0x20000000>;
compatible = "arm,armv7-timer-mem";
reg = <0 0x17c20000 0 0x1000>;

@@ -3398,49 +3398,49 @@ frame@17c21000 {
frame-number = <0>;
interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 6 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17c21000 0 0x1000>,
- <0 0x17c22000 0 0x1000>;
+ reg = <0x17c21000 0x1000>,
+ <0x17c22000 0x1000>;
};

frame@17c23000 {
frame-number = <1>;
interrupts = <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17c23000 0 0x1000>;
+ reg = <0x17c23000 0x1000>;
status = "disabled";
};

frame@17c25000 {
frame-number = <2>;
interrupts = <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17c25000 0 0x1000>;
+ reg = <0x17c25000 0x1000>;
status = "disabled";
};

frame@17c27000 {
frame-number = <3>;
interrupts = <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17c27000 0 0x1000>;
+ reg = <0x17c27000 0x1000>;
status = "disabled";
};

frame@17c29000 {
frame-number = <4>;
interrupts = <GIC_SPI 12 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17c29000 0 0x1000>;
+ reg = <0x17c29000 0x1000>;
status = "disabled";
};

frame@17c2b000 {
frame-number = <5>;
interrupts = <GIC_SPI 13 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17c2b000 0 0x1000>;
+ reg = <0x17c2b000 0x1000>;
status = "disabled";
};

frame@17c2d000 {
frame-number = <6>;
interrupts = <GIC_SPI 14 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17c2d000 0 0x1000>;
+ reg = <0x17c2d000 0x1000>;
status = "disabled";
};
};
diff --git a/arch/arm64/boot/dts/qcom/sc7280.dtsi b/arch/arm64/boot/dts/qcom/sc7280.dtsi
index f0b64be63c21..793ac9715da2 100644
--- a/arch/arm64/boot/dts/qcom/sc7280.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280.dtsi
@@ -810,7 +810,7 @@ gcc: clock-controller@100000 {
reg = <0 0x00100000 0 0x1f0000>;
clocks = <&rpmhcc RPMH_CXO_CLK>,
<&rpmhcc RPMH_CXO_CLK_A>, <&sleep_clk>,
- <0>, <&pcie1_lane 0>,
+ <0>, <&pcie1_lane>,
<0>, <0>, <0>, <0>;
clock-names = "bi_tcxo", "bi_tcxo_ao", "sleep_clk",
"pcie_0_pipe_clk", "pcie_1_pipe_clk",
@@ -1839,7 +1839,7 @@ pcie1: pci@1c08000 {

clocks = <&gcc GCC_PCIE_1_PIPE_CLK>,
<&gcc GCC_PCIE_1_PIPE_CLK_SRC>,
- <&pcie1_lane 0>,
+ <&pcie1_lane>,
<&rpmhcc RPMH_CXO_CLK>,
<&gcc GCC_PCIE_1_AUX_CLK>,
<&gcc GCC_PCIE_1_CFG_AHB_CLK>,
@@ -1914,7 +1914,7 @@ pcie1_lane: lanes@1c0e200 {
clock-names = "pipe0";

#phy-cells = <0>;
- #clock-cells = <1>;
+ #clock-cells = <0>;
clock-output-names = "pcie_1_pipe_clk";
};
};
@@ -3467,7 +3467,7 @@ aoss_reset: reset-controller@c2a0000 {
};

aoss_qmp: power-controller@c300000 {
- compatible = "qcom,sc7280-aoss-qmp";
+ compatible = "qcom,sc7280-aoss-qmp", "qcom,aoss-qmp";
reg = <0 0x0c300000 0 0x400>;
interrupts-extended = <&ipcc IPCC_CLIENT_AOP
IPCC_MPROC_SIGNAL_GLINK_QMP
@@ -4395,9 +4395,9 @@ watchdog@17c10000 {
};

timer@17c20000 {
- #address-cells = <2>;
- #size-cells = <2>;
- ranges;
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges = <0 0 0 0x20000000>;
compatible = "arm,armv7-timer-mem";
reg = <0 0x17c20000 0 0x1000>;

@@ -4405,49 +4405,49 @@ frame@17c21000 {
frame-number = <0>;
interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 6 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17c21000 0 0x1000>,
- <0 0x17c22000 0 0x1000>;
+ reg = <0x17c21000 0x1000>,
+ <0x17c22000 0x1000>;
};

frame@17c23000 {
frame-number = <1>;
interrupts = <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17c23000 0 0x1000>;
+ reg = <0x17c23000 0x1000>;
status = "disabled";
};

frame@17c25000 {
frame-number = <2>;
interrupts = <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17c25000 0 0x1000>;
+ reg = <0x17c25000 0x1000>;
status = "disabled";
};

frame@17c27000 {
frame-number = <3>;
interrupts = <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17c27000 0 0x1000>;
+ reg = <0x17c27000 0x1000>;
status = "disabled";
};

frame@17c29000 {
frame-number = <4>;
interrupts = <GIC_SPI 12 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17c29000 0 0x1000>;
+ reg = <0x17c29000 0x1000>;
status = "disabled";
};

frame@17c2b000 {
frame-number = <5>;
interrupts = <GIC_SPI 13 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17c2b000 0 0x1000>;
+ reg = <0x17c2b000 0x1000>;
status = "disabled";
};

frame@17c2d000 {
frame-number = <6>;
interrupts = <GIC_SPI 14 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17c2d000 0 0x1000>;
+ reg = <0x17c2d000 0x1000>;
status = "disabled";
};
};
diff --git a/arch/arm64/boot/dts/qcom/sdm630.dtsi b/arch/arm64/boot/dts/qcom/sdm630.dtsi
index 240293592ef9..a33638a1cb44 100644
--- a/arch/arm64/boot/dts/qcom/sdm630.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm630.dtsi
@@ -8,6 +8,7 @@
#include <dt-bindings/clock/qcom,gpucc-sdm660.h>
#include <dt-bindings/clock/qcom,mmcc-sdm660.h>
#include <dt-bindings/clock/qcom,rpmcc.h>
+#include <dt-bindings/interconnect/qcom,sdm660.h>
#include <dt-bindings/power/qcom-rpmpd.h>
#include <dt-bindings/gpio/gpio.h>
#include <dt-bindings/interrupt-controller/arm-gic.h>
@@ -1045,11 +1046,13 @@ adreno_gpu: gpu@5000000 {
nvmem-cells = <&gpu_speed_bin>;
nvmem-cell-names = "speed_bin";

- interconnects = <&gnoc 1 &bimc 5>;
+ interconnects = <&bimc MASTER_OXILI &bimc SLAVE_EBI>;
interconnect-names = "gfx-mem";

operating-points-v2 = <&gpu_sdm630_opp_table>;

+ status = "disabled";
+
gpu_sdm630_opp_table: opp-table {
compatible = "operating-points-v2";
opp-775000000 {
@@ -1260,7 +1263,7 @@ qusb2phy: phy@c012000 {
#phy-cells = <0>;

clocks = <&gcc GCC_USB_PHY_CFG_AHB2PHY_CLK>,
- <&gcc GCC_RX1_USB2_CLKREF_CLK>;
+ <&gcc GCC_RX0_USB2_CLKREF_CLK>;
clock-names = "cfg_ahb", "ref";

resets = <&gcc GCC_QUSB2PHY_PRIM_BCR>;
diff --git a/arch/arm64/boot/dts/qcom/sdm636-sony-xperia-ganges-mermaid.dts b/arch/arm64/boot/dts/qcom/sdm636-sony-xperia-ganges-mermaid.dts
index b96da53f2f1e..58f687fc49e0 100644
--- a/arch/arm64/boot/dts/qcom/sdm636-sony-xperia-ganges-mermaid.dts
+++ b/arch/arm64/boot/dts/qcom/sdm636-sony-xperia-ganges-mermaid.dts
@@ -19,7 +19,7 @@ / {
};

&sdc2_state_on {
- pinconf-clk {
+ clk {
drive-strength = <14>;
};
};
diff --git a/arch/arm64/boot/dts/qcom/sdm845-sony-xperia-tama-akatsuki.dts b/arch/arm64/boot/dts/qcom/sdm845-sony-xperia-tama-akatsuki.dts
index 8a0d94e7f598..2f5e12deaada 100644
--- a/arch/arm64/boot/dts/qcom/sdm845-sony-xperia-tama-akatsuki.dts
+++ b/arch/arm64/boot/dts/qcom/sdm845-sony-xperia-tama-akatsuki.dts
@@ -19,8 +19,9 @@ &vreg_l14a_1p8 {
};

&vreg_l22a_2p8 {
- regulator-min-microvolt = <2700000>;
- regulator-max-microvolt = <2700000>;
+ /* Note: Round-down from 2700000 to be a multiple of PLDO step-size 8000 */
+ regulator-min-microvolt = <2696000>;
+ regulator-max-microvolt = <2696000>;
};

&vreg_l28a_2p8 {
diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index ad21cf465c98..cd0029ff8246 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -4942,9 +4942,9 @@ slimbam: dma-controller@17184000 {
};

timer@17c90000 {
- #address-cells = <2>;
- #size-cells = <2>;
- ranges;
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges = <0 0 0 0x20000000>;
compatible = "arm,armv7-timer-mem";
reg = <0 0x17c90000 0 0x1000>;

@@ -4952,49 +4952,49 @@ frame@17ca0000 {
frame-number = <0>;
interrupts = <GIC_SPI 7 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 6 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17ca0000 0 0x1000>,
- <0 0x17cb0000 0 0x1000>;
+ reg = <0x17ca0000 0x1000>,
+ <0x17cb0000 0x1000>;
};

frame@17cc0000 {
frame-number = <1>;
interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17cc0000 0 0x1000>;
+ reg = <0x17cc0000 0x1000>;
status = "disabled";
};

frame@17cd0000 {
frame-number = <2>;
interrupts = <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17cd0000 0 0x1000>;
+ reg = <0x17cd0000 0x1000>;
status = "disabled";
};

frame@17ce0000 {
frame-number = <3>;
interrupts = <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17ce0000 0 0x1000>;
+ reg = <0x17ce0000 0x1000>;
status = "disabled";
};

frame@17cf0000 {
frame-number = <4>;
interrupts = <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17cf0000 0 0x1000>;
+ reg = <0x17cf0000 0x1000>;
status = "disabled";
};

frame@17d00000 {
frame-number = <5>;
interrupts = <GIC_SPI 12 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17d00000 0 0x1000>;
+ reg = <0x17d00000 0x1000>;
status = "disabled";
};

frame@17d10000 {
frame-number = <6>;
interrupts = <GIC_SPI 13 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0 0x17d10000 0 0x1000>;
+ reg = <0x17d10000 0x1000>;
status = "disabled";
};
};
diff --git a/arch/arm64/boot/dts/qcom/sm6125-sony-xperia-seine-pdx201.dts b/arch/arm64/boot/dts/qcom/sm6125-sony-xperia-seine-pdx201.dts
index 871ccbba445b..038970c0b68e 100644
--- a/arch/arm64/boot/dts/qcom/sm6125-sony-xperia-seine-pdx201.dts
+++ b/arch/arm64/boot/dts/qcom/sm6125-sony-xperia-seine-pdx201.dts
@@ -88,11 +88,19 @@ &hsusb_phy1 {
status = "okay";
};

-&sdc2_state_off {
+&sdc2_off_state {
sd-cd {
pins = "gpio98";
+ drive-strength = <2>;
bias-disable;
+ };
+};
+
+&sdc2_on_state {
+ sd-cd {
+ pins = "gpio98";
drive-strength = <2>;
+ bias-pull-up;
};
};

@@ -102,32 +110,6 @@ &sdhc_1 {

&tlmm {
gpio-reserved-ranges = <22 2>, <28 6>;
-
- sdc2_state_on: sdc2-on {
- clk {
- pins = "sdc2_clk";
- bias-disable;
- drive-strength = <16>;
- };
-
- cmd {
- pins = "sdc2_cmd";
- bias-pull-up;
- drive-strength = <10>;
- };
-
- data {
- pins = "sdc2_data";
- bias-pull-up;
- drive-strength = <10>;
- };
-
- sd-cd {
- pins = "gpio98";
- bias-pull-up;
- drive-strength = <2>;
- };
- };
};

&usb3 {
diff --git a/arch/arm64/boot/dts/qcom/sm6125.dtsi b/arch/arm64/boot/dts/qcom/sm6125.dtsi
index e81b2a7794fb..e601b9bfdc04 100644
--- a/arch/arm64/boot/dts/qcom/sm6125.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm6125.dtsi
@@ -386,23 +386,43 @@ tlmm: pinctrl@500000 {
interrupt-controller;
#interrupt-cells = <2>;

- sdc2_state_off: sdc2-off {
+ sdc2_off_state: sdc2-off-state {
clk {
pins = "sdc2_clk";
- bias-disable;
drive-strength = <2>;
+ bias-disable;
};

cmd {
pins = "sdc2_cmd";
+ drive-strength = <2>;
bias-pull-up;
+ };
+
+ data {
+ pins = "sdc2_data";
drive-strength = <2>;
+ bias-pull-up;
+ };
+ };
+
+ sdc2_on_state: sdc2-on-state {
+ clk {
+ pins = "sdc2_clk";
+ drive-strength = <16>;
+ bias-disable;
+ };
+
+ cmd {
+ pins = "sdc2_cmd";
+ drive-strength = <10>;
+ bias-pull-up;
};

data {
pins = "sdc2_data";
+ drive-strength = <10>;
bias-pull-up;
- drive-strength = <2>;
};
};
};
@@ -470,8 +490,8 @@ sdhc_2: sdhci@4784000 {
<&xo_board>;
clock-names = "iface", "core", "xo";

- pinctrl-0 = <&sdc2_state_on>;
- pinctrl-1 = <&sdc2_state_off>;
+ pinctrl-0 = <&sdc2_on_state>;
+ pinctrl-1 = <&sdc2_off_state>;
pinctrl-names = "default", "sleep";

power-domains = <&rpmpd SM6125_VDDCX>;
diff --git a/arch/arm64/boot/dts/qcom/sm6350.dtsi b/arch/arm64/boot/dts/qcom/sm6350.dtsi
index d7c9edff19f7..c31fe27a46f2 100644
--- a/arch/arm64/boot/dts/qcom/sm6350.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm6350.dtsi
@@ -1090,57 +1090,57 @@ timer@17c20000 {
compatible = "arm,armv7-timer-mem";
reg = <0x0 0x17c20000 0x0 0x1000>;
clock-frequency = <19200000>;
- #address-cells = <2>;
- #size-cells = <2>;
- ranges;
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges = <0 0 0 0x20000000>;

frame@17c21000 {
frame-number = <0>;
interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 6 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c21000 0x0 0x1000>,
- <0x0 0x17c22000 0x0 0x1000>;
+ reg = <0x17c21000 0x1000>,
+ <0x17c22000 0x1000>;
};

frame@17c23000 {
frame-number = <1>;
interrupts = <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c23000 0x0 0x1000>;
+ reg = <0x17c23000 0x1000>;
status = "disabled";
};

frame@17c25000 {
frame-number = <2>;
interrupts = <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c25000 0x0 0x1000>;
+ reg = <0x17c25000 0x1000>;
status = "disabled";
};

frame@17c27000 {
frame-number = <3>;
interrupts = <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c27000 0x0 0x1000>;
+ reg = <0x17c27000 0x1000>;
status = "disabled";
};

frame@17c29000 {
frame-number = <4>;
interrupts = <GIC_SPI 12 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c29000 0x0 0x1000>;
+ reg = <0x17c29000 0x1000>;
status = "disabled";
};

frame@17c2b000 {
frame-number = <5>;
interrupts = <GIC_SPI 13 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c2b000 0x0 0x1000>;
+ reg = <0x17c2b000 0x1000>;
status = "disabled";
};

frame@17c2d000 {
frame-number = <6>;
interrupts = <GIC_SPI 14 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c2d000 0x0 0x1000>;
+ reg = <0x17c2d000 0x1000>;
status = "disabled";
};
};
diff --git a/arch/arm64/boot/dts/qcom/sm8150.dtsi b/arch/arm64/boot/dts/qcom/sm8150.dtsi
index 15f3bf2e7ea0..6edb3986a473 100644
--- a/arch/arm64/boot/dts/qcom/sm8150.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8150.dtsi
@@ -3382,7 +3382,7 @@ camnoc_virt: interconnect@ac00000 {
};

aoss_qmp: power-controller@c300000 {
- compatible = "qcom,sm8150-aoss-qmp";
+ compatible = "qcom,sm8150-aoss-qmp", "qcom,aoss-qmp";
reg = <0x0 0x0c300000 0x0 0x400>;
interrupts = <GIC_SPI 389 IRQ_TYPE_EDGE_RISING>;
mboxes = <&apss_shared 0>;
@@ -3608,9 +3608,9 @@ watchdog@17c10000 {
};

timer@17c20000 {
- #address-cells = <2>;
- #size-cells = <2>;
- ranges;
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges = <0 0 0 0x20000000>;
compatible = "arm,armv7-timer-mem";
reg = <0x0 0x17c20000 0x0 0x1000>;
clock-frequency = <19200000>;
@@ -3619,49 +3619,49 @@ frame@17c21000{
frame-number = <0>;
interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 6 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c21000 0x0 0x1000>,
- <0x0 0x17c22000 0x0 0x1000>;
+ reg = <0x17c21000 0x1000>,
+ <0x17c22000 0x1000>;
};

frame@17c23000 {
frame-number = <1>;
interrupts = <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c23000 0x0 0x1000>;
+ reg = <0x17c23000 0x1000>;
status = "disabled";
};

frame@17c25000 {
frame-number = <2>;
interrupts = <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c25000 0x0 0x1000>;
+ reg = <0x17c25000 0x1000>;
status = "disabled";
};

frame@17c27000 {
frame-number = <3>;
interrupts = <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c26000 0x0 0x1000>;
+ reg = <0x17c26000 0x1000>;
status = "disabled";
};

frame@17c29000 {
frame-number = <4>;
interrupts = <GIC_SPI 12 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c29000 0x0 0x1000>;
+ reg = <0x17c29000 0x1000>;
status = "disabled";
};

frame@17c2b000 {
frame-number = <5>;
interrupts = <GIC_SPI 13 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c2b000 0x0 0x1000>;
+ reg = <0x17c2b000 0x1000>;
status = "disabled";
};

frame@17c2d000 {
frame-number = <6>;
interrupts = <GIC_SPI 14 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c2d000 0x0 0x1000>;
+ reg = <0x17c2d000 0x1000>;
status = "disabled";
};
};
diff --git a/arch/arm64/boot/dts/qcom/sm8250.dtsi b/arch/arm64/boot/dts/qcom/sm8250.dtsi
index 1304b86af1a0..9afd8b3da3a1 100644
--- a/arch/arm64/boot/dts/qcom/sm8250.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8250.dtsi
@@ -1883,6 +1883,8 @@ pcie0_lane: phy@1c06200 {
clock-names = "pipe0";

#phy-cells = <0>;
+
+ #clock-cells = <0>;
clock-output-names = "pcie_0_pipe_clk";
};
};
@@ -1989,6 +1991,8 @@ pcie1_lane: phy@1c0e200 {
clock-names = "pipe0";

#phy-cells = <0>;
+
+ #clock-cells = <0>;
clock-output-names = "pcie_1_pipe_clk";
};
};
@@ -2095,6 +2099,8 @@ pcie2_lane: phy@1c16200 {
clock-names = "pipe0";

#phy-cells = <0>;
+
+ #clock-cells = <0>;
clock-output-names = "pcie_2_pipe_clk";
};
};
@@ -3475,7 +3481,7 @@ tsens1: thermal-sensor@c265000 {
};

aoss_qmp: power-controller@c300000 {
- compatible = "qcom,sm8250-aoss-qmp";
+ compatible = "qcom,sm8250-aoss-qmp", "qcom,aoss-qmp";
reg = <0 0x0c300000 0 0x400>;
interrupts-extended = <&ipcc IPCC_CLIENT_AOP
IPCC_MPROC_SIGNAL_GLINK_QMP
@@ -4528,9 +4534,9 @@ watchdog@17c10000 {
};

timer@17c20000 {
- #address-cells = <2>;
- #size-cells = <2>;
- ranges;
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges = <0 0 0 0x20000000>;
compatible = "arm,armv7-timer-mem";
reg = <0x0 0x17c20000 0x0 0x1000>;
clock-frequency = <19200000>;
@@ -4539,49 +4545,49 @@ frame@17c21000 {
frame-number = <0>;
interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 6 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c21000 0x0 0x1000>,
- <0x0 0x17c22000 0x0 0x1000>;
+ reg = <0x17c21000 0x1000>,
+ <0x17c22000 0x1000>;
};

frame@17c23000 {
frame-number = <1>;
interrupts = <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c23000 0x0 0x1000>;
+ reg = <0x17c23000 0x1000>;
status = "disabled";
};

frame@17c25000 {
frame-number = <2>;
interrupts = <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c25000 0x0 0x1000>;
+ reg = <0x17c25000 0x1000>;
status = "disabled";
};

frame@17c27000 {
frame-number = <3>;
interrupts = <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c27000 0x0 0x1000>;
+ reg = <0x17c27000 0x1000>;
status = "disabled";
};

frame@17c29000 {
frame-number = <4>;
interrupts = <GIC_SPI 12 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c29000 0x0 0x1000>;
+ reg = <0x17c29000 0x1000>;
status = "disabled";
};

frame@17c2b000 {
frame-number = <5>;
interrupts = <GIC_SPI 13 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c2b000 0x0 0x1000>;
+ reg = <0x17c2b000 0x1000>;
status = "disabled";
};

frame@17c2d000 {
frame-number = <6>;
interrupts = <GIC_SPI 14 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c2d000 0x0 0x1000>;
+ reg = <0x17c2d000 0x1000>;
status = "disabled";
};
};
diff --git a/arch/arm64/boot/dts/qcom/sm8350.dtsi b/arch/arm64/boot/dts/qcom/sm8350.dtsi
index 20f850b94158..ee6c202ab68c 100644
--- a/arch/arm64/boot/dts/qcom/sm8350.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8350.dtsi
@@ -1537,7 +1537,7 @@ tsens1: thermal-sensor@c265000 {
};

aoss_qmp: power-controller@c300000 {
- compatible = "qcom,sm8350-aoss-qmp";
+ compatible = "qcom,sm8350-aoss-qmp", "qcom,aoss-qmp";
reg = <0 0x0c300000 0 0x400>;
interrupts-extended = <&ipcc IPCC_CLIENT_AOP IPCC_MPROC_SIGNAL_GLINK_QMP
IRQ_TYPE_EDGE_RISING>;
@@ -1752,9 +1752,9 @@ intc: interrupt-controller@17a00000 {

timer@17c20000 {
compatible = "arm,armv7-timer-mem";
- #address-cells = <2>;
- #size-cells = <2>;
- ranges;
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges = <0 0 0 0x20000000>;
reg = <0x0 0x17c20000 0x0 0x1000>;
clock-frequency = <19200000>;

@@ -1762,49 +1762,49 @@ frame@17c21000 {
frame-number = <0>;
interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 6 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c21000 0x0 0x1000>,
- <0x0 0x17c22000 0x0 0x1000>;
+ reg = <0x17c21000 0x1000>,
+ <0x17c22000 0x1000>;
};

frame@17c23000 {
frame-number = <1>;
interrupts = <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c23000 0x0 0x1000>;
+ reg = <0x17c23000 0x1000>;
status = "disabled";
};

frame@17c25000 {
frame-number = <2>;
interrupts = <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c25000 0x0 0x1000>;
+ reg = <0x17c25000 0x1000>;
status = "disabled";
};

frame@17c27000 {
frame-number = <3>;
interrupts = <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c27000 0x0 0x1000>;
+ reg = <0x17c27000 0x1000>;
status = "disabled";
};

frame@17c29000 {
frame-number = <4>;
interrupts = <GIC_SPI 12 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c29000 0x0 0x1000>;
+ reg = <0x17c29000 0x1000>;
status = "disabled";
};

frame@17c2b000 {
frame-number = <5>;
interrupts = <GIC_SPI 13 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c2b000 0x0 0x1000>;
+ reg = <0x17c2b000 0x1000>;
status = "disabled";
};

frame@17c2d000 {
frame-number = <6>;
interrupts = <GIC_SPI 14 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17c2d000 0x0 0x1000>;
+ reg = <0x17c2d000 0x1000>;
status = "disabled";
};
};
diff --git a/arch/arm64/boot/dts/qcom/sm8450.dtsi b/arch/arm64/boot/dts/qcom/sm8450.dtsi
index 7a14eb89e4ca..e5fe694b7be6 100644
--- a/arch/arm64/boot/dts/qcom/sm8450.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8450.dtsi
@@ -1203,9 +1203,9 @@ intc: interrupt-controller@17100000 {

timer@17420000 {
compatible = "arm,armv7-timer-mem";
- #address-cells = <2>;
- #size-cells = <2>;
- ranges;
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges = <0 0 0 0x20000000>;
reg = <0x0 0x17420000 0x0 0x1000>;
clock-frequency = <19200000>;

@@ -1213,49 +1213,49 @@ frame@17421000 {
frame-number = <0>;
interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 6 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17421000 0x0 0x1000>,
- <0x0 0x17422000 0x0 0x1000>;
+ reg = <0x17421000 0x1000>,
+ <0x17422000 0x1000>;
};

frame@17423000 {
frame-number = <1>;
interrupts = <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17423000 0x0 0x1000>;
+ reg = <0x17423000 0x1000>;
status = "disabled";
};

frame@17425000 {
frame-number = <2>;
interrupts = <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17425000 0x0 0x1000>;
+ reg = <0x17425000 0x1000>;
status = "disabled";
};

frame@17427000 {
frame-number = <3>;
interrupts = <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17427000 0x0 0x1000>;
+ reg = <0x17427000 0x1000>;
status = "disabled";
};

frame@17429000 {
frame-number = <4>;
interrupts = <GIC_SPI 12 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x17429000 0x0 0x1000>;
+ reg = <0x17429000 0x1000>;
status = "disabled";
};

frame@1742b000 {
frame-number = <5>;
interrupts = <GIC_SPI 13 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x1742b000 0x0 0x1000>;
+ reg = <0x1742b000 0x1000>;
status = "disabled";
};

frame@1742d000 {
frame-number = <6>;
interrupts = <GIC_SPI 14 IRQ_TYPE_LEVEL_HIGH>;
- reg = <0x0 0x1742d000 0x0 0x1000>;
+ reg = <0x1742d000 0x1000>;
status = "disabled";
};
};
diff --git a/arch/arm64/boot/dts/renesas/beacon-renesom-baseboard.dtsi b/arch/arm64/boot/dts/renesas/beacon-renesom-baseboard.dtsi
index 5ad6cd1864c1..c289d2bb5122 100644
--- a/arch/arm64/boot/dts/renesas/beacon-renesom-baseboard.dtsi
+++ b/arch/arm64/boot/dts/renesas/beacon-renesom-baseboard.dtsi
@@ -146,7 +146,7 @@ rgb_panel: endpoint {
};
};

- reg_audio: regulator_audio {
+ reg_audio: regulator-audio {
compatible = "regulator-fixed";
regulator-name = "audio-1.8V";
regulator-min-microvolt = <1800000>;
@@ -174,7 +174,7 @@ reg_lcd_reset: regulator-lcd-reset {
vin-supply = <&reg_lcd>;
};

- reg_cam0: regulator_camera {
+ reg_cam0: regulator-cam0 {
compatible = "regulator-fixed";
regulator-name = "reg_cam0";
regulator-min-microvolt = <1800000>;
@@ -183,7 +183,7 @@ reg_cam0: regulator_camera {
enable-active-high;
};

- reg_cam1: regulator_camera {
+ reg_cam1: regulator-cam1 {
compatible = "regulator-fixed";
regulator-name = "reg_cam1";
regulator-min-microvolt = <1800000>;
diff --git a/arch/arm64/boot/dts/renesas/r8a774c0.dtsi b/arch/arm64/boot/dts/renesas/r8a774c0.dtsi
index e123c8d1bab9..337efbeda5a9 100644
--- a/arch/arm64/boot/dts/renesas/r8a774c0.dtsi
+++ b/arch/arm64/boot/dts/renesas/r8a774c0.dtsi
@@ -1956,7 +1956,7 @@ thermal-zones {
cpu-thermal {
polling-delay-passive = <250>;
polling-delay = <0>;
- thermal-sensors = <&thermal 0>;
+ thermal-sensors = <&thermal>;
sustainable-power = <717>;

cooling-maps {
diff --git a/arch/arm64/boot/dts/renesas/r8a77990.dtsi b/arch/arm64/boot/dts/renesas/r8a77990.dtsi
index 7e0f1aab2135..155589f67e1b 100644
--- a/arch/arm64/boot/dts/renesas/r8a77990.dtsi
+++ b/arch/arm64/boot/dts/renesas/r8a77990.dtsi
@@ -2117,7 +2117,7 @@ thermal-zones {
cpu-thermal {
polling-delay-passive = <250>;
polling-delay = <0>;
- thermal-sensors = <&thermal 0>;
+ thermal-sensors = <&thermal>;
sustainable-power = <717>;

cooling-maps {
diff --git a/arch/arm64/boot/dts/renesas/r8a779m8.dtsi b/arch/arm64/boot/dts/renesas/r8a779m8.dtsi
index 752440b0c40f..750bd8ccdb7f 100644
--- a/arch/arm64/boot/dts/renesas/r8a779m8.dtsi
+++ b/arch/arm64/boot/dts/renesas/r8a779m8.dtsi
@@ -10,3 +10,8 @@
/ {
compatible = "renesas,r8a779m8", "renesas,r8a7795";
};
+
+&cluster0_opp {
+ /delete-node/ opp-1600000000;
+ /delete-node/ opp-1700000000;
+};
diff --git a/arch/arm64/boot/dts/renesas/r9a07g054l2-smarc.dts b/arch/arm64/boot/dts/renesas/r9a07g054l2-smarc.dts
index fc334b4c2aa4..d2848cbfc6c0 100644
--- a/arch/arm64/boot/dts/renesas/r9a07g054l2-smarc.dts
+++ b/arch/arm64/boot/dts/renesas/r9a07g054l2-smarc.dts
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
/*
- * Device Tree Source for the RZ/G2L SMARC EVK board
+ * Device Tree Source for the RZ/V2L SMARC EVK board
*
* Copyright (C) 2021 Renesas Electronics Corp.
*/
diff --git a/arch/arm64/boot/dts/socionext/uniphier-pxs3.dtsi b/arch/arm64/boot/dts/socionext/uniphier-pxs3.dtsi
index be97da132258..ba75adedbf79 100644
--- a/arch/arm64/boot/dts/socionext/uniphier-pxs3.dtsi
+++ b/arch/arm64/boot/dts/socionext/uniphier-pxs3.dtsi
@@ -599,8 +599,8 @@ usb0: usb@65a00000 {
compatible = "socionext,uniphier-dwc3", "snps,dwc3";
status = "disabled";
reg = <0x65a00000 0xcd00>;
- interrupt-names = "host", "peripheral";
- interrupts = <0 134 4>, <0 135 4>;
+ interrupt-names = "dwc_usb3";
+ interrupts = <0 134 4>;
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_usb0>, <&pinctrl_usb2>;
clock-names = "ref", "bus_early", "suspend";
@@ -701,8 +701,8 @@ usb1: usb@65c00000 {
compatible = "socionext,uniphier-dwc3", "snps,dwc3";
status = "disabled";
reg = <0x65c00000 0xcd00>;
- interrupt-names = "host", "peripheral";
- interrupts = <0 137 4>, <0 138 4>;
+ interrupt-names = "dwc_usb3";
+ interrupts = <0 137 4>;
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_usb1>, <&pinctrl_usb3>;
clock-names = "ref", "bus_early", "suspend";
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 2a965aa0188d..bdca34283c06 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -59,6 +59,7 @@ config CRYPTO_GHASH_ARM64_CE
select CRYPTO_HASH
select CRYPTO_GF128MUL
select CRYPTO_LIB_AES
+ select CRYPTO_AEAD

config CRYPTO_CRCT10DIF_ARM64_CE
tristate "CRCT10DIF digest algorithm using PMULL instructions"
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index d52a0b269ee8..65c2201b11b2 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -133,7 +133,8 @@
#define ESR_ELx_CV (UL(1) << 24)
#define ESR_ELx_COND_SHIFT (20)
#define ESR_ELx_COND_MASK (UL(0xF) << ESR_ELx_COND_SHIFT)
-#define ESR_ELx_WFx_ISS_TI (UL(1) << 0)
+#define ESR_ELx_WFx_ISS_TI (UL(3) << 0)
+#define ESR_ELx_WFx_ISS_WFxT (UL(2) << 0)
#define ESR_ELx_WFx_ISS_WFI (UL(0) << 0)
#define ESR_ELx_WFx_ISS_WFE (UL(1) << 0)
#define ESR_ELx_xVC_IMM_MASK ((1UL << 16) - 1)
@@ -146,7 +147,8 @@
#define DISR_EL1_ESR_MASK (ESR_ELx_AET | ESR_ELx_EA | ESR_ELx_FSC)

/* ESR value templates for specific events */
-#define ESR_ELx_WFx_MASK (ESR_ELx_EC_MASK | ESR_ELx_WFx_ISS_TI)
+#define ESR_ELx_WFx_MASK (ESR_ELx_EC_MASK | \
+ (ESR_ELx_WFx_ISS_TI & ~ESR_ELx_WFx_ISS_WFxT))
#define ESR_ELx_WFx_WFI_VAL ((ESR_ELx_EC_WFx << ESR_ELx_EC_SHIFT) | \
ESR_ELx_WFx_ISS_WFI)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 9839bfc163d7..78d272b26ebd 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -115,7 +115,9 @@ extern const struct kexec_file_ops kexec_image_ops;

struct kimage;

-extern int arch_kimage_file_post_load_cleanup(struct kimage *image);
+int arch_kimage_file_post_load_cleanup(struct kimage *image);
+#define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup
+
extern int load_other_segments(struct kimage *image,
unsigned long kernel_load_addr, unsigned long kernel_size,
char *initrd, unsigned long initrd_len,
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 6b1a12c23fe7..7058bf047fa5 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -250,8 +250,9 @@ void tls_preserve_current_state(void);

static inline void start_thread_common(struct pt_regs *regs, unsigned long pc)
{
+ s32 previous_syscall = regs->syscallno;
memset(regs, 0, sizeof(*regs));
- forget_syscall(regs);
+ regs->syscallno = previous_syscall;
regs->pc = pc;

if (system_uses_irq_prio_masking())
diff --git a/arch/arm64/kernel/armv8_deprecated.c b/arch/arm64/kernel/armv8_deprecated.c
index 6875a16b09d2..fb0e7c7b2e20 100644
--- a/arch/arm64/kernel/armv8_deprecated.c
+++ b/arch/arm64/kernel/armv8_deprecated.c
@@ -59,6 +59,7 @@ struct insn_emulation {
static LIST_HEAD(insn_emulation);
static int nr_insn_emulated __initdata;
static DEFINE_RAW_SPINLOCK(insn_emulation_lock);
+static DEFINE_MUTEX(insn_emulation_mutex);

static void register_emulation_hooks(struct insn_emulation_ops *ops)
{
@@ -207,10 +208,10 @@ static int emulation_proc_handler(struct ctl_table *table, int write,
loff_t *ppos)
{
int ret = 0;
- struct insn_emulation *insn = (struct insn_emulation *) table->data;
+ struct insn_emulation *insn = container_of(table->data, struct insn_emulation, current_mode);
enum insn_emulation_mode prev_mode = insn->current_mode;

- table->data = &insn->current_mode;
+ mutex_lock(&insn_emulation_mutex);
ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);

if (ret || !write || prev_mode == insn->current_mode)
@@ -223,7 +224,7 @@ static int emulation_proc_handler(struct ctl_table *table, int write,
update_insn_emulation_mode(insn, INSN_UNDEF);
}
ret:
- table->data = insn;
+ mutex_unlock(&insn_emulation_mutex);
return ret;
}

@@ -247,7 +248,7 @@ static void __init register_insn_emulation_sysctl(void)
sysctl->maxlen = sizeof(int);

sysctl->procname = insn->ops->name;
- sysctl->data = insn;
+ sysctl->data = &insn->current_mode;
sysctl->extra1 = &insn->min;
sysctl->extra2 = &insn->max;
sysctl->proc_handler = emulation_proc_handler;
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index a0f3d0aaa3c5..27baa41c46ca 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -395,6 +395,14 @@ static struct midr_range trbe_write_out_of_range_cpus[] = {
};
#endif /* CONFIG_ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE */

+#ifdef CONFIG_ARM64_ERRATUM_1742098
+static struct midr_range broken_aarch32_aes[] = {
+ MIDR_RANGE(MIDR_CORTEX_A57, 0, 1, 0xf, 0xf),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A72),
+ {},
+};
+#endif /* CONFIG_ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE */
+
const struct arm64_cpu_capabilities arm64_errata[] = {
#ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE
{
@@ -657,6 +665,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
/* Cortex-A510 r0p0 - r0p1 */
ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A510, 0, 0, 1)
},
+#endif
+#ifdef CONFIG_ARM64_ERRATUM_1742098
+ {
+ .desc = "ARM erratum 1742098",
+ .capability = ARM64_WORKAROUND_1742098,
+ CAP_MIDR_RANGE_LIST(broken_aarch32_aes),
+ .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM,
+ },
#endif
{
}
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 2cb9cc9e0eff..a54f34aa36e0 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -79,6 +79,7 @@
#include <asm/cpufeature.h>
#include <asm/cpu_ops.h>
#include <asm/fpsimd.h>
+#include <asm/hwcap.h>
#include <asm/insn.h>
#include <asm/kvm_host.h>
#include <asm/mmu_context.h>
@@ -540,7 +541,7 @@ static const struct arm64_ftr_bits ftr_id_pfr2[] = {

static const struct arm64_ftr_bits ftr_id_dfr0[] = {
/* [31:28] TraceFilt */
- S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_DFR0_PERFMON_SHIFT, 4, 0xf),
+ S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, ID_DFR0_PERFMON_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_DFR0_MPROFDBG_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_DFR0_MMAPTRC_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_DFR0_COPTRC_SHIFT, 4, 0),
@@ -1921,6 +1922,14 @@ static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
}
#endif /* CONFIG_ARM64_MTE */

+static void elf_hwcap_fixup(void)
+{
+#ifdef CONFIG_ARM64_ERRATUM_1742098
+ if (cpus_have_const_cap(ARM64_WORKAROUND_1742098))
+ compat_elf_hwcap2 &= ~COMPAT_HWCAP2_AES;
+#endif /* ARM64_ERRATUM_1742098 */
+}
+
#ifdef CONFIG_KVM
static bool is_kvm_protected_mode(const struct arm64_cpu_capabilities *entry, int __unused)
{
@@ -3033,8 +3042,10 @@ void __init setup_cpu_features(void)
setup_system_capabilities();
setup_elf_hwcaps(arm64_elf_hwcaps);

- if (system_supports_32bit_el0())
+ if (system_supports_32bit_el0()) {
setup_elf_hwcaps(compat_elf_hwcaps);
+ elf_hwcap_fixup();
+ }

if (system_uses_ttbr0_pan())
pr_info("emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching\n");
@@ -3086,6 +3097,7 @@ static int enable_mismatched_32bit_el0(unsigned int cpu)
cpu_active_mask);
get_cpu_device(lucky_winner)->offline_disabled = true;
setup_elf_hwcaps(compat_elf_hwcaps);
+ elf_hwcap_fixup();
pr_info("Asymmetric 32-bit EL0 support detected on CPU %u; CPU hot-unplug disabled on CPU %u\n",
cpu, lucky_winner);
return 0;
diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 6328308be272..7754ef328657 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -300,11 +300,6 @@ static void swsusp_mte_restore_tags(void)
unsigned long pfn = xa_state.xa_index;
struct page *page = pfn_to_online_page(pfn);

- /*
- * It is not required to invoke page_kasan_tag_reset(page)
- * at this point since the tags stored in page->flags are
- * already restored.
- */
mte_restore_page_tags(page_address(page), tags);

mte_free_tag_storage(tags);
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index d502703e8373..d565ae25e48f 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -47,15 +47,6 @@ static void mte_sync_page_tags(struct page *page, pte_t old_pte,
if (!pte_is_tagged)
return;

- page_kasan_tag_reset(page);
- /*
- * We need smp_wmb() in between setting the flags and clearing the
- * tags because if another thread reads page->flags and builds a
- * tagged address out of it, there is an actual dependency to the
- * memory access, but on the current thread we do not guarantee that
- * the new page->flags are visible before the tags were updated.
- */
- smp_wmb();
mte_clear_page_tags(page_address(page));
}

diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 6410d21d8695..858e5be48791 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -371,5 +371,5 @@ void __noreturn hyp_panic(void)

asmlinkage void kvm_unexpected_el2_exception(void)
{
- return __kvm_unexpected_el2_exception();
+ __kvm_unexpected_el2_exception();
}
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 262dfe03134d..51ea9ac71210 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -240,5 +240,5 @@ void __noreturn hyp_panic(void)

asmlinkage void kvm_unexpected_el2_exception(void)
{
- return __kvm_unexpected_el2_exception();
+ __kvm_unexpected_el2_exception();
}
diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c
index 0dea80bf6de4..24913271e898 100644
--- a/arch/arm64/mm/copypage.c
+++ b/arch/arm64/mm/copypage.c
@@ -23,15 +23,6 @@ void copy_highpage(struct page *to, struct page *from)

if (system_supports_mte() && test_bit(PG_mte_tagged, &from->flags)) {
set_bit(PG_mte_tagged, &to->flags);
- page_kasan_tag_reset(to);
- /*
- * We need smp_wmb() in between setting the flags and clearing the
- * tags because if another thread reads page->flags and builds a
- * tagged address out of it, there is an actual dependency to the
- * memory access, but on the current thread we do not guarantee that
- * the new page->flags are visible before the tags were updated.
- */
- smp_wmb();
mte_copy_page_tags(kto, kfrom);
}
}
diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c
index a9e50e930484..4334dec93bd4 100644
--- a/arch/arm64/mm/mteswap.c
+++ b/arch/arm64/mm/mteswap.c
@@ -53,15 +53,6 @@ bool mte_restore_tags(swp_entry_t entry, struct page *page)
if (!tags)
return false;

- page_kasan_tag_reset(page);
- /*
- * We need smp_wmb() in between setting the flags and clearing the
- * tags because if another thread reads page->flags and builds a
- * tagged address out of it, there is an actual dependency to the
- * memory access, but on the current thread we do not guarantee that
- * the new page->flags are visible before the tags were updated.
- */
- smp_wmb();
mte_restore_page_tags(page_address(page), tags);

return true;
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 3ed418f70e3b..8cd6088f8875 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -58,6 +58,7 @@ WORKAROUND_1418040
WORKAROUND_1463225
WORKAROUND_1508412
WORKAROUND_1542419
+WORKAROUND_1742098
WORKAROUND_1902691
WORKAROUND_2038923
WORKAROUND_2064142
diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 7cbce290f4e5..757c2f6d8d4b 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -538,7 +538,7 @@ ia64_get_irr(unsigned int vector)
{
unsigned int reg = vector / 64;
unsigned int bit = vector % 64;
- u64 irr;
+ unsigned long irr;

switch (reg) {
case 0: irr = ia64_getreg(_IA64_REG_CR_IRR0); break;
diff --git a/arch/ia64/kernel/palinfo.c b/arch/ia64/kernel/palinfo.c
index 64189f04c1a4..b9ae093bfe37 100644
--- a/arch/ia64/kernel/palinfo.c
+++ b/arch/ia64/kernel/palinfo.c
@@ -120,7 +120,7 @@ static const char *mem_attrib[]={
* Input:
* - a pointer to a buffer to hold the string
* - a 64-bit vector
- * Ouput:
+ * Output:
* - a pointer to the end of the buffer
*
*/
diff --git a/arch/ia64/kernel/traps.c b/arch/ia64/kernel/traps.c
index 753642366e12..53735b1d1be3 100644
--- a/arch/ia64/kernel/traps.c
+++ b/arch/ia64/kernel/traps.c
@@ -309,7 +309,7 @@ handle_fpu_swa (int fp_fault, struct pt_regs *regs, unsigned long isr)
/*
* Lower 4 bits are used as a count. Upper bits are a sequence
* number that is updated when count is reset. The cmpxchg will
- * fail is seqno has changed. This minimizes mutiple cpus
+ * fail is seqno has changed. This minimizes multiple cpus
* resetting the count.
*/
if (current_jiffies > last.time)
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 5d165607bf35..7ae1244ed8ec 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -451,7 +451,7 @@ mem_init (void)
memblock_free_all();

/*
- * For fsyscall entrpoints with no light-weight handler, use the ordinary
+ * For fsyscall entrypoints with no light-weight handler, use the ordinary
* (heavy-weight) handler, but mark it by setting bit 0, so the fsyscall entry
* code can tell them apart.
*/
diff --git a/arch/ia64/mm/tlb.c b/arch/ia64/mm/tlb.c
index 135b5135cace..ca060e7a2a46 100644
--- a/arch/ia64/mm/tlb.c
+++ b/arch/ia64/mm/tlb.c
@@ -174,7 +174,7 @@ __setup("nptcg=", set_nptcg);
* override table (in which case we should ignore the value from
* PAL_VM_SUMMARY).
*
- * Kernel parameter "nptcg=" overrides maximum number of simultanesous ptc.g
+ * Kernel parameter "nptcg=" overrides maximum number of simultaneous ptc.g
* purges defined in either PAL_VM_SUMMARY or PAL override table. In this case,
* we should ignore the value from either PAL_VM_SUMMARY or PAL override table.
*
@@ -516,7 +516,7 @@ int ia64_itr_entry(u64 target_mask, u64 va, u64 pte, u64 log_size)
if (i >= per_cpu(ia64_tr_num, cpu))
return -EBUSY;

- /*Record tr info for mca hander use!*/
+ /*Record tr info for mca handler use!*/
if (i > per_cpu(ia64_tr_used, cpu))
per_cpu(ia64_tr_used, cpu) = i;

diff --git a/arch/mips/kernel/proc.c b/arch/mips/kernel/proc.c
index bb43bf850314..8eba5a1ed664 100644
--- a/arch/mips/kernel/proc.c
+++ b/arch/mips/kernel/proc.c
@@ -311,7 +311,7 @@ static void *c_start(struct seq_file *m, loff_t *pos)
{
unsigned long i = *pos;

- return i < NR_CPUS ? (void *) (i + 1) : NULL;
+ return i < nr_cpu_ids ? (void *) (i + 1) : NULL;
}

static void *c_next(struct seq_file *m, void *v, loff_t *pos)
diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
index 3d0cf471f2fe..b2cc2c2dd4bf 100644
--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -159,7 +159,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
/* Map GIC user page. */
if (gic_size) {
gic_base = (unsigned long)mips_gic_base + MIPS_GIC_USER_OFS;
- gic_pfn = virt_to_phys((void *)gic_base) >> PAGE_SHIFT;
+ gic_pfn = PFN_DOWN(__pa(gic_base));

ret = io_remap_pfn_range(vma, base, gic_pfn, gic_size,
pgprot_noncached(vma->vm_page_prot));
diff --git a/arch/mips/loongson64/numa.c b/arch/mips/loongson64/numa.c
index 69a533148efd..8f61e93c0c5b 100644
--- a/arch/mips/loongson64/numa.c
+++ b/arch/mips/loongson64/numa.c
@@ -196,7 +196,6 @@ void __init prom_init_numa_memory(void)
pr_info("CP0_PageGrain: CP0 5.1 (0x%x)\n", read_c0_pagegrain());
prom_meminit();
}
-EXPORT_SYMBOL(prom_init_numa_memory);

pg_data_t * __init arch_alloc_nodedata(int nid)
{
diff --git a/arch/mips/mm/physaddr.c b/arch/mips/mm/physaddr.c
index a1ced5e44951..f9b8c85e9843 100644
--- a/arch/mips/mm/physaddr.c
+++ b/arch/mips/mm/physaddr.c
@@ -5,6 +5,7 @@
#include <linux/mmdebug.h>
#include <linux/mm.h>

+#include <asm/addrspace.h>
#include <asm/sections.h>
#include <asm/io.h>
#include <asm/page.h>
@@ -12,15 +13,6 @@

static inline bool __debug_virt_addr_valid(unsigned long x)
{
- /* high_memory does not get immediately defined, and there
- * are early callers of __pa() against PAGE_OFFSET
- */
- if (!high_memory && x >= PAGE_OFFSET)
- return true;
-
- if (high_memory && x >= PAGE_OFFSET && x < (unsigned long)high_memory)
- return true;
-
/*
* MAX_DMA_ADDRESS is a virtual address that may not correspond to an
* actual physical address. Enough code relies on
@@ -30,7 +22,9 @@ static inline bool __debug_virt_addr_valid(unsigned long x)
if (x == MAX_DMA_ADDRESS)
return true;

- return false;
+ return x >= PAGE_OFFSET && (KSEGX(x) < KSEG2 ||
+ IS_ENABLED(CONFIG_EVA) ||
+ !IS_ENABLED(CONFIG_HIGHMEM));
}

phys_addr_t __virt_to_phys(volatile const void *x)
diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
index a20c1c47b780..5930ad5a3ad3 100644
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -50,9 +50,6 @@ void flush_instruction_cache_local(void); /* flushes local code-cache only */
*/
DEFINE_SPINLOCK(pa_tlb_flush_lock);

-/* Swapper page setup lock. */
-DEFINE_SPINLOCK(pa_swapper_pg_lock);
-
#if defined(CONFIG_64BIT) && defined(CONFIG_SMP)
int pa_serialize_tlb_flushes __ro_after_init;
#endif
diff --git a/arch/parisc/kernel/drivers.c b/arch/parisc/kernel/drivers.c
index 776d624a7207..d126e78e101a 100644
--- a/arch/parisc/kernel/drivers.c
+++ b/arch/parisc/kernel/drivers.c
@@ -520,7 +520,6 @@ alloc_pa_dev(unsigned long hpa, struct hardware_path *mod_path)
dev->id.hversion_rev = iodc_data[1] & 0x0f;
dev->id.sversion = ((iodc_data[4] & 0x0f) << 16) |
(iodc_data[5] << 8) | iodc_data[6];
- dev->hpa.name = parisc_pathname(dev);
dev->hpa.start = hpa;
/* This is awkward. The STI spec says that gfx devices may occupy
* 32MB or 64MB. Unfortunately, we don't know how to tell whether
@@ -534,10 +533,10 @@ alloc_pa_dev(unsigned long hpa, struct hardware_path *mod_path)
dev->hpa.end = hpa + 0xfff;
}
dev->hpa.flags = IORESOURCE_MEM;
- name = parisc_hardware_description(&dev->id);
- if (name) {
- strlcpy(dev->name, name, sizeof(dev->name));
- }
+ dev->hpa.name = dev->name;
+ name = parisc_hardware_description(&dev->id) ? : "unknown";
+ snprintf(dev->name, sizeof(dev->name), "%s [%s]",
+ name, parisc_pathname(dev));

/* Silently fail things like mouse ports which are subsumed within
* the keyboard controller
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 68b46fe2f17c..8a99c998da9b 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -413,7 +413,7 @@
412 32 utimensat_time64 sys_utimensat sys_utimensat
413 32 pselect6_time64 sys_pselect6 compat_sys_pselect6_time64
414 32 ppoll_time64 sys_ppoll compat_sys_ppoll_time64
-416 32 io_pgetevents_time64 sys_io_pgetevents sys_io_pgetevents
+416 32 io_pgetevents_time64 sys_io_pgetevents compat_sys_io_pgetevents_time64
417 32 recvmmsg_time64 sys_recvmmsg compat_sys_recvmmsg_time64
418 32 mq_timedsend_time64 sys_mq_timedsend sys_mq_timedsend
419 32 mq_timedreceive_time64 sys_mq_timedreceive sys_mq_timedreceive
diff --git a/arch/powerpc/boot/cuboot-hotfoot.c b/arch/powerpc/boot/cuboot-hotfoot.c
index 888a6b9bfead..0e5532f855d6 100644
--- a/arch/powerpc/boot/cuboot-hotfoot.c
+++ b/arch/powerpc/boot/cuboot-hotfoot.c
@@ -70,7 +70,7 @@ static void hotfoot_fixups(void)

printf("Fixing devtree for 4M Flash\n");

- /* First fix up the base addresse */
+ /* First fix up the base address */
getprop(devp, "reg", regs, sizeof(regs));
regs[0] = 0;
regs[1] = 0xffc00000;
diff --git a/arch/powerpc/configs/44x/akebono_defconfig b/arch/powerpc/configs/44x/akebono_defconfig
index 4bc549c6edc5..fde4824f235e 100644
--- a/arch/powerpc/configs/44x/akebono_defconfig
+++ b/arch/powerpc/configs/44x/akebono_defconfig
@@ -118,7 +118,7 @@ CONFIG_CRAMFS=y
CONFIG_NLS_DEFAULT="n"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_XMON=y
diff --git a/arch/powerpc/configs/44x/currituck_defconfig b/arch/powerpc/configs/44x/currituck_defconfig
index 717827219921..7283b7d4a1a5 100644
--- a/arch/powerpc/configs/44x/currituck_defconfig
+++ b/arch/powerpc/configs/44x/currituck_defconfig
@@ -73,7 +73,7 @@ CONFIG_NFS_FS=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
CONFIG_NLS_DEFAULT="n"
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_XMON=y
diff --git a/arch/powerpc/configs/44x/fsp2_defconfig b/arch/powerpc/configs/44x/fsp2_defconfig
index 8da316e61a08..3fdfbb29b854 100644
--- a/arch/powerpc/configs/44x/fsp2_defconfig
+++ b/arch/powerpc/configs/44x/fsp2_defconfig
@@ -110,7 +110,7 @@ CONFIG_XZ_DEC=y
CONFIG_PRINTK_TIME=y
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=3
CONFIG_DYNAMIC_DEBUG=y
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_CRYPTO_CBC=y
diff --git a/arch/powerpc/configs/44x/iss476-smp_defconfig b/arch/powerpc/configs/44x/iss476-smp_defconfig
index c11e777b2f3d..0f6380e1e612 100644
--- a/arch/powerpc/configs/44x/iss476-smp_defconfig
+++ b/arch/powerpc/configs/44x/iss476-smp_defconfig
@@ -56,7 +56,7 @@ CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_CRAMFS=y
# CONFIG_NETWORK_FILESYSTEMS is not set
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_PPC_EARLY_DEBUG=y
diff --git a/arch/powerpc/configs/44x/warp_defconfig b/arch/powerpc/configs/44x/warp_defconfig
index 47252c2d7669..20891c413149 100644
--- a/arch/powerpc/configs/44x/warp_defconfig
+++ b/arch/powerpc/configs/44x/warp_defconfig
@@ -88,7 +88,7 @@ CONFIG_NLS_UTF8=y
CONFIG_CRC_CCITT=y
CONFIG_CRC_T10DIF=y
CONFIG_PRINTK_TIME=y
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_DEBUG_FS=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DETECT_HUNG_TASK=y
diff --git a/arch/powerpc/configs/52xx/lite5200b_defconfig b/arch/powerpc/configs/52xx/lite5200b_defconfig
index 63368e677506..7db479dcbc0c 100644
--- a/arch/powerpc/configs/52xx/lite5200b_defconfig
+++ b/arch/powerpc/configs/52xx/lite5200b_defconfig
@@ -58,6 +58,6 @@ CONFIG_NFS_FS=y
CONFIG_NFS_V4=y
CONFIG_ROOT_NFS=y
CONFIG_PRINTK_TIME=y
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_DETECT_HUNG_TASK=y
# CONFIG_DEBUG_BUGVERBOSE is not set
diff --git a/arch/powerpc/configs/52xx/motionpro_defconfig b/arch/powerpc/configs/52xx/motionpro_defconfig
index 72762da94846..6186ead1e105 100644
--- a/arch/powerpc/configs/52xx/motionpro_defconfig
+++ b/arch/powerpc/configs/52xx/motionpro_defconfig
@@ -84,7 +84,7 @@ CONFIG_ROOT_NFS=y
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
CONFIG_PRINTK_TIME=y
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_DETECT_HUNG_TASK=y
# CONFIG_DEBUG_BUGVERBOSE is not set
CONFIG_CRYPTO_ECB=y
diff --git a/arch/powerpc/configs/52xx/tqm5200_defconfig b/arch/powerpc/configs/52xx/tqm5200_defconfig
index a3c8ca74032c..e6735b945327 100644
--- a/arch/powerpc/configs/52xx/tqm5200_defconfig
+++ b/arch/powerpc/configs/52xx/tqm5200_defconfig
@@ -85,7 +85,7 @@ CONFIG_ROOT_NFS=y
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
CONFIG_PRINTK_TIME=y
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_DETECT_HUNG_TASK=y
# CONFIG_DEBUG_BUGVERBOSE is not set
CONFIG_CRYPTO_ECB=y
diff --git a/arch/powerpc/configs/adder875_defconfig b/arch/powerpc/configs/adder875_defconfig
index 5326bc739279..7f35d5bc1229 100644
--- a/arch/powerpc/configs/adder875_defconfig
+++ b/arch/powerpc/configs/adder875_defconfig
@@ -45,7 +45,7 @@ CONFIG_CRAMFS=y
CONFIG_NFS_FS=y
CONFIG_ROOT_NFS=y
CONFIG_CRC32_SLICEBY4=y
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_DEBUG_FS=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DETECT_HUNG_TASK=y
diff --git a/arch/powerpc/configs/ep8248e_defconfig b/arch/powerpc/configs/ep8248e_defconfig
index 00d69965f898..8df6d3a293e3 100644
--- a/arch/powerpc/configs/ep8248e_defconfig
+++ b/arch/powerpc/configs/ep8248e_defconfig
@@ -59,7 +59,7 @@ CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_UTF8=y
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_MAGIC_SYSRQ=y
# CONFIG_SCHED_DEBUG is not set
CONFIG_BDI_SWITCH=y
diff --git a/arch/powerpc/configs/ep88xc_defconfig b/arch/powerpc/configs/ep88xc_defconfig
index f5c3e72da719..a98ef6a4abef 100644
--- a/arch/powerpc/configs/ep88xc_defconfig
+++ b/arch/powerpc/configs/ep88xc_defconfig
@@ -48,6 +48,6 @@ CONFIG_CRAMFS=y
CONFIG_NFS_FS=y
CONFIG_ROOT_NFS=y
CONFIG_CRC32_SLICEBY4=y
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DETECT_HUNG_TASK=y
diff --git a/arch/powerpc/configs/fsl-emb-nonhw.config b/arch/powerpc/configs/fsl-emb-nonhw.config
index df37efed0aec..f14c6dbd7346 100644
--- a/arch/powerpc/configs/fsl-emb-nonhw.config
+++ b/arch/powerpc/configs/fsl-emb-nonhw.config
@@ -24,7 +24,7 @@ CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
CONFIG_DEBUG_FS=y
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_SHIRQ=y
CONFIG_DETECT_HUNG_TASK=y
diff --git a/arch/powerpc/configs/mgcoge_defconfig b/arch/powerpc/configs/mgcoge_defconfig
index dcc8dccf54f3..498d35db7833 100644
--- a/arch/powerpc/configs/mgcoge_defconfig
+++ b/arch/powerpc/configs/mgcoge_defconfig
@@ -73,7 +73,7 @@ CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_UTF8=y
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_DEBUG_FS=y
CONFIG_MAGIC_SYSRQ=y
# CONFIG_SCHED_DEBUG is not set
diff --git a/arch/powerpc/configs/mpc5200_defconfig b/arch/powerpc/configs/mpc5200_defconfig
index 83d801307178..c0fe5e76604a 100644
--- a/arch/powerpc/configs/mpc5200_defconfig
+++ b/arch/powerpc/configs/mpc5200_defconfig
@@ -122,6 +122,6 @@ CONFIG_ROOT_NFS=y
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
CONFIG_PRINTK_TIME=y
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DETECT_HUNG_TASK=y
diff --git a/arch/powerpc/configs/mpc8272_ads_defconfig b/arch/powerpc/configs/mpc8272_ads_defconfig
index 00a4d2bf43b2..4145ef5689ca 100644
--- a/arch/powerpc/configs/mpc8272_ads_defconfig
+++ b/arch/powerpc/configs/mpc8272_ads_defconfig
@@ -67,7 +67,7 @@ CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_UTF8=y
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_BDI_SWITCH=y
diff --git a/arch/powerpc/configs/mpc885_ads_defconfig b/arch/powerpc/configs/mpc885_ads_defconfig
index c74dc76b1d0d..700115d85d6f 100644
--- a/arch/powerpc/configs/mpc885_ads_defconfig
+++ b/arch/powerpc/configs/mpc885_ads_defconfig
@@ -71,7 +71,7 @@ CONFIG_ROOT_NFS=y
CONFIG_CRYPTO=y
CONFIG_CRYPTO_DEV_TALITOS=y
CONFIG_CRC32_SLICEBY4=y
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_VM_PGTABLE=y
diff --git a/arch/powerpc/configs/ppc6xx_defconfig b/arch/powerpc/configs/ppc6xx_defconfig
index bb549cb1c3e3..6d7ae407648f 100644
--- a/arch/powerpc/configs/ppc6xx_defconfig
+++ b/arch/powerpc/configs/ppc6xx_defconfig
@@ -1066,7 +1066,7 @@ CONFIG_NLS_ISO8859_14=m
CONFIG_NLS_ISO8859_15=m
CONFIG_NLS_KOI8_R=m
CONFIG_NLS_KOI8_U=m
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_HEADERS_INSTALL=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_KERNEL=y
diff --git a/arch/powerpc/configs/pq2fads_defconfig b/arch/powerpc/configs/pq2fads_defconfig
index 9d8a76857c6f..9d63e2e65211 100644
--- a/arch/powerpc/configs/pq2fads_defconfig
+++ b/arch/powerpc/configs/pq2fads_defconfig
@@ -68,7 +68,7 @@ CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_UTF8=y
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DETECT_HUNG_TASK=y
# CONFIG_SCHED_DEBUG is not set
diff --git a/arch/powerpc/configs/ps3_defconfig b/arch/powerpc/configs/ps3_defconfig
index 7c95fab4b920..2d9ac233da68 100644
--- a/arch/powerpc/configs/ps3_defconfig
+++ b/arch/powerpc/configs/ps3_defconfig
@@ -153,7 +153,7 @@ CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
CONFIG_CRC_CCITT=m
CONFIG_CRC_T10DIF=y
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_STACKOVERFLOW=y
diff --git a/arch/powerpc/configs/tqm8xx_defconfig b/arch/powerpc/configs/tqm8xx_defconfig
index 77857d513022..083c2e57520a 100644
--- a/arch/powerpc/configs/tqm8xx_defconfig
+++ b/arch/powerpc/configs/tqm8xx_defconfig
@@ -55,6 +55,6 @@ CONFIG_CRAMFS=y
CONFIG_NFS_FS=y
CONFIG_ROOT_NFS=y
CONFIG_CRC32_SLICEBY4=y
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DETECT_HUNG_TASK=y
diff --git a/arch/powerpc/crypto/aes-spe-glue.c b/arch/powerpc/crypto/aes-spe-glue.c
index c2b23b69d7b1..e8dfe9fb0266 100644
--- a/arch/powerpc/crypto/aes-spe-glue.c
+++ b/arch/powerpc/crypto/aes-spe-glue.c
@@ -404,7 +404,7 @@ static int ppc_xts_decrypt(struct skcipher_request *req)

/*
* Algorithm definitions. Disabling alignment (cra_alignmask=0) was chosen
- * because the e500 platform can handle unaligned reads/writes very efficently.
+ * because the e500 platform can handle unaligned reads/writes very efficiently.
* This improves IPsec thoughput by another few percent. Additionally we assume
* that AES context is always aligned to at least 8 bytes because it is created
* with kmalloc() in the crypto infrastructure
diff --git a/arch/powerpc/include/asm/archrandom.h b/arch/powerpc/include/asm/archrandom.h
index 9a53e29680f4..258174304904 100644
--- a/arch/powerpc/include/asm/archrandom.h
+++ b/arch/powerpc/include/asm/archrandom.h
@@ -38,12 +38,7 @@ static inline bool __must_check arch_get_random_seed_int(unsigned int *v)
#endif /* CONFIG_ARCH_RANDOM */

#ifdef CONFIG_PPC_POWERNV
-int powernv_hwrng_present(void);
int powernv_get_random_long(unsigned long *v);
-int powernv_get_random_real_mode(unsigned long *v);
-#else
-static inline int powernv_hwrng_present(void) { return 0; }
-static inline int powernv_get_random_real_mode(unsigned long *v) { return 0; }
#endif

#endif /* _ASM_POWERPC_ARCHRANDOM_H */
diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 2aefe14e1442..1e5e9b6ec78d 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -120,6 +120,15 @@ int setup_purgatory(struct kimage *image, const void *slave_code,
#ifdef CONFIG_PPC64
struct kexec_buf;

+int arch_kexec_kernel_image_probe(struct kimage *image, void *buf, unsigned long buf_len);
+#define arch_kexec_kernel_image_probe arch_kexec_kernel_image_probe
+
+int arch_kimage_file_post_load_cleanup(struct kimage *image);
+#define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup
+
+int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf);
+#define arch_kexec_locate_mem_hole arch_kexec_locate_mem_hole
+
int load_crashdump_segments_ppc64(struct kimage *image,
struct kexec_buf *kbuf);
int setup_purgatory_ppc64(struct kimage *image, const void *slave_code,
diff --git a/arch/powerpc/include/asm/simple_spinlock.h b/arch/powerpc/include/asm/simple_spinlock.h
index 7ae6aeef8464..9dcc7e9993b9 100644
--- a/arch/powerpc/include/asm/simple_spinlock.h
+++ b/arch/powerpc/include/asm/simple_spinlock.h
@@ -48,10 +48,11 @@ static inline int arch_spin_is_locked(arch_spinlock_t *lock)
static inline unsigned long __arch_spin_trylock(arch_spinlock_t *lock)
{
unsigned long tmp, token;
+ unsigned int eh = IS_ENABLED(CONFIG_PPC64);

token = LOCK_TOKEN;
__asm__ __volatile__(
-"1: lwarx %0,0,%2,1\n\
+"1: lwarx %0,0,%2,%[eh]\n\
cmpwi 0,%0,0\n\
bne- 2f\n\
stwcx. %1,0,%2\n\
@@ -59,7 +60,7 @@ static inline unsigned long __arch_spin_trylock(arch_spinlock_t *lock)
PPC_ACQUIRE_BARRIER
"2:"
: "=&r" (tmp)
- : "r" (token), "r" (&lock->slock)
+ : "r" (token), "r" (&lock->slock), [eh] "n" (eh)
: "cr0", "memory");

return tmp;
@@ -156,9 +157,10 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
static inline long __arch_read_trylock(arch_rwlock_t *rw)
{
long tmp;
+ unsigned int eh = IS_ENABLED(CONFIG_PPC64);

__asm__ __volatile__(
-"1: lwarx %0,0,%1,1\n"
+"1: lwarx %0,0,%1,%[eh]\n"
__DO_SIGN_EXTEND
" addic. %0,%0,1\n\
ble- 2f\n"
@@ -166,7 +168,7 @@ static inline long __arch_read_trylock(arch_rwlock_t *rw)
bne- 1b\n"
PPC_ACQUIRE_BARRIER
"2:" : "=&r" (tmp)
- : "r" (&rw->lock)
+ : "r" (&rw->lock), [eh] "n" (eh)
: "cr0", "xer", "memory");

return tmp;
@@ -179,17 +181,18 @@ static inline long __arch_read_trylock(arch_rwlock_t *rw)
static inline long __arch_write_trylock(arch_rwlock_t *rw)
{
long tmp, token;
+ unsigned int eh = IS_ENABLED(CONFIG_PPC64);

token = WRLOCK_TOKEN;
__asm__ __volatile__(
-"1: lwarx %0,0,%2,1\n\
+"1: lwarx %0,0,%2,%[eh]\n\
cmpwi 0,%0,0\n\
bne- 2f\n"
" stwcx. %1,0,%2\n\
bne- 1b\n"
PPC_ACQUIRE_BARRIER
"2:" : "=&r" (tmp)
- : "r" (token), "r" (&rw->lock)
+ : "r" (token), "r" (&rw->lock), [eh] "n" (eh)
: "cr0", "memory");

return tmp;
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 4ddd161aef32..63c384c3e6d4 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -20,6 +20,7 @@ CFLAGS_prom.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
CFLAGS_prom_init.o += -fno-stack-protector
CFLAGS_prom_init.o += -DDISABLE_BRANCH_PROFILING
CFLAGS_prom_init.o += -ffreestanding
+CFLAGS_prom_init.o += $(call cc-option, -ftrivial-auto-var-init=uninitialized)

ifdef CONFIG_FUNCTION_TRACER
# Do not trace early boot code
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index ae0fdef0ac11..2a271a6d6924 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -2025,7 +2025,7 @@ static struct cpu_spec * __init setup_cpu_spec(unsigned long offset,
* oprofile_cpu_type already has a value, then we are
* possibly overriding a real PVR with a logical one,
* and, in that case, keep the current value for
- * oprofile_cpu_type. Futhermore, let's ensure that the
+ * oprofile_cpu_type. Furthermore, let's ensure that the
* fix for the PMAO bug is enabled on compatibility mode.
*/
if (old.oprofile_cpu_type != NULL) {
diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c
index 64e423d2fe0f..30d4eca88d17 100644
--- a/arch/powerpc/kernel/dawr.c
+++ b/arch/powerpc/kernel/dawr.c
@@ -27,7 +27,7 @@ int set_dawr(int nr, struct arch_hw_breakpoint *brk)
dawrx |= (brk->type & (HW_BRK_TYPE_PRIV_ALL)) >> 3;
/*
* DAWR length is stored in field MDR bits 48:53. Matches range in
- * doublewords (64 bits) baised by -1 eg. 0b000000=1DW and
+ * doublewords (64 bits) biased by -1 eg. 0b000000=1DW and
* 0b111111=64DW.
* brk->hw_len is in bytes.
* This aligns up to double word size, shifts and does the bias.
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 28bb1e7263a6..ab316e155ea9 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1329,7 +1329,7 @@ int eeh_pe_set_option(struct eeh_pe *pe, int option)

/*
* EEH functionality could possibly be disabled, just
- * return error for the case. And the EEH functinality
+ * return error for the case. And the EEH functionality
* isn't expected to be disabled on one specific PE.
*/
switch (option) {
@@ -1804,7 +1804,7 @@ static int eeh_debugfs_break_device(struct pci_dev *pdev)
* PE freeze. Using the in_8() accessor skips the eeh detection hook
* so the freeze hook so the EEH Detection machinery won't be
* triggered here. This is to match the usual behaviour of EEH
- * where the HW will asyncronously freeze a PE and it's up to
+ * where the HW will asynchronously freeze a PE and it's up to
* the kernel to notice and deal with it.
*
* 3. Turn Memory space back on. This is more important for VFs
diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
index a7a8dc182efb..c23a454af08a 100644
--- a/arch/powerpc/kernel/eeh_event.c
+++ b/arch/powerpc/kernel/eeh_event.c
@@ -143,7 +143,7 @@ int __eeh_send_failure_event(struct eeh_pe *pe)
int eeh_send_failure_event(struct eeh_pe *pe)
{
/*
- * If we've manually supressed recovery events via debugfs
+ * If we've manually suppressed recovery events via debugfs
* then just drop it on the floor.
*/
if (eeh_debugfs_no_recover) {
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index dc2350b288cf..aa29b9b7920f 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1665,8 +1665,8 @@ int __init setup_fadump(void)
}
/*
* Use subsys_initcall_sync() here because there is dependency with
- * crash_save_vmcoreinfo_init(), which mush run first to ensure vmcoreinfo initialization
- * is done before regisering with f/w.
+ * crash_save_vmcoreinfo_init(), which must run first to ensure vmcoreinfo initialization
+ * is done before registering with f/w.
*/
subsys_initcall_sync(setup_fadump);
#else /* !CONFIG_PRESERVE_FA_DUMP */
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 07093b7cdcb9..a67fd54ccc57 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -776,6 +776,11 @@ bool iommu_table_in_use(struct iommu_table *tbl)
/* ignore reserved bit0 */
if (tbl->it_offset == 0)
start = 1;
+
+ /* Simple case with no reserved MMIO32 region */
+ if (!tbl->it_reserved_start && !tbl->it_reserved_end)
+ return find_next_bit(tbl->it_map, tbl->it_size, start) != tbl->it_size;
+
end = tbl->it_reserved_start - tbl->it_offset;
if (find_next_bit(tbl->it_map, end, start) != end)
return true;
diff --git a/arch/powerpc/kernel/module_32.c b/arch/powerpc/kernel/module_32.c
index a0432ef46967..e25b796682cc 100644
--- a/arch/powerpc/kernel/module_32.c
+++ b/arch/powerpc/kernel/module_32.c
@@ -99,7 +99,7 @@ static unsigned long get_plt_size(const Elf32_Ehdr *hdr,

/* Sort the relocation information based on a symbol and
* addend key. This is a stable O(n*log n) complexity
- * alogrithm but it will reduce the complexity of
+ * algorithm but it will reduce the complexity of
* count_relocs() to linear complexity O(n)
*/
sort((void *)hdr + sechdrs[i].sh_offset,
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 794720530442..2cce576edbc5 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -194,7 +194,7 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,

/* Sort the relocation information based on a symbol and
* addend key. This is a stable O(n*log n) complexity
- * alogrithm but it will reduce the complexity of
+ * algorithm but it will reduce the complexity of
* count_relocs() to linear complexity O(n)
*/
sort((void *)sechdrs[i].sh_addr,
@@ -361,7 +361,7 @@ static inline int create_ftrace_stub(struct ppc64_stub_entry *entry,
entry->jump[1] |= PPC_HA(reladdr);
entry->jump[2] |= PPC_LO(reladdr);

- /* Eventhough we don't use funcdata in the stub, it's needed elsewhere. */
+ /* Even though we don't use funcdata in the stub, it's needed elsewhere. */
entry->funcdata = func_desc(addr);
entry->magic = STUB_MAGIC;

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 8bc9cf62cd93..aa37faeb80ee 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -74,16 +74,32 @@ void __init set_pci_dma_ops(const struct dma_map_ops *dma_ops)
static int get_phb_number(struct device_node *dn)
{
int ret, phb_id = -1;
- u32 prop_32;
u64 prop;

/*
* Try fixed PHB numbering first, by checking archs and reading
- * the respective device-tree properties. Firstly, try powernv by
- * reading "ibm,opal-phbid", only present in OPAL environment.
+ * the respective device-tree properties. Firstly, try reading
+ * standard "linux,pci-domain", then try reading "ibm,opal-phbid"
+ * (only present in powernv OPAL environment), then try device-tree
+ * alias and as the last try to use lower bits of "reg" property.
*/
- ret = of_property_read_u64(dn, "ibm,opal-phbid", &prop);
+ ret = of_get_pci_domain_nr(dn);
+ if (ret >= 0) {
+ prop = ret;
+ ret = 0;
+ }
+ if (ret)
+ ret = of_property_read_u64(dn, "ibm,opal-phbid", &prop);
+
if (ret) {
+ ret = of_alias_get_id(dn, "pci");
+ if (ret >= 0) {
+ prop = ret;
+ ret = 0;
+ }
+ }
+ if (ret) {
+ u32 prop_32;
ret = of_property_read_u32_index(dn, "reg", 1, &prop_32);
prop = prop_32;
}
@@ -95,10 +111,7 @@ static int get_phb_number(struct device_node *dn)
if ((phb_id >= 0) && !test_and_set_bit(phb_id, phb_bitmap))
return phb_id;

- /*
- * If not pseries nor powernv, or if fixed PHB numbering tried to add
- * the same PHB number twice, then fallback to dynamic PHB numbering.
- */
+ /* If everything fails then fallback to dynamic PHB numbering. */
phb_id = find_first_zero_bit(phb_bitmap, MAX_PHBS);
BUG_ON(phb_id >= MAX_PHBS);
set_bit(phb_id, phb_bitmap);
@@ -1688,7 +1701,7 @@ EXPORT_SYMBOL_GPL(pcibios_scan_phb);
static void fixup_hide_host_resource_fsl(struct pci_dev *dev)
{
int i, class = dev->class >> 8;
- /* When configured as agent, programing interface = 1 */
+ /* When configured as agent, programming interface = 1 */
int prog_if = dev->class & 0xf;

if ((class == PCI_CLASS_PROCESSOR_POWERPC ||
diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c
index c3024f104765..6f2b0cc1ddd6 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -244,7 +244,7 @@ EXPORT_SYMBOL(of_create_pci_dev);
* @dev: pci_dev structure for the bridge
*
* of_scan_bus() calls this routine for each PCI bridge that it finds, and
- * this routine in turn call of_scan_bus() recusively to scan for more child
+ * this routine in turn call of_scan_bus() recursively to scan for more child
* devices.
*/
void of_scan_pci_bridge(struct pci_dev *dev)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 9be279469a85..3940db48db77 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -307,7 +307,7 @@ static void __giveup_vsx(struct task_struct *tsk)
unsigned long msr = tsk->thread.regs->msr;

/*
- * We should never be ssetting MSR_VSX without also setting
+ * We should never be setting MSR_VSX without also setting
* MSR_FP and MSR_VEC
*/
WARN_ON((msr & MSR_VSX) && !((msr & MSR_FP) && (msr & MSR_VEC)));
@@ -645,7 +645,7 @@ static void do_break_handler(struct pt_regs *regs)
return;
}

- /* Otherwise findout which DAWR caused exception and disable it. */
+ /* Otherwise find out which DAWR caused exception and disable it. */
wp_get_instr_detail(regs, &instr, &type, &size, &ea);

for (i = 0; i < nr_wp_slots(); i++) {
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 0ac5faacc909..ace861ec4c4c 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -3416,7 +3416,7 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4,
*
* PowerMacs use a different mechanism to spin CPUs
*
- * (This must be done after instanciating RTAS)
+ * (This must be done after instantiating RTAS)
*/
if (of_platform != PLATFORM_POWERMAC)
prom_hold_cpus();
diff --git a/arch/powerpc/kernel/ptrace/ptrace-view.c b/arch/powerpc/kernel/ptrace/ptrace-view.c
index f15bc78caf71..076d867412c7 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-view.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-view.c
@@ -174,7 +174,7 @@ int ptrace_get_reg(struct task_struct *task, int regno, unsigned long *data)

/*
* softe copies paca->irq_soft_mask variable state. Since irq_soft_mask is
- * no more used as a flag, lets force usr to alway see the softe value as 1
+ * no more used as a flag, lets force usr to always see the softe value as 1
* which means interrupts are not soft disabled.
*/
if (IS_ENABLED(CONFIG_PPC64) && regno == PT_SOFTE) {
diff --git a/arch/powerpc/kernel/rtas_flash.c b/arch/powerpc/kernel/rtas_flash.c
index a99179d83538..bc817a5619d6 100644
--- a/arch/powerpc/kernel/rtas_flash.c
+++ b/arch/powerpc/kernel/rtas_flash.c
@@ -120,7 +120,7 @@ static struct kmem_cache *flash_block_cache = NULL;
/*
* Local copy of the flash block list.
*
- * The rtas_firmware_flash_list varable will be
+ * The rtas_firmware_flash_list variable will be
* set once the data is fully read.
*
* For convenience as we build the list we use virtual addrs,
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 518ae5aa9410..3acf2782acdf 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -279,7 +279,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
proc_freq / 1000000, proc_freq % 1000000);

/* If we are a Freescale core do a simple check so
- * we dont have to keep adding cases in the future */
+ * we don't have to keep adding cases in the future */
if (PVR_VER(pvr) & 0x8000) {
switch (PVR_VER(pvr)) {
case 0x8000: /* 7441/7450/7451, Voyager */
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 73d483b07ff3..858fc13b8c51 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -123,7 +123,7 @@ static long notrace __unsafe_setup_sigcontext(struct sigcontext __user *sc,
#endif
struct pt_regs *regs = tsk->thread.regs;
unsigned long msr = regs->msr;
- /* Force usr to alway see softe as 1 (interrupts enabled) */
+ /* Force usr to always see softe as 1 (interrupts enabled) */
unsigned long softe = 0x1;

BUG_ON(tsk != current);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index de0f6f09a5dd..a69df557e2b7 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1102,7 +1102,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
DBG("smp_prepare_cpus\n");

/*
- * setup_cpu may need to be called on the boot cpu. We havent
+ * setup_cpu may need to be called on the boot cpu. We haven't
* spun any cpus up but lets be paranoid.
*/
BUG_ON(boot_cpuid != smp_processor_id());
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index f80cce0e3899..4bf757ebe13d 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -829,7 +829,7 @@ static void __read_persistent_clock(struct timespec64 *ts)
static int first = 1;

ts->tv_nsec = 0;
- /* XXX this is a litle fragile but will work okay in the short term */
+ /* XXX this is a little fragile but will work okay in the short term */
if (first) {
first = 0;
if (ppc_md.time_init)
@@ -974,7 +974,7 @@ void secondary_cpu_time_init(void)
*/
start_cpu_decrementer();

- /* FIME: Should make unrelatred change to move snapshot_timebase
+ /* FIME: Should make unrelated change to move snapshot_timebase
* call here ! */
register_decrementer_clockevent(smp_processor_id());
}
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index bfc27496fe7e..7d28b9553654 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -56,7 +56,7 @@
* solved by also having a SMP watchdog where all CPUs check all other
* CPUs heartbeat.
*
- * The SMP checker can detect lockups on other CPUs. A gobal "pending"
+ * The SMP checker can detect lockups on other CPUs. A global "pending"
* cpumask is kept, containing all CPUs which enable the watchdog. Each
* CPU clears their pending bit in their heartbeat timer. When the bitmask
* becomes empty, the last CPU to clear its pending bit updates a global
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 6cc7793b8420..c29c639551fe 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -406,7 +406,7 @@ static int __init export_htab_values(void)
if (!node)
return -ENODEV;

- /* remove any stale propertys so ours can be found */
+ /* remove any stale properties so ours can be found */
of_remove_property(node, of_find_property(node, htab_base_prop.name, NULL));
of_remove_property(node, of_find_property(node, htab_size_prop.name, NULL));

diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
index b4981b651d9a..349a781cea0b 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -23,6 +23,7 @@
#include <linux/vmalloc.h>
#include <asm/setup.h>
#include <asm/drmem.h>
+#include <asm/firmware.h>
#include <asm/kexec_ranges.h>
#include <asm/crashdump-ppc64.h>

@@ -1038,6 +1039,48 @@ static int update_cpus_node(void *fdt)
return ret;
}

+static int copy_property(void *fdt, int node_offset, const struct device_node *dn,
+ const char *propname)
+{
+ const void *prop, *fdtprop;
+ int len = 0, fdtlen = 0;
+
+ prop = of_get_property(dn, propname, &len);
+ fdtprop = fdt_getprop(fdt, node_offset, propname, &fdtlen);
+
+ if (fdtprop && !prop)
+ return fdt_delprop(fdt, node_offset, propname);
+ else if (prop)
+ return fdt_setprop(fdt, node_offset, propname, prop, len);
+ else
+ return -FDT_ERR_NOTFOUND;
+}
+
+static int update_pci_dma_nodes(void *fdt, const char *dmapropname)
+{
+ struct device_node *dn;
+ int pci_offset, root_offset, ret = 0;
+
+ if (!firmware_has_feature(FW_FEATURE_LPAR))
+ return 0;
+
+ root_offset = fdt_path_offset(fdt, "/");
+ for_each_node_with_property(dn, dmapropname) {
+ pci_offset = fdt_subnode_offset(fdt, root_offset, of_node_full_name(dn));
+ if (pci_offset < 0)
+ continue;
+
+ ret = copy_property(fdt, pci_offset, dn, "ibm,dma-window");
+ if (ret < 0)
+ break;
+ ret = copy_property(fdt, pci_offset, dn, dmapropname);
+ if (ret < 0)
+ break;
+ }
+
+ return ret;
+}
+
/**
* setup_new_fdt_ppc64 - Update the flattend device-tree of the kernel
* being loaded.
@@ -1099,6 +1142,18 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
if (ret < 0)
goto out;

+#define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
+#define DMA64_PROPNAME "linux,dma64-ddr-window-info"
+ ret = update_pci_dma_nodes(fdt, DIRECT64_PROPNAME);
+ if (ret < 0)
+ goto out;
+
+ ret = update_pci_dma_nodes(fdt, DMA64_PROPNAME);
+ if (ret < 0)
+ goto out;
+#undef DMA64_PROPNAME
+#undef DIRECT64_PROPNAME
+
/* Update memory reserve map */
ret = get_reserved_memory_ranges(&rmem);
if (ret)
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 0aeb51738ca9..1137c4df726c 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -58,7 +58,7 @@ struct kvm_resize_hpt {
/* Possible values and their usage:
* <0 an error occurred during allocation,
* -EBUSY allocation is in the progress,
- * 0 allocation made successfuly.
+ * 0 allocation made successfully.
*/
int error;

diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index fdeda6a9cff4..fdcc7b287dd8 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -453,7 +453,7 @@ static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu, unsigned long mmu_seq,
* we are doing this on secondary cpus and current task there
* is not the hypervisor. Also this is safe against THP in the
* host, because an IPI to primary thread will wait for the secondary
- * to exit which will agains result in the below page table walk
+ * to exit which will again result in the below page table walk
* to finish.
*/
/* an rmap lock won't make it safe. because that just ensure hash
diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c
index fdb57be71aa6..5bbfb2eed127 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -268,7 +268,7 @@ int kvmppc_core_emulate_op_pr(struct kvm_vcpu *vcpu,

/*
* add rules to fit in ISA specification regarding TM
- * state transistion in TM disable/Suspended state,
+ * state transition in TM disable/Suspended state,
* and target TM state is TM inactive(00) state. (the
* change should be suppressed).
*/
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 7e52d0beee77..5e4251b76e75 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -19,7 +19,7 @@
#include <asm/interrupt.h>
#include <asm/kvm_ppc.h>
#include <asm/kvm_book3s.h>
-#include <asm/archrandom.h>
+#include <asm/machdep.h>
#include <asm/xics.h>
#include <asm/xive.h>
#include <asm/dbell.h>
@@ -176,13 +176,14 @@ EXPORT_SYMBOL_GPL(kvmppc_hcall_impl_hv_realmode);

int kvmppc_hwrng_present(void)
{
- return powernv_hwrng_present();
+ return ppc_md.get_random_seed != NULL;
}
EXPORT_SYMBOL_GPL(kvmppc_hwrng_present);

long kvmppc_rm_h_random(struct kvm_vcpu *vcpu)
{
- if (powernv_get_random_real_mode(&vcpu->arch.regs.gpr[4]))
+ if (ppc_md.get_random_seed &&
+ ppc_md.get_random_seed(&vcpu->arch.regs.gpr[4]))
return H_SUCCESS;

return H_HARDWARE;
diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index a28e5b3daabd..ac38c1cad378 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -379,7 +379,7 @@ void restore_p9_host_os_sprs(struct kvm_vcpu *vcpu,
{
/*
* current->thread.xxx registers must all be restored to host
- * values before a potential context switch, othrewise the context
+ * values before a potential context switch, otherwise the context
* switch itself will overwrite current->thread.xxx with the values
* from the guest SPRs.
*/
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 36f2314c58e5..598006301620 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -120,7 +120,7 @@ static DEFINE_SPINLOCK(kvmppc_uvmem_bitmap_lock);
* content is un-encrypted.
*
* (c) Normal - The GFN is a normal. The GFN is associated with
- * a normal VM. The contents of the GFN is accesible to
+ * a normal VM. The contents of the GFN is accessible to
* the Hypervisor. Its content is never encrypted.
*
* States of a VM.
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 7bf9e6ca5c2d..d6abed6e51e6 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1287,7 +1287,7 @@ int kvmppc_handle_exit_pr(struct kvm_vcpu *vcpu, unsigned int exit_nr)

/* Get last sc for papr */
if (vcpu->arch.papr_enabled) {
- /* The sc instuction points SRR0 to the next inst */
+ /* The sc instruction points SRR0 to the next inst */
emul = kvmppc_get_last_inst(vcpu, INST_SC, &last_sc);
if (emul != EMULATE_DONE) {
kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) - 4);
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index ab6d37d78c62..589a8f257120 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -462,7 +462,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
* new guy. We cannot assume that the rejected interrupt is less
* favored than the new one, and thus doesn't need to be delivered,
* because by the time we exit icp_try_to_deliver() the target
- * processor may well have alrady consumed & completed it, and thus
+ * processor may well have already consumed & completed it, and thus
* the rejected interrupt might actually be already acceptable.
*/
if (icp_try_to_deliver(icp, new_irq, state->priority, &reject)) {
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index c0ce5531d9bc..24d434f1f012 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -124,7 +124,7 @@ void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu)
* interrupt might have fired and be on its way to the
* host queue while we mask it, and if we unmask it
* early enough (re-cede right away), there is a
- * theorical possibility that it fires again, thus
+ * theoretical possibility that it fires again, thus
* landing in the target queue more than once which is
* a big no-no.
*
@@ -622,7 +622,7 @@ static int xive_target_interrupt(struct kvm *kvm,

/*
* Targetting rules: In order to avoid losing track of
- * pending interrupts accross mask and unmask, which would
+ * pending interrupts across mask and unmask, which would
* allow queue overflows, we implement the following rules:
*
* - Unless it was never enabled (or we run out of capacity)
@@ -1073,7 +1073,7 @@ int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
/*
* If old_p is set, the interrupt is pending, we switch it to
* PQ=11. This will force a resend in the host so the interrupt
- * isn't lost to whatver host driver may pick it up
+ * isn't lost to whatever host driver may pick it up
*/
if (state->old_p)
xive_vm_esb_load(state->pt_data, XIVE_ESB_SET_PQ_11);
diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
index fa0d8dbbe484..4ff1372e48d4 100644
--- a/arch/powerpc/kvm/e500mc.c
+++ b/arch/powerpc/kvm/e500mc.c
@@ -309,7 +309,7 @@ static int kvmppc_core_vcpu_create_e500mc(struct kvm_vcpu *vcpu)
BUILD_BUG_ON(offsetof(struct kvmppc_vcpu_e500, vcpu) != 0);
vcpu_e500 = to_e500(vcpu);

- /* Invalid PIR value -- this LPID dosn't have valid state on any cpu */
+ /* Invalid PIR value -- this LPID doesn't have valid state on any cpu */
vcpu->arch.oldpir = 0xffffffff;

err = kvmppc_e500_tlb_init(vcpu_e500);
diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c b/arch/powerpc/mm/book3s64/hash_pgtable.c
index 7ce8914992e3..2e0cad5817ba 100644
--- a/arch/powerpc/mm/book3s64/hash_pgtable.c
+++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
@@ -377,7 +377,7 @@ int hash__has_transparent_hugepage(void)
if (mmu_psize_defs[MMU_PAGE_16M].shift != PMD_SHIFT)
return 0;
/*
- * We need to make sure that we support 16MB hugepage in a segement
+ * We need to make sure that we support 16MB hugepage in a segment
* with base page size 64K or 4K. We only enable THP with a PAGE_SIZE
* of 64K.
*/
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 985cabdd7f67..5b69b271707e 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1343,7 +1343,7 @@ static int subpage_protection(struct mm_struct *mm, unsigned long ea)
spp >>= 30 - 2 * ((ea >> 12) & 0xf);

/*
- * 0 -> full premission
+ * 0 -> full permission
* 1 -> Read only
* 2 -> no access.
* We return the flag that need to be cleared.
@@ -1664,7 +1664,7 @@ DEFINE_INTERRUPT_HANDLER(do_hash_fault)

err = hash_page_mm(mm, ea, access, TRAP(regs), flags);
if (unlikely(err < 0)) {
- // failed to instert a hash PTE due to an hypervisor error
+ // failed to insert a hash PTE due to an hypervisor error
if (user_mode(regs)) {
if (IS_ENABLED(CONFIG_PPC_SUBPAGE_PROT) && err == -2)
_exception(SIGSEGV, regs, SEGV_ACCERR, ea);
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 052e6590f84f..071bb66c3ad9 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -331,7 +331,7 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
spin_lock(&mm->page_table_lock);
/*
* If we find pgtable_page set, we return
- * the allocated page with single fragement
+ * the allocated page with single fragment
* count.
*/
if (likely(!mm->context.pmd_frag)) {
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index def04631a74d..db2f3d193448 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -359,7 +359,7 @@ static void __init radix_init_pgtable(void)
if (!cpu_has_feature(CPU_FTR_HVMODE) &&
cpu_has_feature(CPU_FTR_P9_RADIX_PREFETCH_BUG)) {
/*
- * Older versions of KVM on these machines perfer if the
+ * Older versions of KVM on these machines prefer if the
* guest only uses the low 19 PID bits.
*/
mmu_pid_bits = 19;
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index 7724af19ed7e..dda51fef2d2e 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -397,7 +397,7 @@ static inline void _tlbie_pid(unsigned long pid, unsigned long ric)

/*
* Workaround the fact that the "ric" argument to __tlbie_pid
- * must be a compile-time contraint to match the "i" constraint
+ * must be a compile-time constraint to match the "i" constraint
* in the asm statement.
*/
switch (ric) {
diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
index 81091b9587f6..6956f637a38c 100644
--- a/arch/powerpc/mm/book3s64/slb.c
+++ b/arch/powerpc/mm/book3s64/slb.c
@@ -347,7 +347,7 @@ void slb_setup_new_exec(void)
/*
* We have no good place to clear the slb preload cache on exec,
* flush_thread is about the earliest arch hook but that happens
- * after we switch to the mm and have aleady preloaded the SLBEs.
+ * after we switch to the mm and have already preloaded the SLBEs.
*
* For the most part that's probably okay to use entries from the
* previous exec, they will age out if unused. It may turn out to
@@ -615,7 +615,7 @@ static void slb_cache_update(unsigned long esid_data)
} else {
/*
* Our cache is full and the current cache content strictly
- * doesn't indicate the active SLB conents. Bump the ptr
+ * doesn't indicate the active SLB contents. Bump the ptr
* so that switch_slb() will ignore the cache.
*/
local_paca->slb_cache_ptr = SLB_CACHE_ENTRIES + 1;
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 83c0ee9fbf05..2e11952057f8 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -111,7 +111,7 @@ static int __meminit vmemmap_populated(unsigned long vmemmap_addr, int vmemmap_m
}

/*
- * vmemmap virtual address space management does not have a traditonal page
+ * vmemmap virtual address space management does not have a traditional page
* table to track which virtual struct pages are backed by physical mapping.
* The virtual to physical mappings are tracked in a simple linked list
* format. 'vmemmap_list' maintains the entire vmemmap physical mapping at
@@ -128,7 +128,7 @@ static struct vmemmap_backing *next;

/*
* The same pointer 'next' tracks individual chunks inside the allocated
- * full page during the boot time and again tracks the freeed nodes during
+ * full page during the boot time and again tracks the freed nodes during
* runtime. It is racy but it does not happen as they are separated by the
* boot process. Will create problem if some how we have memory hotplug
* operation during boot !!
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c b/arch/powerpc/mm/kasan/kasan_init_32.c
index f3e4d069e0ba..a70828a6d935 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -25,7 +25,7 @@ static void __init kasan_populate_pte(pte_t *ptep, pgprot_t prot)
int i;

for (i = 0; i < PTRS_PER_PTE; i++, ptep++)
- __set_pte_at(&init_mm, va, ptep, pfn_pte(PHYS_PFN(pa), prot), 0);
+ __set_pte_at(&init_mm, va, ptep, pfn_pte(PHYS_PFN(pa), prot), 1);
}

int __init kasan_init_shadow_page_tables(unsigned long k_start, unsigned long k_end)
diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
index 27f9186ae374..1ee08c3efe5b 100644
--- a/arch/powerpc/mm/nohash/8xx.c
+++ b/arch/powerpc/mm/nohash/8xx.c
@@ -179,8 +179,8 @@ void mmu_mark_initmem_nx(void)
unsigned long boundary = strict_kernel_rwx_enabled() ? sinittext : etext8;
unsigned long einittext8 = ALIGN(__pa(_einittext), SZ_8M);

- mmu_mapin_ram_chunk(0, boundary, PAGE_KERNEL_TEXT, false);
- mmu_mapin_ram_chunk(boundary, einittext8, PAGE_KERNEL, false);
+ if (!debug_pagealloc_enabled_or_kfence())
+ mmu_mapin_ram_chunk(boundary, einittext8, PAGE_KERNEL, false);

mmu_pin_tlb(block_mapped_ram, false);
}
diff --git a/arch/powerpc/mm/nohash/book3e_hugetlbpage.c b/arch/powerpc/mm/nohash/book3e_hugetlbpage.c
index 8b88be91b622..307ca919d393 100644
--- a/arch/powerpc/mm/nohash/book3e_hugetlbpage.c
+++ b/arch/powerpc/mm/nohash/book3e_hugetlbpage.c
@@ -142,7 +142,7 @@ book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea, pte_t pte)
tsize = shift - 10;
/*
* We can't be interrupted while we're setting up the MAS
- * regusters or after we've confirmed that no tlb exists.
+ * registers or after we've confirmed that no tlb exists.
*/
local_irq_save(flags);

diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c b/arch/powerpc/mm/nohash/kaslr_booke.c
index 5f81c076621f..37eb8d80bd53 100644
--- a/arch/powerpc/mm/nohash/kaslr_booke.c
+++ b/arch/powerpc/mm/nohash/kaslr_booke.c
@@ -311,7 +311,7 @@ static unsigned long __init kaslr_choose_location(void *dt_ptr, phys_addr_t size
ram = map_mem_in_cams(ram, CONFIG_LOWMEM_CAM_NUM, true, true);
linear_sz = min_t(unsigned long, ram, SZ_512M);

- /* If the linear size is smaller than 64M, do not randmize */
+ /* If the linear size is smaller than 64M, do not randomize */
if (linear_sz < SZ_64M)
return 0;

diff --git a/arch/powerpc/mm/nohash/tlb_low_64e.S b/arch/powerpc/mm/nohash/tlb_low_64e.S
index 8b97c4acfebf..9e9ab3803fb2 100644
--- a/arch/powerpc/mm/nohash/tlb_low_64e.S
+++ b/arch/powerpc/mm/nohash/tlb_low_64e.S
@@ -583,7 +583,7 @@ itlb_miss_fault_e6500:
*/
rlwimi r11,r14,32-19,27,27
rlwimi r11,r14,32-16,19,19
- beq normal_tlb_miss
+ beq normal_tlb_miss_user
/* XXX replace the RMW cycles with immediate loads + writes */
1: mfspr r10,SPRN_MAS1
cmpldi cr0,r15,8 /* Check for vmalloc region */
@@ -626,7 +626,7 @@ itlb_miss_fault_e6500:

cmpldi cr0,r15,0 /* Check for user region */
std r14,EX_TLB_ESR(r12) /* write crazy -1 to frame */
- beq normal_tlb_miss
+ beq normal_tlb_miss_user

li r11,_PAGE_PRESENT|_PAGE_BAP_SX /* Base perm */
oris r11,r11,_PAGE_ACCESSED@h
@@ -653,6 +653,12 @@ itlb_miss_fault_e6500:
* r11 = PTE permission mask
* r10 = crap (free to use)
*/
+normal_tlb_miss_user:
+#ifdef CONFIG_PPC_KUAP
+ mfspr r14,SPRN_MAS1
+ rlwinm. r14,r14,0,0x3fff0000
+ beq- normal_tlb_miss_access_fault /* KUAP fault */
+#endif
normal_tlb_miss:
/* So we first construct the page table address. We do that by
* shifting the bottom of the address (not the region ID) by
@@ -683,11 +689,6 @@ finish_normal_tlb_miss:
/* Check if required permissions are met */
andc. r15,r11,r14
bne- normal_tlb_miss_access_fault
-#ifdef CONFIG_PPC_KUAP
- mfspr r11,SPRN_MAS1
- rlwinm. r10,r11,0,0x3fff0000
- beq- normal_tlb_miss_access_fault /* KUAP fault */
-#endif

/* Now we build the MAS:
*
@@ -709,9 +710,7 @@ finish_normal_tlb_miss:
rldicl r10,r14,64-8,64-8
cmpldi cr0,r10,BOOK3E_PAGESZ_4K
beq- 1f
-#ifndef CONFIG_PPC_KUAP
mfspr r11,SPRN_MAS1
-#endif
rlwimi r11,r14,31,21,24
rlwinm r11,r11,0,21,19
mtspr SPRN_MAS1,r11
diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index 97ae4935da79..20652daa1d7e 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -83,7 +83,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
spin_lock(&mm->page_table_lock);
/*
* If we find pgtable_page set, we return
- * the allocated page with single fragement
+ * the allocated page with single fragment
* count.
*/
if (likely(!pte_frag_get(&mm->context))) {
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index a56ade39dc68..3ac73f9fb5d5 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -135,9 +135,9 @@ void mark_initmem_nx(void)
unsigned long numpages = PFN_UP((unsigned long)_einittext) -
PFN_DOWN((unsigned long)_sinittext);

- if (v_block_mapped((unsigned long)_sinittext)) {
- mmu_mark_initmem_nx();
- } else {
+ mmu_mark_initmem_nx();
+
+ if (!v_block_mapped((unsigned long)_sinittext)) {
set_memory_nx((unsigned long)_sinittext, numpages);
set_memory_rw((unsigned long)_sinittext, numpages);
}
diff --git a/arch/powerpc/mm/ptdump/shared.c b/arch/powerpc/mm/ptdump/shared.c
index 03607ab90c66..f884760ca5cf 100644
--- a/arch/powerpc/mm/ptdump/shared.c
+++ b/arch/powerpc/mm/ptdump/shared.c
@@ -17,9 +17,9 @@ static const struct flag_info flag_array[] = {
.clear = " ",
}, {
.mask = _PAGE_RW,
- .val = _PAGE_RW,
- .set = "rw",
- .clear = "r ",
+ .val = 0,
+ .set = "r ",
+ .clear = "rw",
}, {
.mask = _PAGE_EXEC,
.val = _PAGE_EXEC,
diff --git a/arch/powerpc/perf/8xx-pmu.c b/arch/powerpc/perf/8xx-pmu.c
index 4738c4dbf567..308a2e40d7be 100644
--- a/arch/powerpc/perf/8xx-pmu.c
+++ b/arch/powerpc/perf/8xx-pmu.c
@@ -157,7 +157,7 @@ static void mpc8xx_pmu_del(struct perf_event *event, int flags)

mpc8xx_pmu_read(event);

- /* If it was the last user, stop counting to avoid useles overhead */
+ /* If it was the last user, stop counting to avoid useless overhead */
switch (event_type(event)) {
case PERF_8xx_ID_CPU_CYCLES:
break;
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index b5b42cf0a703..03c64a0195df 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1142,7 +1142,7 @@ static u64 check_and_compute_delta(u64 prev, u64 val)
/*
* POWER7 can roll back counter values, if the new value is smaller
* than the previous value it will cause the delta and the counter to
- * have bogus values unless we rolled a counter over. If a coutner is
+ * have bogus values unless we rolled a counter over. If a counter is
* rolled back, it will be smaller, but within 256, which is the maximum
* number of events to rollback at once. If we detect a rollback
* return 0. This can lead to a small lack of precision in the
@@ -1349,27 +1349,22 @@ static void power_pmu_disable(struct pmu *pmu)
* a PMI happens during interrupt replay and perf counter
* values are cleared by PMU callbacks before replay.
*
- * If any PMC corresponding to the active PMU events are
- * overflown, disable the interrupt by clearing the paca
- * bit for PMI since we are disabling the PMU now.
- * Otherwise provide a warning if there is PMI pending, but
- * no counter is found overflown.
+ * Disable the interrupt by clearing the paca bit for PMI
+ * since we are disabling the PMU now. Otherwise provide a
+ * warning if there is PMI pending, but no counter is found
+ * overflown.
+ *
+ * Since power_pmu_disable runs under local_irq_save, it
+ * could happen that code hits a PMC overflow without PMI
+ * pending in paca. Hence only clear PMI pending if it was
+ * set.
+ *
+ * If a PMI is pending, then MSR[EE] must be disabled (because
+ * the masked PMI handler disabling EE). So it is safe to
+ * call clear_pmi_irq_pending().
*/
- if (any_pmc_overflown(cpuhw)) {
- /*
- * Since power_pmu_disable runs under local_irq_save, it
- * could happen that code hits a PMC overflow without PMI
- * pending in paca. Hence only clear PMI pending if it was
- * set.
- *
- * If a PMI is pending, then MSR[EE] must be disabled (because
- * the masked PMI handler disabling EE). So it is safe to
- * call clear_pmi_irq_pending().
- */
- if (pmi_irq_pending())
- clear_pmi_irq_pending();
- } else
- WARN_ON(pmi_irq_pending());
+ if (pmi_irq_pending())
+ clear_pmi_irq_pending();

val = mmcra = cpuhw->mmcr.mmcra;

@@ -2057,7 +2052,7 @@ static int power_pmu_event_init(struct perf_event *event)
/*
* PMU config registers have fields that are
* reserved and some specific values for bit fields are reserved.
- * For ex., MMCRA[61:62] is Randome Sampling Mode (SM)
+ * For ex., MMCRA[61:62] is Random Sampling Mode (SM)
* and value of 0b11 to this field is reserved.
* Check for invalid values in attr.config.
*/
@@ -2447,7 +2442,7 @@ static void __perf_event_interrupt(struct pt_regs *regs)
}

/*
- * During system wide profling or while specific CPU is monitored for an
+ * During system wide profiling or while specific CPU is monitored for an
* event, some corner cases could cause PMC to overflow in idle path. This
* will trigger a PMI after waking up from idle. Since counter values are _not_
* saved/restored in idle path, can lead to below "Can't find PMC" message.
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 526d4b767534..498f1a2f7658 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -521,7 +521,7 @@ static int nest_imc_event_init(struct perf_event *event)

/*
* Nest HW counter memory resides in a per-chip reserve-memory (HOMER).
- * Get the base memory addresss for this cpu.
+ * Get the base memory address for this cpu.
*/
chip_id = cpu_to_chip_id(event->cpu);

@@ -674,7 +674,7 @@ static int ppc_core_imc_cpu_offline(unsigned int cpu)
/*
* Check whether core_imc is registered. We could end up here
* if the cpuhotplug callback registration fails. i.e, callback
- * invokes the offline path for all sucessfully registered cpus.
+ * invokes the offline path for all successfully registered cpus.
* At this stage, core_imc pmu will not be registered and we
* should return here.
*
diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
index bb5d64862bc9..42abbcfc73da 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -82,11 +82,11 @@ static unsigned long sdar_mod_val(u64 event)
static void mmcra_sdar_mode(u64 event, unsigned long *mmcra)
{
/*
- * MMCRA[SDAR_MODE] specifices how the SDAR should be updated in
- * continous sampling mode.
+ * MMCRA[SDAR_MODE] specifies how the SDAR should be updated in
+ * continuous sampling mode.
*
* Incase of Power8:
- * MMCRA[SDAR_MODE] will be programmed as "0b01" for continous sampling
+ * MMCRA[SDAR_MODE] will be programmed as "0b01" for continuous sampling
* mode and will be un-changed when setting MMCRA[63] (Marked events).
*
* Incase of Power9/power10:
diff --git a/arch/powerpc/platforms/512x/clock-commonclk.c b/arch/powerpc/platforms/512x/clock-commonclk.c
index 0b03d812baae..0652c7e69225 100644
--- a/arch/powerpc/platforms/512x/clock-commonclk.c
+++ b/arch/powerpc/platforms/512x/clock-commonclk.c
@@ -663,7 +663,7 @@ static void __init mpc512x_clk_setup_mclk(struct mclk_setup_data *entry, size_t
* the PSC/MSCAN/SPDIF (serial drivers et al) need the MCLK
* for their bitrate
* - in the absence of "aliases" for clocks we need to create
- * individial 'struct clk' items for whatever might get
+ * individual 'struct clk' items for whatever might get
* referenced or looked up, even if several of those items are
* identical from the logical POV (their rate value)
* - for easier future maintenance and for better reflection of
diff --git a/arch/powerpc/platforms/512x/mpc512x_shared.c b/arch/powerpc/platforms/512x/mpc512x_shared.c
index e3411663edad..96d9cf49560d 100644
--- a/arch/powerpc/platforms/512x/mpc512x_shared.c
+++ b/arch/powerpc/platforms/512x/mpc512x_shared.c
@@ -289,7 +289,7 @@ static void __init mpc512x_setup_diu(void)

/*
* We do not allocate and configure new area for bitmap buffer
- * because it would requere copying bitmap data (splash image)
+ * because it would require copying bitmap data (splash image)
* and so negatively affect boot time. Instead we reserve the
* already configured frame buffer area so that it won't be
* destroyed. The starting address of the area to reserve and
diff --git a/arch/powerpc/platforms/52xx/mpc52xx_common.c b/arch/powerpc/platforms/52xx/mpc52xx_common.c
index 565e3a83dc9e..60aa6015e284 100644
--- a/arch/powerpc/platforms/52xx/mpc52xx_common.c
+++ b/arch/powerpc/platforms/52xx/mpc52xx_common.c
@@ -308,7 +308,7 @@ int mpc5200_psc_ac97_gpio_reset(int psc_number)

spin_lock_irqsave(&gpio_lock, flags);

- /* Reconfiure pin-muxing to gpio */
+ /* Reconfigure pin-muxing to gpio */
mux = in_be32(&simple_gpio->port_config);
out_be32(&simple_gpio->port_config, mux & (~gpio));

diff --git a/arch/powerpc/platforms/52xx/mpc52xx_gpt.c b/arch/powerpc/platforms/52xx/mpc52xx_gpt.c
index f862b48b4824..7252d992ca9f 100644
--- a/arch/powerpc/platforms/52xx/mpc52xx_gpt.c
+++ b/arch/powerpc/platforms/52xx/mpc52xx_gpt.c
@@ -398,7 +398,7 @@ static int mpc52xx_gpt_do_start(struct mpc52xx_gpt_priv *gpt, u64 period,
set |= MPC52xx_GPT_MODE_CONTINUOUS;

/* Determine the number of clocks in the requested period. 64 bit
- * arithmatic is done here to preserve the precision until the value
+ * arithmetic is done here to preserve the precision until the value
* is scaled back down into the u32 range. Period is in 'ns', bus
* frequency is in Hz. */
clocks = period * (u64)gpt->ipb_freq;
diff --git a/arch/powerpc/platforms/52xx/mpc52xx_lpbfifo.c b/arch/powerpc/platforms/52xx/mpc52xx_lpbfifo.c
index b91ebebd9ff2..54dfc63809d3 100644
--- a/arch/powerpc/platforms/52xx/mpc52xx_lpbfifo.c
+++ b/arch/powerpc/platforms/52xx/mpc52xx_lpbfifo.c
@@ -104,7 +104,7 @@ static void mpc52xx_lpbfifo_kick(struct mpc52xx_lpbfifo_request *req)
*
* Configure the watermarks so DMA will always complete correctly.
* It may be worth experimenting with the ALARM value to see if
- * there is a performance impacit. However, if it is wrong there
+ * there is a performance impact. However, if it is wrong there
* is a risk of DMA not transferring the last chunk of data
*/
if (write) {
diff --git a/arch/powerpc/platforms/85xx/mpc85xx_cds.c b/arch/powerpc/platforms/85xx/mpc85xx_cds.c
index 5bd487030256..ad0b88ecfd0c 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx_cds.c
+++ b/arch/powerpc/platforms/85xx/mpc85xx_cds.c
@@ -151,7 +151,7 @@ static void __init mpc85xx_cds_pci_irq_fixup(struct pci_dev *dev)
*/
case PCI_DEVICE_ID_VIA_82C586_2:
/* There are two USB controllers.
- * Identify them by functon number
+ * Identify them by function number
*/
if (PCI_FUNC(dev->devfn) == 3)
dev->irq = 11;
diff --git a/arch/powerpc/platforms/86xx/gef_ppc9a.c b/arch/powerpc/platforms/86xx/gef_ppc9a.c
index 44bbbc535e1d..884da08806ce 100644
--- a/arch/powerpc/platforms/86xx/gef_ppc9a.c
+++ b/arch/powerpc/platforms/86xx/gef_ppc9a.c
@@ -180,7 +180,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_NEC, PCI_DEVICE_ID_NEC_USB,
*
* This function is called to determine whether the BSP is compatible with the
* supplied device-tree, which is assumed to be the correct one for the actual
- * board. It is expected thati, in the future, a kernel may support multiple
+ * board. It is expected that, in the future, a kernel may support multiple
* boards.
*/
static int __init gef_ppc9a_probe(void)
diff --git a/arch/powerpc/platforms/86xx/gef_sbc310.c b/arch/powerpc/platforms/86xx/gef_sbc310.c
index 46d6d3d4957a..baaf1ab07016 100644
--- a/arch/powerpc/platforms/86xx/gef_sbc310.c
+++ b/arch/powerpc/platforms/86xx/gef_sbc310.c
@@ -167,7 +167,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_NEC, PCI_DEVICE_ID_NEC_USB,
*
* This function is called to determine whether the BSP is compatible with the
* supplied device-tree, which is assumed to be the correct one for the actual
- * board. It is expected thati, in the future, a kernel may support multiple
+ * board. It is expected that, in the future, a kernel may support multiple
* boards.
*/
static int __init gef_sbc310_probe(void)
diff --git a/arch/powerpc/platforms/86xx/gef_sbc610.c b/arch/powerpc/platforms/86xx/gef_sbc610.c
index acf2c6c3c1eb..120caf6af71d 100644
--- a/arch/powerpc/platforms/86xx/gef_sbc610.c
+++ b/arch/powerpc/platforms/86xx/gef_sbc610.c
@@ -157,7 +157,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_NEC, PCI_DEVICE_ID_NEC_USB,
*
* This function is called to determine whether the BSP is compatible with the
* supplied device-tree, which is assumed to be the correct one for the actual
- * board. It is expected thati, in the future, a kernel may support multiple
+ * board. It is expected that, in the future, a kernel may support multiple
* boards.
*/
static int __init gef_sbc610_probe(void)
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index e2e1fec91c6e..660309559bd8 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -173,11 +173,11 @@ config POWER9_CPU

config E5500_CPU
bool "Freescale e5500"
- depends on E500
+ depends on PPC64 && E500

config E6500_CPU
bool "Freescale e6500"
- depends on E500
+ depends on PPC64 && E500

config 860_CPU
bool "8xx family"
diff --git a/arch/powerpc/platforms/book3s/vas-api.c b/arch/powerpc/platforms/book3s/vas-api.c
index f9a1615b74da..c0799fb26b6d 100644
--- a/arch/powerpc/platforms/book3s/vas-api.c
+++ b/arch/powerpc/platforms/book3s/vas-api.c
@@ -30,7 +30,7 @@
*
* where "vas_copy" and "vas_paste" are defined in copy-paste.h.
* copy/paste returns to the user space directly. So refer NX hardware
- * documententation for exact copy/paste usage and completion / error
+ * documentation for exact copy/paste usage and completion / error
* conditions.
*/

diff --git a/arch/powerpc/platforms/cell/axon_msi.c b/arch/powerpc/platforms/cell/axon_msi.c
index 354a58c1e6f2..dba34f4d5162 100644
--- a/arch/powerpc/platforms/cell/axon_msi.c
+++ b/arch/powerpc/platforms/cell/axon_msi.c
@@ -223,6 +223,7 @@ static int setup_msi_msg_address(struct pci_dev *dev, struct msi_msg *msg)
if (!prop) {
dev_dbg(&dev->dev,
"axon_msi: no msi-address-(32|64) properties found\n");
+ of_node_put(dn);
return -ENOENT;
}

diff --git a/arch/powerpc/platforms/cell/cbe_regs.c b/arch/powerpc/platforms/cell/cbe_regs.c
index 1c4c53bec66c..03512a41bd7e 100644
--- a/arch/powerpc/platforms/cell/cbe_regs.c
+++ b/arch/powerpc/platforms/cell/cbe_regs.c
@@ -23,7 +23,7 @@
* Current implementation uses "cpu" nodes. We build our own mapping
* array of cpu numbers to cpu nodes locally for now to allow interrupt
* time code to have a fast path rather than call of_get_cpu_node(). If
- * we implement cpu hotplug, we'll have to install an appropriate norifier
+ * we implement cpu hotplug, we'll have to install an appropriate notifier
* in order to release references to the cpu going away
*/
static struct cbe_regs_map
diff --git a/arch/powerpc/platforms/cell/iommu.c b/arch/powerpc/platforms/cell/iommu.c
index 25e726bf0172..3f141cf5e580 100644
--- a/arch/powerpc/platforms/cell/iommu.c
+++ b/arch/powerpc/platforms/cell/iommu.c
@@ -582,7 +582,7 @@ static int cell_of_bus_notify(struct notifier_block *nb, unsigned long action,
{
struct device *dev = data;

- /* We are only intereted in device addition */
+ /* We are only interested in device addition */
if (action != BUS_NOTIFY_ADD_DEVICE)
return 0;

diff --git a/arch/powerpc/platforms/cell/spider-pci.c b/arch/powerpc/platforms/cell/spider-pci.c
index a1c293f42a1f..3a2ea8376e32 100644
--- a/arch/powerpc/platforms/cell/spider-pci.c
+++ b/arch/powerpc/platforms/cell/spider-pci.c
@@ -81,7 +81,7 @@ static int __init spiderpci_pci_setup_chip(struct pci_controller *phb,
/*
* On CellBlade, we can't know that which XDR memory is used by
* kmalloc() to allocate dummy_page_va.
- * In order to imporve the performance, the XDR which is used to
+ * In order to improve the performance, the XDR which is used to
* allocate dummy_page_va is the nearest the spider-pci.
* We have to select the CBE which is the nearest the spider-pci
* to allocate memory from the best XDR, but I don't know that
diff --git a/arch/powerpc/platforms/cell/spu_manage.c b/arch/powerpc/platforms/cell/spu_manage.c
index ddf8742f09a3..080ed2d2c682 100644
--- a/arch/powerpc/platforms/cell/spu_manage.c
+++ b/arch/powerpc/platforms/cell/spu_manage.c
@@ -457,7 +457,7 @@ static void __init init_affinity_node(int cbe)

/*
* Walk through each phandle in vicinity property of the spu
- * (tipically two vicinity phandles per spe node)
+ * (typically two vicinity phandles per spe node)
*/
for (i = 0; i < (lenp / sizeof(phandle)); i++) {
if (vic_handles[i] == avoid_ph)
diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c
index 4c702192412f..96f5e2e43018 100644
--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -660,6 +660,7 @@ spufs_init_isolated_loader(void)
return;

loader = of_get_property(dn, "loader", &size);
+ of_node_put(dn);
if (!loader)
return;

diff --git a/arch/powerpc/platforms/powermac/low_i2c.c b/arch/powerpc/platforms/powermac/low_i2c.c
index df89d916236d..9aded0188ce8 100644
--- a/arch/powerpc/platforms/powermac/low_i2c.c
+++ b/arch/powerpc/platforms/powermac/low_i2c.c
@@ -1472,7 +1472,7 @@ int __init pmac_i2c_init(void)
smu_i2c_probe();
#endif

- /* Now add plaform functions for some known devices */
+ /* Now add platform functions for some known devices */
pmac_i2c_devscan(pmac_i2c_dev_create);

return 0;
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 89e22c460ebf..33f7b959c810 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -390,7 +390,7 @@ static struct eeh_dev *pnv_eeh_probe(struct pci_dev *pdev)
* should be blocked until PE reset. MMIO access is dropped
* by hardware certainly. In order to drop PCI config requests,
* one more flag (EEH_PE_CFG_RESTRICTED) is introduced, which
- * will be checked in the backend for PE state retrival. If
+ * will be checked in the backend for PE state retrieval. If
* the PE becomes frozen for the first time and the flag has
* been set for the PE, we will set EEH_PE_CFG_BLOCKED for
* that PE to block its config space.
@@ -981,7 +981,7 @@ static int pnv_eeh_do_af_flr(struct pci_dn *pdn, int option)
case EEH_RESET_FUNDAMENTAL:
/*
* Wait for Transaction Pending bit to clear. A word-aligned
- * test is used, so we use the conrol offset rather than status
+ * test is used, so we use the control offset rather than status
* and shift the test bit to match.
*/
pnv_eeh_wait_for_pending(pdn, "AF",
@@ -1048,7 +1048,7 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
* frozen state during PE reset. However, the good idea here from
* benh is to keep frozen state before we get PE reset done completely
* (until BAR restore). With the frozen state, HW drops illegal IO
- * or MMIO access, which can incur recrusive frozen PE during PE
+ * or MMIO access, which can incur recursive frozen PE during PE
* reset. The side effect is that EEH core has to clear the frozen
* state explicitly after BAR restore.
*/
@@ -1095,8 +1095,8 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
* bus is behind a hotplug slot and it will use the slot provided
* reset methods to prevent spurious hotplug events during the reset.
*
- * Fundemental resets need to be handled internally to EEH since the
- * PCI core doesn't really have a concept of a fundemental reset,
+ * Fundamental resets need to be handled internally to EEH since the
+ * PCI core doesn't really have a concept of a fundamental reset,
* mainly because there's no standard way to generate one. Only a
* few devices require an FRESET so it should be fine.
*/
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index a6677a111aca..6f94b808dd39 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -112,7 +112,7 @@ static int __init pnv_save_sprs_for_deep_states(void)
if (rc != 0)
return rc;

- /* Only p8 needs to set extra HID regiters */
+ /* Only p8 needs to set extra HID registers */
if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
uint64_t hid1_val = mfspr(SPRN_HID1);
uint64_t hid4_val = mfspr(SPRN_HID4);
@@ -1204,7 +1204,7 @@ static void __init pnv_arch300_idle_init(void)
* The idle code does not deal with TB loss occurring
* in a shallower state than SPR loss, so force it to
* behave like SPRs are lost if TB is lost. POWER9 would
- * never encouter this, but a POWER8 core would if it
+ * never encounter this, but a POWER8 core would if it
* implemented the stop instruction. So this is for forward
* compatibility.
*/
diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
index 28b009b46464..27c936075031 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -289,7 +289,7 @@ int pnv_ocxl_get_pasid_count(struct pci_dev *dev, int *count)
* be used by a function depends on how many functions exist
* on the device. The NPU needs to be configured to know how
* many bits are available to PASIDs and how many are to be
- * used by the function BDF indentifier.
+ * used by the function BDF identifier.
*
* We only support one AFU-carrying function for now.
*/
diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
index 9d74d3950a52..c1bcd2d4826e 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.c
+++ b/arch/powerpc/platforms/powernv/opal-fadump.c
@@ -206,7 +206,7 @@ static u64 opal_fadump_init_mem_struct(struct fw_dump *fadump_conf)
opal_fdm->region_cnt = cpu_to_be16(reg_cnt);

/*
- * Kernel metadata is passed to f/w and retrieved in capture kerenl.
+ * Kernel metadata is passed to f/w and retrieved in capture kernel.
* So, use it to save fadump header address instead of calculating it.
*/
opal_fdm->fadumphdr_addr = cpu_to_be64(be64_to_cpu(opal_fdm->rgn[0].dest) +
diff --git a/arch/powerpc/platforms/powernv/opal-lpc.c b/arch/powerpc/platforms/powernv/opal-lpc.c
index 5390c888db16..d129d6d45a50 100644
--- a/arch/powerpc/platforms/powernv/opal-lpc.c
+++ b/arch/powerpc/platforms/powernv/opal-lpc.c
@@ -197,7 +197,7 @@ static ssize_t lpc_debug_read(struct file *filp, char __user *ubuf,

/*
* Select access size based on count and alignment and
- * access type. IO and MEM only support byte acceses,
+ * access type. IO and MEM only support byte accesses,
* FW supports all 3.
*/
len = 1;
diff --git a/arch/powerpc/platforms/powernv/opal-memory-errors.c b/arch/powerpc/platforms/powernv/opal-memory-errors.c
index 1e8e17df9ce8..a1754a28265d 100644
--- a/arch/powerpc/platforms/powernv/opal-memory-errors.c
+++ b/arch/powerpc/platforms/powernv/opal-memory-errors.c
@@ -82,7 +82,7 @@ static DECLARE_WORK(mem_error_work, mem_error_handler);

/*
* opal_memory_err_event - notifier handler that queues up the opal message
- * to be preocessed later.
+ * to be processed later.
*/
static int opal_memory_err_event(struct notifier_block *nb,
unsigned long msg_type, void *msg)
diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c
index 04155aaaadb1..fe3d111b881c 100644
--- a/arch/powerpc/platforms/powernv/pci-sriov.c
+++ b/arch/powerpc/platforms/powernv/pci-sriov.c
@@ -699,7 +699,7 @@ static int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
return -ENOSPC;
}

- /* allocate a contigious block of PEs for our VFs */
+ /* allocate a contiguous block of PEs for our VFs */
base_pe = pnv_ioda_alloc_pe(phb, num_vfs);
if (!base_pe) {
pci_err(pdev, "Unable to allocate PEs for %d VFs\n", num_vfs);
diff --git a/arch/powerpc/platforms/powernv/rng.c b/arch/powerpc/platforms/powernv/rng.c
index 3805ad13b8f3..d19305292e1e 100644
--- a/arch/powerpc/platforms/powernv/rng.c
+++ b/arch/powerpc/platforms/powernv/rng.c
@@ -29,15 +29,6 @@ struct powernv_rng {

static DEFINE_PER_CPU(struct powernv_rng *, powernv_rng);

-int powernv_hwrng_present(void)
-{
- struct powernv_rng *rng;
-
- rng = get_cpu_var(powernv_rng);
- put_cpu_var(rng);
- return rng != NULL;
-}
-
static unsigned long rng_whiten(struct powernv_rng *rng, unsigned long val)
{
unsigned long parity;
@@ -58,17 +49,6 @@ static unsigned long rng_whiten(struct powernv_rng *rng, unsigned long val)
return val;
}

-int powernv_get_random_real_mode(unsigned long *v)
-{
- struct powernv_rng *rng;
-
- rng = raw_cpu_read(powernv_rng);
-
- *v = rng_whiten(rng, __raw_rm_readq(rng->regs_real));
-
- return 1;
-}
-
static int powernv_get_random_darn(unsigned long *v)
{
unsigned long val;
@@ -105,12 +85,14 @@ int powernv_get_random_long(unsigned long *v)
{
struct powernv_rng *rng;

- rng = get_cpu_var(powernv_rng);
-
- *v = rng_whiten(rng, in_be64(rng->regs));
-
- put_cpu_var(rng);
-
+ if (mfmsr() & MSR_DR) {
+ rng = get_cpu_var(powernv_rng);
+ *v = rng_whiten(rng, in_be64(rng->regs));
+ put_cpu_var(rng);
+ } else {
+ rng = raw_cpu_read(powernv_rng);
+ *v = rng_whiten(rng, __raw_rm_readq(rng->regs_real));
+ }
return 1;
}
EXPORT_SYMBOL_GPL(powernv_get_random_long);
diff --git a/arch/powerpc/platforms/ps3/mm.c b/arch/powerpc/platforms/ps3/mm.c
index 5ce924611b94..63ef61ed7597 100644
--- a/arch/powerpc/platforms/ps3/mm.c
+++ b/arch/powerpc/platforms/ps3/mm.c
@@ -364,7 +364,7 @@ static void __maybe_unused _dma_dump_region(const struct ps3_dma_region *r,
* @bus_addr: Starting ioc bus address of the area to map.
* @len: Length in bytes of the area to map.
* @link: A struct list_head used with struct ps3_dma_region.chunk_list, the
- * list of all chuncks owned by the region.
+ * list of all chunks owned by the region.
*
* This implementation uses a very simple dma page manager
* based on the dma_chunk structure. This scheme assumes
diff --git a/arch/powerpc/platforms/ps3/system-bus.c b/arch/powerpc/platforms/ps3/system-bus.c
index b637bf292047..2502e9b17df4 100644
--- a/arch/powerpc/platforms/ps3/system-bus.c
+++ b/arch/powerpc/platforms/ps3/system-bus.c
@@ -601,7 +601,7 @@ static dma_addr_t ps3_ioc0_map_page(struct device *_dev, struct page *page,
iopte_flag |= CBE_IOPTE_PP_W | CBE_IOPTE_SO_RW;
break;
default:
- /* not happned */
+ /* not happened */
BUG();
}
result = ps3_dma_map(dev->d_region, (unsigned long)ptr, size,
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 09fafcf2d3a0..f0b2bca750da 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -510,7 +510,7 @@ static int pseries_eeh_set_option(struct eeh_pe *pe, int option)
int ret = 0;

/*
- * When we're enabling or disabling EEH functioality on
+ * When we're enabling or disabling EEH functionality on
* the particular PE, the PE config address is possibly
* unavailable. Therefore, we have to figure it out from
* the FDT node.
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 4d991cf840d9..a90024697c11 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -701,6 +701,33 @@ struct iommu_table_ops iommu_table_lpar_multi_ops = {
.get = tce_get_pSeriesLP
};

+/*
+ * Find nearest ibm,dma-window (default DMA window) or direct DMA window or
+ * dynamic 64bit DMA window, walking up the device tree.
+ */
+static struct device_node *pci_dma_find(struct device_node *dn,
+ const __be32 **dma_window)
+{
+ const __be32 *dw = NULL;
+
+ for ( ; dn && PCI_DN(dn); dn = dn->parent) {
+ dw = of_get_property(dn, "ibm,dma-window", NULL);
+ if (dw) {
+ if (dma_window)
+ *dma_window = dw;
+ return dn;
+ }
+ dw = of_get_property(dn, DIRECT64_PROPNAME, NULL);
+ if (dw)
+ return dn;
+ dw = of_get_property(dn, DMA64_PROPNAME, NULL);
+ if (dw)
+ return dn;
+ }
+
+ return NULL;
+}
+
static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
{
struct iommu_table *tbl;
@@ -713,20 +740,10 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
pr_debug("pci_dma_bus_setup_pSeriesLP: setting up bus %pOF\n",
dn);

- /*
- * Find nearest ibm,dma-window (default DMA window), walking up the
- * device tree
- */
- for (pdn = dn; pdn != NULL; pdn = pdn->parent) {
- dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
- if (dma_window != NULL)
- break;
- }
+ pdn = pci_dma_find(dn, &dma_window);

- if (dma_window == NULL) {
+ if (dma_window == NULL)
pr_debug(" no ibm,dma-window property !\n");
- return;
- }

ppci = PCI_DN(pdn);

@@ -736,11 +753,13 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
if (!ppci->table_group) {
ppci->table_group = iommu_pseries_alloc_group(ppci->phb->node);
tbl = ppci->table_group->tables[0];
- iommu_table_setparms_lpar(ppci->phb, pdn, tbl,
- ppci->table_group, dma_window);
+ if (dma_window) {
+ iommu_table_setparms_lpar(ppci->phb, pdn, tbl,
+ ppci->table_group, dma_window);

- if (!iommu_init_table(tbl, ppci->phb->node, 0, 0))
- panic("Failed to initialize iommu table");
+ if (!iommu_init_table(tbl, ppci->phb->node, 0, 0))
+ panic("Failed to initialize iommu table");
+ }
iommu_register_group(ppci->table_group,
pci_domain_nr(bus), 0);
pr_debug(" created table: %p\n", ppci->table_group);
@@ -1233,7 +1252,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
bool default_win_removed = false, direct_mapping = false;
bool pmem_present;
struct pci_dn *pci = PCI_DN(pdn);
- struct iommu_table *tbl = pci->table_group->tables[0];
+ struct property *default_win = NULL;

dn = of_find_node_by_type(NULL, "ibm,pmemory");
pmem_present = dn != NULL;
@@ -1290,11 +1309,10 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
* for extensions presence.
*/
if (query.windows_available == 0) {
- struct property *default_win;
int reset_win_ext;

/* DDW + IOMMU on single window may fail if there is any allocation */
- if (iommu_table_in_use(tbl)) {
+ if (iommu_table_in_use(pci->table_group->tables[0])) {
dev_warn(&dev->dev, "current IOMMU table in use, can't be replaced.\n");
goto out_failed;
}
@@ -1430,16 +1448,18 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)

pci->table_group->tables[1] = newtbl;

- /* Keep default DMA window stuct if removed */
- if (default_win_removed) {
- tbl->it_size = 0;
- vfree(tbl->it_map);
- tbl->it_map = NULL;
- }
-
set_iommu_table_base(&dev->dev, newtbl);
}

+ if (default_win_removed) {
+ iommu_tce_table_put(pci->table_group->tables[0]);
+ pci->table_group->tables[0] = NULL;
+
+ /* default_win is valid here because default_win_removed == true */
+ of_remove_property(pdn, default_win);
+ dev_info(&dev->dev, "Removed default DMA window for %pOF\n", pdn);
+ }
+
spin_lock(&dma_win_list_lock);
list_add(&window->list, &dma_win_list);
spin_unlock(&dma_win_list_lock);
@@ -1504,13 +1524,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
dn = pci_device_to_OF_node(dev);
pr_debug(" node is %pOF\n", dn);

- for (pdn = dn; pdn && PCI_DN(pdn) && !PCI_DN(pdn)->table_group;
- pdn = pdn->parent) {
- dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
- if (dma_window)
- break;
- }
-
+ pdn = pci_dma_find(dn, &dma_window);
if (!pdn || !PCI_DN(pdn)) {
printk(KERN_WARNING "pci_dma_dev_setup_pSeriesLP: "
"no DMA window found for pci dev=%s dn=%pOF\n",
@@ -1541,7 +1555,6 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask)
{
struct device_node *dn = pci_device_to_OF_node(pdev), *pdn;
- const __be32 *dma_window = NULL;

/* only attempt to use a new window if 64-bit DMA is requested */
if (dma_mask < DMA_BIT_MASK(64))
@@ -1555,13 +1568,7 @@ static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask)
* search upwards in the tree until we either hit a dma-window
* property, OR find a parent with a table already allocated.
*/
- for (pdn = dn; pdn && PCI_DN(pdn) && !PCI_DN(pdn)->table_group;
- pdn = pdn->parent) {
- dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
- if (dma_window)
- break;
- }
-
+ pdn = pci_dma_find(dn, NULL);
if (pdn && PCI_DN(pdn))
return enable_ddw(pdev, pdn);

diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index f27735f623ba..27bed0dd866e 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -658,7 +658,7 @@ static resource_size_t pseries_get_iov_fw_value(struct pci_dev *dev, int resno,
*/
num_res = of_read_number(&indexes[NUM_RES_PROPERTY], 1);
if (resno >= num_res)
- return 0; /* or an errror */
+ return 0; /* or an error */

i = START_OF_ENTRIES + NEXT_ENTRY * resno;
switch (value) {
@@ -762,7 +762,7 @@ static void pseries_pci_fixup_iov_resources(struct pci_dev *pdev)

if (!pdev->is_physfn)
return;
- /*Firmware must support open sriov otherwise dont configure*/
+ /*Firmware must support open sriov otherwise don't configure*/
indexes = of_get_property(dn, "ibm,open-sriov-vf-bar-info", NULL);
if (indexes)
of_pci_parse_iov_addrs(pdev, indexes);
diff --git a/arch/powerpc/platforms/pseries/vas-sysfs.c b/arch/powerpc/platforms/pseries/vas-sysfs.c
index ec65586cbeb3..241c84374045 100644
--- a/arch/powerpc/platforms/pseries/vas-sysfs.c
+++ b/arch/powerpc/platforms/pseries/vas-sysfs.c
@@ -76,7 +76,7 @@ struct vas_sysfs_entry {
* Create sysfs interface:
* /sys/devices/vas/vas0/gzip/default_capabilities
* This directory contains the following VAS GZIP capabilities
- * for the defaule credit type.
+ * for the default credit type.
* /sys/devices/vas/vas0/gzip/default_capabilities/nr_total_credits
* Total number of default credits assigned to the LPAR which
* can be changed with DLPAR operation.
diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
index ec643bbdb67f..500a1fc4a1d7 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -801,7 +801,7 @@ int vas_reconfig_capabilties(u8 type, int new_nr_creds)
atomic_set(&caps->nr_total_credits, new_nr_creds);
/*
* The total number of available credits may be decreased or
- * inceased with DLPAR operation. Means some windows have to be
+ * increased with DLPAR operation. Means some windows have to be
* closed / reopened. Hold the vas_pseries_mutex so that the
* the user space can not open new windows.
*/
diff --git a/arch/powerpc/sysdev/fsl_lbc.c b/arch/powerpc/sysdev/fsl_lbc.c
index 1985e067e952..18acfb4e82af 100644
--- a/arch/powerpc/sysdev/fsl_lbc.c
+++ b/arch/powerpc/sysdev/fsl_lbc.c
@@ -37,7 +37,7 @@ EXPORT_SYMBOL(fsl_lbc_ctrl_dev);
*
* This function converts a base address of lbc into the right format for the
* BR register. If the SOC has eLBC then it returns 32bit physical address
- * else it convers a 34bit local bus physical address to correct format of
+ * else it converts a 34bit local bus physical address to correct format of
* 32bit address for BR register (Example: MPC8641).
*/
u32 fsl_lbc_addr(phys_addr_t addr_base)
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index a97ce602394e..3c430a6a6a4e 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -218,7 +218,7 @@ static void setup_pci_atmu(struct pci_controller *hose)
* windows have implemented the default target value as 0xf
* for CCSR space.In all Freescale legacy devices the target
* of 0xf is reserved for local memory space. 9132 Rev1.0
- * now has local mempry space mapped to target 0x0 instead of
+ * now has local memory space mapped to target 0x0 instead of
* 0xf. Hence adding a workaround to remove the target 0xf
* defined for memory space from Inbound window attributes.
*/
@@ -520,6 +520,7 @@ int fsl_add_bridge(struct platform_device *pdev, int is_primary)
struct resource rsrc;
const int *bus_range;
u8 hdr_type, progif;
+ u32 class_code;
struct device_node *dev;
struct ccsr_pci __iomem *pci;
u16 temp;
@@ -593,6 +594,13 @@ int fsl_add_bridge(struct platform_device *pdev, int is_primary)
PPC_INDIRECT_TYPE_SURPRESS_PRIMARY_BUS;
if (fsl_pcie_check_link(hose))
hose->indirect_type |= PPC_INDIRECT_TYPE_NO_PCIE_LINK;
+ /* Fix Class Code to PCI_CLASS_BRIDGE_PCI_NORMAL for pre-3.0 controller */
+ if (in_be32(&pci->block_rev1) < PCIE_IP_REV_3_0) {
+ early_read_config_dword(hose, 0, 0, PCIE_FSL_CSR_CLASSCODE, &class_code);
+ class_code &= 0xff;
+ class_code |= PCI_CLASS_BRIDGE_PCI_NORMAL << 8;
+ early_write_config_dword(hose, 0, 0, PCIE_FSL_CSR_CLASSCODE, class_code);
+ }
} else {
/*
* Set PBFR(PCI Bus Function Register)[10] = 1 to
diff --git a/arch/powerpc/sysdev/fsl_pci.h b/arch/powerpc/sysdev/fsl_pci.h
index cdbde2e0c96e..093a875d7d1e 100644
--- a/arch/powerpc/sysdev/fsl_pci.h
+++ b/arch/powerpc/sysdev/fsl_pci.h
@@ -18,6 +18,7 @@ struct platform_device;

#define PCIE_LTSSM 0x0404 /* PCIE Link Training and Status */
#define PCIE_LTSSM_L0 0x16 /* L0 state */
+#define PCIE_FSL_CSR_CLASSCODE 0x474 /* FSL GPEX CSR */
#define PCIE_IP_REV_2_2 0x02080202 /* PCIE IP block version Rev2.2 */
#define PCIE_IP_REV_3_0 0x02080300 /* PCIE IP block version Rev3.0 */
#define PIWAR_EN 0x80000000 /* Enable */
diff --git a/arch/powerpc/sysdev/ge/ge_pic.c b/arch/powerpc/sysdev/ge/ge_pic.c
index 02553a8ce191..413b375c4d28 100644
--- a/arch/powerpc/sysdev/ge/ge_pic.c
+++ b/arch/powerpc/sysdev/ge/ge_pic.c
@@ -150,7 +150,7 @@ static struct irq_chip gef_pic_chip = {
};

-/* When an interrupt is being configured, this call allows some flexibilty
+/* When an interrupt is being configured, this call allows some flexibility
* in deciding which irq_chip structure is used
*/
static int gef_pic_host_map(struct irq_domain *h, unsigned int virq,
diff --git a/arch/powerpc/sysdev/mpic_msgr.c b/arch/powerpc/sysdev/mpic_msgr.c
index 36ec0bdd8b63..a25413826b63 100644
--- a/arch/powerpc/sysdev/mpic_msgr.c
+++ b/arch/powerpc/sysdev/mpic_msgr.c
@@ -99,7 +99,7 @@ void mpic_msgr_disable(struct mpic_msgr *msgr)
EXPORT_SYMBOL_GPL(mpic_msgr_disable);

/* The following three functions are used to compute the order and number of
- * the message register blocks. They are clearly very inefficent. However,
+ * the message register blocks. They are clearly very inefficient. However,
* they are called *only* a few times during device initialization.
*/
static unsigned int mpic_msgr_number_of_blocks(void)
diff --git a/arch/powerpc/sysdev/mpic_msi.c b/arch/powerpc/sysdev/mpic_msi.c
index f412d6ad0b66..9936c014ac7d 100644
--- a/arch/powerpc/sysdev/mpic_msi.c
+++ b/arch/powerpc/sysdev/mpic_msi.c
@@ -37,7 +37,7 @@ static int __init mpic_msi_reserve_u3_hwirqs(struct mpic *mpic)
/* Reserve source numbers we know are reserved in the HW.
*
* This is a bit of a mix of U3 and U4 reserves but that's going
- * to work fine, we have plenty enugh numbers left so let's just
+ * to work fine, we have plenty enough numbers left so let's just
* mark anything we don't like reserved.
*/
for (i = 0; i < 8; i++)
diff --git a/arch/powerpc/sysdev/mpic_timer.c b/arch/powerpc/sysdev/mpic_timer.c
index 444e9ce42d0a..b2f0a73e8f93 100644
--- a/arch/powerpc/sysdev/mpic_timer.c
+++ b/arch/powerpc/sysdev/mpic_timer.c
@@ -255,7 +255,7 @@ EXPORT_SYMBOL(mpic_start_timer);

/**
* mpic_stop_timer - stop hardware timer
- * @handle: the timer to be stoped
+ * @handle: the timer to be stopped
*
* The timer periodically generates an interrupt. Unless user stops the timer.
*/
diff --git a/arch/powerpc/sysdev/mpic_u3msi.c b/arch/powerpc/sysdev/mpic_u3msi.c
index 3f4841dfefb5..73d129594078 100644
--- a/arch/powerpc/sysdev/mpic_u3msi.c
+++ b/arch/powerpc/sysdev/mpic_u3msi.c
@@ -78,7 +78,7 @@ static u64 find_u4_magic_addr(struct pci_dev *pdev, unsigned int hwirq)

/* U4 PCIe MSIs need to write to the special register in
* the bridge that generates interrupts. There should be
- * theorically a register at 0xf8005000 where you just write
+ * theoretically a register at 0xf8005000 where you just write
* the MSI number and that triggers the right interrupt, but
* unfortunately, this is busted in HW, the bridge endian swaps
* the value and hits the wrong nibble in the register.
diff --git a/arch/powerpc/sysdev/xive/native.c b/arch/powerpc/sysdev/xive/native.c
index f940428ad13f..45f72fc715fc 100644
--- a/arch/powerpc/sysdev/xive/native.c
+++ b/arch/powerpc/sysdev/xive/native.c
@@ -617,7 +617,7 @@ bool __init xive_native_init(void)

xive_tima_os = r.start;

- /* Grab size of provisionning pages */
+ /* Grab size of provisioning pages */
xive_parse_provisioning(np);

/* Switch the XIVE to exploitation mode */
diff --git a/arch/powerpc/sysdev/xive/spapr.c b/arch/powerpc/sysdev/xive/spapr.c
index b0d36e430dbc..013446f5cdbb 100644
--- a/arch/powerpc/sysdev/xive/spapr.c
+++ b/arch/powerpc/sysdev/xive/spapr.c
@@ -716,6 +716,7 @@ static bool __init xive_get_max_prio(u8 *max_prio)
}

reg = of_get_property(rootdn, "ibm,plat-res-int-priorities", &len);
+ of_node_put(rootdn);
if (!reg) {
pr_err("Failed to read 'ibm,plat-res-int-priorities' property\n");
return false;
diff --git a/arch/powerpc/xmon/ppc-opc.c b/arch/powerpc/xmon/ppc-opc.c
index dfb80810b16c..0774d711453e 100644
--- a/arch/powerpc/xmon/ppc-opc.c
+++ b/arch/powerpc/xmon/ppc-opc.c
@@ -408,7 +408,7 @@ const struct powerpc_operand powerpc_operands[] =
#define FXM4 FXM + 1
{ 0xff, 12, insert_fxm, extract_fxm,
PPC_OPERAND_OPTIONAL | PPC_OPERAND_OPTIONAL_VALUE},
- /* If the FXM4 operand is ommitted, use the sentinel value -1. */
+ /* If the FXM4 operand is omitted, use the sentinel value -1. */
{ -1, -1, NULL, NULL, 0},

/* The IMM20 field in an LI instruction. */
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index fd72753e8ad5..27da7d5c2024 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2024,7 +2024,7 @@ static void dump_206_sprs(void)
if (!cpu_has_feature(CPU_FTR_ARCH_206))
return;

- /* Actually some of these pre-date 2.06, but whatevs */
+ /* Actually some of these pre-date 2.06, but whatever */

printf("srr0 = %.16lx srr1 = %.16lx dsisr = %.8lx\n",
mfspr(SPRN_SRR0), mfspr(SPRN_SRR1), mfspr(SPRN_DSISR));
diff --git a/arch/riscv/boot/dts/starfive/jh7100.dtsi b/arch/riscv/boot/dts/starfive/jh7100.dtsi
index 69f22f9aad9d..f48e232a72a7 100644
--- a/arch/riscv/boot/dts/starfive/jh7100.dtsi
+++ b/arch/riscv/boot/dts/starfive/jh7100.dtsi
@@ -118,7 +118,7 @@ plic: interrupt-controller@c000000 {
interrupt-controller;
#address-cells = <0>;
#interrupt-cells = <1>;
- riscv,ndev = <127>;
+ riscv,ndev = <133>;
};

clkgen: clock-controller@11800000 {
diff --git a/arch/riscv/include/asm/cpu_ops.h b/arch/riscv/include/asm/cpu_ops.h
index 134590f1b843..aa128466c4d4 100644
--- a/arch/riscv/include/asm/cpu_ops.h
+++ b/arch/riscv/include/asm/cpu_ops.h
@@ -38,6 +38,7 @@ struct cpu_operations {
#endif
};

+extern const struct cpu_operations cpu_ops_spinwait;
extern const struct cpu_operations *cpu_ops[NR_CPUS];
void __init cpu_set_ops(int cpu);

diff --git a/arch/riscv/kernel/cpu_ops.c b/arch/riscv/kernel/cpu_ops.c
index 170d07e57721..f92c0e6eddb1 100644
--- a/arch/riscv/kernel/cpu_ops.c
+++ b/arch/riscv/kernel/cpu_ops.c
@@ -15,9 +15,7 @@
const struct cpu_operations *cpu_ops[NR_CPUS] __ro_after_init;

extern const struct cpu_operations cpu_ops_sbi;
-#ifdef CONFIG_RISCV_BOOT_SPINWAIT
-extern const struct cpu_operations cpu_ops_spinwait;
-#else
+#ifndef CONFIG_RISCV_BOOT_SPINWAIT
const struct cpu_operations cpu_ops_spinwait = {
.name = "",
.cpu_prepare = NULL,
diff --git a/arch/riscv/kernel/cpu_ops_spinwait.c b/arch/riscv/kernel/cpu_ops_spinwait.c
index 346847f6c41c..d98d19226b5f 100644
--- a/arch/riscv/kernel/cpu_ops_spinwait.c
+++ b/arch/riscv/kernel/cpu_ops_spinwait.c
@@ -11,6 +11,8 @@
#include <asm/sbi.h>
#include <asm/smp.h>

+#include "head.h"
+
const struct cpu_operations cpu_ops_spinwait;
void *__cpu_spinwait_stack_pointer[NR_CPUS] __section(".data");
void *__cpu_spinwait_task_pointer[NR_CPUS] __section(".data");
@@ -18,7 +20,7 @@ void *__cpu_spinwait_task_pointer[NR_CPUS] __section(".data");
static void cpu_update_secondary_bootdata(unsigned int cpuid,
struct task_struct *tidle)
{
- int hartid = cpuid_to_hartid_map(cpuid);
+ unsigned long hartid = cpuid_to_hartid_map(cpuid);

/*
* The hartid must be less than NR_CPUS to avoid out-of-bound access
@@ -27,7 +29,7 @@ static void cpu_update_secondary_bootdata(unsigned int cpuid,
* spinwait booting is not the recommended approach for any platforms
* booting Linux in S-mode and can be disabled in the future.
*/
- if (hartid == INVALID_HARTID || hartid >= NR_CPUS)
+ if (hartid == INVALID_HARTID || hartid >= (unsigned long) NR_CPUS)
return;

/* Make sure tidle is updated */
diff --git a/arch/riscv/kernel/crash_save_regs.S b/arch/riscv/kernel/crash_save_regs.S
index 7832fb763aba..b2a1908c0463 100644
--- a/arch/riscv/kernel/crash_save_regs.S
+++ b/arch/riscv/kernel/crash_save_regs.S
@@ -44,7 +44,7 @@ SYM_CODE_START(riscv_crash_save_regs)
REG_S t6, PT_T6(a0) /* x31 */

csrr t1, CSR_STATUS
- csrr t2, CSR_EPC
+ auipc t2, 0x0
csrr t3, CSR_TVAL
csrr t4, CSR_CAUSE

diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
index df8e24559035..ee79e6839b86 100644
--- a/arch/riscv/kernel/machine_kexec.c
+++ b/arch/riscv/kernel/machine_kexec.c
@@ -138,19 +138,37 @@ void machine_shutdown(void)
#endif
}

+/* Override the weak function in kernel/panic.c */
+void crash_smp_send_stop(void)
+{
+ static int cpus_stopped;
+
+ /*
+ * This function can be called twice in panic path, but obviously
+ * we execute this only once.
+ */
+ if (cpus_stopped)
+ return;
+
+ smp_send_stop();
+ cpus_stopped = 1;
+}
+
/*
* machine_crash_shutdown - Prepare to kexec after a kernel crash
*
* This function is called by crash_kexec just before machine_kexec
- * below and its goal is similar to machine_shutdown, but in case of
- * a kernel crash. Since we don't handle such cases yet, this function
- * is empty.
+ * and its goal is to shutdown non-crashing cpus and save registers.
*/
void
machine_crash_shutdown(struct pt_regs *regs)
{
+ local_irq_disable();
+
+ /* shutdown non-crashing cpus */
+ crash_smp_send_stop();
+
crash_save_cpu(regs, smp_processor_id());
- machine_shutdown();
pr_info("Starting crashdump kernel...\n");
}

@@ -171,7 +189,7 @@ machine_kexec(struct kimage *image)
struct kimage_arch *internal = &image->arch;
unsigned long jump_addr = (unsigned long) image->start;
unsigned long first_ind_entry = (unsigned long) &image->head;
- unsigned long this_cpu_id = smp_processor_id();
+ unsigned long this_cpu_id = __smp_processor_id();
unsigned long this_hart_id = cpuid_to_hartid_map(this_cpu_id);
unsigned long fdt_addr = internal->fdt_addr;
void *control_code_buffer = page_address(image->control_code_page);
diff --git a/arch/riscv/kernel/probes/uprobes.c b/arch/riscv/kernel/probes/uprobes.c
index 7a057b5f0adc..c976a21cd4bd 100644
--- a/arch/riscv/kernel/probes/uprobes.c
+++ b/arch/riscv/kernel/probes/uprobes.c
@@ -59,8 +59,6 @@ int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)

instruction_pointer_set(regs, utask->xol_vaddr);

- regs->status &= ~SR_SPIE;
-
return 0;
}

@@ -72,8 +70,6 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)

instruction_pointer_set(regs, utask->vaddr + auprobe->insn_size);

- regs->status |= SR_SPIE;
-
return 0;
}

@@ -111,8 +107,6 @@ void arch_uprobe_abort_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
* address.
*/
instruction_pointer_set(regs, utask->vaddr);
-
- regs->status &= ~SR_SPIE;
}

bool arch_uretprobe_is_alive(struct return_instance *ret, enum rp_check ctx,
diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
index 8c475f4da308..ec486e5369d9 100644
--- a/arch/riscv/lib/uaccess.S
+++ b/arch/riscv/lib/uaccess.S
@@ -175,7 +175,7 @@ ENTRY(__asm_copy_from_user)
/* Exception fixup code */
10:
/* Disable access to user memory */
- csrs CSR_STATUS, t6
+ csrc CSR_STATUS, t6
mv a0, t5
ret
ENDPROC(__asm_copy_to_user)
@@ -227,7 +227,7 @@ ENTRY(__clear_user)
/* Exception fixup code */
11:
/* Disable access to user memory */
- csrs CSR_STATUS, t6
+ csrc CSR_STATUS, t6
mv a0, a1
ret
ENDPROC(__clear_user)
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 39e2e1d0e94f..80b4a407b3f1 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -99,6 +99,10 @@ static void __init print_vm_layout(void)
(unsigned long)VMEMMAP_END);
print_mlm("vmalloc", (unsigned long)VMALLOC_START,
(unsigned long)VMALLOC_END);
+#ifdef CONFIG_64BIT
+ print_mlm("modules", (unsigned long)MODULES_VADDR,
+ (unsigned long)MODULES_END);
+#endif
print_mlm("lowmem", (unsigned long)PAGE_OFFSET,
(unsigned long)high_memory);
if (IS_ENABLED(CONFIG_64BIT)) {
diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index 40264f60b0da..f4073106e1f3 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -148,4 +148,6 @@ void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],
unsigned long gaddr, unsigned long vmaddr);
int gmap_mark_unmergeable(void);
void s390_reset_acc(struct mm_struct *mm);
+void s390_unlist_old_asce(struct gmap *gmap);
+int s390_replace_asce(struct gmap *gmap);
#endif /* _ASM_S390_GMAP_H */
diff --git a/arch/s390/include/asm/kexec.h b/arch/s390/include/asm/kexec.h
index 63098df81c9f..d13bd221cd37 100644
--- a/arch/s390/include/asm/kexec.h
+++ b/arch/s390/include/asm/kexec.h
@@ -92,5 +92,8 @@ int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
const Elf_Shdr *relsec,
const Elf_Shdr *symtab);
#define arch_kexec_apply_relocations_add arch_kexec_apply_relocations_add
+
+int arch_kimage_file_post_load_cleanup(struct kimage *image);
+#define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup
#endif
#endif /*_S390_KEXEC_H */
diff --git a/arch/s390/include/asm/unwind.h b/arch/s390/include/asm/unwind.h
index 0bf06f1682d8..02462e7100c1 100644
--- a/arch/s390/include/asm/unwind.h
+++ b/arch/s390/include/asm/unwind.h
@@ -47,7 +47,7 @@ struct unwind_state {
static inline unsigned long unwind_recover_ret_addr(struct unwind_state *state,
unsigned long ip)
{
- ip = ftrace_graph_ret_addr(state->task, &state->graph_idx, ip, NULL);
+ ip = ftrace_graph_ret_addr(state->task, &state->graph_idx, ip, (void *)state->sp);
if (is_kretprobe_trampoline(ip))
ip = kretprobe_find_ret_addr(state->task, (void *)state->sp, &state->kr_cur);
return ip;
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index 28124d0fa1d5..f8ebdd70dd31 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -199,7 +199,7 @@ static int copy_oldmem_user(void __user *dst, unsigned long src, size_t count)
} else {
len = count;
}
- rc = copy_to_user_real(dst, src, count);
+ rc = copy_to_user_real(dst, src, len);
if (rc)
return rc;
}
diff --git a/arch/s390/kernel/machine_kexec_file.c b/arch/s390/kernel/machine_kexec_file.c
index 8f43575a4dd3..fc6d5f58debe 100644
--- a/arch/s390/kernel/machine_kexec_file.c
+++ b/arch/s390/kernel/machine_kexec_file.c
@@ -31,6 +31,7 @@ int s390_verify_sig(const char *kernel, unsigned long kernel_len)
const unsigned long marker_len = sizeof(MODULE_SIG_STRING) - 1;
struct module_signature *ms;
unsigned long sig_len;
+ int ret;

/* Skip signature verification when not secure IPLed. */
if (!ipl_secure_flag)
@@ -65,11 +66,18 @@ int s390_verify_sig(const char *kernel, unsigned long kernel_len)
return -EBADMSG;
}

- return verify_pkcs7_signature(kernel, kernel_len,
- kernel + kernel_len, sig_len,
- VERIFY_USE_PLATFORM_KEYRING,
- VERIFYING_MODULE_SIGNATURE,
- NULL, NULL);
+ ret = verify_pkcs7_signature(kernel, kernel_len,
+ kernel + kernel_len, sig_len,
+ VERIFY_USE_SECONDARY_KEYRING,
+ VERIFYING_MODULE_SIGNATURE,
+ NULL, NULL);
+ if (ret == -ENOKEY && IS_ENABLED(CONFIG_INTEGRITY_PLATFORM_KEYRING))
+ ret = verify_pkcs7_signature(kernel, kernel_len,
+ kernel + kernel_len, sig_len,
+ VERIFY_USE_PLATFORM_KEYRING,
+ VERIFYING_MODULE_SIGNATURE,
+ NULL, NULL);
+ return ret;
}
#endif /* CONFIG_KEXEC_SIG */

diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index 8bd42a20d924..88112065d941 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -528,12 +528,27 @@ static int handle_pv_uvc(struct kvm_vcpu *vcpu)

static int handle_pv_notification(struct kvm_vcpu *vcpu)
{
+ int ret;
+
if (vcpu->arch.sie_block->ipa == 0xb210)
return handle_pv_spx(vcpu);
if (vcpu->arch.sie_block->ipa == 0xb220)
return handle_pv_sclp(vcpu);
if (vcpu->arch.sie_block->ipa == 0xb9a4)
return handle_pv_uvc(vcpu);
+ if (vcpu->arch.sie_block->ipa >> 8 == 0xae) {
+ /*
+ * Besides external call, other SIGP orders also cause a
+ * 108 (pv notify) intercept. In contrast to external call,
+ * these orders need to be emulated and hence the appropriate
+ * place to handle them is in handle_instruction().
+ * So first try kvm_s390_handle_sigp_pei() and if that isn't
+ * successful, go on with handle_instruction().
+ */
+ ret = kvm_s390_handle_sigp_pei(vcpu);
+ if (!ret)
+ return ret;
+ }

return handle_instruction(vcpu);
}
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index cc7c9599f43e..8eee3fc414e5 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -161,10 +161,13 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
atomic_set(&kvm->mm->context.is_protected, 0);
KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x", *rc, *rrc);
WARN_ONCE(cc, "protvirt destroy vm failed rc %x rrc %x", *rc, *rrc);
- /* Inteded memory leak on "impossible" error */
- if (!cc)
+ /* Intended memory leak on "impossible" error */
+ if (!cc) {
kvm_s390_pv_dealloc_vm(kvm);
- return cc ? -EIO : 0;
+ return 0;
+ }
+ s390_replace_asce(kvm->arch.gmap);
+ return -EIO;
}

int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
index 8aaee2892ec3..cb747bf6c798 100644
--- a/arch/s390/kvm/sigp.c
+++ b/arch/s390/kvm/sigp.c
@@ -480,9 +480,9 @@ int kvm_s390_handle_sigp_pei(struct kvm_vcpu *vcpu)
struct kvm_vcpu *dest_vcpu;
u8 order_code = kvm_s390_get_base_disp_rs(vcpu, NULL);

- trace_kvm_s390_handle_sigp_pei(vcpu, order_code, cpu_addr);
-
if (order_code == SIGP_EXTERNAL_CALL) {
+ trace_kvm_s390_handle_sigp_pei(vcpu, order_code, cpu_addr);
+
dest_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, cpu_addr);
BUG_ON(dest_vcpu == NULL);

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index b8ae4a4aa2ba..85cab61d87a9 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2735,3 +2735,89 @@ void s390_reset_acc(struct mm_struct *mm)
mmput(mm);
}
EXPORT_SYMBOL_GPL(s390_reset_acc);
+
+/**
+ * s390_unlist_old_asce - Remove the topmost level of page tables from the
+ * list of page tables of the gmap.
+ * @gmap: the gmap whose table is to be removed
+ *
+ * On s390x, KVM keeps a list of all pages containing the page tables of the
+ * gmap (the CRST list). This list is used at tear down time to free all
+ * pages that are now not needed anymore.
+ *
+ * This function removes the topmost page of the tree (the one pointed to by
+ * the ASCE) from the CRST list.
+ *
+ * This means that it will not be freed when the VM is torn down, and needs
+ * to be handled separately by the caller, unless a leak is actually
+ * intended. Notice that this function will only remove the page from the
+ * list, the page will still be used as a top level page table (and ASCE).
+ */
+void s390_unlist_old_asce(struct gmap *gmap)
+{
+ struct page *old;
+
+ old = virt_to_page(gmap->table);
+ spin_lock(&gmap->guest_table_lock);
+ list_del(&old->lru);
+ /*
+ * Sometimes the topmost page might need to be "removed" multiple
+ * times, for example if the VM is rebooted into secure mode several
+ * times concurrently, or if s390_replace_asce fails after calling
+ * s390_remove_old_asce and is attempted again later. In that case
+ * the old asce has been removed from the list, and therefore it
+ * will not be freed when the VM terminates, but the ASCE is still
+ * in use and still pointed to.
+ * A subsequent call to replace_asce will follow the pointer and try
+ * to remove the same page from the list again.
+ * Therefore it's necessary that the page of the ASCE has valid
+ * pointers, so list_del can work (and do nothing) without
+ * dereferencing stale or invalid pointers.
+ */
+ INIT_LIST_HEAD(&old->lru);
+ spin_unlock(&gmap->guest_table_lock);
+}
+EXPORT_SYMBOL_GPL(s390_unlist_old_asce);
+
+/**
+ * s390_replace_asce - Try to replace the current ASCE of a gmap with a copy
+ * @gmap: the gmap whose ASCE needs to be replaced
+ *
+ * If the allocation of the new top level page table fails, the ASCE is not
+ * replaced.
+ * In any case, the old ASCE is always removed from the gmap CRST list.
+ * Therefore the caller has to make sure to save a pointer to it
+ * beforehand, unless a leak is actually intended.
+ */
+int s390_replace_asce(struct gmap *gmap)
+{
+ unsigned long asce;
+ struct page *page;
+ void *table;
+
+ s390_unlist_old_asce(gmap);
+
+ page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
+ if (!page)
+ return -ENOMEM;
+ table = page_to_virt(page);
+ memcpy(table, gmap->table, 1UL << (CRST_ALLOC_ORDER + PAGE_SHIFT));
+
+ /*
+ * The caller has to deal with the old ASCE, but here we make sure
+ * the new one is properly added to the CRST list, so that
+ * it will be freed when the VM is torn down.
+ */
+ spin_lock(&gmap->guest_table_lock);
+ list_add(&page->lru, &gmap->crst_list);
+ spin_unlock(&gmap->guest_table_lock);
+
+ /* Set new table origin while preserving existing ASCE control bits */
+ asce = (gmap->asce & ~_ASCE_ORIGIN) | __pa(table);
+ WRITE_ONCE(gmap->asce, asce);
+ WRITE_ONCE(gmap->mm->context.gmap_asce, asce);
+ WRITE_ONCE(gmap->table, table);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(s390_replace_asce);
diff --git a/arch/um/drivers/random.c b/arch/um/drivers/random.c
index 433a3f8f2ef3..32b3341fe970 100644
--- a/arch/um/drivers/random.c
+++ b/arch/um/drivers/random.c
@@ -28,7 +28,7 @@
* protects against a module being loaded twice at the same time.
*/
static int random_fd = -1;
-static struct hwrng hwrng = { 0, };
+static struct hwrng hwrng;
static DECLARE_COMPLETION(have_data);

static int rng_dev_read(struct hwrng *rng, void *buf, size_t max, bool block)
diff --git a/arch/um/include/asm/archrandom.h b/arch/um/include/asm/archrandom.h
new file mode 100644
index 000000000000..2f24cb96391d
--- /dev/null
+++ b/arch/um/include/asm/archrandom.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_UM_ARCHRANDOM_H__
+#define __ASM_UM_ARCHRANDOM_H__
+
+#include <linux/types.h>
+
+/* This is from <os.h>, but better not to #include that in a global header here. */
+ssize_t os_getrandom(void *buf, size_t len, unsigned int flags);
+
+static inline bool __must_check arch_get_random_long(unsigned long *v)
+{
+ return os_getrandom(v, sizeof(*v), 0) == sizeof(*v);
+}
+
+static inline bool __must_check arch_get_random_int(unsigned int *v)
+{
+ return os_getrandom(v, sizeof(*v), 0) == sizeof(*v);
+}
+
+static inline bool __must_check arch_get_random_seed_long(unsigned long *v)
+{
+ return false;
+}
+
+static inline bool __must_check arch_get_random_seed_int(unsigned int *v)
+{
+ return false;
+}
+
+#endif
diff --git a/arch/um/include/asm/xor.h b/arch/um/include/asm/xor.h
index 22b39de73c24..647fae200c5d 100644
--- a/arch/um/include/asm/xor.h
+++ b/arch/um/include/asm/xor.h
@@ -18,7 +18,7 @@
#undef XOR_SELECT_TEMPLATE
/* pick an arbitrary one - measuring isn't possible with inf-cpu */
#define XOR_SELECT_TEMPLATE(x) \
- (time_travel_mode == TT_MODE_INFCPU ? TT_CPU_INF_XOR_DEFAULT : x))
+ (time_travel_mode == TT_MODE_INFCPU ? TT_CPU_INF_XOR_DEFAULT : x)
#endif

#endif
diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index fafde1d5416e..0df646c6651e 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
@@ -11,6 +11,12 @@
#include <irq_user.h>
#include <longjmp.h>
#include <mm_id.h>
+/* This is to get size_t */
+#ifndef __UM_HOST__
+#include <linux/types.h>
+#else
+#include <sys/types.h>
+#endif

#define CATCH_EINTR(expr) while ((errno = 0, ((expr) < 0)) && (errno == EINTR))

@@ -243,6 +249,7 @@ extern void stack_protections(unsigned long address);
extern int raw(int fd);
extern void setup_machinename(char *machine_out);
extern void setup_hostinfo(char *buf, int len);
+extern ssize_t os_getrandom(void *buf, size_t len, unsigned int flags);
extern void os_dump_core(void) __attribute__ ((noreturn));
extern void um_early_printk(const char *s, unsigned int n);
extern void os_fix_helper_signals(void);
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index 9838967d0b2f..e0de60e503b9 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -16,6 +16,7 @@
#include <linux/sched/task.h>
#include <linux/kmsg_dump.h>
#include <linux/suspend.h>
+#include <linux/random.h>

#include <asm/processor.h>
#include <asm/cpufeature.h>
@@ -406,6 +407,8 @@ int __init __weak read_initrd(void)

void __init setup_arch(char **cmdline_p)
{
+ u8 rng_seed[32];
+
stack_protections((unsigned long) &init_thread_info);
setup_physmem(uml_physmem, uml_reserved, physmem_size, highmem);
mem_total_pages(physmem_size, iomem_size, highmem);
@@ -416,6 +419,11 @@ void __init setup_arch(char **cmdline_p)
strlcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
*cmdline_p = command_line;
setup_hostinfo(host_info, sizeof host_info);
+
+ if (os_getrandom(rng_seed, sizeof(rng_seed), 0) == sizeof(rng_seed)) {
+ add_bootloader_randomness(rng_seed, sizeof(rng_seed));
+ memzero_explicit(rng_seed, sizeof(rng_seed));
+ }
}

void __init check_bugs(void)
diff --git a/arch/um/os-Linux/util.c b/arch/um/os-Linux/util.c
index 41297ec404bf..fc0f2a9dee5a 100644
--- a/arch/um/os-Linux/util.c
+++ b/arch/um/os-Linux/util.c
@@ -14,6 +14,7 @@
#include <sys/wait.h>
#include <sys/mman.h>
#include <sys/utsname.h>
+#include <sys/random.h>
#include <init.h>
#include <os.h>

@@ -96,6 +97,11 @@ static inline void __attribute__ ((noreturn)) uml_abort(void)
exit(127);
}

+ssize_t os_getrandom(void *buf, size_t len, unsigned int flags)
+{
+ return getrandom(buf, len, flags);
+}
+
/*
* UML helper threads must not handle SIGWINCH/INT/TERM
*/
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ce1f5a876cfe..04a9865dbdbf 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -272,6 +272,7 @@ config X86
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK
select TRACE_IRQFLAGS_SUPPORT
+ select TRACE_IRQFLAGS_NMI_SUPPORT
select USER_STACKTRACE_SUPPORT
select VIRT_TO_BUS
select HAVE_ARCH_KCSAN if X86_64
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index d3a6f74a94bd..d4d6db4dde22 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -1,8 +1,5 @@
# SPDX-License-Identifier: GPL-2.0

-config TRACE_IRQFLAGS_NMI_SUPPORT
- def_bool y
-
config EARLY_PRINTK_USB
bool

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index b5aecb524a8a..ffec8bb01ba8 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -103,7 +103,7 @@ $(obj)/zoffset.h: $(obj)/compressed/vmlinux FORCE
AFLAGS_header.o += -I$(objtree)/$(obj)
$(obj)/header.o: $(obj)/zoffset.h

-LDFLAGS_setup.elf := -m elf_i386 -T
+LDFLAGS_setup.elf := -m elf_i386 -z noexecstack -T
$(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
$(call if_changed,ld)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 6115274fe10f..c66ea96936de 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -69,6 +69,10 @@ LDFLAGS_vmlinux := -pie $(call ld-option, --no-dynamic-linker)
ifdef CONFIG_LD_ORPHAN_WARN
LDFLAGS_vmlinux += --orphan-handling=warn
endif
+LDFLAGS_vmlinux += -z noexecstack
+ifeq ($(CONFIG_LD_IS_BFD),y)
+LDFLAGS_vmlinux += $(call ld-option,--no-warn-rwx-segments)
+endif
LDFLAGS_vmlinux += -T

hostprogs := mkpiggy
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 2831685adf6f..8ed4597fdf6a 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -61,9 +61,7 @@ sha256-ssse3-$(CONFIG_AS_SHA256_NI) += sha256_ni_asm.o
obj-$(CONFIG_CRYPTO_SHA512_SSSE3) += sha512-ssse3.o
sha512-ssse3-y := sha512-ssse3-asm.o sha512-avx-asm.o sha512-avx2-asm.o sha512_ssse3_glue.o

-obj-$(CONFIG_CRYPTO_BLAKE2S_X86) += blake2s-x86_64.o
-blake2s-x86_64-y := blake2s-shash.o
-obj-$(if $(CONFIG_CRYPTO_BLAKE2S_X86),y) += libblake2s-x86_64.o
+obj-$(CONFIG_CRYPTO_BLAKE2S_X86) += libblake2s-x86_64.o
libblake2s-x86_64-y := blake2s-core.o blake2s-glue.o

obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o
diff --git a/arch/x86/crypto/blake2s-glue.c b/arch/x86/crypto/blake2s-glue.c
index 69853c13e8fb..aaba21230528 100644
--- a/arch/x86/crypto/blake2s-glue.c
+++ b/arch/x86/crypto/blake2s-glue.c
@@ -4,7 +4,6 @@
*/

#include <crypto/internal/blake2s.h>
-#include <crypto/internal/simd.h>

#include <linux/types.h>
#include <linux/jump_label.h>
@@ -33,7 +32,7 @@ void blake2s_compress(struct blake2s_state *state, const u8 *block,
/* SIMD disables preemption, so relax after processing each page. */
BUILD_BUG_ON(SZ_4K / BLAKE2S_BLOCK_SIZE < 8);

- if (!static_branch_likely(&blake2s_use_ssse3) || !crypto_simd_usable()) {
+ if (!static_branch_likely(&blake2s_use_ssse3) || !may_use_simd()) {
blake2s_compress_generic(state, block, nblocks, inc);
return;
}
diff --git a/arch/x86/crypto/blake2s-shash.c b/arch/x86/crypto/blake2s-shash.c
deleted file mode 100644
index 59ae28abe35c..000000000000
--- a/arch/x86/crypto/blake2s-shash.c
+++ /dev/null
@@ -1,77 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0 OR MIT
-/*
- * Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@xxxxxxxxx>. All Rights Reserved.
- */
-
-#include <crypto/internal/blake2s.h>
-#include <crypto/internal/simd.h>
-#include <crypto/internal/hash.h>
-
-#include <linux/types.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/sizes.h>
-
-#include <asm/cpufeature.h>
-#include <asm/processor.h>
-
-static int crypto_blake2s_update_x86(struct shash_desc *desc,
- const u8 *in, unsigned int inlen)
-{
- return crypto_blake2s_update(desc, in, inlen, false);
-}
-
-static int crypto_blake2s_final_x86(struct shash_desc *desc, u8 *out)
-{
- return crypto_blake2s_final(desc, out, false);
-}
-
-#define BLAKE2S_ALG(name, driver_name, digest_size) \
- { \
- .base.cra_name = name, \
- .base.cra_driver_name = driver_name, \
- .base.cra_priority = 200, \
- .base.cra_flags = CRYPTO_ALG_OPTIONAL_KEY, \
- .base.cra_blocksize = BLAKE2S_BLOCK_SIZE, \
- .base.cra_ctxsize = sizeof(struct blake2s_tfm_ctx), \
- .base.cra_module = THIS_MODULE, \
- .digestsize = digest_size, \
- .setkey = crypto_blake2s_setkey, \
- .init = crypto_blake2s_init, \
- .update = crypto_blake2s_update_x86, \
- .final = crypto_blake2s_final_x86, \
- .descsize = sizeof(struct blake2s_state), \
- }
-
-static struct shash_alg blake2s_algs[] = {
- BLAKE2S_ALG("blake2s-128", "blake2s-128-x86", BLAKE2S_128_HASH_SIZE),
- BLAKE2S_ALG("blake2s-160", "blake2s-160-x86", BLAKE2S_160_HASH_SIZE),
- BLAKE2S_ALG("blake2s-224", "blake2s-224-x86", BLAKE2S_224_HASH_SIZE),
- BLAKE2S_ALG("blake2s-256", "blake2s-256-x86", BLAKE2S_256_HASH_SIZE),
-};
-
-static int __init blake2s_mod_init(void)
-{
- if (IS_REACHABLE(CONFIG_CRYPTO_HASH) && boot_cpu_has(X86_FEATURE_SSSE3))
- return crypto_register_shashes(blake2s_algs, ARRAY_SIZE(blake2s_algs));
- return 0;
-}
-
-static void __exit blake2s_mod_exit(void)
-{
- if (IS_REACHABLE(CONFIG_CRYPTO_HASH) && boot_cpu_has(X86_FEATURE_SSSE3))
- crypto_unregister_shashes(blake2s_algs, ARRAY_SIZE(blake2s_algs));
-}
-
-module_init(blake2s_mod_init);
-module_exit(blake2s_mod_exit);
-
-MODULE_ALIAS_CRYPTO("blake2s-128");
-MODULE_ALIAS_CRYPTO("blake2s-128-x86");
-MODULE_ALIAS_CRYPTO("blake2s-160");
-MODULE_ALIAS_CRYPTO("blake2s-160-x86");
-MODULE_ALIAS_CRYPTO("blake2s-224");
-MODULE_ALIAS_CRYPTO("blake2s-224-x86");
-MODULE_ALIAS_CRYPTO("blake2s-256");
-MODULE_ALIAS_CRYPTO("blake2s-256-x86");
-MODULE_LICENSE("GPL v2");
diff --git a/arch/x86/entry/Makefile b/arch/x86/entry/Makefile
index eeadbd7d92cc..ca2fe186994b 100644
--- a/arch/x86/entry/Makefile
+++ b/arch/x86/entry/Makefile
@@ -11,12 +11,13 @@ CFLAGS_REMOVE_common.o = $(CC_FLAGS_FTRACE)

CFLAGS_common.o += -fno-stack-protector

-obj-y := entry.o entry_$(BITS).o thunk_$(BITS).o syscall_$(BITS).o
+obj-y := entry.o entry_$(BITS).o syscall_$(BITS).o
obj-y += common.o

obj-y += vdso/
obj-y += vsyscall/

+obj-$(CONFIG_PREEMPTION) += thunk_$(BITS).o
obj-$(CONFIG_IA32_EMULATION) += entry_64_compat.o syscall_32.o
obj-$(CONFIG_X86_X32_ABI) += syscall_x32.o

diff --git a/arch/x86/entry/thunk_32.S b/arch/x86/entry/thunk_32.S
index 7591bab060f7..ff6e7003da97 100644
--- a/arch/x86/entry/thunk_32.S
+++ b/arch/x86/entry/thunk_32.S
@@ -29,10 +29,8 @@ SYM_CODE_START_NOALIGN(\name)
SYM_CODE_END(\name)
.endm

-#ifdef CONFIG_PREEMPTION
THUNK preempt_schedule_thunk, preempt_schedule
THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
EXPORT_SYMBOL(preempt_schedule_thunk)
EXPORT_SYMBOL(preempt_schedule_notrace_thunk)
-#endif

diff --git a/arch/x86/entry/thunk_64.S b/arch/x86/entry/thunk_64.S
index 505b488fcc65..f38b07d2768b 100644
--- a/arch/x86/entry/thunk_64.S
+++ b/arch/x86/entry/thunk_64.S
@@ -31,14 +31,11 @@ SYM_FUNC_END(\name)
_ASM_NOKPROBE(\name)
.endm

-#ifdef CONFIG_PREEMPTION
THUNK preempt_schedule_thunk, preempt_schedule
THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
EXPORT_SYMBOL(preempt_schedule_thunk)
EXPORT_SYMBOL(preempt_schedule_notrace_thunk)
-#endif

-#ifdef CONFIG_PREEMPTION
SYM_CODE_START_LOCAL_NOALIGN(__thunk_restore)
popq %r11
popq %r10
@@ -53,4 +50,3 @@ SYM_CODE_START_LOCAL_NOALIGN(__thunk_restore)
RET
_ASM_NOKPROBE(__thunk_restore)
SYM_CODE_END(__thunk_restore)
-#endif
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index e893af5aa8f5..d80a133093a2 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -179,7 +179,7 @@ quiet_cmd_vdso = VDSO $@
sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'

VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
- $(call ld-option, --eh-frame-hdr) -Bsymbolic
+ $(call ld-option, --eh-frame-hdr) -Bsymbolic -z noexecstack
GCOV_PROFILE := n

quiet_cmd_vdso_and_check = VDSO $@
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 1f156098a5bf..4f70fb6c2c1e 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -769,6 +769,7 @@ void intel_pmu_lbr_disable_all(void)
void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc)
{
unsigned long mask = x86_pmu.lbr_nr - 1;
+ struct perf_branch_entry *br = cpuc->lbr_entries;
u64 tos = intel_pmu_lbr_tos();
int i;

@@ -784,15 +785,11 @@ void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc)

rdmsrl(x86_pmu.lbr_from + lbr_idx, msr_lastbranch.lbr);

- cpuc->lbr_entries[i].from = msr_lastbranch.from;
- cpuc->lbr_entries[i].to = msr_lastbranch.to;
- cpuc->lbr_entries[i].mispred = 0;
- cpuc->lbr_entries[i].predicted = 0;
- cpuc->lbr_entries[i].in_tx = 0;
- cpuc->lbr_entries[i].abort = 0;
- cpuc->lbr_entries[i].cycles = 0;
- cpuc->lbr_entries[i].type = 0;
- cpuc->lbr_entries[i].reserved = 0;
+ perf_clear_branch_entry_bitfields(br);
+
+ br->from = msr_lastbranch.from;
+ br->to = msr_lastbranch.to;
+ br++;
}
cpuc->lbr_stack.nr = i;
cpuc->lbr_stack.hw_idx = tos;
@@ -807,6 +804,7 @@ void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
{
bool need_info = false, call_stack = false;
unsigned long mask = x86_pmu.lbr_nr - 1;
+ struct perf_branch_entry *br = cpuc->lbr_entries;
u64 tos = intel_pmu_lbr_tos();
int i;
int out = 0;
@@ -878,15 +876,14 @@ void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
if (abort && x86_pmu.lbr_double_abort && out > 0)
out--;

- cpuc->lbr_entries[out].from = from;
- cpuc->lbr_entries[out].to = to;
- cpuc->lbr_entries[out].mispred = mis;
- cpuc->lbr_entries[out].predicted = pred;
- cpuc->lbr_entries[out].in_tx = in_tx;
- cpuc->lbr_entries[out].abort = abort;
- cpuc->lbr_entries[out].cycles = cycles;
- cpuc->lbr_entries[out].type = 0;
- cpuc->lbr_entries[out].reserved = 0;
+ perf_clear_branch_entry_bitfields(br+out);
+ br[out].from = from;
+ br[out].to = to;
+ br[out].mispred = mis;
+ br[out].predicted = pred;
+ br[out].in_tx = in_tx;
+ br[out].abort = abort;
+ br[out].cycles = cycles;
out++;
}
cpuc->lbr_stack.nr = out;
@@ -951,6 +948,8 @@ static void intel_pmu_store_lbr(struct cpu_hw_events *cpuc,
to = rdlbr_to(i, lbr);
info = rdlbr_info(i, lbr);

+ perf_clear_branch_entry_bitfields(e);
+
e->from = from;
e->to = to;
e->mispred = get_lbr_mispred(info);
@@ -959,7 +958,6 @@ static void intel_pmu_store_lbr(struct cpu_hw_events *cpuc,
e->abort = !!(info & LBR_INFO_ABORT);
e->cycles = get_lbr_cycles(info);
e->type = get_lbr_br_type(info);
- e->reserved = 0;
}

cpuc->lbr_stack.nr = i;
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 6ad8d946cd3e..5ec359c1b50c 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -193,6 +193,12 @@ int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
const Elf_Shdr *relsec,
const Elf_Shdr *symtab);
#define arch_kexec_apply_relocations_add arch_kexec_apply_relocations_add
+
+void *arch_kexec_kernel_image_load(struct kimage *image);
+#define arch_kexec_kernel_image_load arch_kexec_kernel_image_load
+
+int arch_kimage_file_post_load_cleanup(struct kimage *image);
+#define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup
#endif
#endif

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9fdaa847d4b6..35c7a1fce8ea 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -508,6 +508,7 @@ struct kvm_pmu {
unsigned nr_arch_fixed_counters;
unsigned available_event_types;
u64 fixed_ctr_ctrl;
+ u64 fixed_ctr_ctrl_mask;
u64 global_ctrl;
u64 global_status;
u64 counter_bitmask[2];
@@ -1588,7 +1589,7 @@ static inline int kvm_arch_flush_remote_tlb(struct kvm *kvm)
#define kvm_arch_pmi_in_guest(vcpu) \
((vcpu) && (vcpu)->arch.handling_intr_from_guest)

-void kvm_mmu_x86_module_init(void);
+void __init kvm_mmu_x86_module_init(void);
int kvm_mmu_vendor_module_init(void);
void kvm_mmu_vendor_module_exit(void);

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index fa625b2a8a93..a80329bd0004 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -152,7 +152,7 @@ void __init check_bugs(void)
/*
* spectre_v2_user_select_mitigation() relies on the state set by
* retbleed_select_mitigation(); specifically the STIBP selection is
- * forced for UNRET.
+ * forced for UNRET or IBPB.
*/
spectre_v2_user_select_mitigation();
ssb_select_mitigation();
@@ -1172,7 +1172,8 @@ spectre_v2_user_select_mitigation(void)
boot_cpu_has(X86_FEATURE_AMD_STIBP_ALWAYS_ON))
mode = SPECTRE_V2_USER_STRICT_PREFERRED;

- if (retbleed_mitigation == RETBLEED_MITIGATION_UNRET) {
+ if (retbleed_mitigation == RETBLEED_MITIGATION_UNRET ||
+ retbleed_mitigation == RETBLEED_MITIGATION_IBPB) {
if (mode != SPECTRE_V2_USER_STRICT &&
mode != SPECTRE_V2_USER_STRICT_PREFERRED)
pr_info("Selecting STIBP always-on mode to complement retbleed mitigation\n");
@@ -2353,10 +2354,11 @@ static ssize_t srbds_show_state(char *buf)

static ssize_t retbleed_show_state(char *buf)
{
- if (retbleed_mitigation == RETBLEED_MITIGATION_UNRET) {
+ if (retbleed_mitigation == RETBLEED_MITIGATION_UNRET ||
+ retbleed_mitigation == RETBLEED_MITIGATION_IBPB) {
if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
- return sprintf(buf, "Vulnerable: untrained return thunk on non-Zen uarch\n");
+ return sprintf(buf, "Vulnerable: untrained return thunk / IBPB on non-AMD based uarch\n");

return sprintf(buf, "%s; SMT %s\n",
retbleed_strings[retbleed_mitigation],
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 2c87d62f191e..ae7d4c85f4f4 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1145,22 +1145,23 @@ static void bus_lock_init(void)
{
u64 val;

- /*
- * Warn and fatal are handled by #AC for split lock if #AC for
- * split lock is supported.
- */
- if (!boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT) ||
- (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
- (sld_state == sld_warn || sld_state == sld_fatal)) ||
- sld_state == sld_off)
+ if (!boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT))
return;

- /*
- * Enable #DB for bus lock. All bus locks are handled in #DB except
- * split locks are handled in #AC in the fatal case.
- */
rdmsrl(MSR_IA32_DEBUGCTLMSR, val);
- val |= DEBUGCTLMSR_BUS_LOCK_DETECT;
+
+ if ((boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
+ (sld_state == sld_warn || sld_state == sld_fatal)) ||
+ sld_state == sld_off) {
+ /*
+ * Warn and fatal are handled by #AC for split lock if #AC for
+ * split lock is supported.
+ */
+ val &= ~DEBUGCTLMSR_BUS_LOCK_DETECT;
+ } else {
+ val |= DEBUGCTLMSR_BUS_LOCK_DETECT;
+ }
+
wrmsrl(MSR_IA32_DEBUGCTLMSR, val);
}

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 6892ca67d9c6..b6d7ece7bf51 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -93,6 +93,7 @@ static int ftrace_verify_code(unsigned long ip, const char *old_code)

/* Make sure it is what we expect it to be */
if (memcmp(cur_code, old_code, MCOUNT_INSN_SIZE) != 0) {
+ ftrace_expected = old_code;
WARN_ON(1);
return -EINVAL;
}
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 7c4ab8870da4..74167dc5f55e 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -814,16 +814,20 @@ set_current_kprobe(struct kprobe *p, struct pt_regs *regs,
static void kprobe_post_process(struct kprobe *cur, struct pt_regs *regs,
struct kprobe_ctlblk *kcb)
{
- if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
- kcb->kprobe_status = KPROBE_HIT_SSDONE;
- cur->post_handler(cur, regs, 0);
- }
-
/* Restore back the original saved kprobes variables and continue. */
- if (kcb->kprobe_status == KPROBE_REENTER)
+ if (kcb->kprobe_status == KPROBE_REENTER) {
+ /* This will restore both kcb and current_kprobe */
restore_previous_kprobe(kcb);
- else
+ } else {
+ /*
+ * Always update the kcb status because
+ * reset_curent_kprobe() doesn't update kcb.
+ */
+ kcb->kprobe_status = KPROBE_HIT_SSDONE;
+ if (cur->post_handler)
+ cur->post_handler(cur, regs, 0);
reset_current_kprobe();
+ }
}
NOKPROBE_SYMBOL(kprobe_post_process);

diff --git a/arch/x86/kernel/pmem.c b/arch/x86/kernel/pmem.c
index 6b07faaa1579..23154d24b117 100644
--- a/arch/x86/kernel/pmem.c
+++ b/arch/x86/kernel/pmem.c
@@ -27,6 +27,11 @@ static __init int register_e820_pmem(void)
* simply here to trigger the module to load on demand.
*/
pdev = platform_device_alloc("e820_pmem", -1);
- return platform_device_add(pdev);
+
+ rc = platform_device_add(pdev);
+ if (rc)
+ platform_device_put(pdev);
+
+ return rc;
}
device_initcall(register_e820_pmem);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 622dc3673c37..8011536ba5c4 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -824,6 +824,10 @@ static void amd_e400_idle(void)
*/
static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
{
+ /* User has disallowed the use of MWAIT. Fallback to HALT */
+ if (boot_option_idle_override == IDLE_NOMWAIT)
+ return 0;
+
if (c->x86_vendor != X86_VENDOR_INTEL)
return 0;

@@ -932,9 +936,8 @@ static int __init idle_setup(char *str)
} else if (!strcmp(str, "nomwait")) {
/*
* If the boot option of "idle=nomwait" is added,
- * it means that mwait will be disabled for CPU C2/C3
- * states. In such case it won't touch the variable
- * of boot_option_idle_override.
+ * it means that mwait will be disabled for CPU C1/C2/C3
+ * states.
*/
boot_option_idle_override = IDLE_NOMWAIT;
} else
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f8382abe22ff..aa907cec0918 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1687,16 +1687,6 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
case VCPU_SREG_TR:
if (seg_desc.s || (seg_desc.type != 1 && seg_desc.type != 9))
goto exception;
- if (!seg_desc.p) {
- err_vec = NP_VECTOR;
- goto exception;
- }
- old_desc = seg_desc;
- seg_desc.type |= 2; /* busy */
- ret = ctxt->ops->cmpxchg_emulated(ctxt, desc_addr, &old_desc, &seg_desc,
- sizeof(seg_desc), &ctxt->exception);
- if (ret != X86EMUL_CONTINUE)
- return ret;
break;
case VCPU_SREG_LDTR:
if (seg_desc.s || seg_desc.type != 2)
@@ -1734,8 +1724,17 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
if (ret != X86EMUL_CONTINUE)
return ret;
if (emul_is_noncanonical_address(get_desc_base(&seg_desc) |
- ((u64)base3 << 32), ctxt))
- return emulate_gp(ctxt, 0);
+ ((u64)base3 << 32), ctxt))
+ return emulate_gp(ctxt, err_code);
+ }
+
+ if (seg == VCPU_SREG_TR) {
+ old_desc = seg_desc;
+ seg_desc.type |= 2; /* busy */
+ ret = ctxt->ops->cmpxchg_emulated(ctxt, desc_addr, &old_desc, &seg_desc,
+ sizeof(seg_desc), &ctxt->exception);
+ if (ret != X86EMUL_CONTINUE)
+ return ret;
}
load:
ctxt->ops->set_segment(ctxt, selector, &seg_desc, base3, seg);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7f9c3b0fcd0b..da7a25d41a97 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6264,7 +6264,7 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
* nx_huge_pages needs to be resolved to true/false when kvm.ko is loaded, as
* its default value of -1 is technically undefined behavior for a boolean.
*/
-void kvm_mmu_x86_module_init(void)
+void __init kvm_mmu_x86_module_init(void)
{
if (nx_huge_pages == -1)
__set_nx_huge_pages(get_nx_auto_mode());
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index beb3ce8d94eb..43f6a882615f 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1044,7 +1044,14 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access))
continue;

- if (gfn != sp->gfns[i]) {
+ /*
+ * Drop the SPTE if the new protections would result in a RWX=0
+ * SPTE or if the gfn is changing. The RWX=0 case only affects
+ * EPT with execute-only support, i.e. EPT without an effective
+ * "present" bit, as all other paging modes will create a
+ * read-only SPTE if pte_access is zero.
+ */
+ if ((!pte_access && !shadow_present_mask) || gfn != sp->gfns[i]) {
drop_spte(vcpu->kvm, &sp->spt[i]);
flush = true;
continue;
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index e5c0b6db6f2c..8223a80802e7 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -128,6 +128,8 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
u64 spte = SPTE_MMU_PRESENT_MASK;
bool wrprot = false;

+ WARN_ON_ONCE(!pte_access && !shadow_present_mask);
+
if (sp->role.ad_disabled)
spte |= SPTE_TDP_AD_DISABLED_MASK;
else if (kvm_mmu_page_ad_need_write_protect(sp))
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 9a5963bff0f9..83e7325f7df4 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -292,7 +292,8 @@ static bool __nested_vmcb_check_save(struct kvm_vcpu *vcpu,
return false;
}

- if (CC(!kvm_is_valid_cr4(vcpu, save->cr4)))
+ /* Note, SVM doesn't have any additional restrictions on CR4. */
+ if (CC(!__kvm_is_valid_cr4(vcpu, save->cr4)))
return false;

if (CC(!kvm_valid_efer(vcpu, save->efer)))
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c667214c630b..e5a154f0d2ab 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -390,6 +390,10 @@ static void svm_queue_exception(struct kvm_vcpu *vcpu)
*/
(void)svm_skip_emulated_instruction(vcpu);
rip = kvm_rip_read(vcpu);
+
+ if (boot_cpu_has(X86_FEATURE_NRIPS))
+ svm->vmcb->control.next_rip = rip;
+
svm->int3_rip = rip + svm->vmcb->save.cs.base;
svm->int3_injected = rip - old_rip;
}
@@ -3305,8 +3309,6 @@ static void svm_inject_irq(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);

- BUG_ON(!(gif_set(svm)));
-
trace_kvm_inj_virq(vcpu->arch.interrupt.nr);
++vcpu->stat.irq_injections;

@@ -3615,6 +3617,18 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu)
vector = exitintinfo & SVM_EXITINTINFO_VEC_MASK;
type = exitintinfo & SVM_EXITINTINFO_TYPE_MASK;

+ /*
+ * If NextRIP isn't enabled, KVM must manually advance RIP prior to
+ * injecting the soft exception/interrupt. That advancement needs to
+ * be unwound if vectoring didn't complete. Note, the new event may
+ * not be the injected event, e.g. if KVM injected an INTn, the INTn
+ * hit a #NP in the guest, and the #NP encountered a #PF, the #NP will
+ * be the reported vectored event, but RIP still needs to be unwound.
+ */
+ if (int3_injected && type == SVM_EXITINTINFO_TYPE_EXEPT &&
+ kvm_is_linear_rip(vcpu, svm->int3_rip))
+ kvm_rip_write(vcpu, kvm_rip_read(vcpu) - int3_injected);
+
switch (type) {
case SVM_EXITINTINFO_TYPE_NMI:
vcpu->arch.nmi_injected = true;
@@ -3628,16 +3642,11 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu)

/*
* In case of software exceptions, do not reinject the vector,
- * but re-execute the instruction instead. Rewind RIP first
- * if we emulated INT3 before.
+ * but re-execute the instruction instead.
*/
- if (kvm_exception_is_soft(vector)) {
- if (vector == BP_VECTOR && int3_injected &&
- kvm_is_linear_rip(vcpu, svm->int3_rip))
- kvm_rip_write(vcpu,
- kvm_rip_read(vcpu) - int3_injected);
+ if (kvm_exception_is_soft(vector))
break;
- }
+
if (exitintinfo & SVM_EXITINTINFO_VALID_ERR) {
u32 err = svm->vmcb->control.exit_int_info_err;
kvm_requeue_exception_e(vcpu, vector, err);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 28ccf25c4124..5c62e552082a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1224,7 +1224,7 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
/* reserved */
BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
- u64 vmx_basic = vmx->nested.msrs.basic;
+ u64 vmx_basic = vmcs_config.nested.basic;

if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved))
return -EINVAL;
@@ -1247,36 +1247,42 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
return 0;
}

-static int
-vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
+static void vmx_get_control_msr(struct nested_vmx_msrs *msrs, u32 msr_index,
+ u32 **low, u32 **high)
{
- u64 supported;
- u32 *lowp, *highp;
-
switch (msr_index) {
case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
- lowp = &vmx->nested.msrs.pinbased_ctls_low;
- highp = &vmx->nested.msrs.pinbased_ctls_high;
+ *low = &msrs->pinbased_ctls_low;
+ *high = &msrs->pinbased_ctls_high;
break;
case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
- lowp = &vmx->nested.msrs.procbased_ctls_low;
- highp = &vmx->nested.msrs.procbased_ctls_high;
+ *low = &msrs->procbased_ctls_low;
+ *high = &msrs->procbased_ctls_high;
break;
case MSR_IA32_VMX_TRUE_EXIT_CTLS:
- lowp = &vmx->nested.msrs.exit_ctls_low;
- highp = &vmx->nested.msrs.exit_ctls_high;
+ *low = &msrs->exit_ctls_low;
+ *high = &msrs->exit_ctls_high;
break;
case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
- lowp = &vmx->nested.msrs.entry_ctls_low;
- highp = &vmx->nested.msrs.entry_ctls_high;
+ *low = &msrs->entry_ctls_low;
+ *high = &msrs->entry_ctls_high;
break;
case MSR_IA32_VMX_PROCBASED_CTLS2:
- lowp = &vmx->nested.msrs.secondary_ctls_low;
- highp = &vmx->nested.msrs.secondary_ctls_high;
+ *low = &msrs->secondary_ctls_low;
+ *high = &msrs->secondary_ctls_high;
break;
default:
BUG();
}
+}
+
+static int
+vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
+{
+ u32 *lowp, *highp;
+ u64 supported;
+
+ vmx_get_control_msr(&vmcs_config.nested, msr_index, &lowp, &highp);

supported = vmx_control_msr(*lowp, *highp);

@@ -1288,6 +1294,7 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
if (!is_bitwise_subset(supported, data, GENMASK_ULL(63, 32)))
return -EINVAL;

+ vmx_get_control_msr(&vmx->nested.msrs, msr_index, &lowp, &highp);
*lowp = data;
*highp = data >> 32;
return 0;
@@ -1301,10 +1308,8 @@ static int vmx_restore_vmx_misc(struct vcpu_vmx *vmx, u64 data)
BIT_ULL(28) | BIT_ULL(29) | BIT_ULL(30) |
/* reserved */
GENMASK_ULL(13, 9) | BIT_ULL(31);
- u64 vmx_misc;
-
- vmx_misc = vmx_control_msr(vmx->nested.msrs.misc_low,
- vmx->nested.msrs.misc_high);
+ u64 vmx_misc = vmx_control_msr(vmcs_config.nested.misc_low,
+ vmcs_config.nested.misc_high);

if (!is_bitwise_subset(vmx_misc, data, feature_and_reserved_bits))
return -EINVAL;
@@ -1332,10 +1337,8 @@ static int vmx_restore_vmx_misc(struct vcpu_vmx *vmx, u64 data)

static int vmx_restore_vmx_ept_vpid_cap(struct vcpu_vmx *vmx, u64 data)
{
- u64 vmx_ept_vpid_cap;
-
- vmx_ept_vpid_cap = vmx_control_msr(vmx->nested.msrs.ept_caps,
- vmx->nested.msrs.vpid_caps);
+ u64 vmx_ept_vpid_cap = vmx_control_msr(vmcs_config.nested.ept_caps,
+ vmcs_config.nested.vpid_caps);

/* Every bit is either reserved or a feature bit. */
if (!is_bitwise_subset(vmx_ept_vpid_cap, data, -1ULL))
@@ -1346,20 +1349,21 @@ static int vmx_restore_vmx_ept_vpid_cap(struct vcpu_vmx *vmx, u64 data)
return 0;
}

-static int vmx_restore_fixed0_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
+static u64 *vmx_get_fixed0_msr(struct nested_vmx_msrs *msrs, u32 msr_index)
{
- u64 *msr;
-
switch (msr_index) {
case MSR_IA32_VMX_CR0_FIXED0:
- msr = &vmx->nested.msrs.cr0_fixed0;
- break;
+ return &msrs->cr0_fixed0;
case MSR_IA32_VMX_CR4_FIXED0:
- msr = &vmx->nested.msrs.cr4_fixed0;
- break;
+ return &msrs->cr4_fixed0;
default:
BUG();
}
+}
+
+static int vmx_restore_fixed0_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
+{
+ const u64 *msr = vmx_get_fixed0_msr(&vmcs_config.nested, msr_index);

/*
* 1 bits (which indicates bits which "must-be-1" during VMX operation)
@@ -1368,7 +1372,7 @@ static int vmx_restore_fixed0_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
if (!is_bitwise_subset(data, *msr, -1ULL))
return -EINVAL;

- *msr = data;
+ *vmx_get_fixed0_msr(&vmx->nested.msrs, msr_index) = data;
return 0;
}

@@ -1429,7 +1433,7 @@ int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
vmx->nested.msrs.vmcs_enum = data;
return 0;
case MSR_IA32_VMX_VMFUNC:
- if (data & ~vmx->nested.msrs.vmfunc_controls)
+ if (data & ~vmcs_config.nested.vmfunc_controls)
return -EINVAL;
vmx->nested.msrs.vmfunc_controls = data;
return 0;
@@ -2279,7 +2283,6 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
SECONDARY_EXEC_APIC_REGISTER_VIRT |
SECONDARY_EXEC_ENABLE_VMFUNC |
- SECONDARY_EXEC_TSC_SCALING |
SECONDARY_EXEC_DESC);

if (nested_cpu_has(vmcs12,
@@ -2618,6 +2621,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
vcpu->arch.walk_mmu->inject_page_fault = vmx_inject_page_fault_nested;

if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) &&
+ intel_pmu_has_perf_global_ctrl(vcpu_to_pmu(vcpu)) &&
WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
vmcs12->guest_ia32_perf_global_ctrl))) {
*entry_failure_code = ENTRY_FAIL_DEFAULT;
@@ -3378,10 +3382,12 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
if (likely(!evaluate_pending_interrupts) && kvm_vcpu_apicv_active(vcpu))
evaluate_pending_interrupts |= vmx_has_apicv_interrupt(vcpu);

- if (!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
+ if (!vmx->nested.nested_run_pending ||
+ !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
vmx->nested.vmcs01_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
if (kvm_mpx_supported() &&
- !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
+ (!vmx->nested.nested_run_pending ||
+ !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);

/*
@@ -4341,7 +4347,8 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
vmcs_write64(GUEST_IA32_PAT, vmcs12->host_ia32_pat);
vcpu->arch.pat = vmcs12->host_ia32_pat;
}
- if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL)
+ if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL) &&
+ intel_pmu_has_perf_global_ctrl(vcpu_to_pmu(vcpu)))
WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
vmcs12->host_ia32_perf_global_ctrl));

@@ -4967,20 +4974,25 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
| FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX;

/*
- * The Intel VMX Instruction Reference lists a bunch of bits that are
- * prerequisite to running VMXON, most notably cr4.VMXE must be set to
- * 1 (see vmx_is_valid_cr4() for when we allow the guest to set this).
- * Otherwise, we should fail with #UD. But most faulting conditions
- * have already been checked by hardware, prior to the VM-exit for
- * VMXON. We do test guest cr4.VMXE because processor CR4 always has
- * that bit set to 1 in non-root mode.
+ * Note, KVM cannot rely on hardware to perform the CR0/CR4 #UD checks
+ * that have higher priority than VM-Exit (see Intel SDM's pseudocode
+ * for VMXON), as KVM must load valid CR0/CR4 values into hardware while
+ * running the guest, i.e. KVM needs to check the _guest_ values.
+ *
+ * Rely on hardware for the other two pre-VM-Exit checks, !VM86 and
+ * !COMPATIBILITY modes. KVM may run the guest in VM86 to emulate Real
+ * Mode, but KVM will never take the guest out of those modes.
*/
- if (!kvm_read_cr4_bits(vcpu, X86_CR4_VMXE)) {
+ if (!nested_host_cr0_valid(vcpu, kvm_read_cr0(vcpu)) ||
+ !nested_host_cr4_valid(vcpu, kvm_read_cr4(vcpu))) {
kvm_queue_exception(vcpu, UD_VECTOR);
return 1;
}

- /* CPL=0 must be checked manually. */
+ /*
+ * CPL=0 and all other checks that are lower priority than VM-Exit must
+ * be checked manually.
+ */
if (vmx_get_cpl(vcpu)) {
kvm_inject_gp(vcpu, 0);
return 1;
@@ -6780,6 +6792,9 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
rdmsrl(MSR_IA32_VMX_CR0_FIXED1, msrs->cr0_fixed1);
rdmsrl(MSR_IA32_VMX_CR4_FIXED1, msrs->cr4_fixed1);

+ if (vmx_umip_emulated())
+ msrs->cr4_fixed1 |= X86_CR4_UMIP;
+
msrs->vmcs_enum = nested_vmx_calc_vmcs_enum_msr();
}

diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index c92cea0b8ccc..129ae4e01f7c 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -281,7 +281,8 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
u64 fixed0 = to_vmx(vcpu)->nested.msrs.cr4_fixed0;
u64 fixed1 = to_vmx(vcpu)->nested.msrs.cr4_fixed1;

- return fixed_bits_valid(val, fixed0, fixed1);
+ return fixed_bits_valid(val, fixed0, fixed1) &&
+ __kvm_is_valid_cr4(vcpu, val);
}

/* No difference in the restrictions on guest and host CR4 in VMX operation. */
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index b82b6709d7a8..8bd154f8c966 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -98,6 +98,9 @@ static bool intel_pmc_is_enabled(struct kvm_pmc *pmc)
{
struct kvm_pmu *pmu = pmc_to_pmu(pmc);

+ if (!intel_pmu_has_perf_global_ctrl(pmu))
+ return true;
+
return test_bit(pmc->idx, (unsigned long *)&pmu->global_ctrl);
}

@@ -212,7 +215,7 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
case MSR_CORE_PERF_GLOBAL_STATUS:
case MSR_CORE_PERF_GLOBAL_CTRL:
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
- ret = pmu->version > 1;
+ return intel_pmu_has_perf_global_ctrl(pmu);
break;
default:
ret = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0) ||
@@ -395,7 +398,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_CORE_PERF_FIXED_CTR_CTRL:
if (pmu->fixed_ctr_ctrl == data)
return 0;
- if (!(data & 0xfffffffffffff444ull)) {
+ if (!(data & pmu->fixed_ctr_ctrl_mask)) {
reprogram_fixed_counters(pmu, data);
return 0;
}
@@ -479,6 +482,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
struct kvm_cpuid_entry2 *entry;
union cpuid10_eax eax;
union cpuid10_edx edx;
+ int i;

pmu->nr_arch_gp_counters = 0;
pmu->nr_arch_fixed_counters = 0;
@@ -487,6 +491,9 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
pmu->version = 0;
pmu->reserved_bits = 0xffffffff00200000ull;
pmu->raw_event_mask = X86_RAW_EVENT_MASK;
+ pmu->global_ctrl_mask = ~0ull;
+ pmu->global_ovf_ctrl_mask = ~0ull;
+ pmu->fixed_ctr_ctrl_mask = ~0ull;

entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
if (!entry || !vcpu->kvm->arch.enable_pmu)
@@ -522,6 +529,8 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
setup_fixed_pmc_eventsel(pmu);
}

+ for (i = 0; i < pmu->nr_arch_fixed_counters; i++)
+ pmu->fixed_ctr_ctrl_mask &= ~(0xbull << (i * 4));
pmu->global_ctrl = ((1ull << pmu->nr_arch_gp_counters) - 1) |
(((1ull << pmu->nr_arch_fixed_counters) - 1) << INTEL_PMC_IDX_FIXED);
pmu->global_ctrl_mask = ~pmu->global_ctrl;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 597c3c08da50..3cfbb0413cd8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3230,8 +3230,8 @@ static bool vmx_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
{
/*
* We operate under the default treatment of SMM, so VMX cannot be
- * enabled under SMM. Note, whether or not VMXE is allowed at all is
- * handled by kvm_is_valid_cr4().
+ * enabled under SMM. Note, whether or not VMXE is allowed at all,
+ * i.e. is a reserved bit, is handled by common x86 code.
*/
if ((cr4 & X86_CR4_VMXE) && is_smm(vcpu))
return false;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 1e7f9453894b..93aa1f3ea01e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -92,6 +92,18 @@ union vmx_exit_reason {
u32 full;
};

+static inline bool intel_pmu_has_perf_global_ctrl(struct kvm_pmu *pmu)
+{
+ /*
+ * Architecturally, Intel's SDM states that IA32_PERF_GLOBAL_CTRL is
+ * supported if "CPUID.0AH: EAX[7:0] > 0", i.e. if the PMU version is
+ * greater than zero. However, KVM only exposes and emulates the MSR
+ * to/for the guest if the guest PMU supports at least "Architectural
+ * Performance Monitoring Version 2".
+ */
+ return pmu->version > 1;
+}
+
#define vcpu_to_lbr_desc(vcpu) (&to_vmx(vcpu)->lbr_desc)
#define vcpu_to_lbr_records(vcpu) (&to_vmx(vcpu)->lbr_desc.records)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 65b0ec28bd52..0d6cea0d33a9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1066,7 +1066,7 @@ int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_emulate_xsetbv);

-bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
{
if (cr4 & cr4_reserved_bits)
return false;
@@ -1074,9 +1074,15 @@ bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
if (cr4 & vcpu->arch.cr4_guest_rsvd_bits)
return false;

- return static_call(kvm_x86_is_valid_cr4)(vcpu, cr4);
+ return true;
+}
+EXPORT_SYMBOL_GPL(__kvm_is_valid_cr4);
+
+static bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+{
+ return __kvm_is_valid_cr4(vcpu, cr4) &&
+ static_call(kvm_x86_is_valid_cr4)(vcpu, cr4);
}
-EXPORT_SYMBOL_GPL(kvm_is_valid_cr4);

void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4)
{
@@ -3220,17 +3226,20 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
/* only 0 or all 1s can be written to IA32_MCi_CTL
* some Linux kernels though clear bit 10 in bank 4 to
* workaround a BIOS/GART TBL issue on AMD K8s, ignore
- * this to avoid an uncatched #GP in the guest
+ * this to avoid an uncatched #GP in the guest.
+ *
+ * UNIXWARE clears bit 0 of MC1_CTL to ignore
+ * correctable, single-bit ECC data errors.
*/
if ((offset & 0x3) == 0 &&
- data != 0 && (data | (1 << 10)) != ~(u64)0)
- return -1;
+ data != 0 && (data | (1 << 10) | 1) != ~(u64)0)
+ return 1;

/* MCi_STATUS */
if (!msr_info->host_initiated &&
(offset & 0x3) == 1 && data != 0) {
if (!can_set_mci_status(vcpu))
- return -1;
+ return 1;
}

vcpu->arch.mce_banks[offset] = data;
@@ -3361,6 +3370,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
struct gfn_to_hva_cache *ghc = &vcpu->arch.st.cache;
struct kvm_steal_time __user *st;
struct kvm_memslots *slots;
+ gpa_t gpa = vcpu->arch.st.msr_val & KVM_STEAL_VALID_BITS;
u64 steal;
u32 version;

@@ -3378,13 +3388,12 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
slots = kvm_memslots(vcpu->kvm);

if (unlikely(slots->generation != ghc->generation ||
+ gpa != ghc->gpa ||
kvm_is_error_hva(ghc->hva) || !ghc->memslot)) {
- gfn_t gfn = vcpu->arch.st.msr_val & KVM_STEAL_VALID_BITS;
-
/* We rely on the fact that it fits in a single page. */
BUILD_BUG_ON((sizeof(*st) - 1) & KVM_STEAL_VALID_BITS);

- if (kvm_gfn_to_hva_cache_init(vcpu->kvm, ghc, gfn, sizeof(*st)) ||
+ if (kvm_gfn_to_hva_cache_init(vcpu->kvm, ghc, gpa, sizeof(*st)) ||
kvm_is_error_hva(ghc->hva) || !ghc->memslot)
return;
}
@@ -4371,10 +4380,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
if (r < sizeof(struct kvm_xsave))
r = sizeof(struct kvm_xsave);
break;
+ }
case KVM_CAP_PMU_CAPABILITY:
r = enable_pmu ? KVM_CAP_PMU_VALID_MASK : 0;
break;
- }
case KVM_CAP_DISABLE_QUIRKS2:
r = KVM_X86_VALID_QUIRKS;
break;
@@ -4608,6 +4617,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
struct kvm_steal_time __user *st;
struct kvm_memslots *slots;
static const u8 preempted = KVM_VCPU_PREEMPTED;
+ gpa_t gpa = vcpu->arch.st.msr_val & KVM_STEAL_VALID_BITS;

/*
* The vCPU can be marked preempted if and only if the VM-Exit was on
@@ -4635,6 +4645,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
slots = kvm_memslots(vcpu->kvm);

if (unlikely(slots->generation != ghc->generation ||
+ gpa != ghc->gpa ||
kvm_is_error_hva(ghc->hva) || !ghc->memslot))
return;

diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 588792f00334..80417761fe4a 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -407,7 +407,7 @@ static inline void kvm_machine_check(void)
void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu);
void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu);
int kvm_spec_ctrl_test_value(u64 value);
-bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
+bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
struct x86_exception *e);
int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva);
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index dba2197c05c3..331310c29349 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -94,16 +94,18 @@ static bool ex_handler_copy(const struct exception_table_entry *fixup,
static bool ex_handler_msr(const struct exception_table_entry *fixup,
struct pt_regs *regs, bool wrmsr, bool safe, int reg)
{
- if (!safe && wrmsr &&
- pr_warn_once("unchecked MSR access error: WRMSR to 0x%x (tried to write 0x%08x%08x) at rIP: 0x%lx (%pS)\n",
- (unsigned int)regs->cx, (unsigned int)regs->dx,
- (unsigned int)regs->ax, regs->ip, (void *)regs->ip))
+ if (__ONCE_LITE_IF(!safe && wrmsr)) {
+ pr_warn("unchecked MSR access error: WRMSR to 0x%x (tried to write 0x%08x%08x) at rIP: 0x%lx (%pS)\n",
+ (unsigned int)regs->cx, (unsigned int)regs->dx,
+ (unsigned int)regs->ax, regs->ip, (void *)regs->ip);
show_stack_regs(regs);
+ }

- if (!safe && !wrmsr &&
- pr_warn_once("unchecked MSR access error: RDMSR from 0x%x at rIP: 0x%lx (%pS)\n",
- (unsigned int)regs->cx, regs->ip, (void *)regs->ip))
+ if (__ONCE_LITE_IF(!safe && !wrmsr)) {
+ pr_warn("unchecked MSR access error: RDMSR from 0x%x at rIP: 0x%lx (%pS)\n",
+ (unsigned int)regs->cx, regs->ip, (void *)regs->ip);
show_stack_regs(regs);
+ }

if (!wrmsr) {
/* Pretend that the read succeeded and returned 0. */
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index e8b061557887..2aadb2019b4f 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -867,7 +867,7 @@ void debug_cpumask_set_cpu(int cpu, int node, bool enable)
return;
}
mask = node_to_cpumask_map[node];
- if (!mask) {
+ if (!cpumask_available(mask)) {
pr_err("node_to_cpumask_map[%i] NULL\n", node);
dump_stack();
return;
@@ -913,7 +913,7 @@ const struct cpumask *cpumask_of_node(int node)
dump_stack();
return cpu_none_mask;
}
- if (node_to_cpumask_map[node] == NULL) {
+ if (!cpumask_available(node_to_cpumask_map[node])) {
printk(KERN_WARNING
"cpumask_of_node(%d): no node_to_cpumask_map!\n",
node);
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 2dab2816b3f7..400117f630b8 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1777,10 +1777,12 @@ static void restore_regs(const struct btf_func_model *m, u8 **prog, int nr_args,
}

static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
- struct bpf_prog *p, int stack_size, bool save_ret)
+ struct bpf_tramp_link *l, int stack_size,
+ bool save_ret)
{
u8 *prog = *pprog;
u8 *jmp_insn;
+ struct bpf_prog *p = l->link.prog;

/* arg1: mov rdi, progs[i] */
emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32, (u32) (long) p);
@@ -1865,14 +1867,14 @@ static int emit_cond_near_jump(u8 **pprog, void *func, void *ip, u8 jmp_cond)
}

static int invoke_bpf(const struct btf_func_model *m, u8 **pprog,
- struct bpf_tramp_progs *tp, int stack_size,
+ struct bpf_tramp_links *tl, int stack_size,
bool save_ret)
{
int i;
u8 *prog = *pprog;

- for (i = 0; i < tp->nr_progs; i++) {
- if (invoke_bpf_prog(m, &prog, tp->progs[i], stack_size,
+ for (i = 0; i < tl->nr_links; i++) {
+ if (invoke_bpf_prog(m, &prog, tl->links[i], stack_size,
save_ret))
return -EINVAL;
}
@@ -1881,7 +1883,7 @@ static int invoke_bpf(const struct btf_func_model *m, u8 **pprog,
}

static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
- struct bpf_tramp_progs *tp, int stack_size,
+ struct bpf_tramp_links *tl, int stack_size,
u8 **branches)
{
u8 *prog = *pprog;
@@ -1892,8 +1894,8 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
*/
emit_mov_imm32(&prog, false, BPF_REG_0, 0);
emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);
- for (i = 0; i < tp->nr_progs; i++) {
- if (invoke_bpf_prog(m, &prog, tp->progs[i], stack_size, true))
+ for (i = 0; i < tl->nr_links; i++) {
+ if (invoke_bpf_prog(m, &prog, tl->links[i], stack_size, true))
return -EINVAL;

/* mod_ret prog stored return value into [rbp - 8]. Emit:
@@ -1995,14 +1997,14 @@ static bool is_valid_bpf_tramp_flags(unsigned int flags)
*/
int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end,
const struct btf_func_model *m, u32 flags,
- struct bpf_tramp_progs *tprogs,
+ struct bpf_tramp_links *tlinks,
void *orig_call)
{
int ret, i, nr_args = m->nr_args;
int regs_off, ip_off, args_off, stack_size = nr_args * 8;
- struct bpf_tramp_progs *fentry = &tprogs[BPF_TRAMP_FENTRY];
- struct bpf_tramp_progs *fexit = &tprogs[BPF_TRAMP_FEXIT];
- struct bpf_tramp_progs *fmod_ret = &tprogs[BPF_TRAMP_MODIFY_RETURN];
+ struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
+ struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
+ struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
u8 **branches = NULL;
u8 *prog;
bool save_ret;
@@ -2093,13 +2095,13 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
}
}

- if (fentry->nr_progs)
+ if (fentry->nr_links)
if (invoke_bpf(m, &prog, fentry, regs_off,
flags & BPF_TRAMP_F_RET_FENTRY_RET))
return -EINVAL;

- if (fmod_ret->nr_progs) {
- branches = kcalloc(fmod_ret->nr_progs, sizeof(u8 *),
+ if (fmod_ret->nr_links) {
+ branches = kcalloc(fmod_ret->nr_links, sizeof(u8 *),
GFP_KERNEL);
if (!branches)
return -ENOMEM;
@@ -2126,7 +2128,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
prog += X86_PATCH_SIZE;
}

- if (fmod_ret->nr_progs) {
+ if (fmod_ret->nr_links) {
/* From Intel 64 and IA-32 Architectures Optimization
* Reference Manual, 3.4.1.4 Code Alignment, Assembly/Compiler
* Coding Rule 11: All branch targets should be 16-byte
@@ -2136,12 +2138,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
/* Update the branches saved in invoke_bpf_mod_ret with the
* aligned address of do_fexit.
*/
- for (i = 0; i < fmod_ret->nr_progs; i++)
+ for (i = 0; i < fmod_ret->nr_links; i++)
emit_cond_near_jump(&branches[i], prog, branches[i],
X86_JNE);
}

- if (fexit->nr_progs)
+ if (fexit->nr_links)
if (invoke_bpf(m, &prog, fexit, regs_off, false)) {
ret = -EINVAL;
goto cleanup;
@@ -2475,3 +2477,34 @@ void *bpf_arch_text_copy(void *dst, void *src, size_t len)
return ERR_PTR(-EINVAL);
return dst;
}
+
+/* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */
+bool bpf_jit_supports_subprog_tailcalls(void)
+{
+ return true;
+}
+
+void bpf_jit_free(struct bpf_prog *prog)
+{
+ if (prog->jited) {
+ struct x64_jit_data *jit_data = prog->aux->jit_data;
+ struct bpf_binary_header *hdr;
+
+ /*
+ * If we fail the final pass of JIT (from jit_subprogs),
+ * the program may not be finalized yet. Call finalize here
+ * before freeing it.
+ */
+ if (jit_data) {
+ bpf_jit_binary_pack_finalize(prog, jit_data->header,
+ jit_data->rw_header);
+ kvfree(jit_data->addrs);
+ kfree(jit_data);
+ }
+ hdr = bpf_jit_binary_pack_hdr(prog);
+ bpf_jit_binary_pack_free(hdr, NULL);
+ WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(prog));
+ }
+
+ bpf_prog_unlock_free(prog);
+}
diff --git a/arch/x86/platform/olpc/olpc-xo1-sci.c b/arch/x86/platform/olpc/olpc-xo1-sci.c
index f03a6883dcc6..89f25af4b3c3 100644
--- a/arch/x86/platform/olpc/olpc-xo1-sci.c
+++ b/arch/x86/platform/olpc/olpc-xo1-sci.c
@@ -80,7 +80,7 @@ static void send_ebook_state(void)
return;
}

- if (!!test_bit(SW_TABLET_MODE, ebook_switch_idev->sw) == state)
+ if (test_bit(SW_TABLET_MODE, ebook_switch_idev->sw) == !!state)
return; /* Nothing new to report. */

input_report_switch(ebook_switch_idev, SW_TABLET_MODE, state);
diff --git a/arch/x86/um/Makefile b/arch/x86/um/Makefile
index ba5789c35809..a8cde4e8ab11 100644
--- a/arch/x86/um/Makefile
+++ b/arch/x86/um/Makefile
@@ -28,7 +28,8 @@ else

obj-y += syscalls_64.o vdso/

-subarch-y = ../lib/csum-partial_64.o ../lib/memcpy_64.o ../entry/thunk_64.o
+subarch-y = ../lib/csum-partial_64.o ../lib/memcpy_64.o
+subarch-$(CONFIG_PREEMPTION) += ../entry/thunk_64.o

endif

diff --git a/arch/xtensa/platforms/iss/network.c b/arch/xtensa/platforms/iss/network.c
index be3aaaad8bee..eee389187807 100644
--- a/arch/xtensa/platforms/iss/network.c
+++ b/arch/xtensa/platforms/iss/network.c
@@ -503,16 +503,24 @@ static const struct net_device_ops iss_netdev_ops = {
.ndo_set_rx_mode = iss_net_set_multicast_list,
};

-static int iss_net_configure(int index, char *init)
+static void iss_net_pdev_release(struct device *dev)
+{
+ struct platform_device *pdev = to_platform_device(dev);
+ struct iss_net_private *lp =
+ container_of(pdev, struct iss_net_private, pdev);
+
+ free_netdev(lp->dev);
+}
+
+static void iss_net_configure(int index, char *init)
{
struct net_device *dev;
struct iss_net_private *lp;
- int err;

dev = alloc_etherdev(sizeof(*lp));
if (dev == NULL) {
pr_err("eth_configure: failed to allocate device\n");
- return 1;
+ return;
}

/* Initialize private element. */
@@ -541,7 +549,7 @@ static int iss_net_configure(int index, char *init)
if (!tuntap_probe(lp, index, init)) {
pr_err("%s: invalid arguments. Skipping device!\n",
dev->name);
- goto errout;
+ goto err_free_netdev;
}

pr_info("Netdevice %d (%pM)\n", index, dev->dev_addr);
@@ -549,7 +557,8 @@ static int iss_net_configure(int index, char *init)
/* sysfs register */

if (!driver_registered) {
- platform_driver_register(&iss_net_driver);
+ if (platform_driver_register(&iss_net_driver))
+ goto err_free_netdev;
driver_registered = 1;
}

@@ -559,7 +568,9 @@ static int iss_net_configure(int index, char *init)

lp->pdev.id = index;
lp->pdev.name = DRIVER_NAME;
- platform_device_register(&lp->pdev);
+ lp->pdev.dev.release = iss_net_pdev_release;
+ if (platform_device_register(&lp->pdev))
+ goto err_free_netdev;
SET_NETDEV_DEV(dev, &lp->pdev.dev);

dev->netdev_ops = &iss_netdev_ops;
@@ -568,23 +579,20 @@ static int iss_net_configure(int index, char *init)
dev->irq = -1;

rtnl_lock();
- err = register_netdevice(dev);
- rtnl_unlock();
-
- if (err) {
+ if (register_netdevice(dev)) {
+ rtnl_unlock();
pr_err("%s: error registering net device!\n", dev->name);
- /* XXX: should we call ->remove() here? */
- free_netdev(dev);
- return 1;
+ platform_device_unregister(&lp->pdev);
+ return;
}
+ rtnl_unlock();

timer_setup(&lp->tl, iss_net_user_timer_expire, 0);

- return 0;
+ return;

-errout:
- /* FIXME: unregister; free, etc.. */
- return -EIO;
+err_free_netdev:
+ free_netdev(dev);
}

/* ------------------------------------------------------------------------- */
diff --git a/block/bio.c b/block/bio.c
index d3ca79c3ebdf..7d4d5723350b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1129,6 +1129,37 @@ static void bio_put_pages(struct page **pages, size_t size, size_t off)
put_page(pages[i]);
}

+static int bio_iov_add_page(struct bio *bio, struct page *page,
+ unsigned int len, unsigned int offset)
+{
+ bool same_page = false;
+
+ if (!__bio_try_merge_page(bio, page, len, offset, &same_page)) {
+ if (WARN_ON_ONCE(bio_full(bio, len)))
+ return -EINVAL;
+ __bio_add_page(bio, page, len, offset);
+ return 0;
+ }
+
+ if (same_page)
+ put_page(page);
+ return 0;
+}
+
+static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
+ unsigned int len, unsigned int offset)
+{
+ struct request_queue *q = bdev_get_queue(bio->bi_bdev);
+ bool same_page = false;
+
+ if (bio_add_hw_page(q, bio, page, len, offset,
+ queue_max_zone_append_sectors(q), &same_page) != len)
+ return -EINVAL;
+ if (same_page)
+ put_page(page);
+ return 0;
+}
+
#define PAGE_PTRS_PER_BVEC (sizeof(struct bio_vec) / sizeof(struct page *))

/**
@@ -1147,61 +1178,11 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
struct page **pages = (struct page **)bv;
- bool same_page = false;
- ssize_t size, left;
- unsigned len, i;
- size_t offset;
-
- /*
- * Move page array up in the allocated memory for the bio vecs as far as
- * possible so that we can start filling biovecs from the beginning
- * without overwriting the temporary page array.
- */
- BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
- pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
-
- size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
- if (unlikely(size <= 0))
- return size ? size : -EFAULT;
-
- for (left = size, i = 0; left > 0; left -= len, i++) {
- struct page *page = pages[i];
-
- len = min_t(size_t, PAGE_SIZE - offset, left);
-
- if (__bio_try_merge_page(bio, page, len, offset, &same_page)) {
- if (same_page)
- put_page(page);
- } else {
- if (WARN_ON_ONCE(bio_full(bio, len))) {
- bio_put_pages(pages + i, left, offset);
- return -EINVAL;
- }
- __bio_add_page(bio, page, len, offset);
- }
- offset = 0;
- }
-
- iov_iter_advance(iter, size);
- return 0;
-}
-
-static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
-{
- unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
- unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
- struct request_queue *q = bdev_get_queue(bio->bi_bdev);
- unsigned int max_append_sectors = queue_max_zone_append_sectors(q);
- struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
- struct page **pages = (struct page **)bv;
ssize_t size, left;
unsigned len, i;
size_t offset;
int ret = 0;

- if (WARN_ON_ONCE(!max_append_sectors))
- return 0;
-
/*
* Move page array up in the allocated memory for the bio vecs as far as
* possible so that we can start filling biovecs from the beginning
@@ -1216,17 +1197,18 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)

for (left = size, i = 0; left > 0; left -= len, i++) {
struct page *page = pages[i];
- bool same_page = false;

len = min_t(size_t, PAGE_SIZE - offset, left);
- if (bio_add_hw_page(q, bio, page, len, offset,
- max_append_sectors, &same_page) != len) {
+ if (bio_op(bio) == REQ_OP_ZONE_APPEND)
+ ret = bio_iov_add_zone_append_page(bio, page, len,
+ offset);
+ else
+ ret = bio_iov_add_page(bio, page, len, offset);
+
+ if (ret) {
bio_put_pages(pages + i, left, offset);
- ret = -EINVAL;
break;
}
- if (same_page)
- put_page(page);
offset = 0;
}

@@ -1268,10 +1250,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
}

do {
- if (bio_op(bio) == REQ_OP_ZONE_APPEND)
- ret = __bio_iov_append_get_pages(bio, iter);
- else
- ret = __bio_iov_iter_get_pages(bio, iter);
+ ret = __bio_iov_iter_get_pages(bio, iter);
} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));

/* don't account direct I/O as memory stall */
diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 16705fbd0699..a19f2db4eeb2 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -2893,15 +2893,21 @@ static int blk_iocost_init(struct request_queue *q)
* called before policy activation completion, can't assume that the
* target bio has an iocg associated and need to test for NULL iocg.
*/
- rq_qos_add(q, rqos);
+ ret = rq_qos_add(q, rqos);
+ if (ret)
+ goto err_free_ioc;
+
ret = blkcg_activate_policy(q, &blkcg_policy_iocost);
- if (ret) {
- rq_qos_del(q, rqos);
- free_percpu(ioc->pcpu_stat);
- kfree(ioc);
- return ret;
- }
+ if (ret)
+ goto err_del_qos;
return 0;
+
+err_del_qos:
+ rq_qos_del(q, rqos);
+err_free_ioc:
+ free_percpu(ioc->pcpu_stat);
+ kfree(ioc);
+ return ret;
}

static struct blkcg_policy_data *ioc_cpd_alloc(gfp_t gfp)
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 9568bf8dfe82..7845dca5fcfd 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -773,19 +773,23 @@ int blk_iolatency_init(struct request_queue *q)
rqos->ops = &blkcg_iolatency_ops;
rqos->q = q;

- rq_qos_add(q, rqos);
-
+ ret = rq_qos_add(q, rqos);
+ if (ret)
+ goto err_free;
ret = blkcg_activate_policy(q, &blkcg_policy_iolatency);
- if (ret) {
- rq_qos_del(q, rqos);
- kfree(blkiolat);
- return ret;
- }
+ if (ret)
+ goto err_qos_del;

timer_setup(&blkiolat->timer, blkiolatency_timer_fn, 0);
INIT_WORK(&blkiolat->enable_work, blkiolatency_enable_work_fn);

return 0;
+
+err_qos_del:
+ rq_qos_del(q, rqos);
+err_free:
+ kfree(blkiolat);
+ return ret;
}

static void iolatency_set_min_lat_nsec(struct blkcg_gq *blkg, u64 val)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index aa0349e9f083..d491b6eb0ab9 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -713,11 +713,6 @@ void blk_mq_debugfs_register(struct request_queue *q)
}
}

-void blk_mq_debugfs_unregister(struct request_queue *q)
-{
- q->sched_debugfs_dir = NULL;
-}
-
static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx,
struct blk_mq_ctx *ctx)
{
@@ -737,6 +732,9 @@ void blk_mq_debugfs_register_hctx(struct request_queue *q,
char name[20];
int i;

+ if (!q->debugfs_dir)
+ return;
+
snprintf(name, sizeof(name), "hctx%u", hctx->queue_num);
hctx->debugfs_dir = debugfs_create_dir(name, q->debugfs_dir);

@@ -748,6 +746,8 @@ void blk_mq_debugfs_register_hctx(struct request_queue *q,

void blk_mq_debugfs_unregister_hctx(struct blk_mq_hw_ctx *hctx)
{
+ if (!hctx->queue->debugfs_dir)
+ return;
debugfs_remove_recursive(hctx->debugfs_dir);
hctx->sched_debugfs_dir = NULL;
hctx->debugfs_dir = NULL;
@@ -775,6 +775,8 @@ void blk_mq_debugfs_register_sched(struct request_queue *q)
{
struct elevator_type *e = q->elevator->type;

+ lockdep_assert_held(&q->debugfs_mutex);
+
/*
* If the parent directory has not been created yet, return, we will be
* called again later on and the directory/files will be created then.
@@ -792,6 +794,8 @@ void blk_mq_debugfs_register_sched(struct request_queue *q)

void blk_mq_debugfs_unregister_sched(struct request_queue *q)
{
+ lockdep_assert_held(&q->debugfs_mutex);
+
debugfs_remove_recursive(q->sched_debugfs_dir);
q->sched_debugfs_dir = NULL;
}
@@ -813,6 +817,10 @@ static const char *rq_qos_id_to_name(enum rq_qos_id id)

void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos)
{
+ lockdep_assert_held(&rqos->q->debugfs_mutex);
+
+ if (!rqos->q->debugfs_dir)
+ return;
debugfs_remove_recursive(rqos->debugfs_dir);
rqos->debugfs_dir = NULL;
}
@@ -822,6 +830,8 @@ void blk_mq_debugfs_register_rqos(struct rq_qos *rqos)
struct request_queue *q = rqos->q;
const char *dir_name = rq_qos_id_to_name(rqos->id);

+ lockdep_assert_held(&q->debugfs_mutex);
+
if (rqos->debugfs_dir || !rqos->ops->debugfs_attrs)
return;

@@ -837,6 +847,8 @@ void blk_mq_debugfs_register_rqos(struct rq_qos *rqos)

void blk_mq_debugfs_unregister_queue_rqos(struct request_queue *q)
{
+ lockdep_assert_held(&q->debugfs_mutex);
+
debugfs_remove_recursive(q->rqos_debugfs_dir);
q->rqos_debugfs_dir = NULL;
}
@@ -846,6 +858,8 @@ void blk_mq_debugfs_register_sched_hctx(struct request_queue *q,
{
struct elevator_type *e = q->elevator->type;

+ lockdep_assert_held(&q->debugfs_mutex);
+
/*
* If the parent debugfs directory has not been created yet, return;
* We will be called again later on with appropriate parent debugfs
@@ -865,6 +879,10 @@ void blk_mq_debugfs_register_sched_hctx(struct request_queue *q,

void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx)
{
+ lockdep_assert_held(&hctx->queue->debugfs_mutex);
+
+ if (!hctx->queue->debugfs_dir)
+ return;
debugfs_remove_recursive(hctx->sched_debugfs_dir);
hctx->sched_debugfs_dir = NULL;
}
diff --git a/block/blk-mq-debugfs.h b/block/blk-mq-debugfs.h
index 69918f4170d6..771d45832878 100644
--- a/block/blk-mq-debugfs.h
+++ b/block/blk-mq-debugfs.h
@@ -21,7 +21,6 @@ int __blk_mq_debugfs_rq_show(struct seq_file *m, struct request *rq);
int blk_mq_debugfs_rq_show(struct seq_file *m, void *v);

void blk_mq_debugfs_register(struct request_queue *q);
-void blk_mq_debugfs_unregister(struct request_queue *q);
void blk_mq_debugfs_register_hctx(struct request_queue *q,
struct blk_mq_hw_ctx *hctx);
void blk_mq_debugfs_unregister_hctx(struct blk_mq_hw_ctx *hctx);
@@ -42,10 +41,6 @@ static inline void blk_mq_debugfs_register(struct request_queue *q)
{
}

-static inline void blk_mq_debugfs_unregister(struct request_queue *q)
-{
-}
-
static inline void blk_mq_debugfs_register_hctx(struct request_queue *q,
struct blk_mq_hw_ctx *hctx)
{
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 9e56a69422b6..e84bec39fd3a 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -593,7 +593,9 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
if (ret)
goto err_free_map_and_rqs;

+ mutex_lock(&q->debugfs_mutex);
blk_mq_debugfs_register_sched(q);
+ mutex_unlock(&q->debugfs_mutex);

queue_for_each_hw_ctx(q, hctx, i) {
if (e->ops.init_hctx) {
@@ -606,7 +608,9 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
return ret;
}
}
+ mutex_lock(&q->debugfs_mutex);
blk_mq_debugfs_register_sched_hctx(q, hctx);
+ mutex_unlock(&q->debugfs_mutex);
}

return 0;
@@ -647,14 +651,21 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
unsigned int flags = 0;

queue_for_each_hw_ctx(q, hctx, i) {
+ mutex_lock(&q->debugfs_mutex);
blk_mq_debugfs_unregister_sched_hctx(hctx);
+ mutex_unlock(&q->debugfs_mutex);
+
if (e->type->ops.exit_hctx && hctx->sched_data) {
e->type->ops.exit_hctx(hctx, i);
hctx->sched_data = NULL;
}
flags = hctx->flags;
}
+
+ mutex_lock(&q->debugfs_mutex);
blk_mq_debugfs_unregister_sched(q);
+ mutex_unlock(&q->debugfs_mutex);
+
if (e->type->ops.exit_sched)
e->type->ops.exit_sched(e);
blk_mq_sched_tags_teardown(q, flags);
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index e83af7bc7591..249a6f05dd3b 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -294,7 +294,9 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data,

void rq_qos_exit(struct request_queue *q)
{
+ mutex_lock(&q->debugfs_mutex);
blk_mq_debugfs_unregister_queue_rqos(q);
+ mutex_unlock(&q->debugfs_mutex);

while (q->rq_qos) {
struct rq_qos *rqos = q->rq_qos;
diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index 68267007da1c..08b856570ad1 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -86,7 +86,7 @@ static inline void rq_wait_init(struct rq_wait *rq_wait)
init_waitqueue_head(&rq_wait->wait);
}

-static inline void rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
+static inline int rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
{
/*
* No IO can be in-flight when adding rqos, so freeze queue, which
@@ -98,14 +98,26 @@ static inline void rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
blk_mq_freeze_queue(q);

spin_lock_irq(&q->queue_lock);
+ if (rq_qos_id(q, rqos->id))
+ goto ebusy;
rqos->next = q->rq_qos;
q->rq_qos = rqos;
spin_unlock_irq(&q->queue_lock);

blk_mq_unfreeze_queue(q);

- if (rqos->ops->debugfs_attrs)
+ if (rqos->ops->debugfs_attrs) {
+ mutex_lock(&q->debugfs_mutex);
blk_mq_debugfs_register_rqos(rqos);
+ mutex_unlock(&q->debugfs_mutex);
+ }
+
+ return 0;
+ebusy:
+ spin_unlock_irq(&q->queue_lock);
+ blk_mq_unfreeze_queue(q);
+ return -EBUSY;
+
}

static inline void rq_qos_del(struct request_queue *q, struct rq_qos *rqos)
@@ -129,7 +141,9 @@ static inline void rq_qos_del(struct request_queue *q, struct rq_qos *rqos)

blk_mq_unfreeze_queue(q);

+ mutex_lock(&q->debugfs_mutex);
blk_mq_debugfs_unregister_rqos(rqos);
+ mutex_unlock(&q->debugfs_mutex);
}

typedef bool (acquire_inflight_cb_t)(struct rq_wait *rqw, void *private_data);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 88bd41d4cb59..6e4801b217a7 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -779,14 +779,13 @@ static void blk_release_queue(struct kobject *kobj)
if (queue_is_mq(q))
blk_mq_release(q);

- blk_trace_shutdown(q);
mutex_lock(&q->debugfs_mutex);
+ blk_trace_shutdown(q);
debugfs_remove_recursive(q->debugfs_dir);
+ q->debugfs_dir = NULL;
+ q->sched_debugfs_dir = NULL;
mutex_unlock(&q->debugfs_mutex);

- if (queue_is_mq(q))
- blk_mq_debugfs_unregister(q);
-
bioset_exit(&q->bio_split);

if (blk_queue_has_srcu(q))
@@ -836,17 +835,16 @@ int blk_register_queue(struct gendisk *disk)
goto unlock;
}

+ if (queue_is_mq(q))
+ __blk_mq_register_dev(dev, q);
+ mutex_lock(&q->sysfs_lock);
+
mutex_lock(&q->debugfs_mutex);
q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
blk_debugfs_root);
- mutex_unlock(&q->debugfs_mutex);
-
- if (queue_is_mq(q)) {
- __blk_mq_register_dev(dev, q);
+ if (queue_is_mq(q))
blk_mq_debugfs_register(q);
- }
-
- mutex_lock(&q->sysfs_lock);
+ mutex_unlock(&q->debugfs_mutex);

ret = disk_register_independent_access_ranges(disk, NULL);
if (ret)
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 0c119be0e813..ae6ea0b54579 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -820,6 +820,7 @@ int wbt_init(struct request_queue *q)
{
struct rq_wb *rwb;
int i;
+ int ret;

rwb = kzalloc(sizeof(*rwb), GFP_KERNEL);
if (!rwb)
@@ -846,7 +847,10 @@ int wbt_init(struct request_queue *q)
/*
* Assign rwb and add the stats callback.
*/
- rq_qos_add(q, &rwb->rqos);
+ ret = rq_qos_add(q, &rwb->rqos);
+ if (ret)
+ goto err_free;
+
blk_stat_add_callback(q, rwb->cb);

rwb->min_lat_nsec = wbt_default_latency_nsec(q);
@@ -855,4 +859,10 @@ int wbt_init(struct request_queue *q)
wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));

return 0;
+
+err_free:
+ blk_stat_free_callback(rwb->cb);
+ kfree(rwb);
+ return ret;
+
}
diff --git a/crypto/Kconfig b/crypto/Kconfig
index b4e00a7a046b..38601a072b99 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -692,26 +692,8 @@ config CRYPTO_BLAKE2B

See https://blake2.net for further information.

-config CRYPTO_BLAKE2S
- tristate "BLAKE2s digest algorithm"
- select CRYPTO_LIB_BLAKE2S_GENERIC
- select CRYPTO_HASH
- help
- Implementation of cryptographic hash function BLAKE2s
- optimized for 8-32bit platforms and can produce digests of any size
- between 1 to 32. The keyed hash is also implemented.
-
- This module provides the following algorithms:
-
- - blake2s-128
- - blake2s-160
- - blake2s-224
- - blake2s-256
-
- See https://blake2.net for further information.
-
config CRYPTO_BLAKE2S_X86
- tristate "BLAKE2s digest algorithm (x86 accelerated version)"
+ bool "BLAKE2s digest algorithm (x86 accelerated version)"
depends on X86 && 64BIT
select CRYPTO_LIB_BLAKE2S_GENERIC
select CRYPTO_ARCH_HAVE_LIB_BLAKE2S
diff --git a/crypto/Makefile b/crypto/Makefile
index a40e6d5fb2c8..dbfa53567c92 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -83,7 +83,6 @@ obj-$(CONFIG_CRYPTO_STREEBOG) += streebog_generic.o
obj-$(CONFIG_CRYPTO_WP512) += wp512.o
CFLAGS_wp512.o := $(call cc-option,-fno-schedule-insns) # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
obj-$(CONFIG_CRYPTO_BLAKE2B) += blake2b_generic.o
-obj-$(CONFIG_CRYPTO_BLAKE2S) += blake2s_generic.o
obj-$(CONFIG_CRYPTO_GF128MUL) += gf128mul.o
obj-$(CONFIG_CRYPTO_ECB) += ecb.o
obj-$(CONFIG_CRYPTO_CBC) += cbc.o
diff --git a/crypto/asymmetric_keys/public_key.c b/crypto/asymmetric_keys/public_key.c
index 7c9e6be35c30..2f8352e88860 100644
--- a/crypto/asymmetric_keys/public_key.c
+++ b/crypto/asymmetric_keys/public_key.c
@@ -304,6 +304,10 @@ static int cert_sig_digest_update(const struct public_key_signature *sig,

BUG_ON(!sig->data);

+ /* SM2 signatures always use the SM3 hash algorithm */
+ if (!sig->hash_algo || strcmp(sig->hash_algo, "sm3") != 0)
+ return -EINVAL;
+
ret = sm2_compute_z_digest(tfm_pkey, SM2_DEFAULT_USERID,
SM2_DEFAULT_USERID_LEN, dgst);
if (ret)
@@ -414,8 +418,7 @@ int public_key_verify_signature(const struct public_key *pkey,
if (ret)
goto error_free_key;

- if (sig->pkey_algo && strcmp(sig->pkey_algo, "sm2") == 0 &&
- sig->data_size) {
+ if (strcmp(pkey->pkey_algo, "sm2") == 0 && sig->data_size) {
ret = cert_sig_digest_update(sig, tfm);
if (ret)
goto error_free_key;
diff --git a/crypto/blake2s_generic.c b/crypto/blake2s_generic.c
deleted file mode 100644
index 5f96a21f8788..000000000000
--- a/crypto/blake2s_generic.c
+++ /dev/null
@@ -1,75 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0 OR MIT
-/*
- * shash interface to the generic implementation of BLAKE2s
- *
- * Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@xxxxxxxxx>. All Rights Reserved.
- */
-
-#include <crypto/internal/blake2s.h>
-#include <crypto/internal/hash.h>
-
-#include <linux/types.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-
-static int crypto_blake2s_update_generic(struct shash_desc *desc,
- const u8 *in, unsigned int inlen)
-{
- return crypto_blake2s_update(desc, in, inlen, true);
-}
-
-static int crypto_blake2s_final_generic(struct shash_desc *desc, u8 *out)
-{
- return crypto_blake2s_final(desc, out, true);
-}
-
-#define BLAKE2S_ALG(name, driver_name, digest_size) \
- { \
- .base.cra_name = name, \
- .base.cra_driver_name = driver_name, \
- .base.cra_priority = 100, \
- .base.cra_flags = CRYPTO_ALG_OPTIONAL_KEY, \
- .base.cra_blocksize = BLAKE2S_BLOCK_SIZE, \
- .base.cra_ctxsize = sizeof(struct blake2s_tfm_ctx), \
- .base.cra_module = THIS_MODULE, \
- .digestsize = digest_size, \
- .setkey = crypto_blake2s_setkey, \
- .init = crypto_blake2s_init, \
- .update = crypto_blake2s_update_generic, \
- .final = crypto_blake2s_final_generic, \
- .descsize = sizeof(struct blake2s_state), \
- }
-
-static struct shash_alg blake2s_algs[] = {
- BLAKE2S_ALG("blake2s-128", "blake2s-128-generic",
- BLAKE2S_128_HASH_SIZE),
- BLAKE2S_ALG("blake2s-160", "blake2s-160-generic",
- BLAKE2S_160_HASH_SIZE),
- BLAKE2S_ALG("blake2s-224", "blake2s-224-generic",
- BLAKE2S_224_HASH_SIZE),
- BLAKE2S_ALG("blake2s-256", "blake2s-256-generic",
- BLAKE2S_256_HASH_SIZE),
-};
-
-static int __init blake2s_mod_init(void)
-{
- return crypto_register_shashes(blake2s_algs, ARRAY_SIZE(blake2s_algs));
-}
-
-static void __exit blake2s_mod_exit(void)
-{
- crypto_unregister_shashes(blake2s_algs, ARRAY_SIZE(blake2s_algs));
-}
-
-subsys_initcall(blake2s_mod_init);
-module_exit(blake2s_mod_exit);
-
-MODULE_ALIAS_CRYPTO("blake2s-128");
-MODULE_ALIAS_CRYPTO("blake2s-128-generic");
-MODULE_ALIAS_CRYPTO("blake2s-160");
-MODULE_ALIAS_CRYPTO("blake2s-160-generic");
-MODULE_ALIAS_CRYPTO("blake2s-224");
-MODULE_ALIAS_CRYPTO("blake2s-224-generic");
-MODULE_ALIAS_CRYPTO("blake2s-256");
-MODULE_ALIAS_CRYPTO("blake2s-256-generic");
-MODULE_LICENSE("GPL v2");
diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 2bacf8384f59..66b7ca1ccb23 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -1669,10 +1669,6 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
ret += tcrypt_test("rmd160");
break;

- case 41:
- ret += tcrypt_test("blake2s-256");
- break;
-
case 42:
ret += tcrypt_test("blake2b-512");
break;
@@ -2240,10 +2236,6 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
test_hash_speed("rmd160", sec, generic_hash_speed_template);
if (mode > 300 && mode < 400) break;
fallthrough;
- case 316:
- test_hash_speed("blake2s-256", sec, generic_hash_speed_template);
- if (mode > 300 && mode < 400) break;
- fallthrough;
case 317:
test_hash_speed("blake2b-512", sec, generic_hash_speed_template);
if (mode > 300 && mode < 400) break;
@@ -2352,10 +2344,6 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
test_ahash_speed("rmd160", sec, generic_hash_speed_template);
if (mode > 400 && mode < 500) break;
fallthrough;
- case 416:
- test_ahash_speed("blake2s-256", sec, generic_hash_speed_template);
- if (mode > 400 && mode < 500) break;
- fallthrough;
case 417:
test_ahash_speed("blake2b-512", sec, generic_hash_speed_template);
if (mode > 400 && mode < 500) break;
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 4948201065cc..56facdb63843 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -4324,30 +4324,6 @@ static const struct alg_test_desc alg_test_descs[] = {
.suite = {
.hash = __VECS(blake2b_512_tv_template)
}
- }, {
- .alg = "blake2s-128",
- .test = alg_test_hash,
- .suite = {
- .hash = __VECS(blakes2s_128_tv_template)
- }
- }, {
- .alg = "blake2s-160",
- .test = alg_test_hash,
- .suite = {
- .hash = __VECS(blakes2s_160_tv_template)
- }
- }, {
- .alg = "blake2s-224",
- .test = alg_test_hash,
- .suite = {
- .hash = __VECS(blakes2s_224_tv_template)
- }
- }, {
- .alg = "blake2s-256",
- .test = alg_test_hash,
- .suite = {
- .hash = __VECS(blakes2s_256_tv_template)
- }
}, {
.alg = "cbc(aes)",
.test = alg_test_skcipher,
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 4d7449fc6a65..c29658337d96 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -34034,221 +34034,4 @@ static const struct hash_testvec blake2b_512_tv_template[] = {{
0xae, 0x15, 0x81, 0x15, 0xd0, 0x88, 0xa0, 0x3c, },
}};

-static const struct hash_testvec blakes2s_128_tv_template[] = {{
- .digest = (u8[]){ 0x64, 0x55, 0x0d, 0x6f, 0xfe, 0x2c, 0x0a, 0x01,
- 0xa1, 0x4a, 0xba, 0x1e, 0xad, 0xe0, 0x20, 0x0c, },
-}, {
- .plaintext = blake2_ordered_sequence,
- .psize = 64,
- .digest = (u8[]){ 0xdc, 0x66, 0xca, 0x8f, 0x03, 0x86, 0x58, 0x01,
- 0xb0, 0xff, 0xe0, 0x6e, 0xd8, 0xa1, 0xa9, 0x0e, },
-}, {
- .ksize = 16,
- .key = blake2_ordered_sequence,
- .plaintext = blake2_ordered_sequence,
- .psize = 1,
- .digest = (u8[]){ 0x88, 0x1e, 0x42, 0xe7, 0xbb, 0x35, 0x80, 0x82,
- 0x63, 0x7c, 0x0a, 0x0f, 0xd7, 0xec, 0x6c, 0x2f, },
-}, {
- .ksize = 32,
- .key = blake2_ordered_sequence,
- .plaintext = blake2_ordered_sequence,
- .psize = 7,
- .digest = (u8[]){ 0xcf, 0x9e, 0x07, 0x2a, 0xd5, 0x22, 0xf2, 0xcd,
- 0xa2, 0xd8, 0x25, 0x21, 0x80, 0x86, 0x73, 0x1c, },
-}, {
- .ksize = 1,
- .key = "B",
- .plaintext = blake2_ordered_sequence,
- .psize = 15,
- .digest = (u8[]){ 0xf6, 0x33, 0x5a, 0x2c, 0x22, 0xa0, 0x64, 0xb2,
- 0xb6, 0x3f, 0xeb, 0xbc, 0xd1, 0xc3, 0xe5, 0xb2, },
-}, {
- .ksize = 16,
- .key = blake2_ordered_sequence,
- .plaintext = blake2_ordered_sequence,
- .psize = 247,
- .digest = (u8[]){ 0x72, 0x66, 0x49, 0x60, 0xf9, 0x4a, 0xea, 0xbe,
- 0x1f, 0xf4, 0x60, 0xce, 0xb7, 0x81, 0xcb, 0x09, },
-}, {
- .ksize = 32,
- .key = blake2_ordered_sequence,
- .plaintext = blake2_ordered_sequence,
- .psize = 256,
- .digest = (u8[]){ 0xd5, 0xa4, 0x0e, 0xc3, 0x16, 0xc7, 0x51, 0xa6,
- 0x3c, 0xd0, 0xd9, 0x11, 0x57, 0xfa, 0x1e, 0xbb, },
-}};
-
-static const struct hash_testvec blakes2s_160_tv_template[] = {{
- .plaintext = blake2_ordered_sequence,
- .psize = 7,
- .digest = (u8[]){ 0xb4, 0xf2, 0x03, 0x49, 0x37, 0xed, 0xb1, 0x3e,
- 0x5b, 0x2a, 0xca, 0x64, 0x82, 0x74, 0xf6, 0x62,
- 0xe3, 0xf2, 0x84, 0xff, },
-}, {
- .plaintext = blake2_ordered_sequence,
- .psize = 256,
- .digest = (u8[]){ 0xaa, 0x56, 0x9b, 0xdc, 0x98, 0x17, 0x75, 0xf2,
- 0xb3, 0x68, 0x83, 0xb7, 0x9b, 0x8d, 0x48, 0xb1,
- 0x9b, 0x2d, 0x35, 0x05, },
-}, {
- .ksize = 1,
- .key = "B",
- .digest = (u8[]){ 0x50, 0x16, 0xe7, 0x0c, 0x01, 0xd0, 0xd3, 0xc3,
- 0xf4, 0x3e, 0xb1, 0x6e, 0x97, 0xa9, 0x4e, 0xd1,
- 0x79, 0x65, 0x32, 0x93, },
-}, {
- .ksize = 32,
- .key = blake2_ordered_sequence,
- .plaintext = blake2_ordered_sequence,
- .psize = 1,
- .digest = (u8[]){ 0x1c, 0x2b, 0xcd, 0x9a, 0x68, 0xca, 0x8c, 0x71,
- 0x90, 0x29, 0x6c, 0x54, 0xfa, 0x56, 0x4a, 0xef,
- 0xa2, 0x3a, 0x56, 0x9c, },
-}, {
- .ksize = 16,
- .key = blake2_ordered_sequence,
- .plaintext = blake2_ordered_sequence,
- .psize = 15,
- .digest = (u8[]){ 0x36, 0xc3, 0x5f, 0x9a, 0xdc, 0x7e, 0xbf, 0x19,
- 0x68, 0xaa, 0xca, 0xd8, 0x81, 0xbf, 0x09, 0x34,
- 0x83, 0x39, 0x0f, 0x30, },
-}, {
- .ksize = 1,
- .key = "B",
- .plaintext = blake2_ordered_sequence,
- .psize = 64,
- .digest = (u8[]){ 0x86, 0x80, 0x78, 0xa4, 0x14, 0xec, 0x03, 0xe5,
- 0xb6, 0x9a, 0x52, 0x0e, 0x42, 0xee, 0x39, 0x9d,
- 0xac, 0xa6, 0x81, 0x63, },
-}, {
- .ksize = 32,
- .key = blake2_ordered_sequence,
- .plaintext = blake2_ordered_sequence,
- .psize = 247,
- .digest = (u8[]){ 0x2d, 0xd8, 0xd2, 0x53, 0x66, 0xfa, 0xa9, 0x01,
- 0x1c, 0x9c, 0xaf, 0xa3, 0xe2, 0x9d, 0x9b, 0x10,
- 0x0a, 0xf6, 0x73, 0xe8, },
-}};
-
-static const struct hash_testvec blakes2s_224_tv_template[] = {{
- .plaintext = blake2_ordered_sequence,
- .psize = 1,
- .digest = (u8[]){ 0x61, 0xb9, 0x4e, 0xc9, 0x46, 0x22, 0xa3, 0x91,
- 0xd2, 0xae, 0x42, 0xe6, 0x45, 0x6c, 0x90, 0x12,
- 0xd5, 0x80, 0x07, 0x97, 0xb8, 0x86, 0x5a, 0xfc,
- 0x48, 0x21, 0x97, 0xbb, },
-}, {
- .plaintext = blake2_ordered_sequence,
- .psize = 247,
- .digest = (u8[]){ 0x9e, 0xda, 0xc7, 0x20, 0x2c, 0xd8, 0x48, 0x2e,
- 0x31, 0x94, 0xab, 0x46, 0x6d, 0x94, 0xd8, 0xb4,
- 0x69, 0xcd, 0xae, 0x19, 0x6d, 0x9e, 0x41, 0xcc,
- 0x2b, 0xa4, 0xd5, 0xf6, },
-}, {
- .ksize = 16,
- .key = blake2_ordered_sequence,
- .digest = (u8[]){ 0x32, 0xc0, 0xac, 0xf4, 0x3b, 0xd3, 0x07, 0x9f,
- 0xbe, 0xfb, 0xfa, 0x4d, 0x6b, 0x4e, 0x56, 0xb3,
- 0xaa, 0xd3, 0x27, 0xf6, 0x14, 0xbf, 0xb9, 0x32,
- 0xa7, 0x19, 0xfc, 0xb8, },
-}, {
- .ksize = 1,
- .key = "B",
- .plaintext = blake2_ordered_sequence,
- .psize = 7,
- .digest = (u8[]){ 0x73, 0xad, 0x5e, 0x6d, 0xb9, 0x02, 0x8e, 0x76,
- 0xf2, 0x66, 0x42, 0x4b, 0x4c, 0xfa, 0x1f, 0xe6,
- 0x2e, 0x56, 0x40, 0xe5, 0xa2, 0xb0, 0x3c, 0xe8,
- 0x7b, 0x45, 0xfe, 0x05, },
-}, {
- .ksize = 32,
- .key = blake2_ordered_sequence,
- .plaintext = blake2_ordered_sequence,
- .psize = 15,
- .digest = (u8[]){ 0x16, 0x60, 0xfb, 0x92, 0x54, 0xb3, 0x6e, 0x36,
- 0x81, 0xf4, 0x16, 0x41, 0xc3, 0x3d, 0xd3, 0x43,
- 0x84, 0xed, 0x10, 0x6f, 0x65, 0x80, 0x7a, 0x3e,
- 0x25, 0xab, 0xc5, 0x02, },
-}, {
- .ksize = 16,
- .key = blake2_ordered_sequence,
- .plaintext = blake2_ordered_sequence,
- .psize = 64,
- .digest = (u8[]){ 0xca, 0xaa, 0x39, 0x67, 0x9c, 0xf7, 0x6b, 0xc7,
- 0xb6, 0x82, 0xca, 0x0e, 0x65, 0x36, 0x5b, 0x7c,
- 0x24, 0x00, 0xfa, 0x5f, 0xda, 0x06, 0x91, 0x93,
- 0x6a, 0x31, 0x83, 0xb5, },
-}, {
- .ksize = 1,
- .key = "B",
- .plaintext = blake2_ordered_sequence,
- .psize = 256,
- .digest = (u8[]){ 0x90, 0x02, 0x26, 0xb5, 0x06, 0x9c, 0x36, 0x86,
- 0x94, 0x91, 0x90, 0x1e, 0x7d, 0x2a, 0x71, 0xb2,
- 0x48, 0xb5, 0xe8, 0x16, 0xfd, 0x64, 0x33, 0x45,
- 0xb3, 0xd7, 0xec, 0xcc, },
-}};
-
-static const struct hash_testvec blakes2s_256_tv_template[] = {{
- .plaintext = blake2_ordered_sequence,
- .psize = 15,
- .digest = (u8[]){ 0xd9, 0x7c, 0x82, 0x8d, 0x81, 0x82, 0xa7, 0x21,
- 0x80, 0xa0, 0x6a, 0x78, 0x26, 0x83, 0x30, 0x67,
- 0x3f, 0x7c, 0x4e, 0x06, 0x35, 0x94, 0x7c, 0x04,
- 0xc0, 0x23, 0x23, 0xfd, 0x45, 0xc0, 0xa5, 0x2d, },
-}, {
- .ksize = 32,
- .key = blake2_ordered_sequence,
- .digest = (u8[]){ 0x48, 0xa8, 0x99, 0x7d, 0xa4, 0x07, 0x87, 0x6b,
- 0x3d, 0x79, 0xc0, 0xd9, 0x23, 0x25, 0xad, 0x3b,
- 0x89, 0xcb, 0xb7, 0x54, 0xd8, 0x6a, 0xb7, 0x1a,
- 0xee, 0x04, 0x7a, 0xd3, 0x45, 0xfd, 0x2c, 0x49, },
-}, {
- .ksize = 1,
- .key = "B",
- .plaintext = blake2_ordered_sequence,
- .psize = 1,
- .digest = (u8[]){ 0x22, 0x27, 0xae, 0xaa, 0x6e, 0x81, 0x56, 0x03,
- 0xa7, 0xe3, 0xa1, 0x18, 0xa5, 0x9a, 0x2c, 0x18,
- 0xf4, 0x63, 0xbc, 0x16, 0x70, 0xf1, 0xe7, 0x4b,
- 0x00, 0x6d, 0x66, 0x16, 0xae, 0x9e, 0x74, 0x4e, },
-}, {
- .ksize = 16,
- .key = blake2_ordered_sequence,
- .plaintext = blake2_ordered_sequence,
- .psize = 7,
- .digest = (u8[]){ 0x58, 0x5d, 0xa8, 0x60, 0x1c, 0xa4, 0xd8, 0x03,
- 0x86, 0x86, 0x84, 0x64, 0xd7, 0xa0, 0x8e, 0x15,
- 0x2f, 0x05, 0xa2, 0x1b, 0xbc, 0xef, 0x7a, 0x34,
- 0xb3, 0xc5, 0xbc, 0x4b, 0xf0, 0x32, 0xeb, 0x12, },
-}, {
- .ksize = 32,
- .key = blake2_ordered_sequence,
- .plaintext = blake2_ordered_sequence,
- .psize = 64,
- .digest = (u8[]){ 0x89, 0x75, 0xb0, 0x57, 0x7f, 0xd3, 0x55, 0x66,
- 0xd7, 0x50, 0xb3, 0x62, 0xb0, 0x89, 0x7a, 0x26,
- 0xc3, 0x99, 0x13, 0x6d, 0xf0, 0x7b, 0xab, 0xab,
- 0xbd, 0xe6, 0x20, 0x3f, 0xf2, 0x95, 0x4e, 0xd4, },
-}, {
- .ksize = 1,
- .key = "B",
- .plaintext = blake2_ordered_sequence,
- .psize = 247,
- .digest = (u8[]){ 0x2e, 0x74, 0x1c, 0x1d, 0x03, 0xf4, 0x9d, 0x84,
- 0x6f, 0xfc, 0x86, 0x32, 0x92, 0x49, 0x7e, 0x66,
- 0xd7, 0xc3, 0x10, 0x88, 0xfe, 0x28, 0xb3, 0xe0,
- 0xbf, 0x50, 0x75, 0xad, 0x8e, 0xa4, 0xe6, 0xb2, },
-}, {
- .ksize = 16,
- .key = blake2_ordered_sequence,
- .plaintext = blake2_ordered_sequence,
- .psize = 256,
- .digest = (u8[]){ 0xb9, 0xd2, 0x81, 0x0e, 0x3a, 0xb1, 0x62, 0x9b,
- 0xad, 0x44, 0x05, 0xf4, 0x92, 0x2e, 0x99, 0xc1,
- 0x4a, 0x47, 0xbb, 0x5b, 0x6f, 0xb2, 0x96, 0xed,
- 0xd5, 0x06, 0xb5, 0x3a, 0x7c, 0x7a, 0x65, 0x1d, },
-}};
-
#endif /* _CRYPTO_TESTMGR_H */
diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c
index fbe0756259c5..c4d4d21391d7 100644
--- a/drivers/acpi/acpi_lpss.c
+++ b/drivers/acpi/acpi_lpss.c
@@ -422,6 +422,9 @@ static int register_device_clock(struct acpi_device *adev,
if (!lpss_clk_dev)
lpt_register_clock_device();

+ if (IS_ERR(lpss_clk_dev))
+ return PTR_ERR(lpss_clk_dev);
+
clk_data = platform_get_drvdata(lpss_clk_dev);
if (!clk_data)
return -ENODEV;
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 95cc2a9f3e05..49b5e317e916 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -546,6 +546,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
!= REGION_INTERSECTS) &&
(region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
!= REGION_INTERSECTS) &&
+ (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_SOFT_RESERVED)
+ != REGION_INTERSECTS) &&
!arch_is_platform_page(base_addr)))
return -EINVAL;

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 6c735cfa7d43..ef7858393a3c 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1373,6 +1373,7 @@ static int __init acpi_init(void)

pci_mmcfg_late_init();
acpi_iort_init();
+ acpi_viot_early_init();
acpi_hest_init();
acpi_ghes_init();
acpi_scan_init();
diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index b8e26b6b5523..35d894674eba 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -600,33 +600,6 @@ static int pcc_data_alloc(int pcc_ss_id)
return 0;
}

-/* Check if CPPC revision + num_ent combination is supported */
-static bool is_cppc_supported(int revision, int num_ent)
-{
- int expected_num_ent;
-
- switch (revision) {
- case CPPC_V2_REV:
- expected_num_ent = CPPC_V2_NUM_ENT;
- break;
- case CPPC_V3_REV:
- expected_num_ent = CPPC_V3_NUM_ENT;
- break;
- default:
- pr_debug("Firmware exports unsupported CPPC revision: %d\n",
- revision);
- return false;
- }
-
- if (expected_num_ent != num_ent) {
- pr_debug("Firmware exports %d entries. Expected: %d for CPPC rev:%d\n",
- num_ent, expected_num_ent, revision);
- return false;
- }
-
- return true;
-}
-
/*
* An example CPC table looks like the following.
*
@@ -715,7 +688,6 @@ int acpi_cppc_processor_probe(struct acpi_processor *pr)
cpc_obj->type, pr->id);
goto out_free;
}
- cpc_ptr->num_entries = num_ent;

/* Second entry should be revision. */
cpc_obj = &out_obj->package.elements[1];
@@ -726,10 +698,32 @@ int acpi_cppc_processor_probe(struct acpi_processor *pr)
cpc_obj->type, pr->id);
goto out_free;
}
- cpc_ptr->version = cpc_rev;

- if (!is_cppc_supported(cpc_rev, num_ent))
+ if (cpc_rev < CPPC_V2_REV) {
+ pr_debug("Unsupported _CPC Revision (%d) for CPU:%d\n", cpc_rev,
+ pr->id);
+ goto out_free;
+ }
+
+ /*
+ * Disregard _CPC if the number of entries in the return pachage is not
+ * as expected, but support future revisions being proper supersets of
+ * the v3 and only causing more entries to be returned by _CPC.
+ */
+ if ((cpc_rev == CPPC_V2_REV && num_ent != CPPC_V2_NUM_ENT) ||
+ (cpc_rev == CPPC_V3_REV && num_ent != CPPC_V3_NUM_ENT) ||
+ (cpc_rev > CPPC_V3_REV && num_ent <= CPPC_V3_NUM_ENT)) {
+ pr_debug("Unexpected number of _CPC return package entries (%d) for CPU:%d\n",
+ num_ent, pr->id);
goto out_free;
+ }
+ if (cpc_rev > CPPC_V3_REV) {
+ num_ent = CPPC_V3_NUM_ENT;
+ cpc_rev = CPPC_V3_REV;
+ }
+
+ cpc_ptr->num_entries = num_ent;
+ cpc_ptr->version = cpc_rev;

/* Iterate through remaining entries in _CPC */
for (i = 2; i < num_ent; i++) {
diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index a1b871a418f8..488c9ec0da0b 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -180,7 +180,6 @@ static struct workqueue_struct *ec_wq;
static struct workqueue_struct *ec_query_wq;

static int EC_FLAGS_CORRECT_ECDT; /* Needs ECDT port address correction */
-static int EC_FLAGS_IGNORE_DSDT_GPE; /* Needs ECDT GPE as correction setting */
static int EC_FLAGS_TRUST_DSDT_GPE; /* Needs DSDT GPE as correction setting */
static int EC_FLAGS_CLEAR_ON_RESUME; /* Needs acpi_ec_clear() on boot/resume */

@@ -1407,24 +1406,16 @@ ec_parse_device(acpi_handle handle, u32 Level, void *context, void **retval)
if (ec->data_addr == 0 || ec->command_addr == 0)
return AE_OK;

- if (boot_ec && boot_ec_is_ecdt && EC_FLAGS_IGNORE_DSDT_GPE) {
- /*
- * Always inherit the GPE number setting from the ECDT
- * EC.
- */
- ec->gpe = boot_ec->gpe;
- } else {
- /* Get GPE bit assignment (EC events). */
- /* TODO: Add support for _GPE returning a package */
- status = acpi_evaluate_integer(handle, "_GPE", NULL, &tmp);
- if (ACPI_SUCCESS(status))
- ec->gpe = tmp;
+ /* Get GPE bit assignment (EC events). */
+ /* TODO: Add support for _GPE returning a package */
+ status = acpi_evaluate_integer(handle, "_GPE", NULL, &tmp);
+ if (ACPI_SUCCESS(status))
+ ec->gpe = tmp;
+ /*
+ * Errors are non-fatal, allowing for ACPI Reduced Hardware
+ * platforms which use GpioInt instead of GPE.
+ */

- /*
- * Errors are non-fatal, allowing for ACPI Reduced Hardware
- * platforms which use GpioInt instead of GPE.
- */
- }
/* Use the global lock for all EC transactions? */
tmp = 0;
acpi_evaluate_integer(handle, "_GLK", NULL, &tmp);
@@ -1862,60 +1853,12 @@ static int ec_honor_dsdt_gpe(const struct dmi_system_id *id)
return 0;
}

-/*
- * Some DSDTs contain wrong GPE setting.
- * Asus FX502VD/VE, GL702VMK, X550VXK, X580VD
- * https://bugzilla.kernel.org/show_bug.cgi?id=195651
- */
-static int ec_honor_ecdt_gpe(const struct dmi_system_id *id)
-{
- pr_debug("Detected system needing ignore DSDT GPE setting.\n");
- EC_FLAGS_IGNORE_DSDT_GPE = 1;
- return 0;
-}
-
static const struct dmi_system_id ec_dmi_table[] __initconst = {
{
ec_correct_ecdt, "MSI MS-171F", {
DMI_MATCH(DMI_SYS_VENDOR, "Micro-Star"),
DMI_MATCH(DMI_PRODUCT_NAME, "MS-171F"),}, NULL},
{
- ec_honor_ecdt_gpe, "ASUS FX502VD", {
- DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
- DMI_MATCH(DMI_PRODUCT_NAME, "FX502VD"),}, NULL},
- {
- ec_honor_ecdt_gpe, "ASUS FX502VE", {
- DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
- DMI_MATCH(DMI_PRODUCT_NAME, "FX502VE"),}, NULL},
- {
- ec_honor_ecdt_gpe, "ASUS GL702VMK", {
- DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
- DMI_MATCH(DMI_PRODUCT_NAME, "GL702VMK"),}, NULL},
- {
- ec_honor_ecdt_gpe, "ASUSTeK COMPUTER INC. X505BA", {
- DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
- DMI_MATCH(DMI_PRODUCT_NAME, "X505BA"),}, NULL},
- {
- ec_honor_ecdt_gpe, "ASUSTeK COMPUTER INC. X505BP", {
- DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
- DMI_MATCH(DMI_PRODUCT_NAME, "X505BP"),}, NULL},
- {
- ec_honor_ecdt_gpe, "ASUSTeK COMPUTER INC. X542BA", {
- DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
- DMI_MATCH(DMI_PRODUCT_NAME, "X542BA"),}, NULL},
- {
- ec_honor_ecdt_gpe, "ASUSTeK COMPUTER INC. X542BP", {
- DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
- DMI_MATCH(DMI_PRODUCT_NAME, "X542BP"),}, NULL},
- {
- ec_honor_ecdt_gpe, "ASUS X550VXK", {
- DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
- DMI_MATCH(DMI_PRODUCT_NAME, "X550VXK"),}, NULL},
- {
- ec_honor_ecdt_gpe, "ASUS X580VD", {
- DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
- DMI_MATCH(DMI_PRODUCT_NAME, "X580VD"),}, NULL},
- {
/* https://bugzilla.kernel.org/show_bug.cgi?id=209989 */
ec_honor_dsdt_gpe, "HP Pavilion Gaming Laptop 15-cx0xxx", {
DMI_MATCH(DMI_SYS_VENDOR, "HP"),
@@ -2207,13 +2150,6 @@ static const struct dmi_system_id acpi_ec_no_wakeup[] = {
DMI_MATCH(DMI_PRODUCT_FAMILY, "Thinkpad X1 Carbon 6th"),
},
},
- {
- .ident = "ThinkPad X1 Carbon 6th",
- .matches = {
- DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
- DMI_MATCH(DMI_PRODUCT_FAMILY, "ThinkPad X1 Carbon 6th"),
- },
- },
{
.ident = "ThinkPad X1 Yoga 3rd",
.matches = {
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index eb95e188d62b..573ba7f617e8 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -602,7 +602,7 @@ static DEFINE_RAW_SPINLOCK(c3_lock);
* @cx: Target state context
* @index: index of target state
*/
-static int acpi_idle_enter_bm(struct cpuidle_driver *drv,
+static int __cpuidle acpi_idle_enter_bm(struct cpuidle_driver *drv,
struct acpi_processor *pr,
struct acpi_processor_cx *cx,
int index)
@@ -659,7 +659,7 @@ static int acpi_idle_enter_bm(struct cpuidle_driver *drv,
return index;
}

-static int acpi_idle_enter(struct cpuidle_device *dev,
+static int __cpuidle acpi_idle_enter(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index)
{
struct acpi_processor_cx *cx = per_cpu(acpi_cstate[index], dev->cpu);
@@ -688,7 +688,7 @@ static int acpi_idle_enter(struct cpuidle_device *dev,
return index;
}

-static int acpi_idle_enter_s2idle(struct cpuidle_device *dev,
+static int __cpuidle acpi_idle_enter_s2idle(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index)
{
struct acpi_processor_cx *cx = per_cpu(acpi_cstate[index], dev->cpu);
diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c
index 3147702710af..1ec3238e2cdc 100644
--- a/drivers/acpi/sleep.c
+++ b/drivers/acpi/sleep.c
@@ -360,6 +360,14 @@ static const struct dmi_system_id acpisleep_dmi_table[] __initconst = {
DMI_MATCH(DMI_PRODUCT_NAME, "80E3"),
},
},
+ {
+ .callback = init_nvs_save_s3,
+ .ident = "Lenovo G40-45",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "80E1"),
+ },
+ },
/*
* ThinkPad X1 Tablet(2016) cannot do suspend-to-idle using
* the Low Power S0 Idle firmware interface (see
diff --git a/drivers/acpi/video_detect.c b/drivers/acpi/video_detect.c
index 6615f59ab7fd..5d7f38016a24 100644
--- a/drivers/acpi/video_detect.c
+++ b/drivers/acpi/video_detect.c
@@ -347,6 +347,14 @@ static const struct dmi_system_id video_detect_dmi_table[] = {
DMI_MATCH(DMI_PRODUCT_NAME, "MacBookPro12,1"),
},
},
+ {
+ .callback = video_detect_force_native,
+ /* Dell Inspiron N4010 */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
+ DMI_MATCH(DMI_PRODUCT_NAME, "Inspiron N4010"),
+ },
+ },
{
.callback = video_detect_force_native,
/* Dell Vostro V131 */
diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c
index d2256326c73a..647f11cf165d 100644
--- a/drivers/acpi/viot.c
+++ b/drivers/acpi/viot.c
@@ -248,6 +248,26 @@ static int __init viot_parse_node(const struct acpi_viot_header *hdr)
return ret;
}

+/**
+ * acpi_viot_early_init - Test the presence of VIOT and enable ACS
+ *
+ * If the VIOT does exist, ACS must be enabled. This cannot be
+ * done in acpi_viot_init() which is called after the bus scan
+ */
+void __init acpi_viot_early_init(void)
+{
+#ifdef CONFIG_PCI
+ acpi_status status;
+ struct acpi_table_header *hdr;
+
+ status = acpi_get_table(ACPI_SIG_VIOT, 0, &hdr);
+ if (ACPI_FAILURE(status))
+ return;
+ pci_request_acs();
+ acpi_put_table(hdr);
+#endif
+}
+
/**
* acpi_viot_init - Parse the VIOT table
*
@@ -319,12 +339,6 @@ static int viot_pci_dev_iommu_init(struct pci_dev *pdev, u16 dev_id, void *data)
epid = ((domain_nr - ep->segment_start) << 16) +
dev_id - ep->bdf_start + ep->endpoint_id;

- /*
- * If we found a PCI range managed by the viommu, we're
- * the one that has to request ACS.
- */
- pci_request_acs();
-
return viot_dev_iommu_init(&pdev->dev, ep->viommu,
epid);
}
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index f3b639e89dd8..5243fe0eb402 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -170,8 +170,32 @@ static inline void binder_stats_created(enum binder_stat_types type)
atomic_inc(&binder_stats.obj_created[type]);
}

-struct binder_transaction_log binder_transaction_log;
-struct binder_transaction_log binder_transaction_log_failed;
+struct binder_transaction_log_entry {
+ int debug_id;
+ int debug_id_done;
+ int call_type;
+ int from_proc;
+ int from_thread;
+ int target_handle;
+ int to_proc;
+ int to_thread;
+ int to_node;
+ int data_size;
+ int offsets_size;
+ int return_error_line;
+ uint32_t return_error;
+ uint32_t return_error_param;
+ char context_name[BINDERFS_MAX_NAME + 1];
+};
+
+struct binder_transaction_log {
+ atomic_t cur;
+ bool full;
+ struct binder_transaction_log_entry entry[32];
+};
+
+static struct binder_transaction_log binder_transaction_log;
+static struct binder_transaction_log binder_transaction_log_failed;

static struct binder_transaction_log_entry *binder_transaction_log_add(
struct binder_transaction_log *log)
@@ -6084,8 +6108,7 @@ static void print_binder_proc_stats(struct seq_file *m,
print_binder_stats(m, " ", &proc->stats);
}

-
-int binder_state_show(struct seq_file *m, void *unused)
+static int state_show(struct seq_file *m, void *unused)
{
struct binder_proc *proc;
struct binder_node *node;
@@ -6124,7 +6147,7 @@ int binder_state_show(struct seq_file *m, void *unused)
return 0;
}

-int binder_stats_show(struct seq_file *m, void *unused)
+static int stats_show(struct seq_file *m, void *unused)
{
struct binder_proc *proc;

@@ -6140,7 +6163,7 @@ int binder_stats_show(struct seq_file *m, void *unused)
return 0;
}

-int binder_transactions_show(struct seq_file *m, void *unused)
+static int transactions_show(struct seq_file *m, void *unused)
{
struct binder_proc *proc;

@@ -6196,7 +6219,7 @@ static void print_binder_transaction_log_entry(struct seq_file *m,
"\n" : " (incomplete)\n");
}

-int binder_transaction_log_show(struct seq_file *m, void *unused)
+static int transaction_log_show(struct seq_file *m, void *unused)
{
struct binder_transaction_log *log = m->private;
unsigned int log_cur = atomic_read(&log->cur);
@@ -6228,6 +6251,45 @@ const struct file_operations binder_fops = {
.release = binder_release,
};

+DEFINE_SHOW_ATTRIBUTE(state);
+DEFINE_SHOW_ATTRIBUTE(stats);
+DEFINE_SHOW_ATTRIBUTE(transactions);
+DEFINE_SHOW_ATTRIBUTE(transaction_log);
+
+const struct binder_debugfs_entry binder_debugfs_entries[] = {
+ {
+ .name = "state",
+ .mode = 0444,
+ .fops = &state_fops,
+ .data = NULL,
+ },
+ {
+ .name = "stats",
+ .mode = 0444,
+ .fops = &stats_fops,
+ .data = NULL,
+ },
+ {
+ .name = "transactions",
+ .mode = 0444,
+ .fops = &transactions_fops,
+ .data = NULL,
+ },
+ {
+ .name = "transaction_log",
+ .mode = 0444,
+ .fops = &transaction_log_fops,
+ .data = &binder_transaction_log,
+ },
+ {
+ .name = "failed_transaction_log",
+ .mode = 0444,
+ .fops = &transaction_log_fops,
+ .data = &binder_transaction_log_failed,
+ },
+ {} /* terminator */
+};
+
static int __init init_binder_device(const char *name)
{
int ret;
@@ -6273,36 +6335,18 @@ static int __init binder_init(void)
atomic_set(&binder_transaction_log_failed.cur, ~0U);

binder_debugfs_dir_entry_root = debugfs_create_dir("binder", NULL);
- if (binder_debugfs_dir_entry_root)
+ if (binder_debugfs_dir_entry_root) {
+ const struct binder_debugfs_entry *db_entry;
+
+ binder_for_each_debugfs_entry(db_entry)
+ debugfs_create_file(db_entry->name,
+ db_entry->mode,
+ binder_debugfs_dir_entry_root,
+ db_entry->data,
+ db_entry->fops);
+
binder_debugfs_dir_entry_proc = debugfs_create_dir("proc",
binder_debugfs_dir_entry_root);
-
- if (binder_debugfs_dir_entry_root) {
- debugfs_create_file("state",
- 0444,
- binder_debugfs_dir_entry_root,
- NULL,
- &binder_state_fops);
- debugfs_create_file("stats",
- 0444,
- binder_debugfs_dir_entry_root,
- NULL,
- &binder_stats_fops);
- debugfs_create_file("transactions",
- 0444,
- binder_debugfs_dir_entry_root,
- NULL,
- &binder_transactions_fops);
- debugfs_create_file("transaction_log",
- 0444,
- binder_debugfs_dir_entry_root,
- &binder_transaction_log,
- &binder_transaction_log_fops);
- debugfs_create_file("failed_transaction_log",
- 0444,
- binder_debugfs_dir_entry_root,
- &binder_transaction_log_failed,
- &binder_transaction_log_fops);
}

if (!IS_ENABLED(CONFIG_ANDROID_BINDERFS) &&
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 2ac1008a5f39..a93a7d2a8caf 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -213,7 +213,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,

if (mm) {
mmap_read_lock(mm);
- vma = alloc->vma;
+ vma = vma_lookup(mm, alloc->vma_addr);
}

if (!vma && need_mm) {
@@ -313,16 +313,15 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
static inline void binder_alloc_set_vma(struct binder_alloc *alloc,
struct vm_area_struct *vma)
{
- if (vma)
+ unsigned long vm_start = 0;
+
+ if (vma) {
+ vm_start = vma->vm_start;
alloc->vma_vm_mm = vma->vm_mm;
- /*
- * If we see alloc->vma is not NULL, buffer data structures set up
- * completely. Look at smp_rmb side binder_alloc_get_vma.
- * We also want to guarantee new alloc->vma_vm_mm is always visible
- * if alloc->vma is set.
- */
- smp_wmb();
- alloc->vma = vma;
+ }
+
+ mmap_assert_write_locked(alloc->vma_vm_mm);
+ alloc->vma_addr = vm_start;
}

static inline struct vm_area_struct *binder_alloc_get_vma(
@@ -330,11 +329,9 @@ static inline struct vm_area_struct *binder_alloc_get_vma(
{
struct vm_area_struct *vma = NULL;

- if (alloc->vma) {
- /* Look at description in binder_alloc_set_vma */
- smp_rmb();
- vma = alloc->vma;
- }
+ if (alloc->vma_addr)
+ vma = vma_lookup(alloc->vma_vm_mm, alloc->vma_addr);
+
return vma;
}

@@ -817,7 +814,8 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)

buffers = 0;
mutex_lock(&alloc->mutex);
- BUG_ON(alloc->vma);
+ BUG_ON(alloc->vma_addr &&
+ vma_lookup(alloc->vma_vm_mm, alloc->vma_addr));

while ((n = rb_first(&alloc->allocated_buffers))) {
buffer = rb_entry(n, struct binder_buffer, rb_node);
diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
index 7dea57a84c79..1e4fd37af5e0 100644
--- a/drivers/android/binder_alloc.h
+++ b/drivers/android/binder_alloc.h
@@ -100,7 +100,7 @@ struct binder_lru_page {
*/
struct binder_alloc {
struct mutex mutex;
- struct vm_area_struct *vma;
+ unsigned long vma_addr;
struct mm_struct *vma_vm_mm;
void __user *buffer;
struct list_head buffers;
diff --git a/drivers/android/binder_alloc_selftest.c b/drivers/android/binder_alloc_selftest.c
index c2b323bc3b3a..43a881073a42 100644
--- a/drivers/android/binder_alloc_selftest.c
+++ b/drivers/android/binder_alloc_selftest.c
@@ -287,7 +287,7 @@ void binder_selftest_alloc(struct binder_alloc *alloc)
if (!binder_selftest_run)
return;
mutex_lock(&binder_selftest_lock);
- if (!binder_selftest_run || !alloc->vma)
+ if (!binder_selftest_run || !alloc->vma_addr)
goto done;
pr_info("STARTED\n");
binder_selftest_alloc_offset(alloc, end_offset, 0);
diff --git a/drivers/android/binder_internal.h b/drivers/android/binder_internal.h
index d6b6b8cb7346..1ade9799c8d5 100644
--- a/drivers/android/binder_internal.h
+++ b/drivers/android/binder_internal.h
@@ -107,41 +107,19 @@ static inline int __init init_binderfs(void)
}
#endif

-int binder_stats_show(struct seq_file *m, void *unused);
-DEFINE_SHOW_ATTRIBUTE(binder_stats);
-
-int binder_state_show(struct seq_file *m, void *unused);
-DEFINE_SHOW_ATTRIBUTE(binder_state);
-
-int binder_transactions_show(struct seq_file *m, void *unused);
-DEFINE_SHOW_ATTRIBUTE(binder_transactions);
-
-int binder_transaction_log_show(struct seq_file *m, void *unused);
-DEFINE_SHOW_ATTRIBUTE(binder_transaction_log);
-
-struct binder_transaction_log_entry {
- int debug_id;
- int debug_id_done;
- int call_type;
- int from_proc;
- int from_thread;
- int target_handle;
- int to_proc;
- int to_thread;
- int to_node;
- int data_size;
- int offsets_size;
- int return_error_line;
- uint32_t return_error;
- uint32_t return_error_param;
- char context_name[BINDERFS_MAX_NAME + 1];
+struct binder_debugfs_entry {
+ const char *name;
+ umode_t mode;
+ const struct file_operations *fops;
+ void *data;
};

-struct binder_transaction_log {
- atomic_t cur;
- bool full;
- struct binder_transaction_log_entry entry[32];
-};
+extern const struct binder_debugfs_entry binder_debugfs_entries[];
+
+#define binder_for_each_debugfs_entry(entry) \
+ for ((entry) = binder_debugfs_entries; \
+ (entry)->name; \
+ (entry)++)

enum binder_stat_types {
BINDER_STAT_PROC,
@@ -575,6 +553,4 @@ struct binder_object {
};
};

-extern struct binder_transaction_log binder_transaction_log;
-extern struct binder_transaction_log binder_transaction_log_failed;
#endif /* _LINUX_BINDER_INTERNAL_H */
diff --git a/drivers/android/binderfs.c b/drivers/android/binderfs.c
index e3605cdd4335..6d717ed76766 100644
--- a/drivers/android/binderfs.c
+++ b/drivers/android/binderfs.c
@@ -621,6 +621,7 @@ static int init_binder_features(struct super_block *sb)
static int init_binder_logs(struct super_block *sb)
{
struct dentry *binder_logs_root_dir, *dentry, *proc_log_dir;
+ const struct binder_debugfs_entry *db_entry;
struct binderfs_info *info;
int ret = 0;

@@ -631,43 +632,15 @@ static int init_binder_logs(struct super_block *sb)
goto out;
}

- dentry = binderfs_create_file(binder_logs_root_dir, "stats",
- &binder_stats_fops, NULL);
- if (IS_ERR(dentry)) {
- ret = PTR_ERR(dentry);
- goto out;
- }
-
- dentry = binderfs_create_file(binder_logs_root_dir, "state",
- &binder_state_fops, NULL);
- if (IS_ERR(dentry)) {
- ret = PTR_ERR(dentry);
- goto out;
- }
-
- dentry = binderfs_create_file(binder_logs_root_dir, "transactions",
- &binder_transactions_fops, NULL);
- if (IS_ERR(dentry)) {
- ret = PTR_ERR(dentry);
- goto out;
- }
-
- dentry = binderfs_create_file(binder_logs_root_dir,
- "transaction_log",
- &binder_transaction_log_fops,
- &binder_transaction_log);
- if (IS_ERR(dentry)) {
- ret = PTR_ERR(dentry);
- goto out;
- }
-
- dentry = binderfs_create_file(binder_logs_root_dir,
- "failed_transaction_log",
- &binder_transaction_log_fops,
- &binder_transaction_log_failed);
- if (IS_ERR(dentry)) {
- ret = PTR_ERR(dentry);
- goto out;
+ binder_for_each_debugfs_entry(db_entry) {
+ dentry = binderfs_create_file(binder_logs_root_dir,
+ db_entry->name,
+ db_entry->fops,
+ db_entry->data);
+ if (IS_ERR(dentry)) {
+ ret = PTR_ERR(dentry);
+ goto out;
+ }
}

proc_log_dir = binderfs_create_dir(binder_logs_root_dir, "proc");
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index d6980f33afc4..822dfa6561c2 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -1091,6 +1091,7 @@ static void __driver_attach_async_helper(void *_dev, async_cookie_t cookie)
static int __driver_attach(struct device *dev, void *data)
{
struct device_driver *drv = data;
+ bool async = false;
int ret;

/*
@@ -1129,9 +1130,11 @@ static int __driver_attach(struct device *dev, void *data)
if (!dev->driver) {
get_device(dev);
dev->p->async_driver = drv;
- async_schedule_dev(__driver_attach_async_helper, dev);
+ async = true;
}
device_unlock(dev);
+ if (async)
+ async_schedule_dev(__driver_attach_async_helper, dev);
return 0;
}

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 0ac6376ef7a1..eb0f43784c2b 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -45,7 +45,7 @@ static inline ssize_t cpumap_read(struct file *file, struct kobject *kobj,
return n;
}

-static BIN_ATTR_RO(cpumap, 0);
+static BIN_ATTR_RO(cpumap, CPUMAP_FILE_MAX_BYTES);

static inline ssize_t cpulist_read(struct file *file, struct kobject *kobj,
struct bin_attribute *attr, char *buf,
@@ -66,7 +66,7 @@ static inline ssize_t cpulist_read(struct file *file, struct kobject *kobj,
return n;
}

-static BIN_ATTR_RO(cpulist, 0);
+static BIN_ATTR_RO(cpulist, CPULIST_FILE_MAX_BYTES);

/**
* struct node_access_nodes - Access class device to hold user visible
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index f0e4b0ea93e8..59a706a126e5 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -219,6 +219,9 @@ static void genpd_debug_remove(struct generic_pm_domain *genpd)
{
struct dentry *d;

+ if (!genpd_debugfs_dir)
+ return;
+
d = debugfs_lookup(genpd->name, genpd_debugfs_dir);
debugfs_remove(d);
}
diff --git a/drivers/base/topology.c b/drivers/base/topology.c
index ac6ad9ab67f9..89f98be5c5b9 100644
--- a/drivers/base/topology.c
+++ b/drivers/base/topology.c
@@ -62,47 +62,47 @@ define_id_show_func(ppin, "0x%llx");
static DEVICE_ATTR_ADMIN_RO(ppin);

define_siblings_read_func(thread_siblings, sibling_cpumask);
-static BIN_ATTR_RO(thread_siblings, 0);
-static BIN_ATTR_RO(thread_siblings_list, 0);
+static BIN_ATTR_RO(thread_siblings, CPUMAP_FILE_MAX_BYTES);
+static BIN_ATTR_RO(thread_siblings_list, CPULIST_FILE_MAX_BYTES);

define_siblings_read_func(core_cpus, sibling_cpumask);
-static BIN_ATTR_RO(core_cpus, 0);
-static BIN_ATTR_RO(core_cpus_list, 0);
+static BIN_ATTR_RO(core_cpus, CPUMAP_FILE_MAX_BYTES);
+static BIN_ATTR_RO(core_cpus_list, CPULIST_FILE_MAX_BYTES);

define_siblings_read_func(core_siblings, core_cpumask);
-static BIN_ATTR_RO(core_siblings, 0);
-static BIN_ATTR_RO(core_siblings_list, 0);
+static BIN_ATTR_RO(core_siblings, CPUMAP_FILE_MAX_BYTES);
+static BIN_ATTR_RO(core_siblings_list, CPULIST_FILE_MAX_BYTES);

#ifdef TOPOLOGY_CLUSTER_SYSFS
define_siblings_read_func(cluster_cpus, cluster_cpumask);
-static BIN_ATTR_RO(cluster_cpus, 0);
-static BIN_ATTR_RO(cluster_cpus_list, 0);
+static BIN_ATTR_RO(cluster_cpus, CPUMAP_FILE_MAX_BYTES);
+static BIN_ATTR_RO(cluster_cpus_list, CPULIST_FILE_MAX_BYTES);
#endif

#ifdef TOPOLOGY_DIE_SYSFS
define_siblings_read_func(die_cpus, die_cpumask);
-static BIN_ATTR_RO(die_cpus, 0);
-static BIN_ATTR_RO(die_cpus_list, 0);
+static BIN_ATTR_RO(die_cpus, CPUMAP_FILE_MAX_BYTES);
+static BIN_ATTR_RO(die_cpus_list, CPULIST_FILE_MAX_BYTES);
#endif

define_siblings_read_func(package_cpus, core_cpumask);
-static BIN_ATTR_RO(package_cpus, 0);
-static BIN_ATTR_RO(package_cpus_list, 0);
+static BIN_ATTR_RO(package_cpus, CPUMAP_FILE_MAX_BYTES);
+static BIN_ATTR_RO(package_cpus_list, CPULIST_FILE_MAX_BYTES);

#ifdef TOPOLOGY_BOOK_SYSFS
define_id_show_func(book_id, "%d");
static DEVICE_ATTR_RO(book_id);
define_siblings_read_func(book_siblings, book_cpumask);
-static BIN_ATTR_RO(book_siblings, 0);
-static BIN_ATTR_RO(book_siblings_list, 0);
+static BIN_ATTR_RO(book_siblings, CPUMAP_FILE_MAX_BYTES);
+static BIN_ATTR_RO(book_siblings_list, CPULIST_FILE_MAX_BYTES);
#endif

#ifdef TOPOLOGY_DRAWER_SYSFS
define_id_show_func(drawer_id, "%d");
static DEVICE_ATTR_RO(drawer_id);
define_siblings_read_func(drawer_siblings, drawer_cpumask);
-static BIN_ATTR_RO(drawer_siblings, 0);
-static BIN_ATTR_RO(drawer_siblings_list, 0);
+static BIN_ATTR_RO(drawer_siblings, CPUMAP_FILE_MAX_BYTES);
+static BIN_ATTR_RO(drawer_siblings_list, CPULIST_FILE_MAX_BYTES);
#endif

static struct bin_attribute *bin_attrs[] = {
diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index c441a4972064..3e59e824ed1f 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -2044,8 +2044,13 @@ static int null_add_dev(struct nullb_device *dev)
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, nullb->q);

mutex_lock(&lock);
- nullb->index = ida_simple_get(&nullb_indexes, 0, 0, GFP_KERNEL);
- dev->index = nullb->index;
+ rv = ida_simple_get(&nullb_indexes, 0, 0, GFP_KERNEL);
+ if (rv < 0) {
+ mutex_unlock(&lock);
+ goto out_cleanup_zone;
+ }
+ nullb->index = rv;
+ dev->index = rv;
mutex_unlock(&lock);

blk_queue_logical_block_size(nullb->q, dev->blocksize);
@@ -2065,13 +2070,16 @@ static int null_add_dev(struct nullb_device *dev)

rv = null_gendisk_register(nullb);
if (rv)
- goto out_cleanup_zone;
+ goto out_ida_free;

mutex_lock(&lock);
list_add_tail(&nullb->list, &nullb_list);
mutex_unlock(&lock);

return 0;
+
+out_ida_free:
+ ida_free(&nullb_indexes, nullb->index);
out_cleanup_zone:
null_free_zoned_dev(dev);
out_cleanup_disk:
diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
index f04df6294650..963df70ea026 100644
--- a/drivers/block/rnbd/rnbd-srv.c
+++ b/drivers/block/rnbd/rnbd-srv.c
@@ -323,10 +323,11 @@ void rnbd_srv_sess_dev_force_close(struct rnbd_srv_sess_dev *sess_dev,
{
struct rnbd_srv_session *sess = sess_dev->sess;

- sess_dev->keep_id = true;
/* It is already started to close by client's close message. */
if (!mutex_trylock(&sess->lock))
return;
+
+ sess_dev->keep_id = true;
/* first remove sysfs itself to avoid deadlock */
sysfs_remove_file_self(&sess_dev->kobj, &attr->attr);
rnbd_srv_destroy_dev_session_sysfs(sess_dev);
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index f09040435e2e..7e0d1eb29d8f 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -157,6 +157,11 @@ static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
return 0;
}

+/* Enable the persistent grants feature. */
+static bool feature_persistent = true;
+module_param(feature_persistent, bool, 0644);
+MODULE_PARM_DESC(feature_persistent, "Enables the persistent grants feature");
+
static struct xen_blkif *xen_blkif_alloc(domid_t domid)
{
struct xen_blkif *blkif;
@@ -472,12 +477,6 @@ static void xen_vbd_free(struct xen_vbd *vbd)
vbd->bdev = NULL;
}

-/* Enable the persistent grants feature. */
-static bool feature_persistent = true;
-module_param(feature_persistent, bool, 0644);
-MODULE_PARM_DESC(feature_persistent,
- "Enables the persistent grants feature");
-
static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
unsigned major, unsigned minor, int readonly,
int cdrom)
@@ -523,8 +522,6 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
if (q && blk_queue_secure_erase(q))
vbd->discard_secure = true;

- vbd->feature_gnt_persistent = feature_persistent;
-
pr_debug("Successful creation of handle=%04x (dom=%u)\n",
handle, blkif->domid);
return 0;
@@ -1091,10 +1088,9 @@ static int connect_ring(struct backend_info *be)
xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
return -ENOSYS;
}
- if (blkif->vbd.feature_gnt_persistent)
- blkif->vbd.feature_gnt_persistent =
- xenbus_read_unsigned(dev->otherend,
- "feature-persistent", 0);
+
+ blkif->vbd.feature_gnt_persistent = feature_persistent &&
+ xenbus_read_unsigned(dev->otherend, "feature-persistent", 0);

blkif->vbd.overflow_max_grants = 0;

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index cf9cfc40a028..6aa1bcca8645 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2011,8 +2011,6 @@ static int blkfront_probe(struct xenbus_device *dev,
info->vdevice = vdevice;
info->connected = BLKIF_STATE_DISCONNECTED;

- info->feature_persistent = feature_persistent;
-
/* Front end dir is a number, which is used as the id. */
info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
dev_set_drvdata(&dev->dev, info);
@@ -2306,7 +2304,7 @@ static void blkfront_gather_backend_features(struct blkfront_info *info)
if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0))
blkfront_setup_discard(info);

- if (info->feature_persistent)
+ if (feature_persistent)
info->feature_persistent =
!!xenbus_read_unsigned(info->xbdev->otherend,
"feature-persistent", 0);
diff --git a/drivers/bluetooth/hci_intel.c b/drivers/bluetooth/hci_intel.c
index 7249b91d9b91..78afb9a348e7 100644
--- a/drivers/bluetooth/hci_intel.c
+++ b/drivers/bluetooth/hci_intel.c
@@ -1217,7 +1217,11 @@ static struct platform_driver intel_driver = {

int __init intel_init(void)
{
- platform_driver_register(&intel_driver);
+ int err;
+
+ err = platform_driver_register(&intel_driver);
+ if (err)
+ return err;

return hci_uart_register_proto(&intel_proto);
}
diff --git a/drivers/bluetooth/hci_serdev.c b/drivers/bluetooth/hci_serdev.c
index 4cda890ce647..c0e5f42ec6b7 100644
--- a/drivers/bluetooth/hci_serdev.c
+++ b/drivers/bluetooth/hci_serdev.c
@@ -231,6 +231,15 @@ static int hci_uart_setup(struct hci_dev *hdev)
return 0;
}

+/* Check if the device is wakeable */
+static bool hci_uart_wakeup(struct hci_dev *hdev)
+{
+ /* HCI UART devices are assumed to be wakeable by default.
+ * Implement wakeup callback to override this behavior.
+ */
+ return true;
+}
+
/** hci_uart_write_wakeup - transmit buffer wakeup
* @serdev: serial device
*
@@ -342,6 +351,8 @@ int hci_uart_register_device(struct hci_uart *hu,
hdev->flush = hci_uart_flush;
hdev->send = hci_uart_send_frame;
hdev->setup = hci_uart_setup;
+ if (!hdev->wakeup)
+ hdev->wakeup = hci_uart_wakeup;
SET_HCIDEV_DEV(hdev, &hu->serdev->dev);

if (test_bit(HCI_UART_NO_SUSPEND_NOTIFIER, &hu->flags))
diff --git a/drivers/bus/hisi_lpc.c b/drivers/bus/hisi_lpc.c
index 378f5d62a991..e7eaa8784fee 100644
--- a/drivers/bus/hisi_lpc.c
+++ b/drivers/bus/hisi_lpc.c
@@ -503,13 +503,13 @@ static int hisi_lpc_acpi_probe(struct device *hostdev)
{
struct acpi_device *adev = ACPI_COMPANION(hostdev);
struct acpi_device *child;
+ struct platform_device *pdev;
int ret;

/* Only consider the children of the host */
list_for_each_entry(child, &adev->children, node) {
const char *hid = acpi_device_hid(child);
const struct hisi_lpc_acpi_cell *cell;
- struct platform_device *pdev;
const struct resource *res;
bool found = false;
int num_res;
@@ -571,22 +571,24 @@ static int hisi_lpc_acpi_probe(struct device *hostdev)

ret = platform_device_add_resources(pdev, res, num_res);
if (ret)
- goto fail;
+ goto fail_put_device;

ret = platform_device_add_data(pdev, cell->pdata,
cell->pdata_size);
if (ret)
- goto fail;
+ goto fail_put_device;

ret = platform_device_add(pdev);
if (ret)
- goto fail;
+ goto fail_put_device;

acpi_device_set_enumerated(child);
}

return 0;

+fail_put_device:
+ platform_device_put(pdev);
fail:
hisi_lpc_acpi_remove(hostdev);
return ret;
diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c
index 04a3e23a4afc..4419593d9531 100644
--- a/drivers/char/tpm/tpm2-cmd.c
+++ b/drivers/char/tpm/tpm2-cmd.c
@@ -752,6 +752,12 @@ int tpm2_auto_startup(struct tpm_chip *chip)
}

rc = tpm2_get_cc_attrs_tbl(chip);
+ if (rc == TPM2_RC_FAILURE || (rc < 0 && rc != -ENOMEM)) {
+ dev_info(&chip->dev,
+ "TPM in field failure mode, requires firmware upgrade\n");
+ chip->flags |= TPM_CHIP_FLAG_FIRMWARE_UPGRADE;
+ rc = 0;
+ }

out:
if (rc == TPM2_RC_UPGRADE) {
diff --git a/drivers/clk/imx/clk-fracn-gppll.c b/drivers/clk/imx/clk-fracn-gppll.c
index 71c102d950ab..025b73229cdd 100644
--- a/drivers/clk/imx/clk-fracn-gppll.c
+++ b/drivers/clk/imx/clk-fracn-gppll.c
@@ -64,10 +64,10 @@ struct clk_fracn_gppll {
* Fout = Fvco / (rdiv * odiv)
*/
static const struct imx_fracn_gppll_rate_table fracn_tbl[] = {
- PLL_FRACN_GP(650000000U, 81, 0, 0, 0, 3),
- PLL_FRACN_GP(594000000U, 198, 0, 0, 0, 8),
- PLL_FRACN_GP(560000000U, 70, 0, 0, 0, 3),
- PLL_FRACN_GP(400000000U, 50, 0, 0, 0, 3),
+ PLL_FRACN_GP(650000000U, 81, 0, 1, 0, 3),
+ PLL_FRACN_GP(594000000U, 198, 0, 1, 0, 8),
+ PLL_FRACN_GP(560000000U, 70, 0, 1, 0, 3),
+ PLL_FRACN_GP(400000000U, 50, 0, 1, 0, 3),
PLL_FRACN_GP(393216000U, 81, 92, 100, 0, 5)
};

@@ -131,18 +131,7 @@ static unsigned long clk_fracn_gppll_recalc_rate(struct clk_hw *hw, unsigned lon
mfi = FIELD_GET(PLL_MFI_MASK, pll_div);

rdiv = FIELD_GET(PLL_RDIV_MASK, pll_div);
- rdiv = rdiv + 1;
odiv = FIELD_GET(PLL_ODIV_MASK, pll_div);
- switch (odiv) {
- case 0:
- odiv = 2;
- break;
- case 1:
- odiv = 3;
- break;
- default:
- break;
- }

/*
* Sometimes, the recalculated rate has deviation due to
@@ -160,6 +149,20 @@ static unsigned long clk_fracn_gppll_recalc_rate(struct clk_hw *hw, unsigned lon
if (rate)
return (unsigned long)rate;

+ if (!rdiv)
+ rdiv = rdiv + 1;
+
+ switch (odiv) {
+ case 0:
+ odiv = 2;
+ break;
+ case 1:
+ odiv = 3;
+ break;
+ default:
+ break;
+ }
+
/* Fvco = Fref * (MFI + MFN / MFD) */
fvco = fvco * mfi * mfd + fvco * mfn;
do_div(fvco, mfd * rdiv * odiv);
diff --git a/drivers/clk/imx/clk-imx93.c b/drivers/clk/imx/clk-imx93.c
index edcc87661d1f..26885bd3971c 100644
--- a/drivers/clk/imx/clk-imx93.c
+++ b/drivers/clk/imx/clk-imx93.c
@@ -150,7 +150,7 @@ static const struct imx93_clk_ccgr {
{ IMX93_CLK_A55_GATE, "a55", "a55_root", 0x8000, },
/* M33 critical clk for system run */
{ IMX93_CLK_CM33_GATE, "cm33", "m33_root", 0x8040, CLK_IS_CRITICAL },
- { IMX93_CLK_ADC1_GATE, "adc1", "osc_24m", 0x82c0, },
+ { IMX93_CLK_ADC1_GATE, "adc1", "adc_root", 0x82c0, },
{ IMX93_CLK_WDOG1_GATE, "wdog1", "osc_24m", 0x8300, },
{ IMX93_CLK_WDOG2_GATE, "wdog2", "osc_24m", 0x8340, },
{ IMX93_CLK_WDOG3_GATE, "wdog3", "osc_24m", 0x8380, },
@@ -219,7 +219,7 @@ static const struct imx93_clk_ccgr {
{ IMX93_CLK_LCDIF_GATE, "lcdif", "media_apb_root", 0x9640, },
{ IMX93_CLK_PXP_GATE, "pxp", "media_apb_root", 0x9680, },
{ IMX93_CLK_ISI_GATE, "isi", "media_apb_root", 0x96c0, },
- { IMX93_CLK_NIC_MEDIA_GATE, "nic_media", "media_apb_root", 0x9700, },
+ { IMX93_CLK_NIC_MEDIA_GATE, "nic_media", "media_axi_root", 0x9700, },
{ IMX93_CLK_USB_CONTROLLER_GATE, "usb_controller", "hsio_root", 0x9a00, },
{ IMX93_CLK_USB_TEST_60M_GATE, "usb_test_60m", "hsio_usb_test_60m_root", 0x9a40, },
{ IMX93_CLK_HSIO_TROUT_24M_GATE, "hsio_trout_24m", "osc_24m", 0x9a80, },
diff --git a/drivers/clk/mediatek/reset.c b/drivers/clk/mediatek/reset.c
index bcec4b89f449..834d26e9bdfd 100644
--- a/drivers/clk/mediatek/reset.c
+++ b/drivers/clk/mediatek/reset.c
@@ -25,7 +25,7 @@ static int mtk_reset_assert_set_clr(struct reset_controller_dev *rcdev,
struct mtk_reset *data = container_of(rcdev, struct mtk_reset, rcdev);
unsigned int reg = data->regofs + ((id / 32) << 4);

- return regmap_write(data->regmap, reg, 1);
+ return regmap_write(data->regmap, reg, BIT(id % 32));
}

static int mtk_reset_deassert_set_clr(struct reset_controller_dev *rcdev,
@@ -34,7 +34,7 @@ static int mtk_reset_deassert_set_clr(struct reset_controller_dev *rcdev,
struct mtk_reset *data = container_of(rcdev, struct mtk_reset, rcdev);
unsigned int reg = data->regofs + ((id / 32) << 4) + 0x4;

- return regmap_write(data->regmap, reg, 1);
+ return regmap_write(data->regmap, reg, BIT(id % 32));
}

static int mtk_reset_assert(struct reset_controller_dev *rcdev,
diff --git a/drivers/clk/qcom/camcc-sdm845.c b/drivers/clk/qcom/camcc-sdm845.c
index be3f95326965..27d44188a7ab 100644
--- a/drivers/clk/qcom/camcc-sdm845.c
+++ b/drivers/clk/qcom/camcc-sdm845.c
@@ -1534,6 +1534,8 @@ static struct clk_branch cam_cc_sys_tmr_clk = {
},
};

+static struct gdsc titan_top_gdsc;
+
static struct gdsc bps_gdsc = {
.gdscr = 0x6004,
.pd = {
@@ -1567,6 +1569,7 @@ static struct gdsc ife_0_gdsc = {
.name = "ife_0_gdsc",
},
.flags = POLL_CFG_GDSCR,
+ .parent = &titan_top_gdsc.pd,
.pwrsts = PWRSTS_OFF_ON,
};

@@ -1576,6 +1579,7 @@ static struct gdsc ife_1_gdsc = {
.name = "ife_1_gdsc",
},
.flags = POLL_CFG_GDSCR,
+ .parent = &titan_top_gdsc.pd,
.pwrsts = PWRSTS_OFF_ON,
};

diff --git a/drivers/clk/qcom/camcc-sm8250.c b/drivers/clk/qcom/camcc-sm8250.c
index 439eaafdcc86..9b32c56a5bc5 100644
--- a/drivers/clk/qcom/camcc-sm8250.c
+++ b/drivers/clk/qcom/camcc-sm8250.c
@@ -2205,6 +2205,8 @@ static struct clk_branch cam_cc_sleep_clk = {
},
};

+static struct gdsc titan_top_gdsc;
+
static struct gdsc bps_gdsc = {
.gdscr = 0x7004,
.pd = {
@@ -2238,6 +2240,7 @@ static struct gdsc ife_0_gdsc = {
.name = "ife_0_gdsc",
},
.flags = POLL_CFG_GDSCR,
+ .parent = &titan_top_gdsc.pd,
.pwrsts = PWRSTS_OFF_ON,
};

@@ -2247,6 +2250,7 @@ static struct gdsc ife_1_gdsc = {
.name = "ife_1_gdsc",
},
.flags = POLL_CFG_GDSCR,
+ .parent = &titan_top_gdsc.pd,
.pwrsts = PWRSTS_OFF_ON,
};

@@ -2440,17 +2444,7 @@ static struct platform_driver cam_cc_sm8250_driver = {
},
};

-static int __init cam_cc_sm8250_init(void)
-{
- return platform_driver_register(&cam_cc_sm8250_driver);
-}
-subsys_initcall(cam_cc_sm8250_init);
-
-static void __exit cam_cc_sm8250_exit(void)
-{
- platform_driver_unregister(&cam_cc_sm8250_driver);
-}
-module_exit(cam_cc_sm8250_exit);
+module_platform_driver(cam_cc_sm8250_driver);

MODULE_DESCRIPTION("QTI CAMCC SM8250 Driver");
MODULE_LICENSE("GPL v2");
diff --git a/drivers/clk/qcom/clk-krait.c b/drivers/clk/qcom/clk-krait.c
index 59f1af415b58..90046428693c 100644
--- a/drivers/clk/qcom/clk-krait.c
+++ b/drivers/clk/qcom/clk-krait.c
@@ -32,11 +32,16 @@ static void __krait_mux_set_sel(struct krait_mux_clk *mux, int sel)
regval |= (sel & mux->mask) << (mux->shift + LPL_SHIFT);
}
krait_set_l2_indirect_reg(mux->offset, regval);
- spin_unlock_irqrestore(&krait_clock_reg_lock, flags);

/* Wait for switch to complete. */
mb();
udelay(1);
+
+ /*
+ * Unlock now to make sure the mux register is not
+ * modified while switching to the new parent.
+ */
+ spin_unlock_irqrestore(&krait_clock_reg_lock, flags);
}

static int krait_mux_set_parent(struct clk_hw *hw, u8 index)
diff --git a/drivers/clk/qcom/clk-rcg2.c b/drivers/clk/qcom/clk-rcg2.c
index e9c357309fd9..caafc35eecfd 100644
--- a/drivers/clk/qcom/clk-rcg2.c
+++ b/drivers/clk/qcom/clk-rcg2.c
@@ -13,6 +13,7 @@
#include <linux/rational.h>
#include <linux/regmap.h>
#include <linux/math64.h>
+#include <linux/minmax.h>
#include <linux/slab.h>

#include <asm/div64.h>
@@ -405,7 +406,7 @@ static int clk_rcg2_get_duty_cycle(struct clk_hw *hw, struct clk_duty *duty)
static int clk_rcg2_set_duty_cycle(struct clk_hw *hw, struct clk_duty *duty)
{
struct clk_rcg2 *rcg = to_clk_rcg2(hw);
- u32 notn_m, n, m, d, not2d, mask, duty_per;
+ u32 notn_m, n, m, d, not2d, mask, duty_per, cfg;
int ret;

/* Duty-cycle cannot be modified for non-MND RCGs */
@@ -416,6 +417,11 @@ static int clk_rcg2_set_duty_cycle(struct clk_hw *hw, struct clk_duty *duty)

regmap_read(rcg->clkr.regmap, RCG_N_OFFSET(rcg), &notn_m);
regmap_read(rcg->clkr.regmap, RCG_M_OFFSET(rcg), &m);
+ regmap_read(rcg->clkr.regmap, RCG_CFG_OFFSET(rcg), &cfg);
+
+ /* Duty-cycle cannot be modified if MND divider is in bypass mode. */
+ if (!(cfg & CFG_MODE_MASK))
+ return -EINVAL;

n = (~(notn_m) + m) & mask;

@@ -424,9 +430,11 @@ static int clk_rcg2_set_duty_cycle(struct clk_hw *hw, struct clk_duty *duty)
/* Calculate 2d value */
d = DIV_ROUND_CLOSEST(n * duty_per * 2, 100);

- /* Check bit widths of 2d. If D is too big reduce duty cycle. */
- if (d > mask)
- d = mask;
+ /*
+ * Check bit widths of 2d. If D is too big reduce duty cycle.
+ * Also make sure it is never zero.
+ */
+ d = clamp_val(d, 1, mask);

if ((d / 2) > (n - m))
d = (n - m) * 2;
diff --git a/drivers/clk/qcom/dispcc-sm8250.c b/drivers/clk/qcom/dispcc-sm8250.c
index db9379634fb2..f646fdfe6f15 100644
--- a/drivers/clk/qcom/dispcc-sm8250.c
+++ b/drivers/clk/qcom/dispcc-sm8250.c
@@ -1134,7 +1134,6 @@ static struct gdsc mdss_gdsc = {
},
.pwrsts = PWRSTS_OFF_ON,
.flags = HW_CTRL,
- .supply = "mmcx",
};

static struct clk_regmap *disp_cc_sm8250_clocks[] = {
diff --git a/drivers/clk/qcom/gcc-ipq8074.c b/drivers/clk/qcom/gcc-ipq8074.c
index 541016db3c4b..2c2ecfc5e61f 100644
--- a/drivers/clk/qcom/gcc-ipq8074.c
+++ b/drivers/clk/qcom/gcc-ipq8074.c
@@ -1788,8 +1788,10 @@ static struct clk_regmap_div nss_port4_tx_div_clk_src = {
static const struct freq_tbl ftbl_nss_port5_rx_clk_src[] = {
F(19200000, P_XO, 1, 0, 0),
F(25000000, P_UNIPHY1_RX, 12.5, 0, 0),
+ F(25000000, P_UNIPHY0_RX, 5, 0, 0),
F(78125000, P_UNIPHY1_RX, 4, 0, 0),
F(125000000, P_UNIPHY1_RX, 2.5, 0, 0),
+ F(125000000, P_UNIPHY0_RX, 1, 0, 0),
F(156250000, P_UNIPHY1_RX, 2, 0, 0),
F(312500000, P_UNIPHY1_RX, 1, 0, 0),
{ }
@@ -1828,8 +1830,10 @@ static struct clk_regmap_div nss_port5_rx_div_clk_src = {
static const struct freq_tbl ftbl_nss_port5_tx_clk_src[] = {
F(19200000, P_XO, 1, 0, 0),
F(25000000, P_UNIPHY1_TX, 12.5, 0, 0),
+ F(25000000, P_UNIPHY0_TX, 5, 0, 0),
F(78125000, P_UNIPHY1_TX, 4, 0, 0),
F(125000000, P_UNIPHY1_TX, 2.5, 0, 0),
+ F(125000000, P_UNIPHY0_TX, 1, 0, 0),
F(156250000, P_UNIPHY1_TX, 2, 0, 0),
F(312500000, P_UNIPHY1_TX, 1, 0, 0),
{ }
@@ -1867,8 +1871,10 @@ static struct clk_regmap_div nss_port5_tx_div_clk_src = {

static const struct freq_tbl ftbl_nss_port6_rx_clk_src[] = {
F(19200000, P_XO, 1, 0, 0),
+ F(25000000, P_UNIPHY2_RX, 5, 0, 0),
F(25000000, P_UNIPHY2_RX, 12.5, 0, 0),
F(78125000, P_UNIPHY2_RX, 4, 0, 0),
+ F(125000000, P_UNIPHY2_RX, 1, 0, 0),
F(125000000, P_UNIPHY2_RX, 2.5, 0, 0),
F(156250000, P_UNIPHY2_RX, 2, 0, 0),
F(312500000, P_UNIPHY2_RX, 1, 0, 0),
@@ -1907,8 +1913,10 @@ static struct clk_regmap_div nss_port6_rx_div_clk_src = {

static const struct freq_tbl ftbl_nss_port6_tx_clk_src[] = {
F(19200000, P_XO, 1, 0, 0),
+ F(25000000, P_UNIPHY2_TX, 5, 0, 0),
F(25000000, P_UNIPHY2_TX, 12.5, 0, 0),
F(78125000, P_UNIPHY2_TX, 4, 0, 0),
+ F(125000000, P_UNIPHY2_TX, 1, 0, 0),
F(125000000, P_UNIPHY2_TX, 2.5, 0, 0),
F(156250000, P_UNIPHY2_TX, 2, 0, 0),
F(312500000, P_UNIPHY2_TX, 1, 0, 0),
@@ -3346,6 +3354,7 @@ static struct clk_branch gcc_nssnoc_ubi1_ahb_clk = {

static struct clk_branch gcc_ubi0_ahb_clk = {
.halt_reg = 0x6820c,
+ .halt_check = BRANCH_HALT_DELAY,
.clkr = {
.enable_reg = 0x6820c,
.enable_mask = BIT(0),
@@ -3363,6 +3372,7 @@ static struct clk_branch gcc_ubi0_ahb_clk = {

static struct clk_branch gcc_ubi0_axi_clk = {
.halt_reg = 0x68200,
+ .halt_check = BRANCH_HALT_DELAY,
.clkr = {
.enable_reg = 0x68200,
.enable_mask = BIT(0),
@@ -3380,6 +3390,7 @@ static struct clk_branch gcc_ubi0_axi_clk = {

static struct clk_branch gcc_ubi0_nc_axi_clk = {
.halt_reg = 0x68204,
+ .halt_check = BRANCH_HALT_DELAY,
.clkr = {
.enable_reg = 0x68204,
.enable_mask = BIT(0),
@@ -3397,6 +3408,7 @@ static struct clk_branch gcc_ubi0_nc_axi_clk = {

static struct clk_branch gcc_ubi0_core_clk = {
.halt_reg = 0x68210,
+ .halt_check = BRANCH_HALT_DELAY,
.clkr = {
.enable_reg = 0x68210,
.enable_mask = BIT(0),
@@ -3414,6 +3426,7 @@ static struct clk_branch gcc_ubi0_core_clk = {

static struct clk_branch gcc_ubi0_mpt_clk = {
.halt_reg = 0x68208,
+ .halt_check = BRANCH_HALT_DELAY,
.clkr = {
.enable_reg = 0x68208,
.enable_mask = BIT(0),
@@ -3431,6 +3444,7 @@ static struct clk_branch gcc_ubi0_mpt_clk = {

static struct clk_branch gcc_ubi1_ahb_clk = {
.halt_reg = 0x6822c,
+ .halt_check = BRANCH_HALT_DELAY,
.clkr = {
.enable_reg = 0x6822c,
.enable_mask = BIT(0),
@@ -3448,6 +3462,7 @@ static struct clk_branch gcc_ubi1_ahb_clk = {

static struct clk_branch gcc_ubi1_axi_clk = {
.halt_reg = 0x68220,
+ .halt_check = BRANCH_HALT_DELAY,
.clkr = {
.enable_reg = 0x68220,
.enable_mask = BIT(0),
@@ -3465,6 +3480,7 @@ static struct clk_branch gcc_ubi1_axi_clk = {

static struct clk_branch gcc_ubi1_nc_axi_clk = {
.halt_reg = 0x68224,
+ .halt_check = BRANCH_HALT_DELAY,
.clkr = {
.enable_reg = 0x68224,
.enable_mask = BIT(0),
@@ -3482,6 +3498,7 @@ static struct clk_branch gcc_ubi1_nc_axi_clk = {

static struct clk_branch gcc_ubi1_core_clk = {
.halt_reg = 0x68230,
+ .halt_check = BRANCH_HALT_DELAY,
.clkr = {
.enable_reg = 0x68230,
.enable_mask = BIT(0),
@@ -3499,6 +3516,7 @@ static struct clk_branch gcc_ubi1_core_clk = {

static struct clk_branch gcc_ubi1_mpt_clk = {
.halt_reg = 0x68228,
+ .halt_check = BRANCH_HALT_DELAY,
.clkr = {
.enable_reg = 0x68228,
.enable_mask = BIT(0),
@@ -4371,6 +4389,33 @@ static struct clk_branch gcc_pcie0_axi_s_bridge_clk = {
},
};

+static const struct alpha_pll_config ubi32_pll_config = {
+ .l = 0x4e,
+ .config_ctl_val = 0x200d4aa8,
+ .config_ctl_hi_val = 0x3c2,
+ .main_output_mask = BIT(0),
+ .aux_output_mask = BIT(1),
+ .pre_div_val = 0x0,
+ .pre_div_mask = BIT(12),
+ .post_div_val = 0x0,
+ .post_div_mask = GENMASK(9, 8),
+};
+
+static const struct alpha_pll_config nss_crypto_pll_config = {
+ .l = 0x3e,
+ .alpha = 0x0,
+ .alpha_hi = 0x80,
+ .config_ctl_val = 0x4001055b,
+ .main_output_mask = BIT(0),
+ .pre_div_val = 0x0,
+ .pre_div_mask = GENMASK(14, 12),
+ .post_div_val = 0x1 << 8,
+ .post_div_mask = GENMASK(11, 8),
+ .vco_mask = GENMASK(21, 20),
+ .vco_val = 0x0,
+ .alpha_en_mask = BIT(24),
+};
+
static struct clk_hw *gcc_ipq8074_hws[] = {
&gpll0_out_main_div2.hw,
&gpll6_out_main_div2.hw,
@@ -4772,7 +4817,20 @@ static const struct qcom_cc_desc gcc_ipq8074_desc = {

static int gcc_ipq8074_probe(struct platform_device *pdev)
{
- return qcom_cc_probe(pdev, &gcc_ipq8074_desc);
+ struct regmap *regmap;
+
+ regmap = qcom_cc_map(pdev, &gcc_ipq8074_desc);
+ if (IS_ERR(regmap))
+ return PTR_ERR(regmap);
+
+ /* SW Workaround for UBI32 Huayra PLL */
+ regmap_update_bits(regmap, 0x2501c, BIT(26), BIT(26));
+
+ clk_alpha_pll_configure(&ubi32_pll_main, regmap, &ubi32_pll_config);
+ clk_alpha_pll_configure(&nss_crypto_pll_main, regmap,
+ &nss_crypto_pll_config);
+
+ return qcom_cc_really_probe(pdev, &gcc_ipq8074_desc, regmap);
}

static struct platform_driver gcc_ipq8074_driver = {
diff --git a/drivers/clk/qcom/gcc-msm8939.c b/drivers/clk/qcom/gcc-msm8939.c
index 39ebb443ae3d..de0022e5450d 100644
--- a/drivers/clk/qcom/gcc-msm8939.c
+++ b/drivers/clk/qcom/gcc-msm8939.c
@@ -632,7 +632,7 @@ static struct clk_rcg2 system_noc_bfdcd_clk_src = {
};

static struct clk_rcg2 bimc_ddr_clk_src = {
- .cmd_rcgr = 0x32004,
+ .cmd_rcgr = 0x32024,
.hid_width = 5,
.parent_map = gcc_xo_gpll0_bimc_map,
.clkr.hw.init = &(struct clk_init_data){
@@ -644,6 +644,18 @@ static struct clk_rcg2 bimc_ddr_clk_src = {
},
};

+static struct clk_rcg2 system_mm_noc_bfdcd_clk_src = {
+ .cmd_rcgr = 0x2600c,
+ .hid_width = 5,
+ .parent_map = gcc_xo_gpll0_gpll6a_map,
+ .clkr.hw.init = &(struct clk_init_data){
+ .name = "system_mm_noc_bfdcd_clk_src",
+ .parent_data = gcc_xo_gpll0_gpll6a_parent_data,
+ .num_parents = 3,
+ .ops = &clk_rcg2_ops,
+ },
+};
+
static const struct freq_tbl ftbl_gcc_camss_ahb_clk[] = {
F(40000000, P_GPLL0, 10, 1, 2),
F(80000000, P_GPLL0, 10, 0, 0),
@@ -1002,7 +1014,7 @@ static struct clk_rcg2 blsp1_uart2_apps_clk_src = {
};

static const struct freq_tbl ftbl_gcc_camss_cci_clk[] = {
- F(19200000, P_XO, 1, 0, 0),
+ F(19200000, P_XO, 1, 0, 0),
{ }
};

@@ -2441,7 +2453,7 @@ static struct clk_branch gcc_camss_jpeg_axi_clk = {
.hw.init = &(struct clk_init_data){
.name = "gcc_camss_jpeg_axi_clk",
.parent_data = &(const struct clk_parent_data){
- .hw = &system_noc_bfdcd_clk_src.clkr.hw,
+ .hw = &system_mm_noc_bfdcd_clk_src.clkr.hw,
},
.num_parents = 1,
.flags = CLK_SET_RATE_PARENT,
@@ -2645,7 +2657,7 @@ static struct clk_branch gcc_camss_vfe_axi_clk = {
.hw.init = &(struct clk_init_data){
.name = "gcc_camss_vfe_axi_clk",
.parent_data = &(const struct clk_parent_data){
- .hw = &system_noc_bfdcd_clk_src.clkr.hw,
+ .hw = &system_mm_noc_bfdcd_clk_src.clkr.hw,
},
.num_parents = 1,
.flags = CLK_SET_RATE_PARENT,
@@ -2801,7 +2813,7 @@ static struct clk_branch gcc_mdss_axi_clk = {
.hw.init = &(struct clk_init_data){
.name = "gcc_mdss_axi_clk",
.parent_data = &(const struct clk_parent_data){
- .hw = &system_noc_bfdcd_clk_src.clkr.hw,
+ .hw = &system_mm_noc_bfdcd_clk_src.clkr.hw,
},
.num_parents = 1,
.flags = CLK_SET_RATE_PARENT,
@@ -3193,7 +3205,7 @@ static struct clk_branch gcc_mdp_tbu_clk = {
.hw.init = &(struct clk_init_data){
.name = "gcc_mdp_tbu_clk",
.parent_data = &(const struct clk_parent_data){
- .hw = &system_noc_bfdcd_clk_src.clkr.hw,
+ .hw = &system_mm_noc_bfdcd_clk_src.clkr.hw,
},
.num_parents = 1,
.flags = CLK_SET_RATE_PARENT,
@@ -3211,7 +3223,7 @@ static struct clk_branch gcc_venus_tbu_clk = {
.hw.init = &(struct clk_init_data){
.name = "gcc_venus_tbu_clk",
.parent_data = &(const struct clk_parent_data){
- .hw = &system_noc_bfdcd_clk_src.clkr.hw,
+ .hw = &system_mm_noc_bfdcd_clk_src.clkr.hw,
},
.num_parents = 1,
.flags = CLK_SET_RATE_PARENT,
@@ -3229,7 +3241,7 @@ static struct clk_branch gcc_vfe_tbu_clk = {
.hw.init = &(struct clk_init_data){
.name = "gcc_vfe_tbu_clk",
.parent_data = &(const struct clk_parent_data){
- .hw = &system_noc_bfdcd_clk_src.clkr.hw,
+ .hw = &system_mm_noc_bfdcd_clk_src.clkr.hw,
},
.num_parents = 1,
.flags = CLK_SET_RATE_PARENT,
@@ -3247,7 +3259,7 @@ static struct clk_branch gcc_jpeg_tbu_clk = {
.hw.init = &(struct clk_init_data){
.name = "gcc_jpeg_tbu_clk",
.parent_data = &(const struct clk_parent_data){
- .hw = &system_noc_bfdcd_clk_src.clkr.hw,
+ .hw = &system_mm_noc_bfdcd_clk_src.clkr.hw,
},
.num_parents = 1,
.flags = CLK_SET_RATE_PARENT,
@@ -3484,7 +3496,7 @@ static struct clk_branch gcc_venus0_axi_clk = {
.hw.init = &(struct clk_init_data){
.name = "gcc_venus0_axi_clk",
.parent_data = &(const struct clk_parent_data){
- .hw = &system_noc_bfdcd_clk_src.clkr.hw,
+ .hw = &system_mm_noc_bfdcd_clk_src.clkr.hw,
},
.num_parents = 1,
.flags = CLK_SET_RATE_PARENT,
@@ -3623,6 +3635,7 @@ static struct clk_regmap *gcc_msm8939_clocks[] = {
[GPLL2_VOTE] = &gpll2_vote,
[PCNOC_BFDCD_CLK_SRC] = &pcnoc_bfdcd_clk_src.clkr,
[SYSTEM_NOC_BFDCD_CLK_SRC] = &system_noc_bfdcd_clk_src.clkr,
+ [SYSTEM_MM_NOC_BFDCD_CLK_SRC] = &system_mm_noc_bfdcd_clk_src.clkr,
[CAMSS_AHB_CLK_SRC] = &camss_ahb_clk_src.clkr,
[APSS_AHB_CLK_SRC] = &apss_ahb_clk_src.clkr,
[CSI0_CLK_SRC] = &csi0_clk_src.clkr,
diff --git a/drivers/clk/qcom/gdsc.c b/drivers/clk/qcom/gdsc.c
index 44520efc6c72..2db0938f8dd3 100644
--- a/drivers/clk/qcom/gdsc.c
+++ b/drivers/clk/qcom/gdsc.c
@@ -420,6 +420,14 @@ static int gdsc_init(struct gdsc *sc)
return ret;
}

+ /* ...and the power-domain */
+ ret = gdsc_pm_runtime_get(sc);
+ if (ret) {
+ if (sc->rsupply)
+ regulator_disable(sc->rsupply);
+ return ret;
+ }
+
/*
* Votable GDSCs can be ON due to Vote from other masters.
* If a Votable GDSC is ON, make sure we have a Vote.
diff --git a/drivers/clk/qcom/videocc-sm8250.c b/drivers/clk/qcom/videocc-sm8250.c
index 8617454e4a77..f28f2cb051d7 100644
--- a/drivers/clk/qcom/videocc-sm8250.c
+++ b/drivers/clk/qcom/videocc-sm8250.c
@@ -277,7 +277,6 @@ static struct gdsc mvs0c_gdsc = {
},
.flags = 0,
.pwrsts = PWRSTS_OFF_ON,
- .supply = "mmcx",
};

static struct gdsc mvs1c_gdsc = {
@@ -287,7 +286,6 @@ static struct gdsc mvs1c_gdsc = {
},
.flags = 0,
.pwrsts = PWRSTS_OFF_ON,
- .supply = "mmcx",
};

static struct gdsc mvs0_gdsc = {
@@ -297,7 +295,6 @@ static struct gdsc mvs0_gdsc = {
},
.flags = HW_CTRL,
.pwrsts = PWRSTS_OFF_ON,
- .supply = "mmcx",
};

static struct gdsc mvs1_gdsc = {
@@ -307,7 +304,6 @@ static struct gdsc mvs1_gdsc = {
},
.flags = HW_CTRL,
.pwrsts = PWRSTS_OFF_ON,
- .supply = "mmcx",
};

static struct clk_regmap *video_cc_sm8250_clocks[] = {
diff --git a/drivers/clk/renesas/r9a06g032-clocks.c b/drivers/clk/renesas/r9a06g032-clocks.c
index c99942f0e4d4..abc0891fd96d 100644
--- a/drivers/clk/renesas/r9a06g032-clocks.c
+++ b/drivers/clk/renesas/r9a06g032-clocks.c
@@ -286,8 +286,8 @@ static const struct r9a06g032_clkdesc r9a06g032_clocks[] = {
.name = "uart_group_012",
.type = K_BITSEL,
.source = 1 + R9A06G032_DIV_UART,
- /* R9A06G032_SYSCTRL_REG_PWRCTRL_PG1_PR2 */
- .dual.sel = ((0xec / 4) << 5) | 24,
+ /* R9A06G032_SYSCTRL_REG_PWRCTRL_PG0_0 */
+ .dual.sel = ((0x34 / 4) << 5) | 30,
.dual.group = 0,
},
{
@@ -295,8 +295,8 @@ static const struct r9a06g032_clkdesc r9a06g032_clocks[] = {
.name = "uart_group_34567",
.type = K_BITSEL,
.source = 1 + R9A06G032_DIV_P2_PG,
- /* R9A06G032_SYSCTRL_REG_PWRCTRL_PG0_0 */
- .dual.sel = ((0x34 / 4) << 5) | 30,
+ /* R9A06G032_SYSCTRL_REG_PWRCTRL_PG1_PR2 */
+ .dual.sel = ((0xec / 4) << 5) | 24,
.dual.group = 1,
},
D_UGATE(CLK_UART0, "clk_uart0", UART_GROUP_012, 0, 0, 0x1b2, 0x1b3, 0x1b4, 0x1b5),
diff --git a/drivers/clk/renesas/rzg2l-cpg.c b/drivers/clk/renesas/rzg2l-cpg.c
index 486d0656c58a..1068058a3865 100644
--- a/drivers/clk/renesas/rzg2l-cpg.c
+++ b/drivers/clk/renesas/rzg2l-cpg.c
@@ -744,7 +744,7 @@ static int rzg2l_cpg_status(struct reset_controller_dev *rcdev,
unsigned int reg = info->resets[id].off;
u32 bitmask = BIT(info->resets[id].bit);

- return !(readl(priv->base + CLK_MRST_R(reg)) & bitmask);
+ return !!(readl(priv->base + CLK_MRST_R(reg)) & bitmask);
}

static const struct reset_control_ops rzg2l_cpg_reset_ops = {
diff --git a/drivers/crypto/allwinner/sun8i-ss/sun8i-ss-cipher.c b/drivers/crypto/allwinner/sun8i-ss/sun8i-ss-cipher.c
index 70e2e6e37389..3c46ad8c3a1c 100644
--- a/drivers/crypto/allwinner/sun8i-ss/sun8i-ss-cipher.c
+++ b/drivers/crypto/allwinner/sun8i-ss/sun8i-ss-cipher.c
@@ -151,6 +151,7 @@ static int sun8i_ss_setup_ivs(struct skcipher_request *areq)
while (i >= 0) {
dma_unmap_single(ss->dev, rctx->p_iv[i], ivsize, DMA_TO_DEVICE);
memzero_explicit(sf->iv[i], ivsize);
+ i--;
}
return err;
}
diff --git a/drivers/crypto/allwinner/sun8i-ss/sun8i-ss-core.c b/drivers/crypto/allwinner/sun8i-ss/sun8i-ss-core.c
index 657530578643..47b5828e35c3 100644
--- a/drivers/crypto/allwinner/sun8i-ss/sun8i-ss-core.c
+++ b/drivers/crypto/allwinner/sun8i-ss/sun8i-ss-core.c
@@ -476,14 +476,32 @@ static int allocate_flows(struct sun8i_ss_dev *ss)

ss->flows[i].biv = devm_kmalloc(ss->dev, AES_BLOCK_SIZE,
GFP_KERNEL | GFP_DMA);
- if (!ss->flows[i].biv)
+ if (!ss->flows[i].biv) {
+ err = -ENOMEM;
goto error_engine;
+ }

for (j = 0; j < MAX_SG; j++) {
ss->flows[i].iv[j] = devm_kmalloc(ss->dev, AES_BLOCK_SIZE,
GFP_KERNEL | GFP_DMA);
- if (!ss->flows[i].iv[j])
+ if (!ss->flows[i].iv[j]) {
+ err = -ENOMEM;
goto error_engine;
+ }
+ }
+
+ /* the padding could be up to two block. */
+ ss->flows[i].pad = devm_kmalloc(ss->dev, SHA256_BLOCK_SIZE * 2,
+ GFP_KERNEL | GFP_DMA);
+ if (!ss->flows[i].pad) {
+ err = -ENOMEM;
+ goto error_engine;
+ }
+ ss->flows[i].result = devm_kmalloc(ss->dev, SHA256_DIGEST_SIZE,
+ GFP_KERNEL | GFP_DMA);
+ if (!ss->flows[i].result) {
+ err = -ENOMEM;
+ goto error_engine;
}

ss->flows[i].engine = crypto_engine_alloc_init(ss->dev, true);
diff --git a/drivers/crypto/allwinner/sun8i-ss/sun8i-ss-hash.c b/drivers/crypto/allwinner/sun8i-ss/sun8i-ss-hash.c
index ca4f280af35d..f89a580618aa 100644
--- a/drivers/crypto/allwinner/sun8i-ss/sun8i-ss-hash.c
+++ b/drivers/crypto/allwinner/sun8i-ss/sun8i-ss-hash.c
@@ -342,18 +342,11 @@ int sun8i_ss_hash_run(struct crypto_engine *engine, void *breq)
if (digestsize == SHA224_DIGEST_SIZE)
digestsize = SHA256_DIGEST_SIZE;

- /* the padding could be up to two block. */
- pad = kzalloc(algt->alg.hash.halg.base.cra_blocksize * 2, GFP_KERNEL | GFP_DMA);
- if (!pad)
- return -ENOMEM;
+ result = ss->flows[rctx->flow].result;
+ pad = ss->flows[rctx->flow].pad;
+ memset(pad, 0, algt->alg.hash.halg.base.cra_blocksize * 2);
bf = (__le32 *)pad;

- result = kzalloc(digestsize, GFP_KERNEL | GFP_DMA);
- if (!result) {
- kfree(pad);
- return -ENOMEM;
- }
-
for (i = 0; i < MAX_SG; i++) {
rctx->t_dst[i].addr = 0;
rctx->t_dst[i].len = 0;
@@ -449,8 +442,6 @@ int sun8i_ss_hash_run(struct crypto_engine *engine, void *breq)

memcpy(areq->result, result, algt->alg.hash.halg.digestsize);
theend:
- kfree(pad);
- kfree(result);
local_bh_disable();
crypto_finalize_hash_request(engine, breq, err);
local_bh_enable();
diff --git a/drivers/crypto/allwinner/sun8i-ss/sun8i-ss.h b/drivers/crypto/allwinner/sun8i-ss/sun8i-ss.h
index 57ada8653855..eb82ee5345ae 100644
--- a/drivers/crypto/allwinner/sun8i-ss/sun8i-ss.h
+++ b/drivers/crypto/allwinner/sun8i-ss/sun8i-ss.h
@@ -123,6 +123,8 @@ struct sginfo {
* @stat_req: number of request done by this flow
* @iv: list of IV to use for each step
* @biv: buffer which contain the backuped IV
+ * @pad: padding buffer for hash operations
+ * @result: buffer for storing the result of hash operations
*/
struct sun8i_ss_flow {
struct crypto_engine *engine;
@@ -130,6 +132,8 @@ struct sun8i_ss_flow {
int status;
u8 *iv[MAX_SG];
u8 *biv;
+ void *pad;
+ void *result;
#ifdef CONFIG_CRYPTO_DEV_SUN8I_SS_DEBUG
unsigned long stat_req;
#endif
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 3aefb177715e..95dbab0ffab2 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -503,7 +503,7 @@ static int __sev_platform_shutdown_locked(int *error)
struct sev_device *sev = psp_master->sev_data;
int ret;

- if (sev->state == SEV_STATE_UNINIT)
+ if (!sev || sev->state == SEV_STATE_UNINIT)
return 0;

ret = __sev_do_cmd_locked(SEV_CMD_SHUTDOWN, NULL, error);
@@ -577,6 +577,8 @@ static int sev_ioctl_do_platform_status(struct sev_issue_cmd *argp)
struct sev_user_data_status data;
int ret;

+ memset(&data, 0, sizeof(data));
+
ret = __sev_do_cmd_locked(SEV_CMD_PLATFORM_STATUS, &data, &argp->error);
if (ret)
return ret;
@@ -630,7 +632,7 @@ static int sev_ioctl_do_pek_csr(struct sev_issue_cmd *argp, bool writable)
if (input.length > SEV_FW_BLOB_MAX_SIZE)
return -EFAULT;

- blob = kmalloc(input.length, GFP_KERNEL);
+ blob = kzalloc(input.length, GFP_KERNEL);
if (!blob)
return -ENOMEM;

@@ -854,7 +856,7 @@ static int sev_ioctl_do_get_id2(struct sev_issue_cmd *argp)
input_address = (void __user *)input.address;

if (input.address && input.length) {
- id_blob = kmalloc(input.length, GFP_KERNEL);
+ id_blob = kzalloc(input.length, GFP_KERNEL);
if (!id_blob)
return -ENOMEM;

@@ -973,14 +975,14 @@ static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp, bool writable)
if (input.cert_chain_len > SEV_FW_BLOB_MAX_SIZE)
return -EFAULT;

- pdh_blob = kmalloc(input.pdh_cert_len, GFP_KERNEL);
+ pdh_blob = kzalloc(input.pdh_cert_len, GFP_KERNEL);
if (!pdh_blob)
return -ENOMEM;

data.pdh_cert_address = __psp_pa(pdh_blob);
data.pdh_cert_len = input.pdh_cert_len;

- cert_blob = kmalloc(input.cert_chain_len, GFP_KERNEL);
+ cert_blob = kzalloc(input.cert_chain_len, GFP_KERNEL);
if (!cert_blob) {
ret = -ENOMEM;
goto e_free_pdh;
diff --git a/drivers/crypto/hisilicon/hpre/hpre_crypto.c b/drivers/crypto/hisilicon/hpre/hpre_crypto.c
index 97d54c1465c2..3ba6f15deafc 100644
--- a/drivers/crypto/hisilicon/hpre/hpre_crypto.c
+++ b/drivers/crypto/hisilicon/hpre/hpre_crypto.c
@@ -252,7 +252,7 @@ static int hpre_prepare_dma_buf(struct hpre_asym_request *hpre_req,
if (unlikely(shift < 0))
return -EINVAL;

- ptr = dma_alloc_coherent(dev, ctx->key_sz, tmp, GFP_KERNEL);
+ ptr = dma_alloc_coherent(dev, ctx->key_sz, tmp, GFP_ATOMIC);
if (unlikely(!ptr))
return -ENOMEM;

diff --git a/drivers/crypto/hisilicon/sec/sec_algs.c b/drivers/crypto/hisilicon/sec/sec_algs.c
index 0a3c8f019b02..490e1542305e 100644
--- a/drivers/crypto/hisilicon/sec/sec_algs.c
+++ b/drivers/crypto/hisilicon/sec/sec_algs.c
@@ -449,7 +449,7 @@ static void sec_skcipher_alg_callback(struct sec_bd_info *sec_resp,
*/
}

- mutex_lock(&ctx->queue->queuelock);
+ spin_lock_bh(&ctx->queue->queuelock);
/* Put the IV in place for chained cases */
switch (ctx->cipher_alg) {
case SEC_C_AES_CBC_128:
@@ -509,7 +509,7 @@ static void sec_skcipher_alg_callback(struct sec_bd_info *sec_resp,
list_del(&backlog_req->backlog_head);
}
}
- mutex_unlock(&ctx->queue->queuelock);
+ spin_unlock_bh(&ctx->queue->queuelock);

mutex_lock(&sec_req->lock);
list_del(&sec_req_el->head);
@@ -798,7 +798,7 @@ static int sec_alg_skcipher_crypto(struct skcipher_request *skreq,
*/

/* Grab a big lock for a long time to avoid concurrency issues */
- mutex_lock(&queue->queuelock);
+ spin_lock_bh(&queue->queuelock);

/*
* Can go on to queue if we have space in either:
@@ -814,15 +814,15 @@ static int sec_alg_skcipher_crypto(struct skcipher_request *skreq,
ret = -EBUSY;
if ((skreq->base.flags & CRYPTO_TFM_REQ_MAY_BACKLOG)) {
list_add_tail(&sec_req->backlog_head, &ctx->backlog);
- mutex_unlock(&queue->queuelock);
+ spin_unlock_bh(&queue->queuelock);
goto out;
}

- mutex_unlock(&queue->queuelock);
+ spin_unlock_bh(&queue->queuelock);
goto err_free_elements;
}
ret = sec_send_request(sec_req, queue);
- mutex_unlock(&queue->queuelock);
+ spin_unlock_bh(&queue->queuelock);
if (ret)
goto err_free_elements;

@@ -881,7 +881,7 @@ static int sec_alg_skcipher_init(struct crypto_skcipher *tfm)
if (IS_ERR(ctx->queue))
return PTR_ERR(ctx->queue);

- mutex_init(&ctx->queue->queuelock);
+ spin_lock_init(&ctx->queue->queuelock);
ctx->queue->havesoftqueue = false;

return 0;
diff --git a/drivers/crypto/hisilicon/sec/sec_drv.h b/drivers/crypto/hisilicon/sec/sec_drv.h
index 179a8250d691..e2a50bf2234b 100644
--- a/drivers/crypto/hisilicon/sec/sec_drv.h
+++ b/drivers/crypto/hisilicon/sec/sec_drv.h
@@ -347,7 +347,7 @@ struct sec_queue {
DECLARE_BITMAP(unprocessed, SEC_QUEUE_LEN);
DECLARE_KFIFO_PTR(softqueue, typeof(struct sec_request_el *));
bool havesoftqueue;
- struct mutex queuelock;
+ spinlock_t queuelock;
void *shadow[SEC_QUEUE_LEN];
};

diff --git a/drivers/crypto/hisilicon/sec2/sec.h b/drivers/crypto/hisilicon/sec2/sec.h
index c2e9b01187a7..a44c8dba3cda 100644
--- a/drivers/crypto/hisilicon/sec2/sec.h
+++ b/drivers/crypto/hisilicon/sec2/sec.h
@@ -119,7 +119,7 @@ struct sec_qp_ctx {
struct idr req_idr;
struct sec_alg_res res[QM_Q_DEPTH];
struct sec_ctx *ctx;
- struct mutex req_lock;
+ spinlock_t req_lock;
struct list_head backlog;
struct hisi_acc_sgl_pool *c_in_pool;
struct hisi_acc_sgl_pool *c_out_pool;
diff --git a/drivers/crypto/hisilicon/sec2/sec_crypto.c b/drivers/crypto/hisilicon/sec2/sec_crypto.c
index a91635c348b5..dbaa6c918cfd 100644
--- a/drivers/crypto/hisilicon/sec2/sec_crypto.c
+++ b/drivers/crypto/hisilicon/sec2/sec_crypto.c
@@ -127,11 +127,11 @@ static int sec_alloc_req_id(struct sec_req *req, struct sec_qp_ctx *qp_ctx)
{
int req_id;

- mutex_lock(&qp_ctx->req_lock);
+ spin_lock_bh(&qp_ctx->req_lock);

req_id = idr_alloc_cyclic(&qp_ctx->req_idr, NULL,
0, QM_Q_DEPTH, GFP_ATOMIC);
- mutex_unlock(&qp_ctx->req_lock);
+ spin_unlock_bh(&qp_ctx->req_lock);
if (unlikely(req_id < 0)) {
dev_err(req->ctx->dev, "alloc req id fail!\n");
return req_id;
@@ -156,9 +156,9 @@ static void sec_free_req_id(struct sec_req *req)
qp_ctx->req_list[req_id] = NULL;
req->qp_ctx = NULL;

- mutex_lock(&qp_ctx->req_lock);
+ spin_lock_bh(&qp_ctx->req_lock);
idr_remove(&qp_ctx->req_idr, req_id);
- mutex_unlock(&qp_ctx->req_lock);
+ spin_unlock_bh(&qp_ctx->req_lock);
}

static u8 pre_parse_finished_bd(struct bd_status *status, void *resp)
@@ -273,7 +273,7 @@ static int sec_bd_send(struct sec_ctx *ctx, struct sec_req *req)
!(req->flag & CRYPTO_TFM_REQ_MAY_BACKLOG))
return -EBUSY;

- mutex_lock(&qp_ctx->req_lock);
+ spin_lock_bh(&qp_ctx->req_lock);
ret = hisi_qp_send(qp_ctx->qp, &req->sec_sqe);

if (ctx->fake_req_limit <=
@@ -281,10 +281,10 @@ static int sec_bd_send(struct sec_ctx *ctx, struct sec_req *req)
list_add_tail(&req->backlog_head, &qp_ctx->backlog);
atomic64_inc(&ctx->sec->debug.dfx.send_cnt);
atomic64_inc(&ctx->sec->debug.dfx.send_busy_cnt);
- mutex_unlock(&qp_ctx->req_lock);
+ spin_unlock_bh(&qp_ctx->req_lock);
return -EBUSY;
}
- mutex_unlock(&qp_ctx->req_lock);
+ spin_unlock_bh(&qp_ctx->req_lock);

if (unlikely(ret == -EBUSY))
return -ENOBUFS;
@@ -487,7 +487,7 @@ static int sec_create_qp_ctx(struct hisi_qm *qm, struct sec_ctx *ctx,

qp->req_cb = sec_req_cb;

- mutex_init(&qp_ctx->req_lock);
+ spin_lock_init(&qp_ctx->req_lock);
idr_init(&qp_ctx->req_idr);
INIT_LIST_HEAD(&qp_ctx->backlog);

@@ -620,7 +620,7 @@ static int sec_auth_init(struct sec_ctx *ctx)
{
struct sec_auth_ctx *a_ctx = &ctx->a_ctx;

- a_ctx->a_key = dma_alloc_coherent(ctx->dev, SEC_MAX_KEY_SIZE,
+ a_ctx->a_key = dma_alloc_coherent(ctx->dev, SEC_MAX_AKEY_SIZE,
&a_ctx->a_key_dma, GFP_KERNEL);
if (!a_ctx->a_key)
return -ENOMEM;
@@ -632,8 +632,8 @@ static void sec_auth_uninit(struct sec_ctx *ctx)
{
struct sec_auth_ctx *a_ctx = &ctx->a_ctx;

- memzero_explicit(a_ctx->a_key, SEC_MAX_KEY_SIZE);
- dma_free_coherent(ctx->dev, SEC_MAX_KEY_SIZE,
+ memzero_explicit(a_ctx->a_key, SEC_MAX_AKEY_SIZE);
+ dma_free_coherent(ctx->dev, SEC_MAX_AKEY_SIZE,
a_ctx->a_key, a_ctx->a_key_dma);
}

@@ -1382,7 +1382,7 @@ static struct sec_req *sec_back_req_clear(struct sec_ctx *ctx,
{
struct sec_req *backlog_req = NULL;

- mutex_lock(&qp_ctx->req_lock);
+ spin_lock_bh(&qp_ctx->req_lock);
if (ctx->fake_req_limit >=
atomic_read(&qp_ctx->qp->qp_status.used) &&
!list_empty(&qp_ctx->backlog)) {
@@ -1390,7 +1390,7 @@ static struct sec_req *sec_back_req_clear(struct sec_ctx *ctx,
typeof(*backlog_req), backlog_head);
list_del(&backlog_req->backlog_head);
}
- mutex_unlock(&qp_ctx->req_lock);
+ spin_unlock_bh(&qp_ctx->req_lock);

return backlog_req;
}
diff --git a/drivers/crypto/hisilicon/sec2/sec_crypto.h b/drivers/crypto/hisilicon/sec2/sec_crypto.h
index 5e039b50e9d4..d033f63b583f 100644
--- a/drivers/crypto/hisilicon/sec2/sec_crypto.h
+++ b/drivers/crypto/hisilicon/sec2/sec_crypto.h
@@ -7,6 +7,7 @@
#define SEC_AIV_SIZE 12
#define SEC_IV_SIZE 24
#define SEC_MAX_KEY_SIZE 64
+#define SEC_MAX_AKEY_SIZE 128
#define SEC_COMM_SCENE 0
#define SEC_MIN_BLOCK_SZ 1

diff --git a/drivers/crypto/inside-secure/safexcel.c b/drivers/crypto/inside-secure/safexcel.c
index 9ff885d50edf..389a7b51f1f3 100644
--- a/drivers/crypto/inside-secure/safexcel.c
+++ b/drivers/crypto/inside-secure/safexcel.c
@@ -1831,6 +1831,8 @@ static const struct of_device_id safexcel_of_match_table[] = {
{},
};

+MODULE_DEVICE_TABLE(of, safexcel_of_match_table);
+
static struct platform_driver crypto_safexcel = {
.probe = safexcel_probe,
.remove = safexcel_remove,
diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
index 468d1097a1ec..f23569e4b0bd 100644
--- a/drivers/dma/dw-edma/dw-edma-core.c
+++ b/drivers/dma/dw-edma/dw-edma-core.c
@@ -423,7 +423,7 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
chunk->ll_region.sz += burst->sz;
desc->alloc_sz += burst->sz;

- if (chan->dir == EDMA_DIR_WRITE) {
+ if (dir == DMA_DEV_TO_MEM) {
burst->sar = src_addr;
if (xfer->type == EDMA_XFER_CYCLIC) {
burst->dar = xfer->xfer.cyclic.paddr;
diff --git a/drivers/dma/imx-dma.c b/drivers/dma/imx-dma.c
index 2ddc31e64db0..da31e73d24d4 100644
--- a/drivers/dma/imx-dma.c
+++ b/drivers/dma/imx-dma.c
@@ -1047,7 +1047,7 @@ static int __init imxdma_probe(struct platform_device *pdev)
return -ENOMEM;

imxdma->dev = &pdev->dev;
- imxdma->devtype = (enum imx_dma_type)of_device_get_match_data(&pdev->dev);
+ imxdma->devtype = (uintptr_t)of_device_get_match_data(&pdev->dev);

res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
imxdma->base = devm_ioremap_resource(&pdev->dev, res);
diff --git a/drivers/dma/sf-pdma/sf-pdma.c b/drivers/dma/sf-pdma/sf-pdma.c
index f12606aeff87..ab0ad7a2f201 100644
--- a/drivers/dma/sf-pdma/sf-pdma.c
+++ b/drivers/dma/sf-pdma/sf-pdma.c
@@ -52,16 +52,6 @@ static inline struct sf_pdma_desc *to_sf_pdma_desc(struct virt_dma_desc *vd)
static struct sf_pdma_desc *sf_pdma_alloc_desc(struct sf_pdma_chan *chan)
{
struct sf_pdma_desc *desc;
- unsigned long flags;
-
- spin_lock_irqsave(&chan->lock, flags);
-
- if (chan->desc && !chan->desc->in_use) {
- spin_unlock_irqrestore(&chan->lock, flags);
- return chan->desc;
- }
-
- spin_unlock_irqrestore(&chan->lock, flags);

desc = kzalloc(sizeof(*desc), GFP_NOWAIT);
if (!desc)
@@ -111,7 +101,6 @@ sf_pdma_prep_dma_memcpy(struct dma_chan *dchan, dma_addr_t dest, dma_addr_t src,
desc->async_tx = vchan_tx_prep(&chan->vchan, &desc->vdesc, flags);

spin_lock_irqsave(&chan->vchan.lock, iflags);
- chan->desc = desc;
sf_pdma_fill_desc(desc, dest, src, len);
spin_unlock_irqrestore(&chan->vchan.lock, iflags);

@@ -170,11 +159,17 @@ static size_t sf_pdma_desc_residue(struct sf_pdma_chan *chan,
unsigned long flags;
u64 residue = 0;
struct sf_pdma_desc *desc;
- struct dma_async_tx_descriptor *tx;
+ struct dma_async_tx_descriptor *tx = NULL;

spin_lock_irqsave(&chan->vchan.lock, flags);

- tx = &chan->desc->vdesc.tx;
+ list_for_each_entry(vd, &chan->vchan.desc_submitted, node)
+ if (vd->tx.cookie == cookie)
+ tx = &vd->tx;
+
+ if (!tx)
+ goto out;
+
if (cookie == tx->chan->completed_cookie)
goto out;

@@ -241,6 +236,19 @@ static void sf_pdma_enable_request(struct sf_pdma_chan *chan)
writel(v, regs->ctrl);
}

+static struct sf_pdma_desc *sf_pdma_get_first_pending_desc(struct sf_pdma_chan *chan)
+{
+ struct virt_dma_chan *vchan = &chan->vchan;
+ struct virt_dma_desc *vdesc;
+
+ if (list_empty(&vchan->desc_issued))
+ return NULL;
+
+ vdesc = list_first_entry(&vchan->desc_issued, struct virt_dma_desc, node);
+
+ return container_of(vdesc, struct sf_pdma_desc, vdesc);
+}
+
static void sf_pdma_xfer_desc(struct sf_pdma_chan *chan)
{
struct sf_pdma_desc *desc = chan->desc;
@@ -268,8 +276,11 @@ static void sf_pdma_issue_pending(struct dma_chan *dchan)

spin_lock_irqsave(&chan->vchan.lock, flags);

- if (vchan_issue_pending(&chan->vchan) && chan->desc)
+ if (!chan->desc && vchan_issue_pending(&chan->vchan)) {
+ /* vchan_issue_pending has made a check that desc in not NULL */
+ chan->desc = sf_pdma_get_first_pending_desc(chan);
sf_pdma_xfer_desc(chan);
+ }

spin_unlock_irqrestore(&chan->vchan.lock, flags);
}
@@ -298,6 +309,11 @@ static void sf_pdma_donebh_tasklet(struct tasklet_struct *t)
spin_lock_irqsave(&chan->vchan.lock, flags);
list_del(&chan->desc->vdesc.node);
vchan_cookie_complete(&chan->desc->vdesc);
+
+ chan->desc = sf_pdma_get_first_pending_desc(chan);
+ if (chan->desc)
+ sf_pdma_xfer_desc(chan);
+
spin_unlock_irqrestore(&chan->vchan.lock, flags);
}

diff --git a/drivers/firmware/arm_scpi.c b/drivers/firmware/arm_scpi.c
index ddf0b9ff9e15..435d0e2658a4 100644
--- a/drivers/firmware/arm_scpi.c
+++ b/drivers/firmware/arm_scpi.c
@@ -815,7 +815,7 @@ static int scpi_init_versions(struct scpi_drvinfo *info)
info->firmware_version = le32_to_cpu(caps.platform_version);
}
/* Ignore error if not implemented */
- if (scpi_info->is_legacy && ret == -EOPNOTSUPP)
+ if (info->is_legacy && ret == -EOPNOTSUPP)
return 0;

return ret;
@@ -913,13 +913,14 @@ static int scpi_probe(struct platform_device *pdev)
struct resource res;
struct device *dev = &pdev->dev;
struct device_node *np = dev->of_node;
+ struct scpi_drvinfo *scpi_drvinfo;

- scpi_info = devm_kzalloc(dev, sizeof(*scpi_info), GFP_KERNEL);
- if (!scpi_info)
+ scpi_drvinfo = devm_kzalloc(dev, sizeof(*scpi_drvinfo), GFP_KERNEL);
+ if (!scpi_drvinfo)
return -ENOMEM;

if (of_match_device(legacy_scpi_of_match, &pdev->dev))
- scpi_info->is_legacy = true;
+ scpi_drvinfo->is_legacy = true;

count = of_count_phandle_with_args(np, "mboxes", "#mbox-cells");
if (count < 0) {
@@ -927,19 +928,19 @@ static int scpi_probe(struct platform_device *pdev)
return -ENODEV;
}

- scpi_info->channels = devm_kcalloc(dev, count, sizeof(struct scpi_chan),
- GFP_KERNEL);
- if (!scpi_info->channels)
+ scpi_drvinfo->channels =
+ devm_kcalloc(dev, count, sizeof(struct scpi_chan), GFP_KERNEL);
+ if (!scpi_drvinfo->channels)
return -ENOMEM;

- ret = devm_add_action(dev, scpi_free_channels, scpi_info);
+ ret = devm_add_action(dev, scpi_free_channels, scpi_drvinfo);
if (ret)
return ret;

- for (; scpi_info->num_chans < count; scpi_info->num_chans++) {
+ for (; scpi_drvinfo->num_chans < count; scpi_drvinfo->num_chans++) {
resource_size_t size;
- int idx = scpi_info->num_chans;
- struct scpi_chan *pchan = scpi_info->channels + idx;
+ int idx = scpi_drvinfo->num_chans;
+ struct scpi_chan *pchan = scpi_drvinfo->channels + idx;
struct mbox_client *cl = &pchan->cl;
struct device_node *shmem = of_parse_phandle(np, "shmem", idx);

@@ -986,45 +987,53 @@ static int scpi_probe(struct platform_device *pdev)
return ret;
}

- scpi_info->commands = scpi_std_commands;
+ scpi_drvinfo->commands = scpi_std_commands;

- platform_set_drvdata(pdev, scpi_info);
+ platform_set_drvdata(pdev, scpi_drvinfo);

- if (scpi_info->is_legacy) {
+ if (scpi_drvinfo->is_legacy) {
/* Replace with legacy variants */
scpi_ops.clk_set_val = legacy_scpi_clk_set_val;
- scpi_info->commands = scpi_legacy_commands;
+ scpi_drvinfo->commands = scpi_legacy_commands;

/* Fill priority bitmap */
for (idx = 0; idx < ARRAY_SIZE(legacy_hpriority_cmds); idx++)
set_bit(legacy_hpriority_cmds[idx],
- scpi_info->cmd_priority);
+ scpi_drvinfo->cmd_priority);
}

- ret = scpi_init_versions(scpi_info);
+ scpi_info = scpi_drvinfo;
+
+ ret = scpi_init_versions(scpi_drvinfo);
if (ret) {
dev_err(dev, "incorrect or no SCP firmware found\n");
+ scpi_info = NULL;
return ret;
}

- if (scpi_info->is_legacy && !scpi_info->protocol_version &&
- !scpi_info->firmware_version)
+ if (scpi_drvinfo->is_legacy && !scpi_drvinfo->protocol_version &&
+ !scpi_drvinfo->firmware_version)
dev_info(dev, "SCP Protocol legacy pre-1.0 firmware\n");
else
dev_info(dev, "SCP Protocol %lu.%lu Firmware %lu.%lu.%lu version\n",
FIELD_GET(PROTO_REV_MAJOR_MASK,
- scpi_info->protocol_version),
+ scpi_drvinfo->protocol_version),
FIELD_GET(PROTO_REV_MINOR_MASK,
- scpi_info->protocol_version),
+ scpi_drvinfo->protocol_version),
FIELD_GET(FW_REV_MAJOR_MASK,
- scpi_info->firmware_version),
+ scpi_drvinfo->firmware_version),
FIELD_GET(FW_REV_MINOR_MASK,
- scpi_info->firmware_version),
+ scpi_drvinfo->firmware_version),
FIELD_GET(FW_REV_PATCH_MASK,
- scpi_info->firmware_version));
- scpi_info->scpi_ops = &scpi_ops;
+ scpi_drvinfo->firmware_version));
+
+ scpi_drvinfo->scpi_ops = &scpi_ops;

- return devm_of_platform_populate(dev);
+ ret = devm_of_platform_populate(dev);
+ if (ret)
+ scpi_info = NULL;
+
+ return ret;
}

static const struct of_device_id scpi_of_match[] = {
diff --git a/drivers/firmware/tegra/bpmp-debugfs.c b/drivers/firmware/tegra/bpmp-debugfs.c
index fd89899aeeed..0c440afd5224 100644
--- a/drivers/firmware/tegra/bpmp-debugfs.c
+++ b/drivers/firmware/tegra/bpmp-debugfs.c
@@ -474,7 +474,7 @@ static int bpmp_populate_debugfs_inband(struct tegra_bpmp *bpmp,
mode |= attrs & DEBUGFS_S_IWUSR ? 0200 : 0;
dentry = debugfs_create_file(name, mode, parent, bpmp,
&bpmp_debug_fops);
- if (!dentry) {
+ if (IS_ERR(dentry)) {
err = -ENOMEM;
goto out;
}
@@ -725,7 +725,7 @@ static int bpmp_populate_dir(struct tegra_bpmp *bpmp, struct seqbuf *seqbuf,

if (t & DEBUGFS_S_ISDIR) {
dentry = debugfs_create_dir(name, parent);
- if (!dentry)
+ if (IS_ERR(dentry))
return -ENOMEM;
err = bpmp_populate_dir(bpmp, seqbuf, dentry, depth+1);
if (err < 0)
@@ -738,7 +738,7 @@ static int bpmp_populate_dir(struct tegra_bpmp *bpmp, struct seqbuf *seqbuf,
dentry = debugfs_create_file(name, mode,
parent, bpmp,
&debugfs_fops);
- if (!dentry)
+ if (IS_ERR(dentry))
return -ENOMEM;
}
}
@@ -788,11 +788,11 @@ int tegra_bpmp_init_debugfs(struct tegra_bpmp *bpmp)
return 0;

root = debugfs_create_dir("bpmp", NULL);
- if (!root)
+ if (IS_ERR(root))
return -ENOMEM;

bpmp->debugfs_mirror = debugfs_create_dir("debug", root);
- if (!bpmp->debugfs_mirror) {
+ if (IS_ERR(bpmp->debugfs_mirror)) {
err = -ENOMEM;
goto out;
}
diff --git a/drivers/fpga/altera-pr-ip-core.c b/drivers/fpga/altera-pr-ip-core.c
index be0667968d33..df8671af4a92 100644
--- a/drivers/fpga/altera-pr-ip-core.c
+++ b/drivers/fpga/altera-pr-ip-core.c
@@ -108,7 +108,7 @@ static int alt_pr_fpga_write(struct fpga_manager *mgr, const char *buf,
u32 *buffer_32 = (u32 *)buf;
size_t i = 0;

- if (count <= 0)
+ if (!count)
return -EINVAL;

/* Write out the complete 32-bit chunks */
diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c
index 6dec81b1f24b..5cabb9a9c272 100644
--- a/drivers/gpio/gpiolib-of.c
+++ b/drivers/gpio/gpiolib-of.c
@@ -861,7 +861,8 @@ int of_mm_gpiochip_add_data(struct device_node *np,
if (mm_gc->save_regs)
mm_gc->save_regs(mm_gc);

- mm_gc->gc.of_node = np;
+ of_node_put(mm_gc->gc.of_node);
+ mm_gc->gc.of_node = of_node_get(np);

ret = gpiochip_add_data(gc, data);
if (ret)
@@ -869,6 +870,7 @@ int of_mm_gpiochip_add_data(struct device_node *np,

return 0;
err2:
+ of_node_put(np);
iounmap(mm_gc->regs);
err1:
kfree(gc->label);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index f4509656ea8c..e9a8eb070766 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1401,16 +1401,10 @@ void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
struct amdgpu_vm *vm)
{
struct amdkfd_process_info *process_info = vm->process_info;
- struct amdgpu_bo *pd = vm->root.bo;

if (!process_info)
return;

- /* Release eviction fence from PD */
- amdgpu_bo_reserve(pd, false);
- amdgpu_bo_fence(pd, NULL, false);
- amdgpu_bo_unreserve(pd);
-
/* Update process info */
mutex_lock(&process_info->lock);
process_info->n_vms--;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 2019622191b5..ee0cbc6ccbfb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1260,7 +1260,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,

p->fence = dma_fence_get(&job->base.s_fence->finished);

- amdgpu_ctx_add_fence(p->ctx, entity, p->fence, &seq);
+ seq = amdgpu_ctx_add_fence(p->ctx, entity, p->fence);
amdgpu_cs_post_dependencies(p);

if ((job->preamble_status & AMDGPU_PREAMBLE_IB_PRESENT) &&
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index c317078d1afd..95ed528e6ec5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -135,9 +135,9 @@ static enum amdgpu_ring_priority_level amdgpu_ctx_sched_prio_to_ring_prio(int32_

static unsigned int amdgpu_ctx_get_hw_prio(struct amdgpu_ctx *ctx, u32 hw_ip)
{
- struct amdgpu_device *adev = ctx->adev;
- int32_t ctx_prio;
+ struct amdgpu_device *adev = ctx->mgr->adev;
unsigned int hw_prio;
+ int32_t ctx_prio;

ctx_prio = (ctx->override_priority == AMDGPU_CTX_PRIORITY_UNSET) ?
ctx->init_priority : ctx->override_priority;
@@ -166,7 +166,7 @@ static unsigned int amdgpu_ctx_get_hw_prio(struct amdgpu_ctx *ctx, u32 hw_ip)
static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, u32 hw_ip,
const u32 ring)
{
- struct amdgpu_device *adev = ctx->adev;
+ struct amdgpu_device *adev = ctx->mgr->adev;
struct amdgpu_ctx_entity *entity;
struct drm_gpu_scheduler **scheds = NULL, *sched = NULL;
unsigned num_scheds = 0;
@@ -220,35 +220,6 @@ static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, u32 hw_ip,
return r;
}

-static int amdgpu_ctx_init(struct amdgpu_device *adev,
- int32_t priority,
- struct drm_file *filp,
- struct amdgpu_ctx *ctx)
-{
- int r;
-
- r = amdgpu_ctx_priority_permit(filp, priority);
- if (r)
- return r;
-
- memset(ctx, 0, sizeof(*ctx));
-
- ctx->adev = adev;
-
- kref_init(&ctx->refcount);
- spin_lock_init(&ctx->ring_lock);
- mutex_init(&ctx->lock);
-
- ctx->reset_counter = atomic_read(&adev->gpu_reset_counter);
- ctx->reset_counter_query = ctx->reset_counter;
- ctx->vram_lost_counter = atomic_read(&adev->vram_lost_counter);
- ctx->init_priority = priority;
- ctx->override_priority = AMDGPU_CTX_PRIORITY_UNSET;
- ctx->stable_pstate = AMDGPU_CTX_STABLE_PSTATE_NONE;
-
- return 0;
-}
-
static void amdgpu_ctx_fini_entity(struct amdgpu_ctx_entity *entity)
{

@@ -266,7 +237,7 @@ static void amdgpu_ctx_fini_entity(struct amdgpu_ctx_entity *entity)
static int amdgpu_ctx_get_stable_pstate(struct amdgpu_ctx *ctx,
u32 *stable_pstate)
{
- struct amdgpu_device *adev = ctx->adev;
+ struct amdgpu_device *adev = ctx->mgr->adev;
enum amd_dpm_forced_level current_level;

current_level = amdgpu_dpm_get_performance_level(adev);
@@ -291,10 +262,42 @@ static int amdgpu_ctx_get_stable_pstate(struct amdgpu_ctx *ctx,
return 0;
}

+static int amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, int32_t priority,
+ struct drm_file *filp, struct amdgpu_ctx *ctx)
+{
+ u32 current_stable_pstate;
+ int r;
+
+ r = amdgpu_ctx_priority_permit(filp, priority);
+ if (r)
+ return r;
+
+ memset(ctx, 0, sizeof(*ctx));
+
+ kref_init(&ctx->refcount);
+ ctx->mgr = mgr;
+ spin_lock_init(&ctx->ring_lock);
+ mutex_init(&ctx->lock);
+
+ ctx->reset_counter = atomic_read(&mgr->adev->gpu_reset_counter);
+ ctx->reset_counter_query = ctx->reset_counter;
+ ctx->vram_lost_counter = atomic_read(&mgr->adev->vram_lost_counter);
+ ctx->init_priority = priority;
+ ctx->override_priority = AMDGPU_CTX_PRIORITY_UNSET;
+
+ r = amdgpu_ctx_get_stable_pstate(ctx, &current_stable_pstate);
+ if (r)
+ return r;
+
+ ctx->stable_pstate = current_stable_pstate;
+
+ return 0;
+}
+
static int amdgpu_ctx_set_stable_pstate(struct amdgpu_ctx *ctx,
u32 stable_pstate)
{
- struct amdgpu_device *adev = ctx->adev;
+ struct amdgpu_device *adev = ctx->mgr->adev;
enum amd_dpm_forced_level level;
u32 current_stable_pstate;
int r;
@@ -345,7 +348,8 @@ static int amdgpu_ctx_set_stable_pstate(struct amdgpu_ctx *ctx,
static void amdgpu_ctx_fini(struct kref *ref)
{
struct amdgpu_ctx *ctx = container_of(ref, struct amdgpu_ctx, refcount);
- struct amdgpu_device *adev = ctx->adev;
+ struct amdgpu_ctx_mgr *mgr = ctx->mgr;
+ struct amdgpu_device *adev = mgr->adev;
unsigned i, j, idx;

if (!adev)
@@ -359,7 +363,7 @@ static void amdgpu_ctx_fini(struct kref *ref)
}

if (drm_dev_enter(&adev->ddev, &idx)) {
- amdgpu_ctx_set_stable_pstate(ctx, AMDGPU_CTX_STABLE_PSTATE_NONE);
+ amdgpu_ctx_set_stable_pstate(ctx, ctx->stable_pstate);
drm_dev_exit(idx);
}

@@ -421,7 +425,7 @@ static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
}

*id = (uint32_t)r;
- r = amdgpu_ctx_init(adev, priority, filp, ctx);
+ r = amdgpu_ctx_init(mgr, priority, filp, ctx);
if (r) {
idr_remove(&mgr->ctx_handles, *id);
*id = 0;
@@ -671,9 +675,9 @@ int amdgpu_ctx_put(struct amdgpu_ctx *ctx)
return 0;
}

-void amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx,
- struct drm_sched_entity *entity,
- struct dma_fence *fence, uint64_t *handle)
+uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx,
+ struct drm_sched_entity *entity,
+ struct dma_fence *fence)
{
struct amdgpu_ctx_entity *centity = to_amdgpu_ctx_entity(entity);
uint64_t seq = centity->sequence;
@@ -682,8 +686,7 @@ void amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx,

idx = seq & (amdgpu_sched_jobs - 1);
other = centity->fences[idx];
- if (other)
- BUG_ON(!dma_fence_is_signaled(other));
+ WARN_ON(other && !dma_fence_is_signaled(other));

dma_fence_get(fence);

@@ -693,8 +696,7 @@ void amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx,
spin_unlock(&ctx->ring_lock);

dma_fence_put(other);
- if (handle)
- *handle = seq;
+ return seq;
}

struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx *ctx,
@@ -731,7 +733,7 @@ static void amdgpu_ctx_set_entity_priority(struct amdgpu_ctx *ctx,
int hw_ip,
int32_t priority)
{
- struct amdgpu_device *adev = ctx->adev;
+ struct amdgpu_device *adev = ctx->mgr->adev;
unsigned int hw_prio;
struct drm_gpu_scheduler **scheds = NULL;
unsigned num_scheds;
@@ -796,8 +798,10 @@ int amdgpu_ctx_wait_prev_fence(struct amdgpu_ctx *ctx,
return r;
}

-void amdgpu_ctx_mgr_init(struct amdgpu_ctx_mgr *mgr)
+void amdgpu_ctx_mgr_init(struct amdgpu_ctx_mgr *mgr,
+ struct amdgpu_device *adev)
{
+ mgr->adev = adev;
mutex_init(&mgr->lock);
idr_init(&mgr->ctx_handles);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
index 142f2f87d44c..681050bc828c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
@@ -40,7 +40,7 @@ struct amdgpu_ctx_entity {

struct amdgpu_ctx {
struct kref refcount;
- struct amdgpu_device *adev;
+ struct amdgpu_ctx_mgr *mgr;
unsigned reset_counter;
unsigned reset_counter_query;
uint32_t vram_lost_counter;
@@ -70,9 +70,9 @@ int amdgpu_ctx_put(struct amdgpu_ctx *ctx);

int amdgpu_ctx_get_entity(struct amdgpu_ctx *ctx, u32 hw_ip, u32 instance,
u32 ring, struct drm_sched_entity **entity);
-void amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx,
- struct drm_sched_entity *entity,
- struct dma_fence *fence, uint64_t *seq);
+uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx,
+ struct drm_sched_entity *entity,
+ struct dma_fence *fence);
struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx *ctx,
struct drm_sched_entity *entity,
uint64_t seq);
@@ -85,7 +85,8 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
int amdgpu_ctx_wait_prev_fence(struct amdgpu_ctx *ctx,
struct drm_sched_entity *entity);

-void amdgpu_ctx_mgr_init(struct amdgpu_ctx_mgr *mgr);
+void amdgpu_ctx_mgr_init(struct amdgpu_ctx_mgr *mgr,
+ struct amdgpu_device *adev);
void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr);
long amdgpu_ctx_mgr_entity_flush(struct amdgpu_ctx_mgr *mgr, long timeout);
void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index e4fcbb385a62..d68db66d969b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -1891,12 +1891,9 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev)
break;
case IP_VERSION(7, 4, 0):
case IP_VERSION(7, 4, 1):
- adev->nbio.funcs = &nbio_v7_4_funcs;
- adev->nbio.hdp_flush_reg = &nbio_v7_4_hdp_flush_reg;
- break;
case IP_VERSION(7, 4, 4):
adev->nbio.funcs = &nbio_v7_4_funcs;
- adev->nbio.hdp_flush_reg = &nbio_v7_4_hdp_flush_reg_ald;
+ adev->nbio.hdp_flush_reg = &nbio_v7_4_hdp_flush_reg;
break;
case IP_VERSION(7, 2, 0):
case IP_VERSION(7, 2, 1):
@@ -1910,15 +1907,12 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev)
case IP_VERSION(2, 3, 0):
case IP_VERSION(2, 3, 1):
case IP_VERSION(2, 3, 2):
- adev->nbio.funcs = &nbio_v2_3_funcs;
- adev->nbio.hdp_flush_reg = &nbio_v2_3_hdp_flush_reg;
- break;
case IP_VERSION(3, 3, 0):
case IP_VERSION(3, 3, 1):
case IP_VERSION(3, 3, 2):
case IP_VERSION(3, 3, 3):
adev->nbio.funcs = &nbio_v2_3_funcs;
- adev->nbio.hdp_flush_reg = &nbio_v2_3_hdp_flush_reg_sc;
+ adev->nbio.hdp_flush_reg = &nbio_v2_3_hdp_flush_reg;
break;
default:
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 49c55d82cba8..20a432a774c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1142,7 +1142,7 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
mutex_init(&fpriv->bo_list_lock);
idr_init(&fpriv->bo_list_handles);

- amdgpu_ctx_mgr_init(&fpriv->ctx_mgr);
+ amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);

file_priv->driver_priv = fpriv;
goto out_suspend;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 940752488330..916e2396d263 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -882,6 +882,10 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,
if (WARN_ON_ONCE(min_offset > max_offset))
return -EINVAL;

+ /* Check domain to be pinned to against preferred domains */
+ if (bo->preferred_domains & domain)
+ domain = bo->preferred_domains & domain;
+
/* A shared bo cannot be migrated to VRAM */
if (bo->tbo.base.import_attach) {
if (domain & AMDGPU_GEM_DOMAIN_GTT)
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c b/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
index ee7cab37dfd5..dae78a1e88e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
@@ -328,27 +328,6 @@ const struct nbio_hdp_flush_reg nbio_v2_3_hdp_flush_reg = {
.ref_and_mask_sdma1 = BIF_BX_PF_GPU_HDP_FLUSH_DONE__SDMA1_MASK,
};

-const struct nbio_hdp_flush_reg nbio_v2_3_hdp_flush_reg_sc = {
- .ref_and_mask_cp0 = BIF_BX_PF_GPU_HDP_FLUSH_DONE__CP0_MASK,
- .ref_and_mask_cp1 = BIF_BX_PF_GPU_HDP_FLUSH_DONE__CP1_MASK,
- .ref_and_mask_cp2 = BIF_BX_PF_GPU_HDP_FLUSH_DONE__CP2_MASK,
- .ref_and_mask_cp3 = BIF_BX_PF_GPU_HDP_FLUSH_DONE__CP3_MASK,
- .ref_and_mask_cp4 = BIF_BX_PF_GPU_HDP_FLUSH_DONE__CP4_MASK,
- .ref_and_mask_cp5 = BIF_BX_PF_GPU_HDP_FLUSH_DONE__CP5_MASK,
- .ref_and_mask_cp6 = BIF_BX_PF_GPU_HDP_FLUSH_DONE__CP6_MASK,
- .ref_and_mask_cp7 = BIF_BX_PF_GPU_HDP_FLUSH_DONE__CP7_MASK,
- .ref_and_mask_cp8 = BIF_BX_PF_GPU_HDP_FLUSH_DONE__CP8_MASK,
- .ref_and_mask_cp9 = BIF_BX_PF_GPU_HDP_FLUSH_DONE__CP9_MASK,
- .ref_and_mask_sdma0 = GPU_HDP_FLUSH_DONE__RSVD_ENG1_MASK,
- .ref_and_mask_sdma1 = GPU_HDP_FLUSH_DONE__RSVD_ENG2_MASK,
- .ref_and_mask_sdma2 = GPU_HDP_FLUSH_DONE__RSVD_ENG3_MASK,
- .ref_and_mask_sdma3 = GPU_HDP_FLUSH_DONE__RSVD_ENG4_MASK,
- .ref_and_mask_sdma4 = GPU_HDP_FLUSH_DONE__RSVD_ENG5_MASK,
- .ref_and_mask_sdma5 = GPU_HDP_FLUSH_DONE__RSVD_ENG6_MASK,
- .ref_and_mask_sdma6 = GPU_HDP_FLUSH_DONE__RSVD_ENG7_MASK,
- .ref_and_mask_sdma7 = GPU_HDP_FLUSH_DONE__RSVD_ENG8_MASK,
-};
-
static void nbio_v2_3_init_registers(struct amdgpu_device *adev)
{
uint32_t def, data;
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.h b/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.h
index 6074dd3a1ed8..a43b60acf7f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.h
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.h
@@ -27,7 +27,6 @@
#include "soc15_common.h"

extern const struct nbio_hdp_flush_reg nbio_v2_3_hdp_flush_reg;
-extern const struct nbio_hdp_flush_reg nbio_v2_3_hdp_flush_reg_sc;
extern const struct amdgpu_nbio_funcs nbio_v2_3_funcs;

#endif
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
index c2357e83a8c4..09c0def356a1 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
@@ -339,27 +339,6 @@ const struct nbio_hdp_flush_reg nbio_v7_4_hdp_flush_reg = {
.ref_and_mask_sdma1 = GPU_HDP_FLUSH_DONE__SDMA1_MASK,
};

-const struct nbio_hdp_flush_reg nbio_v7_4_hdp_flush_reg_ald = {
- .ref_and_mask_cp0 = GPU_HDP_FLUSH_DONE__CP0_MASK,
- .ref_and_mask_cp1 = GPU_HDP_FLUSH_DONE__CP1_MASK,
- .ref_and_mask_cp2 = GPU_HDP_FLUSH_DONE__CP2_MASK,
- .ref_and_mask_cp3 = GPU_HDP_FLUSH_DONE__CP3_MASK,
- .ref_and_mask_cp4 = GPU_HDP_FLUSH_DONE__CP4_MASK,
- .ref_and_mask_cp5 = GPU_HDP_FLUSH_DONE__CP5_MASK,
- .ref_and_mask_cp6 = GPU_HDP_FLUSH_DONE__CP6_MASK,
- .ref_and_mask_cp7 = GPU_HDP_FLUSH_DONE__CP7_MASK,
- .ref_and_mask_cp8 = GPU_HDP_FLUSH_DONE__CP8_MASK,
- .ref_and_mask_cp9 = GPU_HDP_FLUSH_DONE__CP9_MASK,
- .ref_and_mask_sdma0 = GPU_HDP_FLUSH_DONE__RSVD_ENG1_MASK,
- .ref_and_mask_sdma1 = GPU_HDP_FLUSH_DONE__RSVD_ENG2_MASK,
- .ref_and_mask_sdma2 = GPU_HDP_FLUSH_DONE__RSVD_ENG3_MASK,
- .ref_and_mask_sdma3 = GPU_HDP_FLUSH_DONE__RSVD_ENG4_MASK,
- .ref_and_mask_sdma4 = GPU_HDP_FLUSH_DONE__RSVD_ENG5_MASK,
- .ref_and_mask_sdma5 = GPU_HDP_FLUSH_DONE__RSVD_ENG6_MASK,
- .ref_and_mask_sdma6 = GPU_HDP_FLUSH_DONE__RSVD_ENG7_MASK,
- .ref_and_mask_sdma7 = GPU_HDP_FLUSH_DONE__RSVD_ENG8_MASK,
-};
-
static void nbio_v7_4_init_registers(struct amdgpu_device *adev)
{
uint32_t baco_cntl;
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.h b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.h
index 7490022d79d4..f27c41728822 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.h
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.h
@@ -27,7 +27,6 @@
#include "soc15_common.h"

extern const struct nbio_hdp_flush_reg nbio_v7_4_hdp_flush_reg;
-extern const struct nbio_hdp_flush_reg nbio_v7_4_hdp_flush_reg_ald;
extern const struct amdgpu_nbio_funcs nbio_v7_4_funcs;
extern struct amdgpu_nbio_ras nbio_v7_4_ras;

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
index f5f39984702f..a3282ddbff86 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
@@ -571,7 +571,7 @@ static bool execute_synaptics_rc_command(struct drm_dp_aux *aux,
unsigned char rc_cmd = 0;
unsigned char rc_result = 0xFF;
unsigned char i = 0;
- uint8_t ret = 0;
+ int ret;

if (is_write_cmd) {
// write rc data
diff --git a/drivers/gpu/drm/bridge/Kconfig b/drivers/gpu/drm/bridge/Kconfig
index be2fc4791c1d..7d152cc6f837 100644
--- a/drivers/gpu/drm/bridge/Kconfig
+++ b/drivers/gpu/drm/bridge/Kconfig
@@ -82,6 +82,8 @@ config DRM_ITE_IT6505
select DRM_KMS_HELPER
select DRM_DP_HELPER
select EXTCON
+ select CRYPTO
+ select CRYPTO_HASH
help
ITE IT6505 DisplayPort bridge chip driver.

diff --git a/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c b/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c
index 668dcefbae17..f4d6359f6219 100644
--- a/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c
+++ b/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c
@@ -1060,6 +1060,10 @@ static int adv7511_init_cec_regmap(struct adv7511 *adv)
ADV7511_CEC_I2C_ADDR_DEFAULT);
if (IS_ERR(adv->i2c_cec))
return PTR_ERR(adv->i2c_cec);
+
+ regmap_write(adv->regmap, ADV7511_REG_CEC_I2C_ADDR,
+ adv->i2c_cec->addr << 1);
+
i2c_set_clientdata(adv->i2c_cec, adv);

adv->regmap_cec = devm_regmap_init_i2c(adv->i2c_cec,
@@ -1264,9 +1268,6 @@ static int adv7511_probe(struct i2c_client *i2c, const struct i2c_device_id *id)
if (ret)
goto err_i2c_unregister_packet;

- regmap_write(adv7511->regmap, ADV7511_REG_CEC_I2C_ADDR,
- adv7511->i2c_cec->addr << 1);
-
INIT_WORK(&adv7511->hpd_work, adv7511_hpd_work);

if (i2c->irq) {
@@ -1383,10 +1384,21 @@ static struct i2c_driver adv7511_driver = {

static int __init adv7511_init(void)
{
- if (IS_ENABLED(CONFIG_DRM_MIPI_DSI))
- mipi_dsi_driver_register(&adv7533_dsi_driver);
+ int ret;

- return i2c_add_driver(&adv7511_driver);
+ if (IS_ENABLED(CONFIG_DRM_MIPI_DSI)) {
+ ret = mipi_dsi_driver_register(&adv7533_dsi_driver);
+ if (ret)
+ return ret;
+ }
+
+ ret = i2c_add_driver(&adv7511_driver);
+ if (ret) {
+ if (IS_ENABLED(CONFIG_DRM_MIPI_DSI))
+ mipi_dsi_driver_unregister(&adv7533_dsi_driver);
+ }
+
+ return ret;
}
module_init(adv7511_init);

diff --git a/drivers/gpu/drm/bridge/analogix/anx7625.c b/drivers/gpu/drm/bridge/analogix/anx7625.c
index 060849f8ad8b..c527a3a02eac 100644
--- a/drivers/gpu/drm/bridge/analogix/anx7625.c
+++ b/drivers/gpu/drm/bridge/analogix/anx7625.c
@@ -2651,14 +2651,6 @@ static int anx7625_i2c_probe(struct i2c_client *client,
platform->aux.dev = dev;
platform->aux.transfer = anx7625_aux_transfer;
drm_dp_aux_init(&platform->aux);
- devm_of_dp_aux_populate_ep_devices(&platform->aux);
-
- ret = anx7625_parse_dt(dev, pdata);
- if (ret) {
- if (ret != -EPROBE_DEFER)
- DRM_DEV_ERROR(dev, "fail to parse DT : %d\n", ret);
- goto free_wq;
- }

if (anx7625_register_i2c_dummy_clients(platform, client) != 0) {
ret = -ENOMEM;
@@ -2674,6 +2666,15 @@ static int anx7625_i2c_probe(struct i2c_client *client,
if (ret)
goto free_wq;

+ devm_of_dp_aux_populate_ep_devices(&platform->aux);
+
+ ret = anx7625_parse_dt(dev, pdata);
+ if (ret) {
+ if (ret != -EPROBE_DEFER)
+ DRM_DEV_ERROR(dev, "fail to parse DT : %d\n", ret);
+ goto free_wq;
+ }
+
if (!platform->pdata.low_power_mode) {
anx7625_disable_pd_protocol(platform);
pm_runtime_get_sync(dev);
diff --git a/drivers/gpu/drm/bridge/lontium-lt9611.c b/drivers/gpu/drm/bridge/lontium-lt9611.c
index 63df2e8a8abc..21d1e1465981 100644
--- a/drivers/gpu/drm/bridge/lontium-lt9611.c
+++ b/drivers/gpu/drm/bridge/lontium-lt9611.c
@@ -586,7 +586,7 @@ lt9611_connector_detect(struct drm_connector *connector, bool force)
int connected = 0;

regmap_read(lt9611->regmap, 0x825e, &reg_val);
- connected = (reg_val & BIT(0));
+ connected = (reg_val & (BIT(2) | BIT(0)));

lt9611->status = connected ? connector_status_connected :
connector_status_disconnected;
diff --git a/drivers/gpu/drm/bridge/lontium-lt9611uxc.c b/drivers/gpu/drm/bridge/lontium-lt9611uxc.c
index 3d62e6bf6892..310b3b194491 100644
--- a/drivers/gpu/drm/bridge/lontium-lt9611uxc.c
+++ b/drivers/gpu/drm/bridge/lontium-lt9611uxc.c
@@ -982,7 +982,7 @@ static int lt9611uxc_remove(struct i2c_client *client)
struct lt9611uxc *lt9611uxc = i2c_get_clientdata(client);

disable_irq(client->irq);
- flush_scheduled_work();
+ cancel_work_sync(&lt9611uxc->work);
lt9611uxc_audio_exit(lt9611uxc);
drm_bridge_remove(&lt9611uxc->bridge);

diff --git a/drivers/gpu/drm/bridge/sil-sii8620.c b/drivers/gpu/drm/bridge/sil-sii8620.c
index ec7745c31da0..ab0bce4a988c 100644
--- a/drivers/gpu/drm/bridge/sil-sii8620.c
+++ b/drivers/gpu/drm/bridge/sil-sii8620.c
@@ -605,7 +605,7 @@ static void *sii8620_burst_get_tx_buf(struct sii8620 *ctx, int len)
u8 *buf = &ctx->burst.tx_buf[ctx->burst.tx_count];
int size = len + 2;

- if (ctx->burst.tx_count + size > ARRAY_SIZE(ctx->burst.tx_buf)) {
+ if (ctx->burst.tx_count + size >= ARRAY_SIZE(ctx->burst.tx_buf)) {
dev_err(ctx->dev, "TX-BLK buffer exhausted\n");
ctx->error = -EINVAL;
return NULL;
@@ -622,7 +622,7 @@ static u8 *sii8620_burst_get_rx_buf(struct sii8620 *ctx, int len)
u8 *buf = &ctx->burst.rx_buf[ctx->burst.rx_count];
int size = len + 1;

- if (ctx->burst.tx_count + size > ARRAY_SIZE(ctx->burst.tx_buf)) {
+ if (ctx->burst.rx_count + size >= ARRAY_SIZE(ctx->burst.rx_buf)) {
dev_err(ctx->dev, "RX-BLK buffer exhausted\n");
ctx->error = -EINVAL;
return NULL;
diff --git a/drivers/gpu/drm/bridge/tc358767.c b/drivers/gpu/drm/bridge/tc358767.c
index c23e0abc65e8..da59b041fec5 100644
--- a/drivers/gpu/drm/bridge/tc358767.c
+++ b/drivers/gpu/drm/bridge/tc358767.c
@@ -1549,19 +1549,12 @@ static irqreturn_t tc_irq_handler(int irq, void *arg)
return IRQ_HANDLED;
}

-static int tc_probe(struct i2c_client *client, const struct i2c_device_id *id)
+static int tc_probe_edp_bridge_endpoint(struct tc_data *tc)
{
- struct device *dev = &client->dev;
+ struct device *dev = tc->dev;
struct drm_panel *panel;
- struct tc_data *tc;
int ret;

- tc = devm_kzalloc(dev, sizeof(*tc), GFP_KERNEL);
- if (!tc)
- return -ENOMEM;
-
- tc->dev = dev;
-
/* port@2 is the output port */
ret = drm_of_find_panel_or_bridge(dev->of_node, 2, 0, &panel, NULL);
if (ret && ret != -ENODEV)
@@ -1580,6 +1573,50 @@ static int tc_probe(struct i2c_client *client, const struct i2c_device_id *id)
tc->bridge.type = DRM_MODE_CONNECTOR_DisplayPort;
}

+ return 0;
+}
+
+static void tc_clk_disable(void *data)
+{
+ struct clk *refclk = data;
+
+ clk_disable_unprepare(refclk);
+}
+
+static int tc_probe(struct i2c_client *client, const struct i2c_device_id *id)
+{
+ struct device *dev = &client->dev;
+ struct tc_data *tc;
+ int ret;
+
+ tc = devm_kzalloc(dev, sizeof(*tc), GFP_KERNEL);
+ if (!tc)
+ return -ENOMEM;
+
+ tc->dev = dev;
+
+ ret = tc_probe_edp_bridge_endpoint(tc);
+ if (ret)
+ return ret;
+
+ tc->refclk = devm_clk_get(dev, "ref");
+ if (IS_ERR(tc->refclk)) {
+ ret = PTR_ERR(tc->refclk);
+ dev_err(dev, "Failed to get refclk: %d\n", ret);
+ return ret;
+ }
+
+ ret = clk_prepare_enable(tc->refclk);
+ if (ret)
+ return ret;
+
+ ret = devm_add_action_or_reset(dev, tc_clk_disable, tc->refclk);
+ if (ret)
+ return ret;
+
+ /* tRSTW = 100 cycles , at 13 MHz that is ~7.69 us */
+ usleep_range(10, 15);
+
/* Shut down GPIO is optional */
tc->sd_gpio = devm_gpiod_get_optional(dev, "shutdown", GPIOD_OUT_HIGH);
if (IS_ERR(tc->sd_gpio))
@@ -1600,13 +1637,6 @@ static int tc_probe(struct i2c_client *client, const struct i2c_device_id *id)
usleep_range(5000, 10000);
}

- tc->refclk = devm_clk_get(dev, "ref");
- if (IS_ERR(tc->refclk)) {
- ret = PTR_ERR(tc->refclk);
- dev_err(dev, "Failed to get refclk: %d\n", ret);
- return ret;
- }
-
tc->regmap = devm_regmap_init_i2c(client, &tc_regmap_config);
if (IS_ERR(tc->regmap)) {
ret = PTR_ERR(tc->regmap);
diff --git a/drivers/gpu/drm/dp/drm_dp_aux_bus.c b/drivers/gpu/drm/dp/drm_dp_aux_bus.c
index 415afce3cf96..5f06bc0a34ac 100644
--- a/drivers/gpu/drm/dp/drm_dp_aux_bus.c
+++ b/drivers/gpu/drm/dp/drm_dp_aux_bus.c
@@ -66,7 +66,6 @@ static int dp_aux_ep_probe(struct device *dev)
* @dev: The device to remove.
*
* Calls through to the endpoint driver remove.
- *
*/
static void dp_aux_ep_remove(struct device *dev)
{
@@ -120,8 +119,6 @@ ATTRIBUTE_GROUPS(dp_aux_ep_dev);
/**
* dp_aux_ep_dev_release() - Free memory for the dp_aux_ep device
* @dev: The device to free.
- *
- * Return: 0 if no error or negative error code.
*/
static void dp_aux_ep_dev_release(struct device *dev)
{
@@ -256,6 +253,7 @@ int of_dp_aux_populate_ep_devices(struct drm_dp_aux *aux)

return 0;
}
+EXPORT_SYMBOL_GPL(of_dp_aux_populate_ep_devices);

static void of_dp_aux_depopulate_ep_devices_void(void *data)
{
diff --git a/drivers/gpu/drm/dp/drm_dp_mst_topology.c b/drivers/gpu/drm/dp/drm_dp_mst_topology.c
index 7a7cc44686f9..96869875390f 100644
--- a/drivers/gpu/drm/dp/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/dp/drm_dp_mst_topology.c
@@ -3861,9 +3861,7 @@ int drm_dp_mst_topology_mgr_resume(struct drm_dp_mst_topology_mgr *mgr,
if (!mgr->mst_primary)
goto out_fail;

- ret = drm_dp_dpcd_read(mgr->aux, DP_DPCD_REV, mgr->dpcd,
- DP_RECEIVER_CAP_SIZE);
- if (ret != DP_RECEIVER_CAP_SIZE) {
+ if (drm_dp_read_dpcd_caps(mgr->aux, mgr->dpcd) < 0) {
drm_dbg_kms(mgr->dev, "dpcd read failed - undocked during suspend?\n");
goto out_fail;
}
@@ -4912,8 +4910,7 @@ void drm_dp_mst_dump_topology(struct seq_file *m,
u8 buf[DP_PAYLOAD_TABLE_SIZE];
int ret;

- ret = drm_dp_dpcd_read(mgr->aux, DP_DPCD_REV, buf, DP_RECEIVER_CAP_SIZE);
- if (ret) {
+ if (drm_dp_read_dpcd_caps(mgr->aux, buf) < 0) {
seq_printf(m, "dpcd read failed\n");
goto out;
}
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 56fb87885146..e2a810cfce45 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1225,7 +1225,7 @@ drm_gem_lock_reservations(struct drm_gem_object **objs, int count,
ret = dma_resv_lock_slow_interruptible(obj->resv,
acquire_ctx);
if (ret) {
- ww_acquire_done(acquire_ctx);
+ ww_acquire_fini(acquire_ctx);
return ret;
}
}
@@ -1250,7 +1250,7 @@ drm_gem_lock_reservations(struct drm_gem_object **objs, int count,
goto retry;
}

- ww_acquire_done(acquire_ctx);
+ ww_acquire_fini(acquire_ctx);
return ret;
}
}
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 8ad0e02991ca..904fc893c905 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -302,6 +302,7 @@ static int drm_gem_shmem_vmap_locked(struct drm_gem_shmem_object *shmem,
ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
if (!ret) {
if (WARN_ON(map->is_iomem)) {
+ dma_buf_vunmap(obj->import_attach->dmabuf, map);
ret = -EIO;
goto err_put_pages;
}
diff --git a/drivers/gpu/drm/drm_mipi_dbi.c b/drivers/gpu/drm/drm_mipi_dbi.c
index 9314f2ead79f..09e4edb5a992 100644
--- a/drivers/gpu/drm/drm_mipi_dbi.c
+++ b/drivers/gpu/drm/drm_mipi_dbi.c
@@ -1199,6 +1199,13 @@ int mipi_dbi_spi_transfer(struct spi_device *spi, u32 speed_hz,
size_t chunk;
int ret;

+ /* In __spi_validate, there's a validation that no partial transfers
+ * are accepted (xfer->len % w_size must be zero).
+ * Here we align max_chunk to multiple of 2 (16bits),
+ * to prevent transfers from being rejected.
+ */
+ max_chunk = ALIGN_DOWN(max_chunk, 2);
+
spi_message_init_with_transfers(&m, &tr, 1);

while (len) {
diff --git a/drivers/gpu/drm/exynos/exynos7_drm_decon.c b/drivers/gpu/drm/exynos/exynos7_drm_decon.c
index c04264f70ad1..3c31405600f0 100644
--- a/drivers/gpu/drm/exynos/exynos7_drm_decon.c
+++ b/drivers/gpu/drm/exynos/exynos7_drm_decon.c
@@ -800,31 +800,40 @@ static int exynos7_decon_resume(struct device *dev)
if (ret < 0) {
DRM_DEV_ERROR(dev, "Failed to prepare_enable the pclk [%d]\n",
ret);
- return ret;
+ goto err_pclk_enable;
}

ret = clk_prepare_enable(ctx->aclk);
if (ret < 0) {
DRM_DEV_ERROR(dev, "Failed to prepare_enable the aclk [%d]\n",
ret);
- return ret;
+ goto err_aclk_enable;
}

ret = clk_prepare_enable(ctx->eclk);
if (ret < 0) {
DRM_DEV_ERROR(dev, "Failed to prepare_enable the eclk [%d]\n",
ret);
- return ret;
+ goto err_eclk_enable;
}

ret = clk_prepare_enable(ctx->vclk);
if (ret < 0) {
DRM_DEV_ERROR(dev, "Failed to prepare_enable the vclk [%d]\n",
ret);
- return ret;
+ goto err_vclk_enable;
}

return 0;
+
+err_vclk_enable:
+ clk_disable_unprepare(ctx->eclk);
+err_eclk_enable:
+ clk_disable_unprepare(ctx->aclk);
+err_aclk_enable:
+ clk_disable_unprepare(ctx->pclk);
+err_pclk_enable:
+ return ret;
}
#endif

diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c b/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c
index e82b815f83a6..5a9358dcd9e5 100644
--- a/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c
+++ b/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c
@@ -7,9 +7,11 @@

#include <drm/drm_damage_helper.h>
#include <drm/drm_drv.h>
+#include <drm/drm_edid.h>
#include <drm/drm_fb_helper.h>
#include <drm/drm_format_helper.h>
#include <drm/drm_fourcc.h>
+#include <drm/drm_framebuffer.h>
#include <drm/drm_gem_atomic_helper.h>
#include <drm/drm_gem_framebuffer_helper.h>
#include <drm/drm_gem_shmem_helper.h>
diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index ac52b49bf901..ca9068bac748 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -69,6 +69,7 @@ struct jz_soc_info {
bool map_noncoherent;
bool use_extended_hwdesc;
bool plane_f0_not_working;
+ u32 max_burst;
unsigned int max_width, max_height;
const u32 *formats_f0, *formats_f1;
unsigned int num_formats_f0, num_formats_f1;
@@ -308,8 +309,9 @@ static void ingenic_drm_crtc_update_timings(struct ingenic_drm *priv,
regmap_write(priv->map, JZ_REG_LCD_REV, mode->htotal << 16);
}

- regmap_set_bits(priv->map, JZ_REG_LCD_CTRL,
- JZ_LCD_CTRL_OFUP | JZ_LCD_CTRL_BURST_16);
+ regmap_update_bits(priv->map, JZ_REG_LCD_CTRL,
+ JZ_LCD_CTRL_OFUP | JZ_LCD_CTRL_BURST_MASK,
+ JZ_LCD_CTRL_OFUP | priv->soc_info->max_burst);

/*
* IPU restart - specify how much time the LCDC will wait before
@@ -1480,6 +1482,7 @@ static const struct jz_soc_info jz4740_soc_info = {
.map_noncoherent = false,
.max_width = 800,
.max_height = 600,
+ .max_burst = JZ_LCD_CTRL_BURST_16,
.formats_f1 = jz4740_formats,
.num_formats_f1 = ARRAY_SIZE(jz4740_formats),
/* JZ4740 has only one plane */
@@ -1491,6 +1494,7 @@ static const struct jz_soc_info jz4725b_soc_info = {
.map_noncoherent = false,
.max_width = 800,
.max_height = 600,
+ .max_burst = JZ_LCD_CTRL_BURST_16,
.formats_f1 = jz4725b_formats_f1,
.num_formats_f1 = ARRAY_SIZE(jz4725b_formats_f1),
.formats_f0 = jz4725b_formats_f0,
@@ -1503,6 +1507,7 @@ static const struct jz_soc_info jz4770_soc_info = {
.map_noncoherent = true,
.max_width = 1280,
.max_height = 720,
+ .max_burst = JZ_LCD_CTRL_BURST_64,
.formats_f1 = jz4770_formats_f1,
.num_formats_f1 = ARRAY_SIZE(jz4770_formats_f1),
.formats_f0 = jz4770_formats_f0,
@@ -1517,6 +1522,7 @@ static const struct jz_soc_info jz4780_soc_info = {
.plane_f0_not_working = true, /* REVISIT */
.max_width = 4096,
.max_height = 2048,
+ .max_burst = JZ_LCD_CTRL_BURST_64,
.formats_f1 = jz4770_formats_f1,
.num_formats_f1 = ARRAY_SIZE(jz4770_formats_f1),
.formats_f0 = jz4770_formats_f0,
diff --git a/drivers/gpu/drm/ingenic/ingenic-drm.h b/drivers/gpu/drm/ingenic/ingenic-drm.h
index cb1d09b62588..e5bd007ea93d 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm.h
+++ b/drivers/gpu/drm/ingenic/ingenic-drm.h
@@ -106,6 +106,9 @@
#define JZ_LCD_CTRL_BURST_4 (0x0 << 28)
#define JZ_LCD_CTRL_BURST_8 (0x1 << 28)
#define JZ_LCD_CTRL_BURST_16 (0x2 << 28)
+#define JZ_LCD_CTRL_BURST_32 (0x3 << 28)
+#define JZ_LCD_CTRL_BURST_64 (0x4 << 28)
+#define JZ_LCD_CTRL_BURST_MASK (0x7 << 28)
#define JZ_LCD_CTRL_RGB555 BIT(27)
#define JZ_LCD_CTRL_OFUP BIT(26)
#define JZ_LCD_CTRL_FRC_GRAYSCALE_16 (0x0 << 24)
diff --git a/drivers/gpu/drm/mcde/mcde_dsi.c b/drivers/gpu/drm/mcde/mcde_dsi.c
index 5651734ce977..9f9ac8699310 100644
--- a/drivers/gpu/drm/mcde/mcde_dsi.c
+++ b/drivers/gpu/drm/mcde/mcde_dsi.c
@@ -1111,6 +1111,7 @@ static int mcde_dsi_bind(struct device *dev, struct device *master,
bridge = of_drm_find_bridge(child);
if (!bridge) {
dev_err(dev, "failed to find bridge\n");
+ of_node_put(child);
return -EINVAL;
}
}
diff --git a/drivers/gpu/drm/mediatek/mtk_dpi.c b/drivers/gpu/drm/mediatek/mtk_dpi.c
index e61cd67b978f..41c783349321 100644
--- a/drivers/gpu/drm/mediatek/mtk_dpi.c
+++ b/drivers/gpu/drm/mediatek/mtk_dpi.c
@@ -54,13 +54,7 @@ enum mtk_dpi_out_channel_swap {
};

enum mtk_dpi_out_color_format {
- MTK_DPI_COLOR_FORMAT_RGB,
- MTK_DPI_COLOR_FORMAT_RGB_FULL,
- MTK_DPI_COLOR_FORMAT_YCBCR_444,
- MTK_DPI_COLOR_FORMAT_YCBCR_422,
- MTK_DPI_COLOR_FORMAT_XV_YCC,
- MTK_DPI_COLOR_FORMAT_YCBCR_444_FULL,
- MTK_DPI_COLOR_FORMAT_YCBCR_422_FULL
+ MTK_DPI_COLOR_FORMAT_RGB
};

struct mtk_dpi {
@@ -364,24 +358,11 @@ static void mtk_dpi_config_disable_edge(struct mtk_dpi *dpi)
static void mtk_dpi_config_color_format(struct mtk_dpi *dpi,
enum mtk_dpi_out_color_format format)
{
- if ((format == MTK_DPI_COLOR_FORMAT_YCBCR_444) ||
- (format == MTK_DPI_COLOR_FORMAT_YCBCR_444_FULL)) {
- mtk_dpi_config_yuv422_enable(dpi, false);
- mtk_dpi_config_csc_enable(dpi, true);
- mtk_dpi_config_swap_input(dpi, false);
- mtk_dpi_config_channel_swap(dpi, MTK_DPI_OUT_CHANNEL_SWAP_BGR);
- } else if ((format == MTK_DPI_COLOR_FORMAT_YCBCR_422) ||
- (format == MTK_DPI_COLOR_FORMAT_YCBCR_422_FULL)) {
- mtk_dpi_config_yuv422_enable(dpi, true);
- mtk_dpi_config_csc_enable(dpi, true);
- mtk_dpi_config_swap_input(dpi, true);
- mtk_dpi_config_channel_swap(dpi, MTK_DPI_OUT_CHANNEL_SWAP_RGB);
- } else {
- mtk_dpi_config_yuv422_enable(dpi, false);
- mtk_dpi_config_csc_enable(dpi, false);
- mtk_dpi_config_swap_input(dpi, false);
- mtk_dpi_config_channel_swap(dpi, MTK_DPI_OUT_CHANNEL_SWAP_RGB);
- }
+ /* only support RGB888 */
+ mtk_dpi_config_yuv422_enable(dpi, false);
+ mtk_dpi_config_csc_enable(dpi, false);
+ mtk_dpi_config_swap_input(dpi, false);
+ mtk_dpi_config_channel_swap(dpi, MTK_DPI_OUT_CHANNEL_SWAP_RGB);
}

static void mtk_dpi_dual_edge(struct mtk_dpi *dpi)
@@ -436,7 +417,6 @@ static int mtk_dpi_power_on(struct mtk_dpi *dpi)
if (dpi->pinctrl && dpi->pins_dpi)
pinctrl_select_state(dpi->pinctrl, dpi->pins_dpi);

- mtk_dpi_enable(dpi);
return 0;

err_pixel:
@@ -658,6 +638,7 @@ static void mtk_dpi_bridge_enable(struct drm_bridge *bridge)

mtk_dpi_power_on(dpi);
mtk_dpi_set_display_mode(dpi, &dpi->mode);
+ mtk_dpi_enable(dpi);
}

static enum drm_mode_status
diff --git a/drivers/gpu/drm/mediatek/mtk_dsi.c b/drivers/gpu/drm/mediatek/mtk_dsi.c
index ccb0511b9cd5..e0a2d5ea40af 100644
--- a/drivers/gpu/drm/mediatek/mtk_dsi.c
+++ b/drivers/gpu/drm/mediatek/mtk_dsi.c
@@ -203,6 +203,7 @@ struct mtk_dsi {
struct mtk_phy_timing phy_timing;
int refcount;
bool enabled;
+ bool lanes_ready;
u32 irq_data;
wait_queue_head_t irq_wait_queue;
const struct mtk_dsi_driver_data *driver_data;
@@ -649,18 +650,11 @@ static int mtk_dsi_poweron(struct mtk_dsi *dsi)
mtk_dsi_reset_engine(dsi);
mtk_dsi_phy_timconfig(dsi);

- mtk_dsi_rxtx_control(dsi);
- usleep_range(30, 100);
- mtk_dsi_reset_dphy(dsi);
mtk_dsi_ps_control_vact(dsi);
mtk_dsi_set_vm_cmd(dsi);
mtk_dsi_config_vdo_timing(dsi);
mtk_dsi_set_interrupt_enable(dsi);

- mtk_dsi_clk_ulp_mode_leave(dsi);
- mtk_dsi_lane0_ulp_mode_leave(dsi);
- mtk_dsi_clk_hs_mode(dsi, 0);
-
return 0;
err_disable_engine_clk:
clk_disable_unprepare(dsi->engine_clk);
@@ -679,19 +673,11 @@ static void mtk_dsi_poweroff(struct mtk_dsi *dsi)
if (--dsi->refcount != 0)
return;

- /*
- * mtk_dsi_stop() and mtk_dsi_start() is asymmetric, since
- * mtk_dsi_stop() should be called after mtk_drm_crtc_atomic_disable(),
- * which needs irq for vblank, and mtk_dsi_stop() will disable irq.
- * mtk_dsi_start() needs to be called in mtk_output_dsi_enable(),
- * after dsi is fully set.
- */
- mtk_dsi_stop(dsi);
-
- mtk_dsi_switch_to_cmd_mode(dsi, VM_DONE_INT_FLAG, 500);
mtk_dsi_reset_engine(dsi);
mtk_dsi_lane0_ulp_mode_enter(dsi);
mtk_dsi_clk_ulp_mode_enter(dsi);
+ /* set the lane number as 0 to pull down mipi */
+ writel(0, dsi->regs + DSI_TXRX_CTRL);

mtk_dsi_disable(dsi);

@@ -699,21 +685,31 @@ static void mtk_dsi_poweroff(struct mtk_dsi *dsi)
clk_disable_unprepare(dsi->digital_clk);

phy_power_off(dsi->phy);
+
+ dsi->lanes_ready = false;
}

-static void mtk_output_dsi_enable(struct mtk_dsi *dsi)
+static void mtk_dsi_lane_ready(struct mtk_dsi *dsi)
{
- int ret;
+ if (!dsi->lanes_ready) {
+ dsi->lanes_ready = true;
+ mtk_dsi_rxtx_control(dsi);
+ usleep_range(30, 100);
+ mtk_dsi_reset_dphy(dsi);
+ mtk_dsi_clk_ulp_mode_leave(dsi);
+ mtk_dsi_lane0_ulp_mode_leave(dsi);
+ mtk_dsi_clk_hs_mode(dsi, 0);
+ msleep(20);
+ /* The reaction time after pulling up the mipi signal for dsi_rx */
+ }
+}

+static void mtk_output_dsi_enable(struct mtk_dsi *dsi)
+{
if (dsi->enabled)
return;

- ret = mtk_dsi_poweron(dsi);
- if (ret < 0) {
- DRM_ERROR("failed to power on dsi\n");
- return;
- }
-
+ mtk_dsi_lane_ready(dsi);
mtk_dsi_set_mode(dsi);
mtk_dsi_clk_hs_mode(dsi, 1);

@@ -727,7 +723,16 @@ static void mtk_output_dsi_disable(struct mtk_dsi *dsi)
if (!dsi->enabled)
return;

- mtk_dsi_poweroff(dsi);
+ /*
+ * mtk_dsi_stop() and mtk_dsi_start() is asymmetric, since
+ * mtk_dsi_stop() should be called after mtk_drm_crtc_atomic_disable(),
+ * which needs irq for vblank, and mtk_dsi_stop() will disable irq.
+ * mtk_dsi_start() needs to be called in mtk_output_dsi_enable(),
+ * after dsi is fully set.
+ */
+ mtk_dsi_stop(dsi);
+
+ mtk_dsi_switch_to_cmd_mode(dsi, VM_DONE_INT_FLAG, 500);

dsi->enabled = false;
}
@@ -751,24 +756,50 @@ static void mtk_dsi_bridge_mode_set(struct drm_bridge *bridge,
drm_display_mode_to_videomode(adjusted, &dsi->vm);
}

-static void mtk_dsi_bridge_disable(struct drm_bridge *bridge)
+static void mtk_dsi_bridge_atomic_disable(struct drm_bridge *bridge,
+ struct drm_bridge_state *old_bridge_state)
{
struct mtk_dsi *dsi = bridge_to_dsi(bridge);

mtk_output_dsi_disable(dsi);
}

-static void mtk_dsi_bridge_enable(struct drm_bridge *bridge)
+static void mtk_dsi_bridge_atomic_enable(struct drm_bridge *bridge,
+ struct drm_bridge_state *old_bridge_state)
{
struct mtk_dsi *dsi = bridge_to_dsi(bridge);

+ if (dsi->refcount == 0)
+ return;
+
mtk_output_dsi_enable(dsi);
}

+static void mtk_dsi_bridge_atomic_pre_enable(struct drm_bridge *bridge,
+ struct drm_bridge_state *old_bridge_state)
+{
+ struct mtk_dsi *dsi = bridge_to_dsi(bridge);
+ int ret;
+
+ ret = mtk_dsi_poweron(dsi);
+ if (ret < 0)
+ DRM_ERROR("failed to power on dsi\n");
+}
+
+static void mtk_dsi_bridge_atomic_post_disable(struct drm_bridge *bridge,
+ struct drm_bridge_state *old_bridge_state)
+{
+ struct mtk_dsi *dsi = bridge_to_dsi(bridge);
+
+ mtk_dsi_poweroff(dsi);
+}
+
static const struct drm_bridge_funcs mtk_dsi_bridge_funcs = {
.attach = mtk_dsi_bridge_attach,
- .disable = mtk_dsi_bridge_disable,
- .enable = mtk_dsi_bridge_enable,
+ .atomic_disable = mtk_dsi_bridge_atomic_disable,
+ .atomic_enable = mtk_dsi_bridge_atomic_enable,
+ .atomic_pre_enable = mtk_dsi_bridge_atomic_pre_enable,
+ .atomic_post_disable = mtk_dsi_bridge_atomic_post_disable,
.mode_set = mtk_dsi_bridge_mode_set,
};

@@ -988,6 +1019,8 @@ static ssize_t mtk_dsi_host_transfer(struct mipi_dsi_host *host,
if (MTK_DSI_HOST_IS_READ(msg->type))
irq_flag |= LPRX_RD_RDY_INT_FLAG;

+ mtk_dsi_lane_ready(dsi);
+
ret = mtk_dsi_host_send_cmd(dsi, msg, irq_flag);
if (ret)
goto restore_dsi_mode;
diff --git a/drivers/gpu/drm/meson/meson_encoder_cvbs.c b/drivers/gpu/drm/meson/meson_encoder_cvbs.c
index fd8db97ba8ba..8110a6e39320 100644
--- a/drivers/gpu/drm/meson/meson_encoder_cvbs.c
+++ b/drivers/gpu/drm/meson/meson_encoder_cvbs.c
@@ -238,6 +238,7 @@ int meson_encoder_cvbs_init(struct meson_drm *priv)
}

meson_encoder_cvbs->next_bridge = of_drm_find_bridge(remote);
+ of_node_put(remote);
if (!meson_encoder_cvbs->next_bridge) {
dev_err(priv->dev, "Failed to find CVBS Connector bridge\n");
return -EPROBE_DEFER;
diff --git a/drivers/gpu/drm/meson/meson_encoder_hdmi.c b/drivers/gpu/drm/meson/meson_encoder_hdmi.c
index 5e306de6f485..a7692584487c 100644
--- a/drivers/gpu/drm/meson/meson_encoder_hdmi.c
+++ b/drivers/gpu/drm/meson/meson_encoder_hdmi.c
@@ -365,7 +365,8 @@ int meson_encoder_hdmi_init(struct meson_drm *priv)
meson_encoder_hdmi->next_bridge = of_drm_find_bridge(remote);
if (!meson_encoder_hdmi->next_bridge) {
dev_err(priv->dev, "Failed to find HDMI transceiver bridge\n");
- return -EPROBE_DEFER;
+ ret = -EPROBE_DEFER;
+ goto err_put_node;
}

/* HDMI Encoder Bridge */
@@ -383,7 +384,7 @@ int meson_encoder_hdmi_init(struct meson_drm *priv)
DRM_MODE_ENCODER_TMDS);
if (ret) {
dev_err(priv->dev, "Failed to init HDMI encoder: %d\n", ret);
- return ret;
+ goto err_put_node;
}

meson_encoder_hdmi->encoder.possible_crtcs = BIT(0);
@@ -393,7 +394,7 @@ int meson_encoder_hdmi_init(struct meson_drm *priv)
DRM_BRIDGE_ATTACH_NO_CONNECTOR);
if (ret) {
dev_err(priv->dev, "Failed to attach bridge: %d\n", ret);
- return ret;
+ goto err_put_node;
}

/* Initialize & attach Bridge Connector */
@@ -401,7 +402,8 @@ int meson_encoder_hdmi_init(struct meson_drm *priv)
&meson_encoder_hdmi->encoder);
if (IS_ERR(meson_encoder_hdmi->connector)) {
dev_err(priv->dev, "Unable to create HDMI bridge connector\n");
- return PTR_ERR(meson_encoder_hdmi->connector);
+ ret = PTR_ERR(meson_encoder_hdmi->connector);
+ goto err_put_node;
}
drm_connector_attach_encoder(meson_encoder_hdmi->connector,
&meson_encoder_hdmi->encoder);
@@ -428,6 +430,7 @@ int meson_encoder_hdmi_init(struct meson_drm *priv)
meson_encoder_hdmi->connector->ycbcr_420_allowed = true;

pdev = of_find_device_by_node(remote);
+ of_node_put(remote);
if (pdev) {
struct cec_connector_info conn_info;
struct cec_notifier *notifier;
@@ -435,8 +438,10 @@ int meson_encoder_hdmi_init(struct meson_drm *priv)
cec_fill_conn_info_from_drm(&conn_info, meson_encoder_hdmi->connector);

notifier = cec_notifier_conn_register(&pdev->dev, NULL, &conn_info);
- if (!notifier)
+ if (!notifier) {
+ put_device(&pdev->dev);
return -ENOMEM;
+ }

meson_encoder_hdmi->cec_notifier = notifier;
}
@@ -444,4 +449,8 @@ int meson_encoder_hdmi_init(struct meson_drm *priv)
dev_dbg(priv->dev, "HDMI encoder initialized\n");

return 0;
+
+err_put_node:
+ of_node_put(remote);
+ return ret;
}
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 217615e0e850..213f75fc98e8 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1666,18 +1666,10 @@ static u64 a5xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
{
u64 busy_cycles;

- /* Only read the gpu busy if the hardware is already active */
- if (pm_runtime_get_if_in_use(&gpu->pdev->dev) == 0) {
- *out_sample_rate = 1;
- return 0;
- }
-
busy_cycles = gpu_read64(gpu, REG_A5XX_RBBM_PERFCTR_RBBM_0_LO,
REG_A5XX_RBBM_PERFCTR_RBBM_0_HI);
*out_sample_rate = clk_get_rate(gpu->core_clk);

- pm_runtime_put(&gpu->pdev->dev);
-
return busy_cycles;
}

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 3e325e2a2b1b..1863908694dc 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -102,7 +102,8 @@ bool a6xx_gmu_gx_is_on(struct a6xx_gmu *gmu)
A6XX_GMU_SPTPRAC_PWR_CLK_STATUS_GX_HM_CLK_OFF));
}

-void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
+void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
+ bool suspended)
{
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
@@ -127,15 +128,16 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)

/*
* This can get called from devfreq while the hardware is idle. Don't
- * bring up the power if it isn't already active
+ * bring up the power if it isn't already active. All we're doing here
+ * is updating the frequency so that when we come back online we're at
+ * the right rate.
*/
- if (pm_runtime_get_if_in_use(gmu->dev) == 0)
+ if (suspended)
return;

if (!gmu->legacy) {
a6xx_hfi_set_freq(gmu, perf_index);
dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
- pm_runtime_put(gmu->dev);
return;
}

@@ -159,7 +161,6 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
dev_err(gmu->dev, "GMU set GPU frequency error: %d\n", ret);

dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
- pm_runtime_put(gmu->dev);
}

unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu)
@@ -895,7 +896,7 @@ static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu)
return;

gmu->freq = 0; /* so a6xx_gmu_set_freq() doesn't exit early */
- a6xx_gmu_set_freq(gpu, gpu_opp);
+ a6xx_gmu_set_freq(gpu, gpu_opp, false);
dev_pm_opp_put(gpu_opp);
}

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 40fb92becc78..a8a0d798d17f 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1658,27 +1658,21 @@ static u64 a6xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
/* 19.2MHz */
*out_sample_rate = 19200000;

- /* Only read the gpu busy if the hardware is already active */
- if (pm_runtime_get_if_in_use(a6xx_gpu->gmu.dev) == 0)
- return 0;
-
busy_cycles = gmu_read64(&a6xx_gpu->gmu,
REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_L,
REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_H);

-
- pm_runtime_put(a6xx_gpu->gmu.dev);
-
return busy_cycles;
}

-static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
+static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
+ bool suspended)
{
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);

mutex_lock(&a6xx_gpu->gmu.lock);
- a6xx_gmu_set_freq(gpu, opp);
+ a6xx_gmu_set_freq(gpu, opp, suspended);
mutex_unlock(&a6xx_gpu->gmu.lock);
}

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 86e0a7c3fe6d..ab853f61db63 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -77,7 +77,8 @@ void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);

-void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp);
+void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
+ bool suspended);
unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu);

void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
index 16ba9f9b9a78..32715399a7c1 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
@@ -361,6 +361,9 @@ static void _dpu_crtc_blend_setup_mixer(struct drm_crtc *crtc,
if (!state)
continue;

+ if (!state->visible)
+ continue;
+
pstate = to_dpu_plane_state(state);
fb = state->fb;

@@ -1125,6 +1128,9 @@ static int dpu_crtc_atomic_check(struct drm_crtc *crtc,
if (cnt >= DPU_STAGE_MAX * 4)
continue;

+ if (!pstate->visible)
+ continue;
+
pstates[cnt].dpu_pstate = dpu_pstate;
pstates[cnt].drm_pstate = pstate;
pstates[cnt].stage = pstate->normalized_zpos;
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_pipe.c b/drivers/gpu/drm/msm/disp/mdp5/mdp5_pipe.c
index a4f5cb90f3e8..e4b8a789835a 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_pipe.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_pipe.c
@@ -123,12 +123,13 @@ int mdp5_pipe_release(struct drm_atomic_state *s, struct mdp5_hw_pipe *hwpipe)
{
struct msm_drm_private *priv = s->dev->dev_private;
struct mdp5_kms *mdp5_kms = to_mdp5_kms(to_mdp_kms(priv->kms));
- struct mdp5_global_state *state = mdp5_get_global_state(s);
+ struct mdp5_global_state *state;
struct mdp5_hw_pipe_state *new_state;

if (!hwpipe)
return 0;

+ state = mdp5_get_global_state(s);
if (IS_ERR(state))
return PTR_ERR(state);

diff --git a/drivers/gpu/drm/msm/hdmi/hdmi.c b/drivers/gpu/drm/msm/hdmi/hdmi.c
index f6229262dcb0..4ce0b4c41e49 100644
--- a/drivers/gpu/drm/msm/hdmi/hdmi.c
+++ b/drivers/gpu/drm/msm/hdmi/hdmi.c
@@ -180,6 +180,9 @@ static struct hdmi *msm_hdmi_init(struct platform_device *pdev)
goto fail;
}

+ for (i = 0; i < config->pwr_reg_cnt; i++)
+ hdmi->pwr_regs[i].supply = config->pwr_reg_names[i];
+
ret = devm_regulator_bulk_get(&pdev->dev, config->pwr_reg_cnt, hdmi->pwr_regs);
if (ret) {
DRM_DEV_ERROR(&pdev->dev, "failed to get pwr regulator: %d\n", ret);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 143c56f5185b..a4d3abc4f3e1 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -63,11 +63,14 @@ struct msm_gpu_funcs {
/* for generation specific debugfs: */
void (*debugfs_init)(struct msm_gpu *gpu, struct drm_minor *minor);
#endif
+ /* note: gpu_busy() can assume that we have been pm_resumed */
u64 (*gpu_busy)(struct msm_gpu *gpu, unsigned long *out_sample_rate);
struct msm_gpu_state *(*gpu_state_get)(struct msm_gpu *gpu);
int (*gpu_state_put)(struct msm_gpu_state *state);
unsigned long (*gpu_get_freq)(struct msm_gpu *gpu);
- void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp);
+ /* note: gpu_set_freq() can assume that we have been pm_resumed */
+ void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp,
+ bool suspended);
struct msm_gem_address_space *(*create_address_space)
(struct msm_gpu *gpu, struct platform_device *pdev);
struct msm_gem_address_space *(*create_private_address_space)
@@ -91,6 +94,9 @@ struct msm_gpu_devfreq {
/** devfreq: devfreq instance */
struct devfreq *devfreq;

+ /** lock: lock for "suspended", "busy_cycles", and "time" */
+ struct mutex lock;
+
/**
* idle_constraint:
*
@@ -134,6 +140,9 @@ struct msm_gpu_devfreq {
* elapsed
*/
struct msm_hrtimer_work boost_work;
+
+ /** suspended: tracks if we're suspended */
+ bool suspended;
};

struct msm_gpu {
diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
index c7dbaa4b1926..01248d340124 100644
--- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
+++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
@@ -20,6 +20,7 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,
u32 flags)
{
struct msm_gpu *gpu = dev_to_gpu(dev);
+ struct msm_gpu_devfreq *df = &gpu->devfreq;
struct dev_pm_opp *opp;

/*
@@ -32,10 +33,13 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,

trace_msm_gpu_freq_change(dev_pm_opp_get_freq(opp));

- if (gpu->funcs->gpu_set_freq)
- gpu->funcs->gpu_set_freq(gpu, opp);
- else
+ if (gpu->funcs->gpu_set_freq) {
+ mutex_lock(&df->lock);
+ gpu->funcs->gpu_set_freq(gpu, opp, df->suspended);
+ mutex_unlock(&df->lock);
+ } else {
clk_set_rate(gpu->core_clk, *freq);
+ }

dev_pm_opp_put(opp);

@@ -58,15 +62,24 @@ static void get_raw_dev_status(struct msm_gpu *gpu,
unsigned long sample_rate;
ktime_t time;

+ mutex_lock(&df->lock);
+
status->current_frequency = get_freq(gpu);
- busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
time = ktime_get();
-
- busy_time = busy_cycles - df->busy_cycles;
status->total_time = ktime_us_delta(time, df->time);
+ df->time = time;

+ if (df->suspended) {
+ mutex_unlock(&df->lock);
+ status->busy_time = 0;
+ return;
+ }
+
+ busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
+ busy_time = busy_cycles - df->busy_cycles;
df->busy_cycles = busy_cycles;
- df->time = time;
+
+ mutex_unlock(&df->lock);

busy_time *= USEC_PER_SEC;
do_div(busy_time, sample_rate);
@@ -175,6 +188,8 @@ void msm_devfreq_init(struct msm_gpu *gpu)
if (!gpu->funcs->gpu_busy)
return;

+ mutex_init(&df->lock);
+
dev_pm_qos_add_request(&gpu->pdev->dev, &df->idle_freq,
DEV_PM_QOS_MAX_FREQUENCY,
PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE);
@@ -244,12 +259,16 @@ void msm_devfreq_cleanup(struct msm_gpu *gpu)
void msm_devfreq_resume(struct msm_gpu *gpu)
{
struct msm_gpu_devfreq *df = &gpu->devfreq;
+ unsigned long sample_rate;

if (!has_devfreq(gpu))
return;

- df->busy_cycles = 0;
+ mutex_lock(&df->lock);
+ df->busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
df->time = ktime_get();
+ df->suspended = false;
+ mutex_unlock(&df->lock);

devfreq_resume_device(df->devfreq);
}
@@ -261,6 +280,10 @@ void msm_devfreq_suspend(struct msm_gpu *gpu)
if (!has_devfreq(gpu))
return;

+ mutex_lock(&df->lock);
+ df->suspended = true;
+ mutex_unlock(&df->lock);
+
devfreq_suspend_device(df->devfreq);

cancel_idle_work(df);
diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c
index 22b83a6577eb..df83c4654e26 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.c
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
@@ -1361,13 +1361,11 @@ nouveau_connector_create(struct drm_device *dev,
snprintf(aux_name, sizeof(aux_name), "sor-%04x-%04x",
dcbe->hasht, dcbe->hashm);
nv_connector->aux.name = kstrdup(aux_name, GFP_KERNEL);
- drm_dp_aux_init(&nv_connector->aux);
- if (ret) {
- NV_ERROR(drm, "Failed to init AUX adapter for sor-%04x-%04x: %d\n",
- dcbe->hasht, dcbe->hashm, ret);
+ if (!nv_connector->aux.name) {
kfree(nv_connector);
- return ERR_PTR(ret);
+ return ERR_PTR(-ENOMEM);
}
+ drm_dp_aux_init(&nv_connector->aux);
fallthrough;
default:
funcs = &nouveau_connector_funcs;
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
index 2cd0932b3d68..a2f5df568ca5 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -515,7 +515,7 @@ nouveau_display_hpd_work(struct work_struct *work)

pm_runtime_mark_last_busy(drm->dev->dev);
noop:
- pm_runtime_put_sync(drm->dev->dev);
+ pm_runtime_put_autosuspend(dev->dev);
}

#ifdef CONFIG_ACPI
@@ -537,7 +537,7 @@ nouveau_display_acpi_ntfy(struct notifier_block *nb, unsigned long val,
* it's own hotplug events.
*/
pm_runtime_put_autosuspend(drm->dev->dev);
- } else if (ret == 0) {
+ } else if (ret == 0 || ret == -EINPROGRESS) {
/* We've started resuming the GPU already, so
* it will handle scheduling a full reprobe
* itself
diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
index 4f9b3aa5deda..20ac1ce2c0f1 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
@@ -466,7 +466,7 @@ nouveau_fbcon_set_suspend_work(struct work_struct *work)
if (state == FBINFO_STATE_RUNNING) {
nouveau_fbcon_hotplug_resume(drm->fbcon);
pm_runtime_mark_last_busy(drm->dev->dev);
- pm_runtime_put_sync(drm->dev->dev);
+ pm_runtime_put_autosuspend(drm->dev->dev);
}
}

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bios/base.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bios/base.c
index 64e423dddd9e..6c318e41bde0 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/bios/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bios/base.c
@@ -33,7 +33,7 @@ nvbios_addr(struct nvkm_bios *bios, u32 *addr, u8 size)
{
u32 p = *addr;

- if (*addr > bios->image0_size && bios->imaged_addr) {
+ if (*addr >= bios->image0_size && bios->imaged_addr) {
*addr -= bios->image0_size;
*addr += bios->imaged_addr;
}
diff --git a/drivers/gpu/drm/panel/Kconfig b/drivers/gpu/drm/panel/Kconfig
index ddf5f38e8731..abc5271af4eb 100644
--- a/drivers/gpu/drm/panel/Kconfig
+++ b/drivers/gpu/drm/panel/Kconfig
@@ -428,6 +428,8 @@ config DRM_PANEL_SAMSUNG_ATNA33XC20
depends on OF
depends on BACKLIGHT_CLASS_DEVICE
depends on PM
+ select DRM_DISPLAY_DP_HELPER
+ select DRM_DISPLAY_HELPER
select DRM_DP_AUX_BUS
help
DRM panel driver for the Samsung ATNA33XC20 panel. This panel can't
diff --git a/drivers/gpu/drm/radeon/.gitignore b/drivers/gpu/drm/radeon/.gitignore
index 9c1a94153983..d8777383a64a 100644
--- a/drivers/gpu/drm/radeon/.gitignore
+++ b/drivers/gpu/drm/radeon/.gitignore
@@ -1,4 +1,4 @@
-# SPDX-License-Identifier: GPL-2.0-only
+# SPDX-License-Identifier: MIT
mkregtable
*_reg_safe.h

diff --git a/drivers/gpu/drm/radeon/Kconfig b/drivers/gpu/drm/radeon/Kconfig
index 6f60f4840cc5..52819e7f1fca 100644
--- a/drivers/gpu/drm/radeon/Kconfig
+++ b/drivers/gpu/drm/radeon/Kconfig
@@ -1,4 +1,4 @@
-# SPDX-License-Identifier: GPL-2.0-only
+# SPDX-License-Identifier: MIT
config DRM_RADEON_USERPTR
bool "Always enable userptr support"
depends on DRM_RADEON
diff --git a/drivers/gpu/drm/radeon/Makefile b/drivers/gpu/drm/radeon/Makefile
index 11c97edde54d..3d502f1bbfcb 100644
--- a/drivers/gpu/drm/radeon/Makefile
+++ b/drivers/gpu/drm/radeon/Makefile
@@ -1,4 +1,4 @@
-# SPDX-License-Identifier: GPL-2.0
+# SPDX-License-Identifier: MIT
#
# Makefile for the drm device driver. This driver provides support for the
# Direct Rendering Infrastructure (DRI) in XFree86 4.1.0 and higher.
diff --git a/drivers/gpu/drm/radeon/ni_dpm.c b/drivers/gpu/drm/radeon/ni_dpm.c
index 769f666335ac..672d2239293e 100644
--- a/drivers/gpu/drm/radeon/ni_dpm.c
+++ b/drivers/gpu/drm/radeon/ni_dpm.c
@@ -2741,10 +2741,10 @@ static int ni_set_mc_special_registers(struct radeon_device *rdev,
table->mc_reg_table_entry[k].mc_data[j] |= 0x100;
}
j++;
- if (j > SMC_NISLANDS_MC_REGISTER_ARRAY_SIZE)
- return -EINVAL;
break;
case MC_SEQ_RESERVE_M >> 2:
+ if (j >= SMC_NISLANDS_MC_REGISTER_ARRAY_SIZE)
+ return -EINVAL;
temp_reg = RREG32(MC_PMG_CMD_MRS1);
table->mc_reg_address[j].s1 = MC_PMG_CMD_MRS1 >> 2;
table->mc_reg_address[j].s0 = MC_SEQ_PMG_CMD_MRS1_LP >> 2;
@@ -2753,8 +2753,6 @@ static int ni_set_mc_special_registers(struct radeon_device *rdev,
(temp_reg & 0xffff0000) |
(table->mc_reg_table_entry[k].mc_data[i] & 0x0000ffff);
j++;
- if (j > SMC_NISLANDS_MC_REGISTER_ARRAY_SIZE)
- return -EINVAL;
break;
default:
break;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 15692cb241fc..429644d5ddc6 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1113,7 +1113,7 @@ static int radeon_gart_size_auto(enum radeon_family family)
static void radeon_check_arguments(struct radeon_device *rdev)
{
/* vramlimit must be a power of two */
- if (!is_power_of_2(radeon_vram_limit)) {
+ if (radeon_vram_limit != 0 && !is_power_of_2(radeon_vram_limit)) {
dev_warn(rdev->dev, "vram limit (%d) must be a power of 2\n",
radeon_vram_limit);
radeon_vram_limit = 0;
diff --git a/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c b/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c
index c82901d9a9cc..8a9bb618f872 100644
--- a/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c
+++ b/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c
@@ -398,7 +398,15 @@ static int rockchip_dp_probe(struct platform_device *pdev)
if (IS_ERR(dp->adp))
return PTR_ERR(dp->adp);

- return component_add(dev, &rockchip_dp_component_ops);
+ ret = component_add(dev, &rockchip_dp_component_ops);
+ if (ret)
+ goto err_dp_remove;
+
+ return 0;
+
+err_dp_remove:
+ analogix_dp_remove(dp->adp);
+ return ret;
}

static int rockchip_dp_remove(struct platform_device *pdev)
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
index d53037531f40..26e24f62f75f 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
@@ -1552,6 +1552,9 @@ static struct drm_crtc_state *vop_crtc_duplicate_state(struct drm_crtc *crtc)
{
struct rockchip_crtc_state *rockchip_state;

+ if (WARN_ON(!crtc->state))
+ return NULL;
+
rockchip_state = kzalloc(sizeof(*rockchip_state), GFP_KERNEL);
if (!rockchip_state)
return NULL;
diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index 7c7dd84e6db8..81991090adcc 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -704,14 +704,23 @@ static int tegra_gem_prime_vmap(struct dma_buf *buf, struct iosys_map *map)
{
struct drm_gem_object *gem = buf->priv;
struct tegra_bo *bo = to_tegra_bo(gem);
+ void *vaddr;

- iosys_map_set_vaddr(map, bo->vaddr);
+ vaddr = tegra_bo_mmap(&bo->base);
+ if (IS_ERR(vaddr))
+ return PTR_ERR(vaddr);
+
+ iosys_map_set_vaddr(map, vaddr);

return 0;
}

static void tegra_gem_prime_vunmap(struct dma_buf *buf, struct iosys_map *map)
{
+ struct drm_gem_object *gem = buf->priv;
+ struct tegra_bo *bo = to_tegra_bo(gem);
+
+ tegra_bo_munmap(&bo->base, map->vaddr);
}

static const struct dma_buf_ops tegra_gem_prime_dmabuf_ops = {
diff --git a/drivers/gpu/drm/tiny/st7735r.c b/drivers/gpu/drm/tiny/st7735r.c
index 29d618093e94..e0f02d367d88 100644
--- a/drivers/gpu/drm/tiny/st7735r.c
+++ b/drivers/gpu/drm/tiny/st7735r.c
@@ -174,6 +174,7 @@ MODULE_DEVICE_TABLE(of, st7735r_of_match);

static const struct spi_device_id st7735r_id[] = {
{ "jd-t18003-t01", (uintptr_t)&jd_t18003_t01_cfg },
+ { "rh128128t", (uintptr_t)&rh128128t_cfg },
{ },
};
MODULE_DEVICE_TABLE(spi, st7735r_id);
diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
index 477b3c5ad089..18e2a246f7c1 100644
--- a/drivers/gpu/drm/vc4/vc4_crtc.c
+++ b/drivers/gpu/drm/vc4/vc4_crtc.c
@@ -314,10 +314,13 @@ static void vc4_crtc_config_pv(struct drm_crtc *crtc, struct drm_encoder *encode
struct drm_crtc_state *crtc_state = crtc->state;
struct drm_display_mode *mode = &crtc_state->adjusted_mode;
bool interlace = mode->flags & DRM_MODE_FLAG_INTERLACE;
- u32 pixel_rep = (mode->flags & DRM_MODE_FLAG_DBLCLK) ? 2 : 1;
+ bool is_hdmi = vc4_encoder->type == VC4_ENCODER_TYPE_HDMI0 ||
+ vc4_encoder->type == VC4_ENCODER_TYPE_HDMI1;
+ u32 pixel_rep = ((mode->flags & DRM_MODE_FLAG_DBLCLK) && !is_hdmi) ? 2 : 1;
bool is_dsi = (vc4_encoder->type == VC4_ENCODER_TYPE_DSI0 ||
vc4_encoder->type == VC4_ENCODER_TYPE_DSI1);
- u32 format = is_dsi ? PV_CONTROL_FORMAT_DSIV_24 : PV_CONTROL_FORMAT_24;
+ bool is_dsi1 = vc4_encoder->type == VC4_ENCODER_TYPE_DSI1;
+ u32 format = is_dsi1 ? PV_CONTROL_FORMAT_DSIV_24 : PV_CONTROL_FORMAT_24;
u8 ppc = pv_data->pixels_per_clock;
bool debug_dump_regs = false;

@@ -343,7 +346,8 @@ static void vc4_crtc_config_pv(struct drm_crtc *crtc, struct drm_encoder *encode
PV_HORZB_HACTIVE));

CRTC_WRITE(PV_VERTA,
- VC4_SET_FIELD(mode->crtc_vtotal - mode->crtc_vsync_end,
+ VC4_SET_FIELD(mode->crtc_vtotal - mode->crtc_vsync_end +
+ interlace,
PV_VERTA_VBP) |
VC4_SET_FIELD(mode->crtc_vsync_end - mode->crtc_vsync_start,
PV_VERTA_VSYNC));
@@ -355,7 +359,7 @@ static void vc4_crtc_config_pv(struct drm_crtc *crtc, struct drm_encoder *encode
if (interlace) {
CRTC_WRITE(PV_VERTA_EVEN,
VC4_SET_FIELD(mode->crtc_vtotal -
- mode->crtc_vsync_end - 1,
+ mode->crtc_vsync_end,
PV_VERTA_VBP) |
VC4_SET_FIELD(mode->crtc_vsync_end -
mode->crtc_vsync_start,
@@ -375,7 +379,7 @@ static void vc4_crtc_config_pv(struct drm_crtc *crtc, struct drm_encoder *encode
PV_VCONTROL_CONTINUOUS |
(is_dsi ? PV_VCONTROL_DSI : 0) |
PV_VCONTROL_INTERLACE |
- VC4_SET_FIELD(mode->htotal * pixel_rep / 2,
+ VC4_SET_FIELD(mode->htotal * pixel_rep / (2 * ppc),
PV_VCONTROL_ODD_DELAY));
CRTC_WRITE(PV_VSYNCD_EVEN, 0);
} else {
diff --git a/drivers/gpu/drm/vc4/vc4_drv.c b/drivers/gpu/drm/vc4/vc4_drv.c
index 162bc18e7497..680eaef7287f 100644
--- a/drivers/gpu/drm/vc4/vc4_drv.c
+++ b/drivers/gpu/drm/vc4/vc4_drv.c
@@ -209,6 +209,15 @@ static void vc4_match_add_drivers(struct device *dev,
}
}

+static const struct of_device_id vc4_dma_range_matches[] = {
+ { .compatible = "brcm,bcm2711-hvs" },
+ { .compatible = "brcm,bcm2835-hvs" },
+ { .compatible = "brcm,bcm2835-v3d" },
+ { .compatible = "brcm,cygnus-v3d" },
+ { .compatible = "brcm,vc4-v3d" },
+ {}
+};
+
static int vc4_drm_bind(struct device *dev)
{
struct platform_device *pdev = to_platform_device(dev);
@@ -227,6 +236,16 @@ static int vc4_drm_bind(struct device *dev)
vc4_drm_driver.driver_features &= ~DRIVER_RENDER;
of_node_put(node);

+ node = of_find_matching_node_and_match(NULL, vc4_dma_range_matches,
+ NULL);
+ if (node) {
+ ret = of_dma_configure(dev, node, true);
+ of_node_put(node);
+
+ if (ret)
+ return ret;
+ }
+
vc4 = devm_drm_dev_alloc(dev, &vc4_drm_driver, struct vc4_dev, base);
if (IS_ERR(vc4))
return PTR_ERR(vc4);
diff --git a/drivers/gpu/drm/vc4/vc4_dsi.c b/drivers/gpu/drm/vc4/vc4_dsi.c
index 98308a17e4ed..b7b2c76770dc 100644
--- a/drivers/gpu/drm/vc4/vc4_dsi.c
+++ b/drivers/gpu/drm/vc4/vc4_dsi.c
@@ -181,8 +181,50 @@

#define DSI0_TXPKT_PIX_FIFO 0x20 /* AKA PIX_FIFO */

-#define DSI0_INT_STAT 0x24
-#define DSI0_INT_EN 0x28
+#define DSI0_INT_STAT 0x24
+#define DSI0_INT_EN 0x28
+# define DSI0_INT_FIFO_ERR BIT(25)
+# define DSI0_INT_CMDC_DONE_MASK VC4_MASK(24, 23)
+# define DSI0_INT_CMDC_DONE_SHIFT 23
+# define DSI0_INT_CMDC_DONE_NO_REPEAT 1
+# define DSI0_INT_CMDC_DONE_REPEAT 3
+# define DSI0_INT_PHY_DIR_RTF BIT(22)
+# define DSI0_INT_PHY_D1_ULPS BIT(21)
+# define DSI0_INT_PHY_D1_STOP BIT(20)
+# define DSI0_INT_PHY_RXLPDT BIT(19)
+# define DSI0_INT_PHY_RXTRIG BIT(18)
+# define DSI0_INT_PHY_D0_ULPS BIT(17)
+# define DSI0_INT_PHY_D0_LPDT BIT(16)
+# define DSI0_INT_PHY_D0_FTR BIT(15)
+# define DSI0_INT_PHY_D0_STOP BIT(14)
+/* Signaled when the clock lane enters the given state. */
+# define DSI0_INT_PHY_CLK_ULPS BIT(13)
+# define DSI0_INT_PHY_CLK_HS BIT(12)
+# define DSI0_INT_PHY_CLK_FTR BIT(11)
+/* Signaled on timeouts */
+# define DSI0_INT_PR_TO BIT(10)
+# define DSI0_INT_TA_TO BIT(9)
+# define DSI0_INT_LPRX_TO BIT(8)
+# define DSI0_INT_HSTX_TO BIT(7)
+/* Contention on a line when trying to drive the line low */
+# define DSI0_INT_ERR_CONT_LP1 BIT(6)
+# define DSI0_INT_ERR_CONT_LP0 BIT(5)
+/* Control error: incorrect line state sequence on data lane 0. */
+# define DSI0_INT_ERR_CONTROL BIT(4)
+# define DSI0_INT_ERR_SYNC_ESC BIT(3)
+# define DSI0_INT_RX2_PKT BIT(2)
+# define DSI0_INT_RX1_PKT BIT(1)
+# define DSI0_INT_CMD_PKT BIT(0)
+
+#define DSI0_INTERRUPTS_ALWAYS_ENABLED (DSI0_INT_ERR_SYNC_ESC | \
+ DSI0_INT_ERR_CONTROL | \
+ DSI0_INT_ERR_CONT_LP0 | \
+ DSI0_INT_ERR_CONT_LP1 | \
+ DSI0_INT_HSTX_TO | \
+ DSI0_INT_LPRX_TO | \
+ DSI0_INT_TA_TO | \
+ DSI0_INT_PR_TO)
+
# define DSI1_INT_PHY_D3_ULPS BIT(30)
# define DSI1_INT_PHY_D3_STOP BIT(29)
# define DSI1_INT_PHY_D2_ULPS BIT(28)
@@ -761,6 +803,9 @@ static void vc4_dsi_encoder_disable(struct drm_encoder *encoder)
list_for_each_entry_reverse(iter, &dsi->bridge_chain, chain_node) {
if (iter->funcs->disable)
iter->funcs->disable(iter);
+
+ if (iter == dsi->bridge)
+ break;
}

vc4_dsi_ulps(dsi, true);
@@ -805,11 +850,9 @@ static bool vc4_dsi_encoder_mode_fixup(struct drm_encoder *encoder,
/* Find what divider gets us a faster clock than the requested
* pixel clock.
*/
- for (divider = 1; divider < 8; divider++) {
- if (parent_rate / divider < pll_clock) {
- divider--;
+ for (divider = 1; divider < 255; divider++) {
+ if (parent_rate / (divider + 1) < pll_clock)
break;
- }
}

/* Now that we've picked a PLL divider, calculate back to its
@@ -894,6 +937,9 @@ static void vc4_dsi_encoder_enable(struct drm_encoder *encoder)

DSI_PORT_WRITE(PHY_AFEC0, afec0);

+ /* AFEC reset hold time */
+ mdelay(1);
+
DSI_PORT_WRITE(PHY_AFEC1,
VC4_SET_FIELD(6, DSI0_PHY_AFEC1_IDR_DLANE1) |
VC4_SET_FIELD(6, DSI0_PHY_AFEC1_IDR_DLANE0) |
@@ -1060,12 +1106,9 @@ static void vc4_dsi_encoder_enable(struct drm_encoder *encoder)
DSI_PORT_WRITE(CTRL, DSI_PORT_READ(CTRL) | DSI1_CTRL_EN);

/* Bring AFE out of reset. */
- if (dsi->variant->port == 0) {
- } else {
- DSI_PORT_WRITE(PHY_AFEC0,
- DSI_PORT_READ(PHY_AFEC0) &
- ~DSI1_PHY_AFEC0_RESET);
- }
+ DSI_PORT_WRITE(PHY_AFEC0,
+ DSI_PORT_READ(PHY_AFEC0) &
+ ~DSI_PORT_BIT(PHY_AFEC0_RESET));

vc4_dsi_ulps(dsi, false);

@@ -1184,13 +1227,28 @@ static ssize_t vc4_dsi_host_transfer(struct mipi_dsi_host *host,
/* Enable the appropriate interrupt for the transfer completion. */
dsi->xfer_result = 0;
reinit_completion(&dsi->xfer_completion);
- DSI_PORT_WRITE(INT_STAT, DSI1_INT_TXPKT1_DONE | DSI1_INT_PHY_DIR_RTF);
- if (msg->rx_len) {
- DSI_PORT_WRITE(INT_EN, (DSI1_INTERRUPTS_ALWAYS_ENABLED |
- DSI1_INT_PHY_DIR_RTF));
+ if (dsi->variant->port == 0) {
+ DSI_PORT_WRITE(INT_STAT,
+ DSI0_INT_CMDC_DONE_MASK | DSI1_INT_PHY_DIR_RTF);
+ if (msg->rx_len) {
+ DSI_PORT_WRITE(INT_EN, (DSI0_INTERRUPTS_ALWAYS_ENABLED |
+ DSI0_INT_PHY_DIR_RTF));
+ } else {
+ DSI_PORT_WRITE(INT_EN,
+ (DSI0_INTERRUPTS_ALWAYS_ENABLED |
+ VC4_SET_FIELD(DSI0_INT_CMDC_DONE_NO_REPEAT,
+ DSI0_INT_CMDC_DONE)));
+ }
} else {
- DSI_PORT_WRITE(INT_EN, (DSI1_INTERRUPTS_ALWAYS_ENABLED |
- DSI1_INT_TXPKT1_DONE));
+ DSI_PORT_WRITE(INT_STAT,
+ DSI1_INT_TXPKT1_DONE | DSI1_INT_PHY_DIR_RTF);
+ if (msg->rx_len) {
+ DSI_PORT_WRITE(INT_EN, (DSI1_INTERRUPTS_ALWAYS_ENABLED |
+ DSI1_INT_PHY_DIR_RTF));
+ } else {
+ DSI_PORT_WRITE(INT_EN, (DSI1_INTERRUPTS_ALWAYS_ENABLED |
+ DSI1_INT_TXPKT1_DONE));
+ }
}

/* Send the packet. */
@@ -1207,7 +1265,7 @@ static ssize_t vc4_dsi_host_transfer(struct mipi_dsi_host *host,
ret = dsi->xfer_result;
}

- DSI_PORT_WRITE(INT_EN, DSI1_INTERRUPTS_ALWAYS_ENABLED);
+ DSI_PORT_WRITE(INT_EN, DSI_PORT_BIT(INTERRUPTS_ALWAYS_ENABLED));

if (ret)
goto reset_fifo_and_return;
@@ -1253,7 +1311,7 @@ static ssize_t vc4_dsi_host_transfer(struct mipi_dsi_host *host,
DSI_PORT_BIT(CTRL_RESET_FIFOS));

DSI_PORT_WRITE(TXPKT1C, 0);
- DSI_PORT_WRITE(INT_EN, DSI1_INTERRUPTS_ALWAYS_ENABLED);
+ DSI_PORT_WRITE(INT_EN, DSI_PORT_BIT(INTERRUPTS_ALWAYS_ENABLED));
return ret;
}

@@ -1390,26 +1448,28 @@ static irqreturn_t vc4_dsi_irq_handler(int irq, void *data)
DSI_PORT_WRITE(INT_STAT, stat);

dsi_handle_error(dsi, &ret, stat,
- DSI1_INT_ERR_SYNC_ESC, "LPDT sync");
+ DSI_PORT_BIT(INT_ERR_SYNC_ESC), "LPDT sync");
dsi_handle_error(dsi, &ret, stat,
- DSI1_INT_ERR_CONTROL, "data lane 0 sequence");
+ DSI_PORT_BIT(INT_ERR_CONTROL), "data lane 0 sequence");
dsi_handle_error(dsi, &ret, stat,
- DSI1_INT_ERR_CONT_LP0, "LP0 contention");
+ DSI_PORT_BIT(INT_ERR_CONT_LP0), "LP0 contention");
dsi_handle_error(dsi, &ret, stat,
- DSI1_INT_ERR_CONT_LP1, "LP1 contention");
+ DSI_PORT_BIT(INT_ERR_CONT_LP1), "LP1 contention");
dsi_handle_error(dsi, &ret, stat,
- DSI1_INT_HSTX_TO, "HSTX timeout");
+ DSI_PORT_BIT(INT_HSTX_TO), "HSTX timeout");
dsi_handle_error(dsi, &ret, stat,
- DSI1_INT_LPRX_TO, "LPRX timeout");
+ DSI_PORT_BIT(INT_LPRX_TO), "LPRX timeout");
dsi_handle_error(dsi, &ret, stat,
- DSI1_INT_TA_TO, "turnaround timeout");
+ DSI_PORT_BIT(INT_TA_TO), "turnaround timeout");
dsi_handle_error(dsi, &ret, stat,
- DSI1_INT_PR_TO, "peripheral reset timeout");
+ DSI_PORT_BIT(INT_PR_TO), "peripheral reset timeout");

- if (stat & (DSI1_INT_TXPKT1_DONE | DSI1_INT_PHY_DIR_RTF)) {
+ if (stat & ((dsi->variant->port ? DSI1_INT_TXPKT1_DONE :
+ DSI0_INT_CMDC_DONE_MASK) |
+ DSI_PORT_BIT(INT_PHY_DIR_RTF))) {
complete(&dsi->xfer_completion);
ret = IRQ_HANDLED;
- } else if (stat & DSI1_INT_HSTX_TO) {
+ } else if (stat & DSI_PORT_BIT(INT_HSTX_TO)) {
complete(&dsi->xfer_completion);
dsi->xfer_result = -ETIMEDOUT;
ret = IRQ_HANDLED;
@@ -1487,13 +1547,29 @@ vc4_dsi_init_phy_clocks(struct vc4_dsi *dsi)
dsi->clk_onecell);
}

+static void vc4_dsi_dma_mem_release(void *ptr)
+{
+ struct vc4_dsi *dsi = ptr;
+ struct device *dev = &dsi->pdev->dev;
+
+ dma_free_coherent(dev, 4, dsi->reg_dma_mem, dsi->reg_dma_paddr);
+ dsi->reg_dma_mem = NULL;
+}
+
+static void vc4_dsi_dma_chan_release(void *ptr)
+{
+ struct vc4_dsi *dsi = ptr;
+
+ dma_release_channel(dsi->reg_dma_chan);
+ dsi->reg_dma_chan = NULL;
+}
+
static int vc4_dsi_bind(struct device *dev, struct device *master, void *data)
{
struct platform_device *pdev = to_platform_device(dev);
struct drm_device *drm = dev_get_drvdata(master);
struct vc4_dsi *dsi = dev_get_drvdata(dev);
struct vc4_dsi_encoder *vc4_dsi_encoder;
- dma_cap_mask_t dma_mask;
int ret;

dsi->variant = of_device_get_match_data(dev);
@@ -1504,7 +1580,8 @@ static int vc4_dsi_bind(struct device *dev, struct device *master, void *data)
return -ENOMEM;

INIT_LIST_HEAD(&dsi->bridge_chain);
- vc4_dsi_encoder->base.type = VC4_ENCODER_TYPE_DSI1;
+ vc4_dsi_encoder->base.type = dsi->variant->port ?
+ VC4_ENCODER_TYPE_DSI1 : VC4_ENCODER_TYPE_DSI0;
vc4_dsi_encoder->dsi = dsi;
dsi->encoder = &vc4_dsi_encoder->base.base;

@@ -1527,6 +1604,8 @@ static int vc4_dsi_bind(struct device *dev, struct device *master, void *data)
* so set up a channel for talking to it.
*/
if (dsi->variant->broken_axi_workaround) {
+ dma_cap_mask_t dma_mask;
+
dsi->reg_dma_mem = dma_alloc_coherent(dev, 4,
&dsi->reg_dma_paddr,
GFP_KERNEL);
@@ -1535,8 +1614,13 @@ static int vc4_dsi_bind(struct device *dev, struct device *master, void *data)
return -ENOMEM;
}

+ ret = devm_add_action_or_reset(dev, vc4_dsi_dma_mem_release, dsi);
+ if (ret)
+ return ret;
+
dma_cap_zero(dma_mask);
dma_cap_set(DMA_MEMCPY, dma_mask);
+
dsi->reg_dma_chan = dma_request_chan_by_mask(&dma_mask);
if (IS_ERR(dsi->reg_dma_chan)) {
ret = PTR_ERR(dsi->reg_dma_chan);
@@ -1546,6 +1630,10 @@ static int vc4_dsi_bind(struct device *dev, struct device *master, void *data)
return ret;
}

+ ret = devm_add_action_or_reset(dev, vc4_dsi_dma_chan_release, dsi);
+ if (ret)
+ return ret;
+
/* Get the physical address of the device's registers. The
* struct resource for the regs gives us the bus address
* instead.
diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c
index 98b78ec6b37d..8b1e52145082 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.c
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.c
@@ -79,6 +79,11 @@
#define VC5_HDMI_VERTB_VSPO_SHIFT 16
#define VC5_HDMI_VERTB_VSPO_MASK VC4_MASK(29, 16)

+#define VC4_HDMI_MISC_CONTROL_PIXEL_REP_SHIFT 0
+#define VC4_HDMI_MISC_CONTROL_PIXEL_REP_MASK VC4_MASK(3, 0)
+#define VC5_HDMI_MISC_CONTROL_PIXEL_REP_SHIFT 0
+#define VC5_HDMI_MISC_CONTROL_PIXEL_REP_MASK VC4_MASK(3, 0)
+
#define VC5_HDMI_SCRAMBLER_CTL_ENABLE BIT(0)

#define VC5_HDMI_DEEP_COLOR_CONFIG_1_INIT_PACK_PHASE_SHIFT 8
@@ -122,6 +127,12 @@ static int vc4_hdmi_debugfs_regs(struct seq_file *m, void *unused)

drm_print_regset32(&p, &vc4_hdmi->hdmi_regset);
drm_print_regset32(&p, &vc4_hdmi->hd_regset);
+ drm_print_regset32(&p, &vc4_hdmi->cec_regset);
+ drm_print_regset32(&p, &vc4_hdmi->csc_regset);
+ drm_print_regset32(&p, &vc4_hdmi->dvp_regset);
+ drm_print_regset32(&p, &vc4_hdmi->phy_regset);
+ drm_print_regset32(&p, &vc4_hdmi->ram_regset);
+ drm_print_regset32(&p, &vc4_hdmi->rm_regset);

return 0;
}
@@ -433,9 +444,11 @@ static void vc4_hdmi_write_infoframe(struct drm_encoder *encoder,
const struct vc4_hdmi_register *ram_packet_start =
&vc4_hdmi->variant->registers[HDMI_RAM_PACKET_START];
u32 packet_reg = ram_packet_start->offset + VC4_HDMI_PACKET_STRIDE * packet_id;
+ u32 packet_reg_next = ram_packet_start->offset +
+ VC4_HDMI_PACKET_STRIDE * (packet_id + 1);
void __iomem *base = __vc4_hdmi_get_field_base(vc4_hdmi,
ram_packet_start->reg);
- uint8_t buffer[VC4_HDMI_PACKET_STRIDE];
+ uint8_t buffer[VC4_HDMI_PACKET_STRIDE] = {};
unsigned long flags;
ssize_t len, i;
int ret;
@@ -471,6 +484,13 @@ static void vc4_hdmi_write_infoframe(struct drm_encoder *encoder,
packet_reg += 4;
}

+ /*
+ * clear remainder of packet ram as it's included in the
+ * infoframe and triggers a checksum error on hdmi analyser
+ */
+ for (; packet_reg < packet_reg_next; packet_reg += 4)
+ writel(0, base + packet_reg);
+
HDMI_WRITE(HDMI_RAM_PACKET_CONFIG,
HDMI_READ(HDMI_RAM_PACKET_CONFIG) | BIT(packet_id));

@@ -852,14 +872,15 @@ static void vc4_hdmi_set_timings(struct vc4_hdmi *vc4_hdmi,
VC4_HDMI_VERTA_VFP) |
VC4_SET_FIELD(mode->crtc_vdisplay, VC4_HDMI_VERTA_VAL));
u32 vertb = (VC4_SET_FIELD(0, VC4_HDMI_VERTB_VSPO) |
- VC4_SET_FIELD(mode->crtc_vtotal - mode->crtc_vsync_end,
+ VC4_SET_FIELD(mode->crtc_vtotal - mode->crtc_vsync_end +
+ interlaced,
VC4_HDMI_VERTB_VBP));
u32 vertb_even = (VC4_SET_FIELD(0, VC4_HDMI_VERTB_VSPO) |
VC4_SET_FIELD(mode->crtc_vtotal -
- mode->crtc_vsync_end -
- interlaced,
+ mode->crtc_vsync_end,
VC4_HDMI_VERTB_VBP));
unsigned long flags;
+ u32 reg;

spin_lock_irqsave(&vc4_hdmi->hw_lock, flags);

@@ -886,6 +907,11 @@ static void vc4_hdmi_set_timings(struct vc4_hdmi *vc4_hdmi,
HDMI_WRITE(HDMI_VERTB0, vertb_even);
HDMI_WRITE(HDMI_VERTB1, vertb);

+ reg = HDMI_READ(HDMI_MISC_CONTROL);
+ reg &= ~VC4_HDMI_MISC_CONTROL_PIXEL_REP_MASK;
+ reg |= VC4_SET_FIELD(pixel_rep - 1, VC4_HDMI_MISC_CONTROL_PIXEL_REP);
+ HDMI_WRITE(HDMI_MISC_CONTROL, reg);
+
spin_unlock_irqrestore(&vc4_hdmi->hw_lock, flags);
}

@@ -902,13 +928,13 @@ static void vc5_hdmi_set_timings(struct vc4_hdmi *vc4_hdmi,
VC4_SET_FIELD(mode->crtc_vsync_start - mode->crtc_vdisplay,
VC5_HDMI_VERTA_VFP) |
VC4_SET_FIELD(mode->crtc_vdisplay, VC5_HDMI_VERTA_VAL));
- u32 vertb = (VC4_SET_FIELD(0, VC5_HDMI_VERTB_VSPO) |
+ u32 vertb = (VC4_SET_FIELD(mode->htotal >> (2 - pixel_rep),
+ VC5_HDMI_VERTB_VSPO) |
VC4_SET_FIELD(mode->crtc_vtotal - mode->crtc_vsync_end,
VC4_HDMI_VERTB_VBP));
u32 vertb_even = (VC4_SET_FIELD(0, VC5_HDMI_VERTB_VSPO) |
VC4_SET_FIELD(mode->crtc_vtotal -
- mode->crtc_vsync_end -
- interlaced,
+ mode->crtc_vsync_end - interlaced,
VC4_HDMI_VERTB_VBP));
unsigned long flags;
unsigned char gcp;
@@ -973,6 +999,11 @@ static void vc5_hdmi_set_timings(struct vc4_hdmi *vc4_hdmi,
reg |= gcp_en ? VC5_HDMI_GCP_CONFIG_GCP_ENABLE : 0;
HDMI_WRITE(HDMI_GCP_CONFIG, reg);

+ reg = HDMI_READ(HDMI_MISC_CONTROL);
+ reg &= ~VC5_HDMI_MISC_CONTROL_PIXEL_REP_MASK;
+ reg |= VC4_SET_FIELD(pixel_rep - 1, VC5_HDMI_MISC_CONTROL_PIXEL_REP);
+ HDMI_WRITE(HDMI_MISC_CONTROL, reg);
+
HDMI_WRITE(HDMI_CLOCK_STOP, 0);

spin_unlock_irqrestore(&vc4_hdmi->hw_lock, flags);
@@ -1253,11 +1284,25 @@ static int vc4_hdmi_encoder_atomic_check(struct drm_encoder *encoder,
unsigned long long pixel_rate = mode->clock * 1000;
unsigned long long tmds_rate;

- if (vc4_hdmi->variant->unsupported_odd_h_timings &&
- !(mode->flags & DRM_MODE_FLAG_DBLCLK) &&
- ((mode->hdisplay % 2) || (mode->hsync_start % 2) ||
- (mode->hsync_end % 2) || (mode->htotal % 2)))
- return -EINVAL;
+ if (vc4_hdmi->variant->unsupported_odd_h_timings) {
+ if (mode->flags & DRM_MODE_FLAG_DBLCLK) {
+ /* Only try to fixup DBLCLK modes to get 480i and 576i
+ * working.
+ * A generic solution for all modes with odd horizontal
+ * timing values seems impossible based on trying to
+ * solve it for 1366x768 monitors.
+ */
+ if ((mode->hsync_start - mode->hdisplay) & 1)
+ mode->hsync_start--;
+ if ((mode->hsync_end - mode->hsync_start) & 1)
+ mode->hsync_end--;
+ }
+
+ /* Now check whether we still have odd values remaining */
+ if ((mode->hdisplay % 2) || (mode->hsync_start % 2) ||
+ (mode->hsync_end % 2) || (mode->htotal % 2))
+ return -EINVAL;
+ }

/*
* The 1440p@60 pixel rate is in the same range than the first
@@ -1611,10 +1656,10 @@ static int vc4_hdmi_audio_prepare(struct device *dev, void *data,

/* Set the MAI threshold */
HDMI_WRITE(HDMI_MAI_THR,
- VC4_SET_FIELD(0x10, VC4_HD_MAI_THR_PANICHIGH) |
- VC4_SET_FIELD(0x10, VC4_HD_MAI_THR_PANICLOW) |
- VC4_SET_FIELD(0x10, VC4_HD_MAI_THR_DREQHIGH) |
- VC4_SET_FIELD(0x10, VC4_HD_MAI_THR_DREQLOW));
+ VC4_SET_FIELD(0x08, VC4_HD_MAI_THR_PANICHIGH) |
+ VC4_SET_FIELD(0x08, VC4_HD_MAI_THR_PANICLOW) |
+ VC4_SET_FIELD(0x06, VC4_HD_MAI_THR_DREQHIGH) |
+ VC4_SET_FIELD(0x08, VC4_HD_MAI_THR_DREQLOW));

HDMI_WRITE(HDMI_MAI_CONFIG,
VC4_HDMI_MAI_CONFIG_BIT_REVERSE |
@@ -1705,12 +1750,12 @@ static int vc4_hdmi_audio_init(struct vc4_hdmi *vc4_hdmi)
struct device *dev = &vc4_hdmi->pdev->dev;
struct platform_device *codec_pdev;
const __be32 *addr;
- int index;
+ int index, len;
int ret;

- if (!of_find_property(dev->of_node, "dmas", NULL)) {
+ if (!of_find_property(dev->of_node, "dmas", &len) || !len) {
dev_warn(dev,
- "'dmas' DT property is missing, no HDMI audio\n");
+ "'dmas' DT property is missing or empty, no HDMI audio\n");
return 0;
}

@@ -2191,8 +2236,6 @@ static int vc4_hdmi_cec_init(struct vc4_hdmi *vc4_hdmi)
struct cec_connector_info conn_info;
struct platform_device *pdev = vc4_hdmi->pdev;
struct device *dev = &pdev->dev;
- unsigned long flags;
- u32 value;
int ret;

if (!of_find_property(dev->of_node, "interrupts", NULL)) {
@@ -2211,15 +2254,6 @@ static int vc4_hdmi_cec_init(struct vc4_hdmi *vc4_hdmi)
cec_fill_conn_info_from_drm(&conn_info, &vc4_hdmi->connector);
cec_s_conn_info(vc4_hdmi->cec_adap, &conn_info);

- spin_lock_irqsave(&vc4_hdmi->hw_lock, flags);
- value = HDMI_READ(HDMI_CEC_CNTRL_1);
- /* Set the logical address to Unregistered */
- value |= VC4_HDMI_CEC_ADDR_MASK;
- HDMI_WRITE(HDMI_CEC_CNTRL_1, value);
- spin_unlock_irqrestore(&vc4_hdmi->hw_lock, flags);
-
- vc4_hdmi_cec_update_clk_div(vc4_hdmi);
-
if (vc4_hdmi->variant->external_irq_controller) {
ret = request_threaded_irq(platform_get_irq_byname(pdev, "cec-rx"),
vc4_cec_irq_handler_rx_bare,
@@ -2235,10 +2269,6 @@ static int vc4_hdmi_cec_init(struct vc4_hdmi *vc4_hdmi)
if (ret)
goto err_remove_cec_rx_handler;
} else {
- spin_lock_irqsave(&vc4_hdmi->hw_lock, flags);
- HDMI_WRITE(HDMI_CEC_CPU_MASK_SET, 0xffffffff);
- spin_unlock_irqrestore(&vc4_hdmi->hw_lock, flags);
-
ret = request_threaded_irq(platform_get_irq(pdev, 0),
vc4_cec_irq_handler,
vc4_cec_irq_handler_thread, 0,
@@ -2289,7 +2319,6 @@ static int vc4_hdmi_cec_init(struct vc4_hdmi *vc4_hdmi)
}

static void vc4_hdmi_cec_exit(struct vc4_hdmi *vc4_hdmi) {};
-
#endif

static int vc4_hdmi_build_regset(struct vc4_hdmi *vc4_hdmi,
@@ -2374,6 +2403,7 @@ static int vc5_hdmi_init_resources(struct vc4_hdmi *vc4_hdmi)
struct platform_device *pdev = vc4_hdmi->pdev;
struct device *dev = &pdev->dev;
struct resource *res;
+ int ret;

res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "hdmi");
if (!res)
@@ -2470,6 +2500,38 @@ static int vc5_hdmi_init_resources(struct vc4_hdmi *vc4_hdmi)
return PTR_ERR(vc4_hdmi->reset);
}

+ ret = vc4_hdmi_build_regset(vc4_hdmi, &vc4_hdmi->hdmi_regset, VC4_HDMI);
+ if (ret)
+ return ret;
+
+ ret = vc4_hdmi_build_regset(vc4_hdmi, &vc4_hdmi->hd_regset, VC4_HD);
+ if (ret)
+ return ret;
+
+ ret = vc4_hdmi_build_regset(vc4_hdmi, &vc4_hdmi->cec_regset, VC5_CEC);
+ if (ret)
+ return ret;
+
+ ret = vc4_hdmi_build_regset(vc4_hdmi, &vc4_hdmi->csc_regset, VC5_CSC);
+ if (ret)
+ return ret;
+
+ ret = vc4_hdmi_build_regset(vc4_hdmi, &vc4_hdmi->dvp_regset, VC5_DVP);
+ if (ret)
+ return ret;
+
+ ret = vc4_hdmi_build_regset(vc4_hdmi, &vc4_hdmi->phy_regset, VC5_PHY);
+ if (ret)
+ return ret;
+
+ ret = vc4_hdmi_build_regset(vc4_hdmi, &vc4_hdmi->ram_regset, VC5_RAM);
+ if (ret)
+ return ret;
+
+ ret = vc4_hdmi_build_regset(vc4_hdmi, &vc4_hdmi->rm_regset, VC5_RM);
+ if (ret)
+ return ret;
+
return 0;
}

@@ -2485,12 +2547,34 @@ static int __maybe_unused vc4_hdmi_runtime_suspend(struct device *dev)
static int vc4_hdmi_runtime_resume(struct device *dev)
{
struct vc4_hdmi *vc4_hdmi = dev_get_drvdata(dev);
+ unsigned long __maybe_unused flags;
+ u32 __maybe_unused value;
int ret;

ret = clk_prepare_enable(vc4_hdmi->hsm_clock);
if (ret)
return ret;

+ if (vc4_hdmi->variant->reset)
+ vc4_hdmi->variant->reset(vc4_hdmi);
+
+#ifdef CONFIG_DRM_VC4_HDMI_CEC
+ spin_lock_irqsave(&vc4_hdmi->hw_lock, flags);
+ value = HDMI_READ(HDMI_CEC_CNTRL_1);
+ /* Set the logical address to Unregistered */
+ value |= VC4_HDMI_CEC_ADDR_MASK;
+ HDMI_WRITE(HDMI_CEC_CNTRL_1, value);
+ spin_unlock_irqrestore(&vc4_hdmi->hw_lock, flags);
+
+ vc4_hdmi_cec_update_clk_div(vc4_hdmi);
+
+ if (!vc4_hdmi->variant->external_irq_controller) {
+ spin_lock_irqsave(&vc4_hdmi->hw_lock, flags);
+ HDMI_WRITE(HDMI_CEC_CPU_MASK_SET, 0xffffffff);
+ spin_unlock_irqrestore(&vc4_hdmi->hw_lock, flags);
+ }
+#endif
+
return 0;
}

@@ -2593,9 +2677,6 @@ static int vc4_hdmi_bind(struct device *dev, struct device *master, void *data)
pm_runtime_set_active(dev);
pm_runtime_enable(dev);

- if (vc4_hdmi->variant->reset)
- vc4_hdmi->variant->reset(vc4_hdmi);
-
if ((of_device_is_compatible(dev->of_node, "brcm,bcm2711-hdmi0") ||
of_device_is_compatible(dev->of_node, "brcm,bcm2711-hdmi1")) &&
HDMI_READ(HDMI_VID_CTL) & VC4_HD_VID_CTL_ENABLE) {
diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.h b/drivers/gpu/drm/vc4/vc4_hdmi.h
index 1076faeab616..2b9f5ca15a40 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.h
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.h
@@ -184,6 +184,14 @@ struct vc4_hdmi {
struct debugfs_regset32 hdmi_regset;
struct debugfs_regset32 hd_regset;

+ /* VC5 only */
+ struct debugfs_regset32 cec_regset;
+ struct debugfs_regset32 csc_regset;
+ struct debugfs_regset32 dvp_regset;
+ struct debugfs_regset32 phy_regset;
+ struct debugfs_regset32 ram_regset;
+ struct debugfs_regset32 rm_regset;
+
/**
* @hw_lock: Spinlock protecting device register access.
*/
diff --git a/drivers/gpu/drm/vc4/vc4_hdmi_regs.h b/drivers/gpu/drm/vc4/vc4_hdmi_regs.h
index fc971506bd4f..72b769412482 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi_regs.h
+++ b/drivers/gpu/drm/vc4/vc4_hdmi_regs.h
@@ -125,6 +125,7 @@ enum vc4_hdmi_field {
HDMI_VERTB0,
HDMI_VERTB1,
HDMI_VID_CTL,
+ HDMI_MISC_CONTROL,
};

struct vc4_hdmi_register {
@@ -235,6 +236,7 @@ static const struct vc4_hdmi_register __maybe_unused vc5_hdmi_hdmi0_fields[] = {
VC4_HDMI_REG(HDMI_VERTB0, 0x0f0),
VC4_HDMI_REG(HDMI_VERTA1, 0x0f4),
VC4_HDMI_REG(HDMI_VERTB1, 0x0f8),
+ VC4_HDMI_REG(HDMI_MISC_CONTROL, 0x100),
VC4_HDMI_REG(HDMI_MAI_CHANNEL_MAP, 0x09c),
VC4_HDMI_REG(HDMI_MAI_CONFIG, 0x0a0),
VC4_HDMI_REG(HDMI_DEEP_COLOR_CONFIG_1, 0x170),
@@ -315,6 +317,7 @@ static const struct vc4_hdmi_register __maybe_unused vc5_hdmi_hdmi1_fields[] = {
VC4_HDMI_REG(HDMI_VERTB0, 0x0f0),
VC4_HDMI_REG(HDMI_VERTA1, 0x0f4),
VC4_HDMI_REG(HDMI_VERTB1, 0x0f8),
+ VC4_HDMI_REG(HDMI_MISC_CONTROL, 0x100),
VC4_HDMI_REG(HDMI_MAI_CHANNEL_MAP, 0x09c),
VC4_HDMI_REG(HDMI_MAI_CONFIG, 0x0a0),
VC4_HDMI_REG(HDMI_DEEP_COLOR_CONFIG_1, 0x170),
@@ -414,7 +417,7 @@ static inline u32 vc4_hdmi_read(struct vc4_hdmi *hdmi,
const struct vc4_hdmi_variant *variant = hdmi->variant;
void __iomem *base;

- WARN_ON(!pm_runtime_active(&hdmi->pdev->dev));
+ WARN_ON(pm_runtime_status_suspended(&hdmi->pdev->dev));

if (reg >= variant->num_registers) {
dev_warn(&hdmi->pdev->dev,
@@ -444,7 +447,7 @@ static inline void vc4_hdmi_write(struct vc4_hdmi *hdmi,

lockdep_assert_held(&hdmi->hw_lock);

- WARN_ON(!pm_runtime_active(&hdmi->pdev->dev));
+ WARN_ON(pm_runtime_status_suspended(&hdmi->pdev->dev));

if (reg >= variant->num_registers) {
dev_warn(&hdmi->pdev->dev,
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c
index 992d6a240002..dba23ae2e65e 100644
--- a/drivers/gpu/drm/vc4/vc4_kms.c
+++ b/drivers/gpu/drm/vc4/vc4_kms.c
@@ -890,7 +890,9 @@ vc4_core_clock_atomic_check(struct drm_atomic_state *state)
continue;

num_outputs++;
- cob_rate += hvs_new_state->fifo_state[i].fifo_load;
+ cob_rate = max_t(unsigned long,
+ hvs_new_state->fifo_state[i].fifo_load,
+ cob_rate);
}

pixel_rate = load_state->hvs_load;
diff --git a/drivers/gpu/drm/vc4/vc4_plane.c b/drivers/gpu/drm/vc4/vc4_plane.c
index 920a9eefe426..a82a0b1190eb 100644
--- a/drivers/gpu/drm/vc4/vc4_plane.c
+++ b/drivers/gpu/drm/vc4/vc4_plane.c
@@ -310,16 +310,16 @@ static int vc4_plane_margins_adj(struct drm_plane_state *pstate)
adjhdisplay,
crtc_state->mode.hdisplay);
vc4_pstate->crtc_x += left;
- if (vc4_pstate->crtc_x > crtc_state->mode.hdisplay - left)
- vc4_pstate->crtc_x = crtc_state->mode.hdisplay - left;
+ if (vc4_pstate->crtc_x > crtc_state->mode.hdisplay - right)
+ vc4_pstate->crtc_x = crtc_state->mode.hdisplay - right;

adjvdisplay = crtc_state->mode.vdisplay - (top + bottom);
vc4_pstate->crtc_y = DIV_ROUND_CLOSEST(vc4_pstate->crtc_y *
adjvdisplay,
crtc_state->mode.vdisplay);
vc4_pstate->crtc_y += top;
- if (vc4_pstate->crtc_y > crtc_state->mode.vdisplay - top)
- vc4_pstate->crtc_y = crtc_state->mode.vdisplay - top;
+ if (vc4_pstate->crtc_y > crtc_state->mode.vdisplay - bottom)
+ vc4_pstate->crtc_y = crtc_state->mode.vdisplay - bottom;

vc4_pstate->crtc_w = DIV_ROUND_CLOSEST(vc4_pstate->crtc_w *
adjhdisplay,
@@ -339,7 +339,6 @@ static int vc4_plane_setup_clipping_and_scaling(struct drm_plane_state *state)
struct vc4_plane_state *vc4_state = to_vc4_plane_state(state);
struct drm_framebuffer *fb = state->fb;
struct drm_gem_cma_object *bo = drm_fb_cma_get_gem_obj(fb, 0);
- u32 subpixel_src_mask = (1 << 16) - 1;
int num_planes = fb->format->num_planes;
struct drm_crtc_state *crtc_state;
u32 h_subsample = fb->format->hsub;
@@ -361,18 +360,15 @@ static int vc4_plane_setup_clipping_and_scaling(struct drm_plane_state *state)
for (i = 0; i < num_planes; i++)
vc4_state->offsets[i] = bo->paddr + fb->offsets[i];

- /* We don't support subpixel source positioning for scaling. */
- if ((state->src.x1 & subpixel_src_mask) ||
- (state->src.x2 & subpixel_src_mask) ||
- (state->src.y1 & subpixel_src_mask) ||
- (state->src.y2 & subpixel_src_mask)) {
- return -EINVAL;
- }
-
- vc4_state->src_x = state->src.x1 >> 16;
- vc4_state->src_y = state->src.y1 >> 16;
- vc4_state->src_w[0] = (state->src.x2 - state->src.x1) >> 16;
- vc4_state->src_h[0] = (state->src.y2 - state->src.y1) >> 16;
+ /*
+ * We don't support subpixel source positioning for scaling,
+ * but fractional coordinates can be generated by clipping
+ * so just round for now
+ */
+ vc4_state->src_x = DIV_ROUND_CLOSEST(state->src.x1, 1 << 16);
+ vc4_state->src_y = DIV_ROUND_CLOSEST(state->src.y1, 1 << 16);
+ vc4_state->src_w[0] = DIV_ROUND_CLOSEST(state->src.x2, 1 << 16) - vc4_state->src_x;
+ vc4_state->src_h[0] = DIV_ROUND_CLOSEST(state->src.y2, 1 << 16) - vc4_state->src_y;

vc4_state->crtc_x = state->dst.x1;
vc4_state->crtc_y = state->dst.y1;
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index c708bab555c6..10f080060923 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -579,8 +579,10 @@ static int virtio_gpu_get_caps_ioctl(struct drm_device *dev,
spin_unlock(&vgdev->display_info_lock);

/* not in cache - need to talk to hw */
- virtio_gpu_cmd_get_capset(vgdev, found_valid, args->cap_set_ver,
- &cache_ent);
+ ret = virtio_gpu_cmd_get_capset(vgdev, found_valid, args->cap_set_ver,
+ &cache_ent);
+ if (ret)
+ return ret;
virtio_gpu_notify(vgdev);

copy_exit:
diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c b/drivers/gpu/drm/virtio/virtgpu_object.c
index f293e6ad52da..1cc8f3fc8e4b 100644
--- a/drivers/gpu/drm/virtio/virtgpu_object.c
+++ b/drivers/gpu/drm/virtio/virtgpu_object.c
@@ -168,9 +168,9 @@ static int virtio_gpu_object_shmem_init(struct virtio_gpu_device *vgdev,
* since virtio_gpu doesn't support dma-buf import from other devices.
*/
shmem->pages = drm_gem_shmem_get_sg_table(&bo->base);
- if (!shmem->pages) {
+ if (IS_ERR(shmem->pages)) {
drm_gem_shmem_unpin(&bo->base);
- return -EINVAL;
+ return PTR_ERR(shmem->pages);
}

if (use_dma_api) {
diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index c6a1036bf2ea..b47ac170108c 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -157,7 +157,7 @@ static void compose_plane(struct vkms_composer *primary_composer,
void *vaddr;
void (*pixel_blend)(const u8 *p_src, u8 *p_dst);

- if (WARN_ON(iosys_map_is_null(&primary_composer->map[0])))
+ if (WARN_ON(iosys_map_is_null(&plane_composer->map[0])))
return;

vaddr = plane_composer->map[0].vaddr;
diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
index 444acd9e2cd6..8be33cf3b6c2 100644
--- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
+++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
@@ -155,6 +155,8 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata)
dev = &privdata->pdev->dev;

cl_data->num_hid_devices = amd_mp2_get_sensor_num(privdata, &cl_data->sensor_idx[0]);
+ if (cl_data->num_hid_devices == 0)
+ return -ENODEV;

INIT_DELAYED_WORK(&cl_data->work, amd_sfh_work);
INIT_DELAYED_WORK(&cl_data->work_buffer, amd_sfh_work_buffer);
diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_hid.c b/drivers/hid/amd-sfh-hid/amd_sfh_hid.c
index e2a9679e32be..54fe224050f9 100644
--- a/drivers/hid/amd-sfh-hid/amd_sfh_hid.c
+++ b/drivers/hid/amd-sfh-hid/amd_sfh_hid.c
@@ -100,11 +100,15 @@ static int amdtp_wait_for_response(struct hid_device *hid)

void amdtp_hid_wakeup(struct hid_device *hid)
{
- struct amdtp_hid_data *hid_data = hid->driver_data;
- struct amdtp_cl_data *cli_data = hid_data->cli_data;
+ struct amdtp_hid_data *hid_data;
+ struct amdtp_cl_data *cli_data;

- cli_data->request_done[cli_data->cur_hid_dev] = true;
- wake_up_interruptible(&hid_data->hid_wait);
+ if (hid) {
+ hid_data = hid->driver_data;
+ cli_data = hid_data->cli_data;
+ cli_data->request_done[cli_data->cur_hid_dev] = true;
+ wake_up_interruptible(&hid_data->hid_wait);
+ }
}

static struct hid_ll_driver amdtp_hid_ll_driver = {
diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
index e18a4efd8839..390298a6fc85 100644
--- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
+++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
@@ -327,7 +327,8 @@ static int amd_mp2_pci_probe(struct pci_dev *pdev, const struct pci_device_id *i
rc = amd_sfh_hid_client_init(privdata);
if (rc) {
amd_sfh_clear_intr(privdata);
- dev_err(&pdev->dev, "amd_sfh_hid_client_init failed\n");
+ if (rc != -EOPNOTSUPP)
+ dev_err(&pdev->dev, "amd_sfh_hid_client_init failed\n");
return rc;
}

diff --git a/drivers/hid/hid-alps.c b/drivers/hid/hid-alps.c
index 2b986d0dbde4..db146d0f7937 100644
--- a/drivers/hid/hid-alps.c
+++ b/drivers/hid/hid-alps.c
@@ -830,6 +830,8 @@ static const struct hid_device_id alps_id[] = {
USB_VENDOR_ID_ALPS_JP, HID_DEVICE_ID_ALPS_U1_DUAL) },
{ HID_DEVICE(HID_BUS_ANY, HID_GROUP_ANY,
USB_VENDOR_ID_ALPS_JP, HID_DEVICE_ID_ALPS_U1) },
+ { HID_DEVICE(HID_BUS_ANY, HID_GROUP_ANY,
+ USB_VENDOR_ID_ALPS_JP, HID_DEVICE_ID_ALPS_U1_UNICORN_LEGACY) },
{ HID_DEVICE(HID_BUS_ANY, HID_GROUP_ANY,
USB_VENDOR_ID_ALPS_JP, HID_DEVICE_ID_ALPS_T4_BTNLESS) },
{ }
diff --git a/drivers/hid/hid-cp2112.c b/drivers/hid/hid-cp2112.c
index ece147d1a278..1e16b0fa310d 100644
--- a/drivers/hid/hid-cp2112.c
+++ b/drivers/hid/hid-cp2112.c
@@ -790,6 +790,11 @@ static int cp2112_xfer(struct i2c_adapter *adap, u16 addr,
data->word = le16_to_cpup((__le16 *)buf);
break;
case I2C_SMBUS_I2C_BLOCK_DATA:
+ if (read_length > I2C_SMBUS_BLOCK_MAX) {
+ ret = -EINVAL;
+ goto power_normal;
+ }
+
memcpy(data->block + 1, buf, read_length);
break;
case I2C_SMBUS_BLOCK_DATA:
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h
index c297c63f3ec5..e233726d9a74 100644
--- a/drivers/hid/hid-ids.h
+++ b/drivers/hid/hid-ids.h
@@ -413,6 +413,7 @@
#define USB_DEVICE_ID_ASUS_UX550VE_TOUCHSCREEN 0x2544
#define USB_DEVICE_ID_ASUS_UX550_TOUCHSCREEN 0x2706
#define I2C_DEVICE_ID_SURFACE_GO_TOUCHSCREEN 0x261A
+#define I2C_DEVICE_ID_SURFACE_GO2_TOUCHSCREEN 0x2A1C

#define USB_VENDOR_ID_ELECOM 0x056e
#define USB_DEVICE_ID_ELECOM_BM084 0x0061
diff --git a/drivers/hid/hid-input.c b/drivers/hid/hid-input.c
index c6b27aab9041..48c1c02c69f4 100644
--- a/drivers/hid/hid-input.c
+++ b/drivers/hid/hid-input.c
@@ -381,6 +381,8 @@ static const struct hid_device_id hid_battery_quirks[] = {
HID_BATTERY_QUIRK_IGNORE },
{ HID_I2C_DEVICE(USB_VENDOR_ID_ELAN, I2C_DEVICE_ID_SURFACE_GO_TOUCHSCREEN),
HID_BATTERY_QUIRK_IGNORE },
+ { HID_I2C_DEVICE(USB_VENDOR_ID_ELAN, I2C_DEVICE_ID_SURFACE_GO2_TOUCHSCREEN),
+ HID_BATTERY_QUIRK_IGNORE },
{}
};

diff --git a/drivers/hid/hid-mcp2221.c b/drivers/hid/hid-mcp2221.c
index 4211b9839209..de52e9f7bb8c 100644
--- a/drivers/hid/hid-mcp2221.c
+++ b/drivers/hid/hid-mcp2221.c
@@ -385,6 +385,9 @@ static int mcp_smbus_write(struct mcp2221 *mcp, u16 addr,
data_len = 7;
break;
default:
+ if (len > I2C_SMBUS_BLOCK_MAX)
+ return -EINVAL;
+
memcpy(&mcp->txbuf[5], buf, len);
data_len = len + 5;
}
diff --git a/drivers/hid/hid-nintendo.c b/drivers/hid/hid-nintendo.c
index 2204de889739..4b1173957c17 100644
--- a/drivers/hid/hid-nintendo.c
+++ b/drivers/hid/hid-nintendo.c
@@ -1586,6 +1586,7 @@ static const unsigned int joycon_button_inputs_r[] = {
/* We report joy-con d-pad inputs as buttons and pro controller as a hat. */
static const unsigned int joycon_dpad_inputs_jc[] = {
BTN_DPAD_UP, BTN_DPAD_DOWN, BTN_DPAD_LEFT, BTN_DPAD_RIGHT,
+ 0 /* 0 signals end of array */
};

static int joycon_input_create(struct joycon_ctlr *ctlr)
diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c
index 066c567dbaa2..9a81e63c330e 100644
--- a/drivers/hid/wacom_sys.c
+++ b/drivers/hid/wacom_sys.c
@@ -2121,7 +2121,7 @@ static int wacom_register_inputs(struct wacom *wacom)

error = wacom_setup_pad_input_capabilities(pad_input_dev, wacom_wac);
if (error) {
- /* no pad in use on this interface */
+ /* no pad events using this interface */
input_free_device(pad_input_dev);
wacom_wac->pad_input = NULL;
pad_input_dev = NULL;
diff --git a/drivers/hid/wacom_wac.c b/drivers/hid/wacom_wac.c
index a7176fc0635d..c454231afec8 100644
--- a/drivers/hid/wacom_wac.c
+++ b/drivers/hid/wacom_wac.c
@@ -638,9 +638,26 @@ static int wacom_intuos_id_mangle(int tool_id)
return (tool_id & ~0xFFF) << 4 | (tool_id & 0xFFF);
}

+static bool wacom_is_art_pen(int tool_id)
+{
+ bool is_art_pen = false;
+
+ switch (tool_id) {
+ case 0x885: /* Intuos3 Marker Pen */
+ case 0x804: /* Intuos4/5 13HD/24HD Marker Pen */
+ case 0x10804: /* Intuos4/5 13HD/24HD Art Pen */
+ is_art_pen = true;
+ break;
+ }
+ return is_art_pen;
+}
+
static int wacom_intuos_get_tool_type(int tool_id)
{
- int tool_type;
+ int tool_type = BTN_TOOL_PEN;
+
+ if (wacom_is_art_pen(tool_id))
+ return tool_type;

switch (tool_id) {
case 0x812: /* Inking pen */
@@ -655,12 +672,9 @@ static int wacom_intuos_get_tool_type(int tool_id)
case 0x852:
case 0x823: /* Intuos3 Grip Pen */
case 0x813: /* Intuos3 Classic Pen */
- case 0x885: /* Intuos3 Marker Pen */
case 0x802: /* Intuos4/5 13HD/24HD General Pen */
- case 0x804: /* Intuos4/5 13HD/24HD Marker Pen */
case 0x8e2: /* IntuosHT2 pen */
case 0x022:
- case 0x10804: /* Intuos4/5 13HD/24HD Art Pen */
case 0x10842: /* MobileStudio Pro Pro Pen slim */
case 0x14802: /* Intuos4/5 13HD/24HD Classic Pen */
case 0x16802: /* Cintiq 13HD Pro Pen */
@@ -718,10 +732,6 @@ static int wacom_intuos_get_tool_type(int tool_id)
case 0x10902: /* Intuos4/5 13HD/24HD Airbrush */
tool_type = BTN_TOOL_AIRBRUSH;
break;
-
- default: /* Unknown tool */
- tool_type = BTN_TOOL_PEN;
- break;
}
return tool_type;
}
@@ -2007,7 +2017,6 @@ static void wacom_wac_pad_usage_mapping(struct hid_device *hdev,
wacom_wac->has_mute_touch_switch = true;
usage->type = EV_SW;
usage->code = SW_MUTE_DEVICE;
- features->device_type |= WACOM_DEVICETYPE_PAD;
break;
case WACOM_HID_WD_TOUCHSTRIP:
wacom_map_usage(input, usage, field, EV_ABS, ABS_RX, 0);
@@ -2087,6 +2096,30 @@ static void wacom_wac_pad_event(struct hid_device *hdev, struct hid_field *field
wacom_wac->hid_data.inrange_state |= value;
}

+ /* Process touch switch state first since it is reported through touch interface,
+ * which is indepentent of pad interface. In the case when there are no other pad
+ * events, the pad interface will not even be created.
+ */
+ if ((equivalent_usage == WACOM_HID_WD_MUTE_DEVICE) ||
+ (equivalent_usage == WACOM_HID_WD_TOUCHONOFF)) {
+ if (wacom_wac->shared->touch_input) {
+ bool *is_touch_on = &wacom_wac->shared->is_touch_on;
+
+ if (equivalent_usage == WACOM_HID_WD_MUTE_DEVICE && value)
+ *is_touch_on = !(*is_touch_on);
+ else if (equivalent_usage == WACOM_HID_WD_TOUCHONOFF)
+ *is_touch_on = value;
+
+ input_report_switch(wacom_wac->shared->touch_input,
+ SW_MUTE_DEVICE, !(*is_touch_on));
+ input_sync(wacom_wac->shared->touch_input);
+ }
+ return;
+ }
+
+ if (!input)
+ return;
+
switch (equivalent_usage) {
case WACOM_HID_WD_TOUCHRING:
/*
@@ -2122,22 +2155,6 @@ static void wacom_wac_pad_event(struct hid_device *hdev, struct hid_field *field
input_event(input, usage->type, usage->code, 0);
break;

- case WACOM_HID_WD_MUTE_DEVICE:
- case WACOM_HID_WD_TOUCHONOFF:
- if (wacom_wac->shared->touch_input) {
- bool *is_touch_on = &wacom_wac->shared->is_touch_on;
-
- if (equivalent_usage == WACOM_HID_WD_MUTE_DEVICE && value)
- *is_touch_on = !(*is_touch_on);
- else if (equivalent_usage == WACOM_HID_WD_TOUCHONOFF)
- *is_touch_on = value;
-
- input_report_switch(wacom_wac->shared->touch_input,
- SW_MUTE_DEVICE, !(*is_touch_on));
- input_sync(wacom_wac->shared->touch_input);
- }
- break;
-
case WACOM_HID_WD_MODE_CHANGE:
if (wacom_wac->is_direct_mode != value) {
wacom_wac->is_direct_mode = value;
@@ -2323,6 +2340,9 @@ static void wacom_wac_pen_event(struct hid_device *hdev, struct hid_field *field
}
return;
case HID_DG_TWIST:
+ /* don't modify the value if the pen doesn't support the feature */
+ if (!wacom_is_art_pen(wacom_wac->id[0])) return;
+
/*
* Userspace expects pen twist to have its zero point when
* the buttons/finger is on the tablet's left. HID values
@@ -2795,7 +2815,7 @@ void wacom_wac_event(struct hid_device *hdev, struct hid_field *field,
/* usage tests must precede field tests */
if (WACOM_BATTERY_USAGE(usage))
wacom_wac_battery_event(hdev, field, usage, value);
- else if (WACOM_PAD_FIELD(field) && wacom->wacom_wac.pad_input)
+ else if (WACOM_PAD_FIELD(field))
wacom_wac_pad_event(hdev, field, usage, value);
else if (WACOM_PEN_FIELD(field) && wacom->wacom_wac.pen_input)
wacom_wac_pen_event(hdev, field, usage, value);
diff --git a/drivers/hwmon/dell-smm-hwmon.c b/drivers/hwmon/dell-smm-hwmon.c
index 84cb1ede7bc0..a0df9a0abeb9 100644
--- a/drivers/hwmon/dell-smm-hwmon.c
+++ b/drivers/hwmon/dell-smm-hwmon.c
@@ -1262,6 +1262,14 @@ static const struct dmi_system_id i8k_whitelist_fan_control[] __initconst = {
},
.driver_data = (void *)&i8k_fan_control_data[I8K_FAN_34A3_35A3],
},
+ {
+ .ident = "Dell XPS 13 7390",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
+ DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "XPS 13 7390"),
+ },
+ .driver_data = (void *)&i8k_fan_control_data[I8K_FAN_34A3_35A3],
+ },
{ }
};

diff --git a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
index 1eb37106a220..5bac2b0fc7bb 100644
--- a/drivers/hwmon/drivetemp.c
+++ b/drivers/hwmon/drivetemp.c
@@ -621,3 +621,4 @@ module_exit(drivetemp_exit);
MODULE_AUTHOR("Guenter Roeck <linus@xxxxxxxxxxxx>");
MODULE_DESCRIPTION("Hard drive temperature monitor");
MODULE_LICENSE("GPL");
+MODULE_ALIAS("platform:drivetemp");
diff --git a/drivers/hwmon/sch56xx-common.c b/drivers/hwmon/sch56xx-common.c
index 3ece53adabd6..de3a0886c2f7 100644
--- a/drivers/hwmon/sch56xx-common.c
+++ b/drivers/hwmon/sch56xx-common.c
@@ -523,6 +523,28 @@ static int __init sch56xx_device_add(int address, const char *name)
return PTR_ERR_OR_ZERO(sch56xx_pdev);
}

+static const struct dmi_system_id sch56xx_dmi_override_table[] __initconst = {
+ {
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "CELSIUS W380"),
+ },
+ },
+ {
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "ESPRIMO P710"),
+ },
+ },
+ {
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "ESPRIMO E9900"),
+ },
+ },
+ { }
+};
+
/* For autoloading only */
static const struct dmi_system_id sch56xx_dmi_table[] __initconst = {
{
@@ -543,16 +565,18 @@ static int __init sch56xx_init(void)
if (!dmi_check_system(sch56xx_dmi_table))
return -ENODEV;

- /*
- * Some machines like the Esprimo P720 and Esprimo C700 have
- * onboard devices named " Antiope"/" Theseus" instead of
- * "Antiope"/"Theseus", so we need to check for both.
- */
- if (!dmi_find_device(DMI_DEV_TYPE_OTHER, "Antiope", NULL) &&
- !dmi_find_device(DMI_DEV_TYPE_OTHER, " Antiope", NULL) &&
- !dmi_find_device(DMI_DEV_TYPE_OTHER, "Theseus", NULL) &&
- !dmi_find_device(DMI_DEV_TYPE_OTHER, " Theseus", NULL))
- return -ENODEV;
+ if (!dmi_check_system(sch56xx_dmi_override_table)) {
+ /*
+ * Some machines like the Esprimo P720 and Esprimo C700 have
+ * onboard devices named " Antiope"/" Theseus" instead of
+ * "Antiope"/"Theseus", so we need to check for both.
+ */
+ if (!dmi_find_device(DMI_DEV_TYPE_OTHER, "Antiope", NULL) &&
+ !dmi_find_device(DMI_DEV_TYPE_OTHER, " Antiope", NULL) &&
+ !dmi_find_device(DMI_DEV_TYPE_OTHER, "Theseus", NULL) &&
+ !dmi_find_device(DMI_DEV_TYPE_OTHER, " Theseus", NULL))
+ return -ENODEV;
+ }
}

/*
diff --git a/drivers/hwmon/sht15.c b/drivers/hwmon/sht15.c
index 7f4a63959730..ae4d14257a11 100644
--- a/drivers/hwmon/sht15.c
+++ b/drivers/hwmon/sht15.c
@@ -1020,25 +1020,20 @@ static int sht15_probe(struct platform_device *pdev)
static int sht15_remove(struct platform_device *pdev)
{
struct sht15_data *data = platform_get_drvdata(pdev);
+ int ret;

- /*
- * Make sure any reads from the device are done and
- * prevent new ones beginning
- */
- mutex_lock(&data->read_lock);
- if (sht15_soft_reset(data)) {
- mutex_unlock(&data->read_lock);
- return -EFAULT;
- }
hwmon_device_unregister(data->hwmon_dev);
sysfs_remove_group(&pdev->dev.kobj, &sht15_attr_group);
+
+ ret = sht15_soft_reset(data);
+ if (ret)
+ dev_err(&pdev->dev, "Failed to reset device (%pe)\n", ERR_PTR(ret));
+
if (!IS_ERR(data->reg)) {
regulator_unregister_notifier(data->reg, &data->nb);
regulator_disable(data->reg);
}

- mutex_unlock(&data->read_lock);
-
return 0;
}

diff --git a/drivers/hwtracing/coresight/coresight-config.h b/drivers/hwtracing/coresight/coresight-config.h
index 2e1670523461..6ba013975741 100644
--- a/drivers/hwtracing/coresight/coresight-config.h
+++ b/drivers/hwtracing/coresight/coresight-config.h
@@ -134,6 +134,7 @@ struct cscfg_feature_desc {
* @active_cnt: ref count for activate on this configuration.
* @load_owner: handle to load owner for dynamic load and unload of configs.
* @fs_group: reference to configfs group for dynamic unload.
+ * @available: config can be activated - multi-stage load sets true on completion.
*/
struct cscfg_config_desc {
const char *name;
@@ -148,6 +149,7 @@ struct cscfg_config_desc {
atomic_t active_cnt;
void *load_owner;
struct config_group *fs_group;
+ bool available;
};

/**
diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c
index ee6ce92ab4c3..1edfec1e9d18 100644
--- a/drivers/hwtracing/coresight/coresight-core.c
+++ b/drivers/hwtracing/coresight/coresight-core.c
@@ -1424,6 +1424,7 @@ static int coresight_remove_match(struct device *dev, void *data)
* platform data.
*/
fwnode_handle_put(conn->child_fwnode);
+ conn->child_fwnode = NULL;
/* No need to continue */
break;
}
diff --git a/drivers/hwtracing/coresight/coresight-syscfg.c b/drivers/hwtracing/coresight/coresight-syscfg.c
index 11850fd8c3b5..11138a9762b0 100644
--- a/drivers/hwtracing/coresight/coresight-syscfg.c
+++ b/drivers/hwtracing/coresight/coresight-syscfg.c
@@ -414,6 +414,27 @@ static void cscfg_remove_owned_csdev_features(struct coresight_device *csdev, vo
}
}

+/*
+ * Unregister all configuration and features from configfs owned by load_owner.
+ * Although this is called without the list mutex being held, it is in the
+ * context of an unload operation which are strictly serialised,
+ * so the lists cannot change during this call.
+ */
+static void cscfg_fs_unregister_cfgs_feats(void *load_owner)
+{
+ struct cscfg_config_desc *config_desc;
+ struct cscfg_feature_desc *feat_desc;
+
+ list_for_each_entry(config_desc, &cscfg_mgr->config_desc_list, item) {
+ if (config_desc->load_owner == load_owner)
+ cscfg_configfs_del_config(config_desc);
+ }
+ list_for_each_entry(feat_desc, &cscfg_mgr->feat_desc_list, item) {
+ if (feat_desc->load_owner == load_owner)
+ cscfg_configfs_del_feature(feat_desc);
+ }
+}
+
/*
* removal is relatively easy - just remove from all lists, anything that
* matches the owner. Memory for the descriptors will be managed by the owner,
@@ -426,6 +447,8 @@ static void cscfg_unload_owned_cfgs_feats(void *load_owner)
struct cscfg_feature_desc *feat_desc, *feat_tmp;
struct cscfg_registered_csdev *csdev_item;

+ lockdep_assert_held(&cscfg_mutex);
+
/* remove from each csdev instance feature and config lists */
list_for_each_entry(csdev_item, &cscfg_mgr->csdev_desc_list, item) {
/*
@@ -439,7 +462,6 @@ static void cscfg_unload_owned_cfgs_feats(void *load_owner)
/* remove from the config descriptor lists */
list_for_each_entry_safe(config_desc, cfg_tmp, &cscfg_mgr->config_desc_list, item) {
if (config_desc->load_owner == load_owner) {
- cscfg_configfs_del_config(config_desc);
etm_perf_del_symlink_cscfg(config_desc);
list_del(&config_desc->item);
}
@@ -448,12 +470,90 @@ static void cscfg_unload_owned_cfgs_feats(void *load_owner)
/* remove from the feature descriptor lists */
list_for_each_entry_safe(feat_desc, feat_tmp, &cscfg_mgr->feat_desc_list, item) {
if (feat_desc->load_owner == load_owner) {
- cscfg_configfs_del_feature(feat_desc);
list_del(&feat_desc->item);
}
}
}

+/*
+ * load the features and configs to the lists - called with list mutex held
+ */
+static int cscfg_load_owned_cfgs_feats(struct cscfg_config_desc **config_descs,
+ struct cscfg_feature_desc **feat_descs,
+ struct cscfg_load_owner_info *owner_info)
+{
+ int i, err;
+
+ lockdep_assert_held(&cscfg_mutex);
+
+ /* load features first */
+ if (feat_descs) {
+ for (i = 0; feat_descs[i]; i++) {
+ err = cscfg_load_feat(feat_descs[i]);
+ if (err) {
+ pr_err("coresight-syscfg: Failed to load feature %s\n",
+ feat_descs[i]->name);
+ return err;
+ }
+ feat_descs[i]->load_owner = owner_info;
+ }
+ }
+
+ /* next any configurations to check feature dependencies */
+ if (config_descs) {
+ for (i = 0; config_descs[i]; i++) {
+ err = cscfg_load_config(config_descs[i]);
+ if (err) {
+ pr_err("coresight-syscfg: Failed to load configuration %s\n",
+ config_descs[i]->name);
+ return err;
+ }
+ config_descs[i]->load_owner = owner_info;
+ config_descs[i]->available = false;
+ }
+ }
+ return 0;
+}
+
+/* set configurations as available to activate at the end of the load process */
+static void cscfg_set_configs_available(struct cscfg_config_desc **config_descs)
+{
+ int i;
+
+ lockdep_assert_held(&cscfg_mutex);
+
+ if (config_descs) {
+ for (i = 0; config_descs[i]; i++)
+ config_descs[i]->available = true;
+ }
+}
+
+/*
+ * Create and register each of the configurations and features with configfs.
+ * Called without mutex being held.
+ */
+static int cscfg_fs_register_cfgs_feats(struct cscfg_config_desc **config_descs,
+ struct cscfg_feature_desc **feat_descs)
+{
+ int i, err;
+
+ if (feat_descs) {
+ for (i = 0; feat_descs[i]; i++) {
+ err = cscfg_configfs_add_feature(feat_descs[i]);
+ if (err)
+ return err;
+ }
+ }
+ if (config_descs) {
+ for (i = 0; config_descs[i]; i++) {
+ err = cscfg_configfs_add_config(config_descs[i]);
+ if (err)
+ return err;
+ }
+ }
+ return 0;
+}
+
/**
* cscfg_load_config_sets - API function to load feature and config sets.
*
@@ -476,57 +576,63 @@ int cscfg_load_config_sets(struct cscfg_config_desc **config_descs,
struct cscfg_feature_desc **feat_descs,
struct cscfg_load_owner_info *owner_info)
{
- int err = 0, i = 0;
+ int err = 0;

mutex_lock(&cscfg_mutex);
-
- /* load features first */
- if (feat_descs) {
- while (feat_descs[i]) {
- err = cscfg_load_feat(feat_descs[i]);
- if (!err)
- err = cscfg_configfs_add_feature(feat_descs[i]);
- if (err) {
- pr_err("coresight-syscfg: Failed to load feature %s\n",
- feat_descs[i]->name);
- cscfg_unload_owned_cfgs_feats(owner_info);
- goto exit_unlock;
- }
- feat_descs[i]->load_owner = owner_info;
- i++;
- }
+ if (cscfg_mgr->load_state != CSCFG_NONE) {
+ mutex_unlock(&cscfg_mutex);
+ return -EBUSY;
}
+ cscfg_mgr->load_state = CSCFG_LOAD;

- /* next any configurations to check feature dependencies */
- i = 0;
- if (config_descs) {
- while (config_descs[i]) {
- err = cscfg_load_config(config_descs[i]);
- if (!err)
- err = cscfg_configfs_add_config(config_descs[i]);
- if (err) {
- pr_err("coresight-syscfg: Failed to load configuration %s\n",
- config_descs[i]->name);
- cscfg_unload_owned_cfgs_feats(owner_info);
- goto exit_unlock;
- }
- config_descs[i]->load_owner = owner_info;
- i++;
- }
- }
+ /* first load and add to the lists */
+ err = cscfg_load_owned_cfgs_feats(config_descs, feat_descs, owner_info);
+ if (err)
+ goto err_clean_load;

/* add the load owner to the load order list */
list_add_tail(&owner_info->item, &cscfg_mgr->load_order_list);
if (!list_is_singular(&cscfg_mgr->load_order_list)) {
/* lock previous item in load order list */
err = cscfg_owner_get(list_prev_entry(owner_info, item));
- if (err) {
- cscfg_unload_owned_cfgs_feats(owner_info);
- list_del(&owner_info->item);
- }
+ if (err)
+ goto err_clean_owner_list;
}

+ /*
+ * make visible to configfs - configfs manipulation must occur outside
+ * the list mutex lock to avoid circular lockdep issues with configfs
+ * built in mutexes and semaphores. This is safe as it is not possible
+ * to start a new load/unload operation till the current one is done.
+ */
+ mutex_unlock(&cscfg_mutex);
+
+ /* create the configfs elements */
+ err = cscfg_fs_register_cfgs_feats(config_descs, feat_descs);
+ mutex_lock(&cscfg_mutex);
+
+ if (err)
+ goto err_clean_cfs;
+
+ /* mark any new configs as available for activation */
+ cscfg_set_configs_available(config_descs);
+ goto exit_unlock;
+
+err_clean_cfs:
+ /* cleanup after error registering with configfs */
+ cscfg_fs_unregister_cfgs_feats(owner_info);
+
+ if (!list_is_singular(&cscfg_mgr->load_order_list))
+ cscfg_owner_put(list_prev_entry(owner_info, item));
+
+err_clean_owner_list:
+ list_del(&owner_info->item);
+
+err_clean_load:
+ cscfg_unload_owned_cfgs_feats(owner_info);
+
exit_unlock:
+ cscfg_mgr->load_state = CSCFG_NONE;
mutex_unlock(&cscfg_mutex);
return err;
}
@@ -543,6 +649,9 @@ EXPORT_SYMBOL_GPL(cscfg_load_config_sets);
* 1) no configurations are active.
* 2) the set being unloaded was the last to be loaded to maintain dependencies.
*
+ * Once the unload operation commences, we disallow any configuration being
+ * made active until it is complete.
+ *
* @owner_info: Information on owner for set being unloaded.
*/
int cscfg_unload_config_sets(struct cscfg_load_owner_info *owner_info)
@@ -551,6 +660,13 @@ int cscfg_unload_config_sets(struct cscfg_load_owner_info *owner_info)
struct cscfg_load_owner_info *load_list_item = NULL;

mutex_lock(&cscfg_mutex);
+ if (cscfg_mgr->load_state != CSCFG_NONE) {
+ mutex_unlock(&cscfg_mutex);
+ return -EBUSY;
+ }
+
+ /* unload op in progress also prevents activation of any config */
+ cscfg_mgr->load_state = CSCFG_UNLOAD;

/* cannot unload if anything is active */
if (atomic_read(&cscfg_mgr->sys_active_cnt)) {
@@ -571,7 +687,12 @@ int cscfg_unload_config_sets(struct cscfg_load_owner_info *owner_info)
goto exit_unlock;
}

- /* unload all belonging to load_owner */
+ /* remove from configfs - again outside the scope of the list mutex */
+ mutex_unlock(&cscfg_mutex);
+ cscfg_fs_unregister_cfgs_feats(owner_info);
+ mutex_lock(&cscfg_mutex);
+
+ /* unload everything from lists belonging to load_owner */
cscfg_unload_owned_cfgs_feats(owner_info);

/* remove from load order list */
@@ -582,6 +703,7 @@ int cscfg_unload_config_sets(struct cscfg_load_owner_info *owner_info)
list_del(&owner_info->item);

exit_unlock:
+ cscfg_mgr->load_state = CSCFG_NONE;
mutex_unlock(&cscfg_mutex);
return err;
}
@@ -759,8 +881,15 @@ static int _cscfg_activate_config(unsigned long cfg_hash)
struct cscfg_config_desc *config_desc;
int err = -EINVAL;

+ if (cscfg_mgr->load_state == CSCFG_UNLOAD)
+ return -EBUSY;
+
list_for_each_entry(config_desc, &cscfg_mgr->config_desc_list, item) {
if ((unsigned long)config_desc->event_ea->var == cfg_hash) {
+ /* if we happen upon a partly loaded config, can't use it */
+ if (config_desc->available == false)
+ return -EBUSY;
+
/* must ensure that config cannot be unloaded in use */
err = cscfg_owner_get(config_desc->load_owner);
if (err)
@@ -1022,8 +1151,10 @@ struct device *cscfg_device(void)
/* Must have a release function or the kernel will complain on module unload */
static void cscfg_dev_release(struct device *dev)
{
+ mutex_lock(&cscfg_mutex);
kfree(cscfg_mgr);
cscfg_mgr = NULL;
+ mutex_unlock(&cscfg_mutex);
}

/* a device is needed to "own" some kernel elements such as sysfs entries. */
@@ -1042,6 +1173,14 @@ static int cscfg_create_device(void)
if (!cscfg_mgr)
goto create_dev_exit_unlock;

+ /* initialise the cscfg_mgr structure */
+ INIT_LIST_HEAD(&cscfg_mgr->csdev_desc_list);
+ INIT_LIST_HEAD(&cscfg_mgr->feat_desc_list);
+ INIT_LIST_HEAD(&cscfg_mgr->config_desc_list);
+ INIT_LIST_HEAD(&cscfg_mgr->load_order_list);
+ atomic_set(&cscfg_mgr->sys_active_cnt, 0);
+ cscfg_mgr->load_state = CSCFG_NONE;
+
/* setup the device */
dev = cscfg_device();
dev->release = cscfg_dev_release;
@@ -1056,17 +1195,73 @@ static int cscfg_create_device(void)
return err;
}

-static void cscfg_clear_device(void)
+/*
+ * Loading and unloading is generally on user discretion.
+ * If exiting due to coresight module unload, we need to unload any configurations that remain,
+ * before we unregister the configfs intrastructure.
+ *
+ * Do this by walking the load_owner list and taking appropriate action, depending on the load
+ * owner type.
+ */
+static void cscfg_unload_cfgs_on_exit(void)
{
- struct cscfg_config_desc *cfg_desc;
+ struct cscfg_load_owner_info *owner_info = NULL;

+ /*
+ * grab the mutex - even though we are exiting, some configfs files
+ * may still be live till we dump them, so ensure list data is
+ * protected from a race condition.
+ */
mutex_lock(&cscfg_mutex);
- list_for_each_entry(cfg_desc, &cscfg_mgr->config_desc_list, item) {
- etm_perf_del_symlink_cscfg(cfg_desc);
+ while (!list_empty(&cscfg_mgr->load_order_list)) {
+
+ /* remove in reverse order of loading */
+ owner_info = list_last_entry(&cscfg_mgr->load_order_list,
+ struct cscfg_load_owner_info, item);
+
+ /* action according to type */
+ switch (owner_info->type) {
+ case CSCFG_OWNER_PRELOAD:
+ /*
+ * preloaded descriptors are statically allocated in
+ * this module - just need to unload dynamic items from
+ * csdev lists, and remove from configfs directories.
+ */
+ pr_info("cscfg: unloading preloaded configurations\n");
+ break;
+
+ case CSCFG_OWNER_MODULE:
+ /*
+ * this is an error - the loadable module must have been unloaded prior
+ * to the coresight module unload. Therefore that module has not
+ * correctly unloaded configs in its own exit code.
+ * Nothing to do other than emit an error string as the static descriptor
+ * references we need to unload will have disappeared with the module.
+ */
+ pr_err("cscfg: ERROR: prior module failed to unload configuration\n");
+ goto list_remove;
+ }
+
+ /* remove from configfs - outside the scope of the list mutex */
+ mutex_unlock(&cscfg_mutex);
+ cscfg_fs_unregister_cfgs_feats(owner_info);
+ mutex_lock(&cscfg_mutex);
+
+ /* Next unload from csdev lists. */
+ cscfg_unload_owned_cfgs_feats(owner_info);
+
+list_remove:
+ /* remove from load order list */
+ list_del(&owner_info->item);
}
+ mutex_unlock(&cscfg_mutex);
+}
+
+static void cscfg_clear_device(void)
+{
+ cscfg_unload_cfgs_on_exit();
cscfg_configfs_release(cscfg_mgr);
device_unregister(cscfg_device());
- mutex_unlock(&cscfg_mutex);
}

/* Initialise system config management API device */
@@ -1074,20 +1269,16 @@ int __init cscfg_init(void)
{
int err = 0;

+ /* create the device and init cscfg_mgr */
err = cscfg_create_device();
if (err)
return err;

+ /* initialise configfs subsystem */
err = cscfg_configfs_init(cscfg_mgr);
if (err)
goto exit_err;

- INIT_LIST_HEAD(&cscfg_mgr->csdev_desc_list);
- INIT_LIST_HEAD(&cscfg_mgr->feat_desc_list);
- INIT_LIST_HEAD(&cscfg_mgr->config_desc_list);
- INIT_LIST_HEAD(&cscfg_mgr->load_order_list);
- atomic_set(&cscfg_mgr->sys_active_cnt, 0);
-
/* preload built-in configurations */
err = cscfg_preload(THIS_MODULE);
if (err)
diff --git a/drivers/hwtracing/coresight/coresight-syscfg.h b/drivers/hwtracing/coresight/coresight-syscfg.h
index 9106ffab4833..66e2db890d82 100644
--- a/drivers/hwtracing/coresight/coresight-syscfg.h
+++ b/drivers/hwtracing/coresight/coresight-syscfg.h
@@ -12,6 +12,17 @@

#include "coresight-config.h"

+/*
+ * Load operation types.
+ * When loading or unloading, another load operation cannot be run.
+ * When unloading configurations cannot be activated.
+ */
+enum cscfg_load_ops {
+ CSCFG_NONE,
+ CSCFG_LOAD,
+ CSCFG_UNLOAD
+};
+
/**
* System configuration manager device.
*
@@ -30,6 +41,7 @@
* @cfgfs_subsys: configfs subsystem used to manage configurations.
* @sysfs_active_config:Active config hash used if CoreSight controlled from sysfs.
* @sysfs_active_preset:Active preset index used if CoreSight controlled from sysfs.
+ * @load_state: A multi-stage load/unload operation is in progress.
*/
struct cscfg_manager {
struct device dev;
@@ -41,6 +53,7 @@ struct cscfg_manager {
struct configfs_subsystem cfgfs_subsys;
u32 sysfs_active_config;
int sysfs_active_preset;
+ enum cscfg_load_ops load_state;
};

/* get reference to dev in cscfg_manager */
diff --git a/drivers/hwtracing/intel_th/msu-sink.c b/drivers/hwtracing/intel_th/msu-sink.c
index 2c7f5116be12..891b28ea25fe 100644
--- a/drivers/hwtracing/intel_th/msu-sink.c
+++ b/drivers/hwtracing/intel_th/msu-sink.c
@@ -71,6 +71,9 @@ static int msu_sink_alloc_window(void *data, struct sg_table **sgt, size_t size)
block = dma_alloc_coherent(priv->dev->parent->parent,
PAGE_SIZE, &sg_dma_address(sg_ptr),
GFP_KERNEL);
+ if (!block)
+ return -ENOMEM;
+
sg_set_buf(sg_ptr, block, PAGE_SIZE);
}

diff --git a/drivers/hwtracing/intel_th/msu.c b/drivers/hwtracing/intel_th/msu.c
index 70a07b4e9967..6c8215a47a60 100644
--- a/drivers/hwtracing/intel_th/msu.c
+++ b/drivers/hwtracing/intel_th/msu.c
@@ -1067,6 +1067,16 @@ msc_buffer_set_uc(struct msc *msc) {}
static inline void msc_buffer_set_wb(struct msc *msc) {}
#endif /* CONFIG_X86 */

+static struct page *msc_sg_page(struct scatterlist *sg)
+{
+ void *addr = sg_virt(sg);
+
+ if (is_vmalloc_addr(addr))
+ return vmalloc_to_page(addr);
+
+ return sg_page(sg);
+}
+
/**
* msc_buffer_win_alloc() - alloc a window for a multiblock mode
* @msc: MSC device
@@ -1137,7 +1147,7 @@ static void __msc_buffer_win_free(struct msc *msc, struct msc_window *win)
int i;

for_each_sg(win->sgt->sgl, sg, win->nr_segs, i) {
- struct page *page = sg_page(sg);
+ struct page *page = msc_sg_page(sg);

page->mapping = NULL;
dma_free_coherent(msc_dev(win->msc)->parent->parent, PAGE_SIZE,
@@ -1401,7 +1411,7 @@ static struct page *msc_buffer_get_page(struct msc *msc, unsigned long pgoff)
pgoff -= win->pgoff;

for_each_sg(win->sgt->sgl, sg, win->nr_segs, blk) {
- struct page *page = sg_page(sg);
+ struct page *page = msc_sg_page(sg);
size_t pgsz = PFN_DOWN(sg->length);

if (pgoff < pgsz)
diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c
index 7da4f298ed01..147d338c191e 100644
--- a/drivers/hwtracing/intel_th/pci.c
+++ b/drivers/hwtracing/intel_th/pci.c
@@ -100,8 +100,10 @@ static int intel_th_pci_probe(struct pci_dev *pdev,
}

th = intel_th_alloc(&pdev->dev, drvdata, resource, r);
- if (IS_ERR(th))
- return PTR_ERR(th);
+ if (IS_ERR(th)) {
+ err = PTR_ERR(th);
+ goto err_free_irq;
+ }

th->activate = intel_th_pci_activate;
th->deactivate = intel_th_pci_deactivate;
@@ -109,6 +111,10 @@ static int intel_th_pci_probe(struct pci_dev *pdev,
pci_set_master(pdev);

return 0;
+
+err_free_irq:
+ pci_free_irq_vectors(pdev);
+ return err;
}

static void intel_th_pci_remove(struct pci_dev *pdev)
@@ -278,6 +284,21 @@ static const struct pci_device_id intel_th_pci_id_table[] = {
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x54a6),
.driver_data = (kernel_ulong_t)&intel_th_2x,
},
+ {
+ /* Meteor Lake-P */
+ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x7e24),
+ .driver_data = (kernel_ulong_t)&intel_th_2x,
+ },
+ {
+ /* Raptor Lake-S */
+ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x7a26),
+ .driver_data = (kernel_ulong_t)&intel_th_2x,
+ },
+ {
+ /* Raptor Lake-S CPU */
+ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0xa76f),
+ .driver_data = (kernel_ulong_t)&intel_th_2x,
+ },
{
/* Alder Lake CPU */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x466f),
diff --git a/drivers/i2c/busses/i2c-cadence.c b/drivers/i2c/busses/i2c-cadence.c
index 630cfa4ddd46..33f5588a50c0 100644
--- a/drivers/i2c/busses/i2c-cadence.c
+++ b/drivers/i2c/busses/i2c-cadence.c
@@ -573,8 +573,13 @@ static void cdns_i2c_mrecv(struct cdns_i2c *id)
ctrl_reg = cdns_i2c_readreg(CDNS_I2C_CR_OFFSET);
ctrl_reg |= CDNS_I2C_CR_RW | CDNS_I2C_CR_CLR_FIFO;

+ /*
+ * Receive up to I2C_SMBUS_BLOCK_MAX data bytes, plus one message length
+ * byte, plus one checksum byte if PEC is enabled. p_msg->len will be 2 if
+ * PEC is enabled, otherwise 1.
+ */
if (id->p_msg->flags & I2C_M_RECV_LEN)
- id->recv_count = I2C_SMBUS_BLOCK_MAX + 1;
+ id->recv_count = I2C_SMBUS_BLOCK_MAX + id->p_msg->len;

id->curr_recv_count = id->recv_count;

@@ -789,6 +794,9 @@ static int cdns_i2c_process_msg(struct cdns_i2c *id, struct i2c_msg *msg,
if (id->err_status & CDNS_I2C_IXR_ARB_LOST)
return -EAGAIN;

+ if (msg->flags & I2C_M_RECV_LEN)
+ msg->len += min_t(unsigned int, msg->buf[0], I2C_SMBUS_BLOCK_MAX);
+
return 0;
}

diff --git a/drivers/i2c/busses/i2c-mxs.c b/drivers/i2c/busses/i2c-mxs.c
index 864a3f1bd4e1..68f67d084c63 100644
--- a/drivers/i2c/busses/i2c-mxs.c
+++ b/drivers/i2c/busses/i2c-mxs.c
@@ -799,7 +799,7 @@ static int mxs_i2c_probe(struct platform_device *pdev)
if (!i2c)
return -ENOMEM;

- i2c->dev_type = (enum mxs_i2c_devtype)of_device_get_match_data(&pdev->dev);
+ i2c->dev_type = (uintptr_t)of_device_get_match_data(&pdev->dev);

i2c->regs = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(i2c->regs))
diff --git a/drivers/i2c/busses/i2c-npcm7xx.c b/drivers/i2c/busses/i2c-npcm7xx.c
index 743ac20a405c..9f0462fb2504 100644
--- a/drivers/i2c/busses/i2c-npcm7xx.c
+++ b/drivers/i2c/busses/i2c-npcm7xx.c
@@ -123,11 +123,11 @@ enum i2c_addr {
* Since the addr regs are sprinkled all over the address space,
* use this array to get the address or each register.
*/
-#define I2C_NUM_OWN_ADDR 10
+#define I2C_NUM_OWN_ADDR 2
+#define I2C_NUM_OWN_ADDR_SUPPORTED 2
+
static const int npcm_i2caddr[I2C_NUM_OWN_ADDR] = {
- NPCM_I2CADDR1, NPCM_I2CADDR2, NPCM_I2CADDR3, NPCM_I2CADDR4,
- NPCM_I2CADDR5, NPCM_I2CADDR6, NPCM_I2CADDR7, NPCM_I2CADDR8,
- NPCM_I2CADDR9, NPCM_I2CADDR10,
+ NPCM_I2CADDR1, NPCM_I2CADDR2,
};
#endif

@@ -391,14 +391,10 @@ static void npcm_i2c_disable(struct npcm_i2c *bus)
#if IS_ENABLED(CONFIG_I2C_SLAVE)
int i;

- /* select bank 0 for I2C addresses */
- npcm_i2c_select_bank(bus, I2C_BANK_0);
-
/* Slave addresses removal */
- for (i = I2C_SLAVE_ADDR1; i < I2C_NUM_OWN_ADDR; i++)
+ for (i = I2C_SLAVE_ADDR1; i < I2C_NUM_OWN_ADDR_SUPPORTED; i++)
iowrite8(0, bus->reg + npcm_i2caddr[i]);

- npcm_i2c_select_bank(bus, I2C_BANK_1);
#endif
/* Disable module */
i2cctl2 = ioread8(bus->reg + NPCM_I2CCTL2);
@@ -603,8 +599,7 @@ static int npcm_i2c_slave_enable(struct npcm_i2c *bus, enum i2c_addr addr_type,
i2cctl1 &= ~NPCM_I2CCTL1_GCMEN;
iowrite8(i2cctl1, bus->reg + NPCM_I2CCTL1);
return 0;
- }
- if (addr_type == I2C_ARP_ADDR) {
+ } else if (addr_type == I2C_ARP_ADDR) {
i2cctl3 = ioread8(bus->reg + NPCM_I2CCTL3);
if (enable)
i2cctl3 |= I2CCTL3_ARPMEN;
@@ -613,16 +608,16 @@ static int npcm_i2c_slave_enable(struct npcm_i2c *bus, enum i2c_addr addr_type,
iowrite8(i2cctl3, bus->reg + NPCM_I2CCTL3);
return 0;
}
+ if (addr_type > I2C_SLAVE_ADDR2 && addr_type <= I2C_SLAVE_ADDR10)
+ dev_err(bus->dev, "try to enable more than 2 SA not supported\n");
+
if (addr_type >= I2C_ARP_ADDR)
return -EFAULT;
- /* select bank 0 for address 3 to 10 */
- if (addr_type > I2C_SLAVE_ADDR2)
- npcm_i2c_select_bank(bus, I2C_BANK_0);
+
/* Set and enable the address */
iowrite8(sa_reg, bus->reg + npcm_i2caddr[addr_type]);
npcm_i2c_slave_int_enable(bus, enable);
- if (addr_type > I2C_SLAVE_ADDR2)
- npcm_i2c_select_bank(bus, I2C_BANK_1);
+
return 0;
}
#endif
@@ -843,15 +838,11 @@ static u8 npcm_i2c_get_slave_addr(struct npcm_i2c *bus, enum i2c_addr addr_type)
{
u8 slave_add;

- /* select bank 0 for address 3 to 10 */
- if (addr_type > I2C_SLAVE_ADDR2)
- npcm_i2c_select_bank(bus, I2C_BANK_0);
+ if (addr_type > I2C_SLAVE_ADDR2 && addr_type <= I2C_SLAVE_ADDR10)
+ dev_err(bus->dev, "get slave: try to use more than 2 SA not supported\n");

slave_add = ioread8(bus->reg + npcm_i2caddr[(int)addr_type]);

- if (addr_type > I2C_SLAVE_ADDR2)
- npcm_i2c_select_bank(bus, I2C_BANK_1);
-
return slave_add;
}

@@ -861,12 +852,12 @@ static int npcm_i2c_remove_slave_addr(struct npcm_i2c *bus, u8 slave_add)

/* Set the enable bit */
slave_add |= 0x80;
- npcm_i2c_select_bank(bus, I2C_BANK_0);
- for (i = I2C_SLAVE_ADDR1; i < I2C_NUM_OWN_ADDR; i++) {
+
+ for (i = I2C_SLAVE_ADDR1; i < I2C_NUM_OWN_ADDR_SUPPORTED; i++) {
if (ioread8(bus->reg + npcm_i2caddr[i]) == slave_add)
iowrite8(0, bus->reg + npcm_i2caddr[i]);
}
- npcm_i2c_select_bank(bus, I2C_BANK_1);
+
return 0;
}

@@ -921,11 +912,15 @@ static int npcm_i2c_slave_get_wr_buf(struct npcm_i2c *bus)
for (i = 0; i < I2C_HW_FIFO_SIZE; i++) {
if (bus->slv_wr_size >= I2C_HW_FIFO_SIZE)
break;
- i2c_slave_event(bus->slave, I2C_SLAVE_READ_REQUESTED, &value);
+ if (bus->state == I2C_SLAVE_MATCH) {
+ i2c_slave_event(bus->slave, I2C_SLAVE_READ_REQUESTED, &value);
+ bus->state = I2C_OPER_STARTED;
+ } else {
+ i2c_slave_event(bus->slave, I2C_SLAVE_READ_PROCESSED, &value);
+ }
ind = (bus->slv_wr_ind + bus->slv_wr_size) % I2C_HW_FIFO_SIZE;
bus->slv_wr_buf[ind] = value;
bus->slv_wr_size++;
- i2c_slave_event(bus->slave, I2C_SLAVE_READ_PROCESSED, &value);
}
return I2C_HW_FIFO_SIZE - ret;
}
@@ -973,7 +968,6 @@ static void npcm_i2c_slave_xmit(struct npcm_i2c *bus, u16 nwrite,
if (nwrite == 0)
return;

- bus->state = I2C_OPER_STARTED;
bus->operation = I2C_WRITE_OPER;

/* get the next buffer */
diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c
index 5b920f0fc7dd..a4eef8de054c 100644
--- a/drivers/i2c/busses/i2c-qcom-geni.c
+++ b/drivers/i2c/busses/i2c-qcom-geni.c
@@ -688,7 +688,7 @@ static int geni_i2c_xfer(struct i2c_adapter *adap,
pm_runtime_put_autosuspend(gi2c->se.dev);
gi2c->cur = NULL;
gi2c->err = 0;
- return num;
+ return ret;
}

static u32 geni_i2c_func(struct i2c_adapter *adap)
diff --git a/drivers/i2c/i2c-core-base.c b/drivers/i2c/i2c-core-base.c
index d43db2c3876e..19a317fdcf5b 100644
--- a/drivers/i2c/i2c-core-base.c
+++ b/drivers/i2c/i2c-core-base.c
@@ -2467,8 +2467,9 @@ void i2c_put_adapter(struct i2c_adapter *adap)
if (!adap)
return;

- put_device(&adap->dev);
module_put(adap->owner);
+ /* Should be last, otherwise we risk use-after-free with 'adap' */
+ put_device(&adap->dev);
}
EXPORT_SYMBOL(i2c_put_adapter);

diff --git a/drivers/i2c/muxes/i2c-mux-gpmux.c b/drivers/i2c/muxes/i2c-mux-gpmux.c
index d3acd8d66c32..33024acaac02 100644
--- a/drivers/i2c/muxes/i2c-mux-gpmux.c
+++ b/drivers/i2c/muxes/i2c-mux-gpmux.c
@@ -134,6 +134,7 @@ static int i2c_mux_probe(struct platform_device *pdev)
return 0;

err_children:
+ of_node_put(child);
i2c_mux_del_adapters(muxc);
err_parent:
i2c_put_adapter(parent);
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 47b68c6071be..9515a3146dc9 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -812,15 +812,105 @@ static struct cpuidle_state icx_cstates[] __initdata = {
};

/*
- * On Sapphire Rapids Xeon C1 has to be disabled if C1E is enabled, and vice
- * versa. On SPR C1E is enabled only if "C1E promotion" bit is set in
- * MSR_IA32_POWER_CTL. But in this case there effectively no C1, because C1
- * requests are promoted to C1E. If the "C1E promotion" bit is cleared, then
- * both C1 and C1E requests end up with C1, so there is effectively no C1E.
+ * On AlderLake C1 has to be disabled if C1E is enabled, and vice versa.
+ * C1E is enabled only if "C1E promotion" bit is set in MSR_IA32_POWER_CTL.
+ * But in this case there is effectively no C1, because C1 requests are
+ * promoted to C1E. If the "C1E promotion" bit is cleared, then both C1
+ * and C1E requests end up with C1, so there is effectively no C1E.
*
- * By default we enable C1 and disable C1E by marking it with
+ * By default we enable C1E and disable C1 by marking it with
* 'CPUIDLE_FLAG_UNUSABLE'.
*/
+static struct cpuidle_state adl_cstates[] __initdata = {
+ {
+ .name = "C1",
+ .desc = "MWAIT 0x00",
+ .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_UNUSABLE,
+ .exit_latency = 1,
+ .target_residency = 1,
+ .enter = &intel_idle,
+ .enter_s2idle = intel_idle_s2idle, },
+ {
+ .name = "C1E",
+ .desc = "MWAIT 0x01",
+ .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_ALWAYS_ENABLE,
+ .exit_latency = 2,
+ .target_residency = 4,
+ .enter = &intel_idle,
+ .enter_s2idle = intel_idle_s2idle, },
+ {
+ .name = "C6",
+ .desc = "MWAIT 0x20",
+ .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED,
+ .exit_latency = 220,
+ .target_residency = 600,
+ .enter = &intel_idle,
+ .enter_s2idle = intel_idle_s2idle, },
+ {
+ .name = "C8",
+ .desc = "MWAIT 0x40",
+ .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TLB_FLUSHED,
+ .exit_latency = 280,
+ .target_residency = 800,
+ .enter = &intel_idle,
+ .enter_s2idle = intel_idle_s2idle, },
+ {
+ .name = "C10",
+ .desc = "MWAIT 0x60",
+ .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TLB_FLUSHED,
+ .exit_latency = 680,
+ .target_residency = 2000,
+ .enter = &intel_idle,
+ .enter_s2idle = intel_idle_s2idle, },
+ {
+ .enter = NULL }
+};
+
+static struct cpuidle_state adl_l_cstates[] __initdata = {
+ {
+ .name = "C1",
+ .desc = "MWAIT 0x00",
+ .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_UNUSABLE,
+ .exit_latency = 1,
+ .target_residency = 1,
+ .enter = &intel_idle,
+ .enter_s2idle = intel_idle_s2idle, },
+ {
+ .name = "C1E",
+ .desc = "MWAIT 0x01",
+ .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_ALWAYS_ENABLE,
+ .exit_latency = 2,
+ .target_residency = 4,
+ .enter = &intel_idle,
+ .enter_s2idle = intel_idle_s2idle, },
+ {
+ .name = "C6",
+ .desc = "MWAIT 0x20",
+ .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED,
+ .exit_latency = 170,
+ .target_residency = 500,
+ .enter = &intel_idle,
+ .enter_s2idle = intel_idle_s2idle, },
+ {
+ .name = "C8",
+ .desc = "MWAIT 0x40",
+ .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TLB_FLUSHED,
+ .exit_latency = 200,
+ .target_residency = 600,
+ .enter = &intel_idle,
+ .enter_s2idle = intel_idle_s2idle, },
+ {
+ .name = "C10",
+ .desc = "MWAIT 0x60",
+ .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TLB_FLUSHED,
+ .exit_latency = 230,
+ .target_residency = 700,
+ .enter = &intel_idle,
+ .enter_s2idle = intel_idle_s2idle, },
+ {
+ .enter = NULL }
+};
+
static struct cpuidle_state spr_cstates[] __initdata = {
{
.name = "C1",
@@ -833,8 +923,7 @@ static struct cpuidle_state spr_cstates[] __initdata = {
{
.name = "C1E",
.desc = "MWAIT 0x01",
- .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_ALWAYS_ENABLE |
- CPUIDLE_FLAG_UNUSABLE,
+ .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_ALWAYS_ENABLE,
.exit_latency = 2,
.target_residency = 4,
.enter = &intel_idle,
@@ -1194,6 +1283,14 @@ static const struct idle_cpu idle_cpu_icx __initconst = {
.use_acpi = true,
};

+static const struct idle_cpu idle_cpu_adl __initconst = {
+ .state_table = adl_cstates,
+};
+
+static const struct idle_cpu idle_cpu_adl_l __initconst = {
+ .state_table = adl_l_cstates,
+};
+
static const struct idle_cpu idle_cpu_spr __initconst = {
.state_table = spr_cstates,
.disable_promotion_to_c1e = true,
@@ -1262,6 +1359,8 @@ static const struct x86_cpu_id intel_idle_ids[] __initconst = {
X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X, &idle_cpu_skx),
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, &idle_cpu_icx),
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D, &idle_cpu_icx),
+ X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, &idle_cpu_adl),
+ X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, &idle_cpu_adl_l),
X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, &idle_cpu_spr),
X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL, &idle_cpu_knl),
X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM, &idle_cpu_knl),
@@ -1620,6 +1719,25 @@ static void __init skx_idle_state_table_update(void)
}
}

+/**
+ * adl_idle_state_table_update - Adjust AlderLake idle states table.
+ */
+static void __init adl_idle_state_table_update(void)
+{
+ /* Check if user prefers C1 over C1E. */
+ if (preferred_states_mask & BIT(1) && !(preferred_states_mask & BIT(2))) {
+ cpuidle_state_table[0].flags &= ~CPUIDLE_FLAG_UNUSABLE;
+ cpuidle_state_table[1].flags |= CPUIDLE_FLAG_UNUSABLE;
+
+ /* Disable C1E by clearing the "C1E promotion" bit. */
+ c1e_promotion = C1E_PROMOTION_DISABLE;
+ return;
+ }
+
+ /* Make sure C1E is enabled by default */
+ c1e_promotion = C1E_PROMOTION_ENABLE;
+}
+
/**
* spr_idle_state_table_update - Adjust Sapphire Rapids idle states table.
*/
@@ -1627,17 +1745,6 @@ static void __init spr_idle_state_table_update(void)
{
unsigned long long msr;

- /* Check if user prefers C1E over C1. */
- if ((preferred_states_mask & BIT(2)) &&
- !(preferred_states_mask & BIT(1))) {
- /* Disable C1 and enable C1E. */
- spr_cstates[0].flags |= CPUIDLE_FLAG_UNUSABLE;
- spr_cstates[1].flags &= ~CPUIDLE_FLAG_UNUSABLE;
-
- /* Enable C1E using the "C1E promotion" bit. */
- c1e_promotion = C1E_PROMOTION_ENABLE;
- }
-
/*
* By default, the C6 state assumes the worst-case scenario of package
* C6. However, if PC6 is disabled, we update the numbers to match
@@ -1689,6 +1796,10 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
case INTEL_FAM6_SAPPHIRERAPIDS_X:
spr_idle_state_table_update();
break;
+ case INTEL_FAM6_ALDERLAKE:
+ case INTEL_FAM6_ALDERLAKE_L:
+ adl_idle_state_table_update();
+ break;
}

for (cstate = 0; cstate < CPUIDLE_STATE_MAX; ++cstate) {
diff --git a/drivers/iio/accel/adxl313_core.c b/drivers/iio/accel/adxl313_core.c
index 9e4193e64765..afeef779e1d0 100644
--- a/drivers/iio/accel/adxl313_core.c
+++ b/drivers/iio/accel/adxl313_core.c
@@ -46,7 +46,7 @@ EXPORT_SYMBOL_NS_GPL(adxl313_writable_regs_table, IIO_ADXL313);
struct adxl313_data {
struct regmap *regmap;
struct mutex lock; /* lock to protect transf_buf */
- __le16 transf_buf ____cacheline_aligned;
+ __le16 transf_buf __aligned(IIO_DMA_MINALIGN);
};

static const int adxl313_odr_freqs[][2] = {
diff --git a/drivers/iio/accel/adxl355_core.c b/drivers/iio/accel/adxl355_core.c
index e9c10c8c32f0..2dfe780d6144 100644
--- a/drivers/iio/accel/adxl355_core.c
+++ b/drivers/iio/accel/adxl355_core.c
@@ -177,7 +177,7 @@ struct adxl355_data {
u8 buf[14];
s64 ts;
} buffer;
- } ____cacheline_aligned;
+ } __aligned(IIO_DMA_MINALIGN);
};

static int adxl355_set_op_mode(struct adxl355_data *data,
diff --git a/drivers/iio/accel/adxl367.c b/drivers/iio/accel/adxl367.c
index 62960134ea19..d680bec05efc 100644
--- a/drivers/iio/accel/adxl367.c
+++ b/drivers/iio/accel/adxl367.c
@@ -179,7 +179,7 @@ struct adxl367_state {
unsigned int fifo_set_size;
unsigned int fifo_watermark;

- __be16 fifo_buf[ADXL367_FIFO_SIZE] ____cacheline_aligned;
+ __be16 fifo_buf[ADXL367_FIFO_SIZE] __aligned(IIO_DMA_MINALIGN);
__be16 sample_buf;
u8 act_threshold_buf[2];
u8 inact_time_buf[2];
diff --git a/drivers/iio/accel/adxl367_spi.c b/drivers/iio/accel/adxl367_spi.c
index 26dfc821ebbe..118c894015a5 100644
--- a/drivers/iio/accel/adxl367_spi.c
+++ b/drivers/iio/accel/adxl367_spi.c
@@ -9,6 +9,8 @@
#include <linux/regmap.h>
#include <linux/spi/spi.h>

+#include <linux/iio/iio.h>
+
#include "adxl367.h"

#define ADXL367_SPI_WRITE_COMMAND 0x0A
@@ -28,10 +30,10 @@ struct adxl367_spi_state {
struct spi_transfer fifo_xfer[2];

/*
- * DMA (thus cache coherency maintenance) requires the
- * transfer buffers to live in their own cache lines.
+ * DMA (thus cache coherency maintenance) may require the
+ * transfer buffers live in their own cache lines.
*/
- u8 reg_write_tx_buf[1] ____cacheline_aligned;
+ u8 reg_write_tx_buf[1] __aligned(IIO_DMA_MINALIGN);
u8 reg_read_tx_buf[2];
u8 fifo_tx_buf[1];
};
diff --git a/drivers/iio/accel/bma220_spi.c b/drivers/iio/accel/bma220_spi.c
index 74024d7ce5ac..b6d9ab8e2054 100644
--- a/drivers/iio/accel/bma220_spi.c
+++ b/drivers/iio/accel/bma220_spi.c
@@ -67,7 +67,7 @@ struct bma220_data {
/* Ensure timestamp is naturally aligned. */
s64 timestamp __aligned(8);
} scan;
- u8 tx_buf[2] ____cacheline_aligned;
+ u8 tx_buf[2] __aligned(IIO_DMA_MINALIGN);
};

static const struct iio_chan_spec bma220_channels[] = {
diff --git a/drivers/iio/accel/bma400.h b/drivers/iio/accel/bma400.h
index c4c8d74155c2..1c8c47a9a317 100644
--- a/drivers/iio/accel/bma400.h
+++ b/drivers/iio/accel/bma400.h
@@ -83,8 +83,27 @@
#define BMA400_ACC_ODR_MIN_WHOLE_HZ 25
#define BMA400_ACC_ODR_MIN_HZ 12

-#define BMA400_SCALE_MIN 38357
-#define BMA400_SCALE_MAX 306864
+/*
+ * BMA400_SCALE_MIN macro value represents m/s^2 for 1 LSB before
+ * converting to micro values for +-2g range.
+ *
+ * For +-2g - 1 LSB = 0.976562 milli g = 0.009576 m/s^2
+ * For +-4g - 1 LSB = 1.953125 milli g = 0.019153 m/s^2
+ * For +-16g - 1 LSB = 7.8125 milli g = 0.076614 m/s^2
+ *
+ * The raw value which is used to select the different ranges is determined
+ * by the first bit set position from the scale value, so BMA400_SCALE_MIN
+ * should be odd.
+ *
+ * Scale values for +-2g, +-4g, +-8g and +-16g are populated into bma400_scales
+ * array by left shifting BMA400_SCALE_MIN.
+ * e.g.:
+ * To select +-2g = 9577 << 0 = raw value to write is 0.
+ * To select +-8g = 9577 << 2 = raw value to write is 2.
+ * To select +-16g = 9577 << 3 = raw value to write is 3.
+ */
+#define BMA400_SCALE_MIN 9577
+#define BMA400_SCALE_MAX 76617

#define BMA400_NUM_REGULATORS 2
#define BMA400_VDD_REGULATOR 0
@@ -94,6 +113,4 @@ extern const struct regmap_config bma400_regmap_config;

int bma400_probe(struct device *dev, struct regmap *regmap, const char *name);

-void bma400_remove(struct device *dev);
-
#endif
diff --git a/drivers/iio/accel/bma400_core.c b/drivers/iio/accel/bma400_core.c
index 043002fe6f63..07674d89d978 100644
--- a/drivers/iio/accel/bma400_core.c
+++ b/drivers/iio/accel/bma400_core.c
@@ -13,14 +13,14 @@

#include <linux/bitops.h>
#include <linux/device.h>
-#include <linux/iio/iio.h>
-#include <linux/iio/sysfs.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/regmap.h>
#include <linux/regulator/consumer.h>

+#include <linux/iio/iio.h>
+
#include "bma400.h"

/*
@@ -560,6 +560,26 @@ static void bma400_init_tables(void)
}
}

+static void bma400_regulators_disable(void *data_ptr)
+{
+ struct bma400_data *data = data_ptr;
+
+ regulator_bulk_disable(ARRAY_SIZE(data->regulators), data->regulators);
+}
+
+static void bma400_power_disable(void *data_ptr)
+{
+ struct bma400_data *data = data_ptr;
+ int ret;
+
+ mutex_lock(&data->mutex);
+ ret = bma400_set_power_mode(data, POWER_MODE_SLEEP);
+ mutex_unlock(&data->mutex);
+ if (ret)
+ dev_warn(data->dev, "Failed to put device into sleep mode (%pe)\n",
+ ERR_PTR(ret));
+}
+
static int bma400_init(struct bma400_data *data)
{
unsigned int val;
@@ -569,13 +589,12 @@ static int bma400_init(struct bma400_data *data)
ret = regmap_read(data->regmap, BMA400_CHIP_ID_REG, &val);
if (ret) {
dev_err(data->dev, "Failed to read chip id register\n");
- goto out;
+ return ret;
}

if (val != BMA400_ID_REG_VAL) {
dev_err(data->dev, "Chip ID mismatch\n");
- ret = -ENODEV;
- goto out;
+ return -ENODEV;
}

data->regulators[BMA400_VDD_REGULATOR].supply = "vdd";
@@ -589,27 +608,31 @@ static int bma400_init(struct bma400_data *data)
"Failed to get regulators: %d\n",
ret);

- goto out;
+ return ret;
}
ret = regulator_bulk_enable(ARRAY_SIZE(data->regulators),
data->regulators);
if (ret) {
dev_err(data->dev, "Failed to enable regulators: %d\n",
ret);
- goto out;
+ return ret;
}

+ ret = devm_add_action_or_reset(data->dev, bma400_regulators_disable, data);
+ if (ret)
+ return ret;
+
ret = bma400_get_power_mode(data);
if (ret) {
dev_err(data->dev, "Failed to get the initial power-mode\n");
- goto err_reg_disable;
+ return ret;
}

if (data->power_mode != POWER_MODE_NORMAL) {
ret = bma400_set_power_mode(data, POWER_MODE_NORMAL);
if (ret) {
dev_err(data->dev, "Failed to wake up the device\n");
- goto err_reg_disable;
+ return ret;
}
/*
* TODO: The datasheet waits 1500us here in the example, but
@@ -618,19 +641,23 @@ static int bma400_init(struct bma400_data *data)
usleep_range(1500, 2000);
}

+ ret = devm_add_action_or_reset(data->dev, bma400_power_disable, data);
+ if (ret)
+ return ret;
+
bma400_init_tables();

ret = bma400_get_accel_output_data_rate(data);
if (ret)
- goto err_reg_disable;
+ return ret;

ret = bma400_get_accel_oversampling_ratio(data);
if (ret)
- goto err_reg_disable;
+ return ret;

ret = bma400_get_accel_scale(data);
if (ret)
- goto err_reg_disable;
+ return ret;

/*
* Once the interrupt engine is supported we might use the
@@ -639,12 +666,6 @@ static int bma400_init(struct bma400_data *data)
* channel.
*/
return regmap_write(data->regmap, BMA400_ACC_CONFIG2_REG, 0x00);
-
-err_reg_disable:
- regulator_bulk_disable(ARRAY_SIZE(data->regulators),
- data->regulators);
-out:
- return ret;
}

static int bma400_read_raw(struct iio_dev *indio_dev,
@@ -822,32 +843,10 @@ int bma400_probe(struct device *dev, struct regmap *regmap, const char *name)
indio_dev->num_channels = ARRAY_SIZE(bma400_channels);
indio_dev->modes = INDIO_DIRECT_MODE;

- dev_set_drvdata(dev, indio_dev);
-
- return iio_device_register(indio_dev);
+ return devm_iio_device_register(dev, indio_dev);
}
EXPORT_SYMBOL_NS(bma400_probe, IIO_BMA400);

-void bma400_remove(struct device *dev)
-{
- struct iio_dev *indio_dev = dev_get_drvdata(dev);
- struct bma400_data *data = iio_priv(indio_dev);
- int ret;
-
- mutex_lock(&data->mutex);
- ret = bma400_set_power_mode(data, POWER_MODE_SLEEP);
- mutex_unlock(&data->mutex);
-
- if (ret)
- dev_warn(dev, "Failed to put device into sleep mode (%pe)\n", ERR_PTR(ret));
-
- regulator_bulk_disable(ARRAY_SIZE(data->regulators),
- data->regulators);
-
- iio_device_unregister(indio_dev);
-}
-EXPORT_SYMBOL_NS(bma400_remove, IIO_BMA400);
-
MODULE_AUTHOR("Dan Robertson <dan@xxxxxxxxxxxxxxx>");
MODULE_DESCRIPTION("Bosch BMA400 triaxial acceleration sensor core");
MODULE_LICENSE("GPL");
diff --git a/drivers/iio/accel/bma400_i2c.c b/drivers/iio/accel/bma400_i2c.c
index da104ffd3fe0..4f6e01a3b3a1 100644
--- a/drivers/iio/accel/bma400_i2c.c
+++ b/drivers/iio/accel/bma400_i2c.c
@@ -27,13 +27,6 @@ static int bma400_i2c_probe(struct i2c_client *client,
return bma400_probe(&client->dev, regmap, id->name);
}

-static int bma400_i2c_remove(struct i2c_client *client)
-{
- bma400_remove(&client->dev);
-
- return 0;
-}
-
static const struct i2c_device_id bma400_i2c_ids[] = {
{ "bma400", 0 },
{ }
@@ -52,7 +45,6 @@ static struct i2c_driver bma400_i2c_driver = {
.of_match_table = bma400_of_i2c_match,
},
.probe = bma400_i2c_probe,
- .remove = bma400_i2c_remove,
.id_table = bma400_i2c_ids,
};

diff --git a/drivers/iio/accel/bma400_spi.c b/drivers/iio/accel/bma400_spi.c
index 51f23bdc0ea5..28e240400a3f 100644
--- a/drivers/iio/accel/bma400_spi.c
+++ b/drivers/iio/accel/bma400_spi.c
@@ -87,11 +87,6 @@ static int bma400_spi_probe(struct spi_device *spi)
return bma400_probe(&spi->dev, regmap, id->name);
}

-static void bma400_spi_remove(struct spi_device *spi)
-{
- bma400_remove(&spi->dev);
-}
-
static const struct spi_device_id bma400_spi_ids[] = {
{ "bma400", 0 },
{ }
@@ -110,7 +105,6 @@ static struct spi_driver bma400_spi_driver = {
.of_match_table = bma400_of_spi_match,
},
.probe = bma400_spi_probe,
- .remove = bma400_spi_remove,
.id_table = bma400_spi_ids,
};

diff --git a/drivers/iio/accel/cros_ec_accel_legacy.c b/drivers/iio/accel/cros_ec_accel_legacy.c
index b6f3471b62dc..3b77fded2dc0 100644
--- a/drivers/iio/accel/cros_ec_accel_legacy.c
+++ b/drivers/iio/accel/cros_ec_accel_legacy.c
@@ -215,7 +215,7 @@ static int cros_ec_accel_legacy_probe(struct platform_device *pdev)
return -ENOMEM;

ret = cros_ec_sensors_core_init(pdev, indio_dev, true,
- cros_ec_sensors_capture, NULL);
+ cros_ec_sensors_capture);
if (ret)
return ret;

@@ -235,7 +235,7 @@ static int cros_ec_accel_legacy_probe(struct platform_device *pdev)
state->sign[CROS_EC_SENSOR_Z] = -1;
}

- return devm_iio_device_register(dev, indio_dev);
+ return cros_ec_sensors_core_register(dev, indio_dev, NULL);
}

static struct platform_driver cros_ec_accel_platform_driver = {
diff --git a/drivers/iio/accel/sca3000.c b/drivers/iio/accel/sca3000.c
index 83c81072511e..2eecd2ab72dd 100644
--- a/drivers/iio/accel/sca3000.c
+++ b/drivers/iio/accel/sca3000.c
@@ -167,8 +167,8 @@ struct sca3000_state {
int mo_det_use_count;
struct mutex lock;
/* Can these share a cacheline ? */
- u8 rx[384] ____cacheline_aligned;
- u8 tx[6] ____cacheline_aligned;
+ u8 rx[384] __aligned(IIO_DMA_MINALIGN);
+ u8 tx[6] __aligned(IIO_DMA_MINALIGN);
};

/**
diff --git a/drivers/iio/accel/sca3300.c b/drivers/iio/accel/sca3300.c
index f7ef8ecfd34a..39e0c24364ae 100644
--- a/drivers/iio/accel/sca3300.c
+++ b/drivers/iio/accel/sca3300.c
@@ -115,7 +115,7 @@ struct sca3300_data {
s16 channels[4];
s64 ts __aligned(sizeof(s64));
} scan;
- u8 txbuf[4] ____cacheline_aligned;
+ u8 txbuf[4] __aligned(IIO_DMA_MINALIGN);
u8 rxbuf[4];
};

diff --git a/drivers/iio/adc/ad7266.c b/drivers/iio/adc/ad7266.c
index c17d9b5fbaf6..53c83e04dde5 100644
--- a/drivers/iio/adc/ad7266.c
+++ b/drivers/iio/adc/ad7266.c
@@ -37,7 +37,7 @@ struct ad7266_state {
struct gpio_desc *gpios[3];

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
* The buffer needs to be large enough to hold two samples (4 bytes) and
* the naturally aligned timestamp (8 bytes).
@@ -45,7 +45,7 @@ struct ad7266_state {
struct {
__be16 sample[2];
s64 timestamp;
- } data ____cacheline_aligned;
+ } data __aligned(IIO_DMA_MINALIGN);
};

static int ad7266_wakeup(struct ad7266_state *st)
diff --git a/drivers/iio/adc/ad7280a.c b/drivers/iio/adc/ad7280a.c
index ec9acbf12b9a..047a9d0dbcf7 100644
--- a/drivers/iio/adc/ad7280a.c
+++ b/drivers/iio/adc/ad7280a.c
@@ -183,7 +183,7 @@ struct ad7280_state {
unsigned char cb_mask[AD7280A_MAX_CHAIN];
struct mutex lock; /* protect sensor state */

- __be32 tx ____cacheline_aligned;
+ __be32 tx __aligned(IIO_DMA_MINALIGN);
__be32 rx;
};

diff --git a/drivers/iio/adc/ad7292.c b/drivers/iio/adc/ad7292.c
index 3271a31afde1..92c68d467c50 100644
--- a/drivers/iio/adc/ad7292.c
+++ b/drivers/iio/adc/ad7292.c
@@ -80,7 +80,7 @@ struct ad7292_state {
struct regulator *reg;
unsigned short vref_mv;

- __be16 d16 ____cacheline_aligned;
+ __be16 d16 __aligned(IIO_DMA_MINALIGN);
u8 d8[2];
};

diff --git a/drivers/iio/adc/ad7298.c b/drivers/iio/adc/ad7298.c
index 3f4e73f7d35a..c0430f71f592 100644
--- a/drivers/iio/adc/ad7298.c
+++ b/drivers/iio/adc/ad7298.c
@@ -49,7 +49,7 @@ struct ad7298_state {
* DMA (thus cache coherency maintenance) requires the
* transfer buffers to live in their own cache lines.
*/
- __be16 rx_buf[12] ____cacheline_aligned;
+ __be16 rx_buf[12] __aligned(IIO_DMA_MINALIGN);
__be16 tx_buf[2];
};

diff --git a/drivers/iio/adc/ad7476.c b/drivers/iio/adc/ad7476.c
index a1e8b32671cf..94776f696290 100644
--- a/drivers/iio/adc/ad7476.c
+++ b/drivers/iio/adc/ad7476.c
@@ -44,13 +44,12 @@ struct ad7476_state {
struct spi_transfer xfer;
struct spi_message msg;
/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
* Make the buffer large enough for one 16 bit sample and one 64 bit
* aligned 64 bit timestamp.
*/
- unsigned char data[ALIGN(2, sizeof(s64)) + sizeof(s64)]
- ____cacheline_aligned;
+ unsigned char data[ALIGN(2, sizeof(s64)) + sizeof(s64)] __aligned(IIO_DMA_MINALIGN);
};

enum ad7476_supported_device_ids {
diff --git a/drivers/iio/adc/ad7606.h b/drivers/iio/adc/ad7606.h
index 4f82d7c9acfd..2dc4f599f9df 100644
--- a/drivers/iio/adc/ad7606.h
+++ b/drivers/iio/adc/ad7606.h
@@ -116,11 +116,11 @@ struct ad7606_state {
struct completion completion;

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
* 16 * 16-bit samples + 64-bit timestamp
*/
- unsigned short data[20] ____cacheline_aligned;
+ unsigned short data[20] __aligned(IIO_DMA_MINALIGN);
__be16 d16[2];
};

diff --git a/drivers/iio/adc/ad7766.c b/drivers/iio/adc/ad7766.c
index 51ee9482e0df..3079a0872947 100644
--- a/drivers/iio/adc/ad7766.c
+++ b/drivers/iio/adc/ad7766.c
@@ -45,13 +45,12 @@ struct ad7766 {
struct spi_message msg;

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
* Make the buffer large enough for one 24 bit sample and one 64 bit
* aligned 64 bit timestamp.
*/
- unsigned char data[ALIGN(3, sizeof(s64)) + sizeof(s64)]
- ____cacheline_aligned;
+ unsigned char data[ALIGN(3, sizeof(s64)) + sizeof(s64)] __aligned(IIO_DMA_MINALIGN);
};

/*
diff --git a/drivers/iio/adc/ad7768-1.c b/drivers/iio/adc/ad7768-1.c
index aa42ba759fa1..60f394da4640 100644
--- a/drivers/iio/adc/ad7768-1.c
+++ b/drivers/iio/adc/ad7768-1.c
@@ -163,7 +163,7 @@ struct ad7768_state {
struct gpio_desc *gpio_sync_in;
const char *labels[ARRAY_SIZE(ad7768_channels)];
/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/
union {
@@ -173,7 +173,7 @@ struct ad7768_state {
} scan;
__be32 d32;
u8 d8[2];
- } data ____cacheline_aligned;
+ } data __aligned(IIO_DMA_MINALIGN);
};

static int ad7768_spi_reg_read(struct ad7768_state *st, unsigned int addr,
diff --git a/drivers/iio/adc/ad7887.c b/drivers/iio/adc/ad7887.c
index f64999714a4d..965bdc8aa696 100644
--- a/drivers/iio/adc/ad7887.c
+++ b/drivers/iio/adc/ad7887.c
@@ -66,13 +66,12 @@ struct ad7887_state {
unsigned char tx_cmd_buf[4];

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
* Buffer needs to be large enough to hold two 16 bit samples and a
* 64 bit aligned 64 bit timestamp.
*/
- unsigned char data[ALIGN(4, sizeof(s64)) + sizeof(s64)]
- ____cacheline_aligned;
+ unsigned char data[ALIGN(4, sizeof(s64)) + sizeof(s64)] __aligned(IIO_DMA_MINALIGN);
};

enum ad7887_supported_device_ids {
diff --git a/drivers/iio/adc/ad7923.c b/drivers/iio/adc/ad7923.c
index 069b561ee768..edad1f30121d 100644
--- a/drivers/iio/adc/ad7923.c
+++ b/drivers/iio/adc/ad7923.c
@@ -57,12 +57,12 @@ struct ad7923_state {
unsigned int settings;

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
* Ensure rx_buf can be directly used in iio_push_to_buffers_with_timetamp
* Length = 8 channels + 4 extra for 8 byte timestamp
*/
- __be16 rx_buf[12] ____cacheline_aligned;
+ __be16 rx_buf[12] __aligned(IIO_DMA_MINALIGN);
__be16 tx_buf[4];
};

diff --git a/drivers/iio/adc/ad7949.c b/drivers/iio/adc/ad7949.c
index 44bb5fde83de..ed4c1656ca75 100644
--- a/drivers/iio/adc/ad7949.c
+++ b/drivers/iio/adc/ad7949.c
@@ -86,7 +86,7 @@ struct ad7949_adc_chip {
u8 resolution;
u16 cfg;
unsigned int current_channel;
- u16 buffer ____cacheline_aligned;
+ u16 buffer __aligned(IIO_DMA_MINALIGN);
__be16 buf8b;
};

diff --git a/drivers/iio/adc/adi-axi-adc.c b/drivers/iio/adc/adi-axi-adc.c
index a9e655e69eaa..8ffabdaf841e 100644
--- a/drivers/iio/adc/adi-axi-adc.c
+++ b/drivers/iio/adc/adi-axi-adc.c
@@ -84,7 +84,8 @@ void *adi_axi_adc_conv_priv(struct adi_axi_adc_conv *conv)
{
struct adi_axi_adc_client *cl = conv_to_client(conv);

- return (char *)cl + ALIGN(sizeof(struct adi_axi_adc_client), IIO_ALIGN);
+ return (char *)cl + ALIGN(sizeof(struct adi_axi_adc_client),
+ IIO_DMA_MINALIGN);
}
EXPORT_SYMBOL_GPL(adi_axi_adc_conv_priv);

@@ -169,9 +170,9 @@ static struct adi_axi_adc_conv *adi_axi_adc_conv_register(struct device *dev,
struct adi_axi_adc_client *cl;
size_t alloc_size;

- alloc_size = ALIGN(sizeof(struct adi_axi_adc_client), IIO_ALIGN);
+ alloc_size = ALIGN(sizeof(struct adi_axi_adc_client), IIO_DMA_MINALIGN);
if (sizeof_priv)
- alloc_size += ALIGN(sizeof_priv, IIO_ALIGN);
+ alloc_size += ALIGN(sizeof_priv, IIO_DMA_MINALIGN);

cl = kzalloc(alloc_size, GFP_KERNEL);
if (!cl)
diff --git a/drivers/iio/adc/hi8435.c b/drivers/iio/adc/hi8435.c
index 8eb0140df133..771fa12bdc02 100644
--- a/drivers/iio/adc/hi8435.c
+++ b/drivers/iio/adc/hi8435.c
@@ -49,7 +49,7 @@ struct hi8435_priv {

unsigned threshold_lo[2]; /* GND-Open and Supply-Open thresholds */
unsigned threshold_hi[2]; /* GND-Open and Supply-Open thresholds */
- u8 reg_buffer[3] ____cacheline_aligned;
+ u8 reg_buffer[3] __aligned(IIO_DMA_MINALIGN);
};

static int hi8435_readb(struct hi8435_priv *priv, u8 reg, u8 *val)
diff --git a/drivers/iio/adc/ltc2496.c b/drivers/iio/adc/ltc2496.c
index 5a55f79f2574..dfb3bb5997e5 100644
--- a/drivers/iio/adc/ltc2496.c
+++ b/drivers/iio/adc/ltc2496.c
@@ -24,10 +24,10 @@ struct ltc2496_driverdata {
struct spi_device *spi;

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/
- unsigned char rxbuf[3] ____cacheline_aligned;
+ unsigned char rxbuf[3] __aligned(IIO_DMA_MINALIGN);
unsigned char txbuf[3];
};

diff --git a/drivers/iio/adc/ltc2497.c b/drivers/iio/adc/ltc2497.c
index 1adddf5a88a9..f7c786f37ceb 100644
--- a/drivers/iio/adc/ltc2497.c
+++ b/drivers/iio/adc/ltc2497.c
@@ -20,10 +20,10 @@ struct ltc2497_driverdata {
struct ltc2497core_driverdata common_ddata;
struct i2c_client *client;
/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/
- __be32 buf ____cacheline_aligned;
+ __be32 buf __aligned(IIO_DMA_MINALIGN);
};

static int ltc2497_result_and_measure(struct ltc2497core_driverdata *ddata,
diff --git a/drivers/iio/adc/max1027.c b/drivers/iio/adc/max1027.c
index 4daf1d576c4e..136fcf753837 100644
--- a/drivers/iio/adc/max1027.c
+++ b/drivers/iio/adc/max1027.c
@@ -272,7 +272,7 @@ struct max1027_state {
struct mutex lock;
struct completion complete;

- u8 reg ____cacheline_aligned;
+ u8 reg __aligned(IIO_DMA_MINALIGN);
};

static int max1027_wait_eoc(struct iio_dev *indio_dev)
@@ -349,8 +349,7 @@ static int max1027_read_single_value(struct iio_dev *indio_dev,
if (ret < 0) {
dev_err(&indio_dev->dev,
"Failed to configure conversion register\n");
- iio_device_release_direct_mode(indio_dev);
- return ret;
+ goto release;
}

/*
@@ -360,11 +359,12 @@ static int max1027_read_single_value(struct iio_dev *indio_dev,
*/
ret = max1027_wait_eoc(indio_dev);
if (ret)
- return ret;
+ goto release;

/* Read result */
ret = spi_read(st->spi, st->buffer, (chan->type == IIO_TEMP) ? 4 : 2);

+release:
iio_device_release_direct_mode(indio_dev);

if (ret < 0)
diff --git a/drivers/iio/adc/max11100.c b/drivers/iio/adc/max11100.c
index eb1ce6a0315c..49e38dca8fe2 100644
--- a/drivers/iio/adc/max11100.c
+++ b/drivers/iio/adc/max11100.c
@@ -33,10 +33,10 @@ struct max11100_state {
struct spi_device *spi;

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/
- u8 buffer[3] ____cacheline_aligned;
+ u8 buffer[3] __aligned(IIO_DMA_MINALIGN);
};

static const struct iio_chan_spec max11100_channels[] = {
diff --git a/drivers/iio/adc/max1118.c b/drivers/iio/adc/max1118.c
index a41bc570be21..75ab57d9aef7 100644
--- a/drivers/iio/adc/max1118.c
+++ b/drivers/iio/adc/max1118.c
@@ -42,7 +42,7 @@ struct max1118 {
s64 ts __aligned(8);
} scan;

- u8 data ____cacheline_aligned;
+ u8 data __aligned(IIO_DMA_MINALIGN);
};

#define MAX1118_CHANNEL(ch) \
diff --git a/drivers/iio/adc/max1241.c b/drivers/iio/adc/max1241.c
index a5afd84af58b..a815ad1f6913 100644
--- a/drivers/iio/adc/max1241.c
+++ b/drivers/iio/adc/max1241.c
@@ -26,7 +26,7 @@ struct max1241 {
struct regulator *vref;
struct gpio_desc *shutdown;

- __be16 data ____cacheline_aligned;
+ __be16 data __aligned(IIO_DMA_MINALIGN);
};

static const struct iio_chan_spec max1241_channels[] = {
diff --git a/drivers/iio/adc/mcp320x.c b/drivers/iio/adc/mcp320x.c
index b4c69acb33e3..f3b81798b3c9 100644
--- a/drivers/iio/adc/mcp320x.c
+++ b/drivers/iio/adc/mcp320x.c
@@ -92,7 +92,7 @@ struct mcp320x {
struct mutex lock;
const struct mcp320x_chip_info *chip_info;

- u8 tx_buf ____cacheline_aligned;
+ u8 tx_buf __aligned(IIO_DMA_MINALIGN);
u8 rx_buf[4];
};

diff --git a/drivers/iio/adc/ti-adc0832.c b/drivers/iio/adc/ti-adc0832.c
index fb5e72600b96..b11ce555ba3b 100644
--- a/drivers/iio/adc/ti-adc0832.c
+++ b/drivers/iio/adc/ti-adc0832.c
@@ -36,7 +36,7 @@ struct adc0832 {
*/
u8 data[24] __aligned(8);

- u8 tx_buf[2] ____cacheline_aligned;
+ u8 tx_buf[2] __aligned(IIO_DMA_MINALIGN);
u8 rx_buf[2];
};

diff --git a/drivers/iio/adc/ti-adc084s021.c b/drivers/iio/adc/ti-adc084s021.c
index c9b5d9aec3dc..1f6e53832e06 100644
--- a/drivers/iio/adc/ti-adc084s021.c
+++ b/drivers/iio/adc/ti-adc084s021.c
@@ -32,10 +32,10 @@ struct adc084s021 {
s64 ts __aligned(8);
} scan;
/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache line.
*/
- u16 tx_buf[4] ____cacheline_aligned;
+ u16 tx_buf[4] __aligned(IIO_DMA_MINALIGN);
__be16 rx_buf[5]; /* First 16-bits are trash */
};

diff --git a/drivers/iio/adc/ti-adc108s102.c b/drivers/iio/adc/ti-adc108s102.c
index c8e48881c37f..c82a161630e1 100644
--- a/drivers/iio/adc/ti-adc108s102.c
+++ b/drivers/iio/adc/ti-adc108s102.c
@@ -77,8 +77,8 @@ struct adc108s102_state {
* tx_buf: 8 channel read commands, plus 1 dummy command
* rx_buf: 1 dummy response, 8 channel responses
*/
- __be16 rx_buf[9] ____cacheline_aligned;
- __be16 tx_buf[9] ____cacheline_aligned;
+ __be16 rx_buf[9] __aligned(IIO_DMA_MINALIGN);
+ __be16 tx_buf[9] __aligned(IIO_DMA_MINALIGN);
};

#define ADC108S102_V_CHAN(index) \
diff --git a/drivers/iio/adc/ti-adc12138.c b/drivers/iio/adc/ti-adc12138.c
index 59d75d09604f..c0a72d72f3a9 100644
--- a/drivers/iio/adc/ti-adc12138.c
+++ b/drivers/iio/adc/ti-adc12138.c
@@ -55,7 +55,7 @@ struct adc12138 {
*/
__be16 data[20] __aligned(8);

- u8 tx_buf[2] ____cacheline_aligned;
+ u8 tx_buf[2] __aligned(IIO_DMA_MINALIGN);
u8 rx_buf[2];
};

diff --git a/drivers/iio/adc/ti-adc128s052.c b/drivers/iio/adc/ti-adc128s052.c
index 8e7adec87755..622fd384983c 100644
--- a/drivers/iio/adc/ti-adc128s052.c
+++ b/drivers/iio/adc/ti-adc128s052.c
@@ -29,7 +29,7 @@ struct adc128 {
struct regulator *reg;
struct mutex lock;

- u8 buffer[2] ____cacheline_aligned;
+ u8 buffer[2] __aligned(IIO_DMA_MINALIGN);
};

static int adc128_adc_conversion(struct adc128 *adc, u8 channel)
diff --git a/drivers/iio/adc/ti-adc161s626.c b/drivers/iio/adc/ti-adc161s626.c
index 75ca7f1c8726..b789891dcf49 100644
--- a/drivers/iio/adc/ti-adc161s626.c
+++ b/drivers/iio/adc/ti-adc161s626.c
@@ -71,7 +71,7 @@ struct ti_adc_data {
u8 read_size;
u8 shift;

- u8 buffer[16] ____cacheline_aligned;
+ u8 buffer[16] __aligned(IIO_DMA_MINALIGN);
};

static int ti_adc_read_measurement(struct ti_adc_data *data,
diff --git a/drivers/iio/adc/ti-ads124s08.c b/drivers/iio/adc/ti-ads124s08.c
index 767b3b634809..64833156c199 100644
--- a/drivers/iio/adc/ti-ads124s08.c
+++ b/drivers/iio/adc/ti-ads124s08.c
@@ -106,7 +106,7 @@ struct ads124s_private {
* timestamp is maintained.
*/
u32 buffer[ADS124S08_MAX_CHANNELS + sizeof(s64)/sizeof(u32)] __aligned(8);
- u8 data[5] ____cacheline_aligned;
+ u8 data[5] __aligned(IIO_DMA_MINALIGN);
};

#define ADS124S08_CHAN(index) \
diff --git a/drivers/iio/adc/ti-ads131e08.c b/drivers/iio/adc/ti-ads131e08.c
index 80a09817c119..32237cacc9a3 100644
--- a/drivers/iio/adc/ti-ads131e08.c
+++ b/drivers/iio/adc/ti-ads131e08.c
@@ -105,7 +105,7 @@ struct ads131e08_state {
s64 ts __aligned(8);
} tmp_buf;

- u8 tx_buf[3] ____cacheline_aligned;
+ u8 tx_buf[3] __aligned(IIO_DMA_MINALIGN);
/*
* Add extra one padding byte to be able to access the last channel
* value using u32 pointer
diff --git a/drivers/iio/adc/ti-ads7950.c b/drivers/iio/adc/ti-ads7950.c
index e3658b969c5b..2cc9a9bd9db6 100644
--- a/drivers/iio/adc/ti-ads7950.c
+++ b/drivers/iio/adc/ti-ads7950.c
@@ -102,11 +102,11 @@ struct ti_ads7950_state {
unsigned int gpio_cmd_settings_bitmask;

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/
u16 rx_buf[TI_ADS7950_MAX_CHAN + 2 + TI_ADS7950_TIMESTAMP_SIZE]
- ____cacheline_aligned;
+ __aligned(IIO_DMA_MINALIGN);
u16 tx_buf[TI_ADS7950_MAX_CHAN + 2];
u16 single_tx;
u16 single_rx;
diff --git a/drivers/iio/adc/ti-ads8344.c b/drivers/iio/adc/ti-ads8344.c
index c96d2a9ba924..bbd85cb47f81 100644
--- a/drivers/iio/adc/ti-ads8344.c
+++ b/drivers/iio/adc/ti-ads8344.c
@@ -28,7 +28,7 @@ struct ads8344 {
*/
struct mutex lock;

- u8 tx_buf ____cacheline_aligned;
+ u8 tx_buf __aligned(IIO_DMA_MINALIGN);
u8 rx_buf[3];
};

diff --git a/drivers/iio/adc/ti-ads8688.c b/drivers/iio/adc/ti-ads8688.c
index 22c2583eedd0..7ee14c6d9f46 100644
--- a/drivers/iio/adc/ti-ads8688.c
+++ b/drivers/iio/adc/ti-ads8688.c
@@ -71,7 +71,7 @@ struct ads8688_state {
union {
__be32 d32;
u8 d8[4];
- } data[2] ____cacheline_aligned;
+ } data[2] __aligned(IIO_DMA_MINALIGN);
};

enum ads8688_id {
diff --git a/drivers/iio/adc/ti-tlc4541.c b/drivers/iio/adc/ti-tlc4541.c
index 2406eda9dfc6..30f629a553a1 100644
--- a/drivers/iio/adc/ti-tlc4541.c
+++ b/drivers/iio/adc/ti-tlc4541.c
@@ -37,12 +37,12 @@ struct tlc4541_state {
struct spi_message scan_single_msg;

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
* 2 bytes data + 6 bytes padding + 8 bytes timestamp when
* call iio_push_to_buffers_with_timestamp.
*/
- __be16 rx_buf[8] ____cacheline_aligned;
+ __be16 rx_buf[8] __aligned(IIO_DMA_MINALIGN);
};

struct tlc4541_chip_info {
diff --git a/drivers/iio/addac/ad74413r.c b/drivers/iio/addac/ad74413r.c
index acd230a6af35..6a66d7a65db7 100644
--- a/drivers/iio/addac/ad74413r.c
+++ b/drivers/iio/addac/ad74413r.c
@@ -77,13 +77,13 @@ struct ad74413r_state {
struct spi_transfer adc_samples_xfer[AD74413R_CHANNEL_MAX + 1];

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/
struct {
u8 rx_buf[AD74413R_FRAME_SIZE * AD74413R_CHANNEL_MAX];
s64 timestamp;
- } adc_samples_buf ____cacheline_aligned;
+ } adc_samples_buf __aligned(IIO_DMA_MINALIGN);

u8 adc_samples_tx_buf[AD74413R_FRAME_SIZE * AD74413R_CHANNEL_MAX];
u8 reg_tx_buf[AD74413R_FRAME_SIZE];
diff --git a/drivers/iio/amplifiers/ad8366.c b/drivers/iio/amplifiers/ad8366.c
index 1134ae12e531..f2c2ea79a07f 100644
--- a/drivers/iio/amplifiers/ad8366.c
+++ b/drivers/iio/amplifiers/ad8366.c
@@ -45,10 +45,10 @@ struct ad8366_state {
enum ad8366_type type;
struct ad8366_info *info;
/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/
- unsigned char data[2] ____cacheline_aligned;
+ unsigned char data[2] __aligned(IIO_DMA_MINALIGN);
};

static struct ad8366_info ad8366_infos[] = {
diff --git a/drivers/iio/common/cros_ec_sensors/cros_ec_lid_angle.c b/drivers/iio/common/cros_ec_sensors/cros_ec_lid_angle.c
index af801e203623..02d3cf36acb0 100644
--- a/drivers/iio/common/cros_ec_sensors/cros_ec_lid_angle.c
+++ b/drivers/iio/common/cros_ec_sensors/cros_ec_lid_angle.c
@@ -97,7 +97,7 @@ static int cros_ec_lid_angle_probe(struct platform_device *pdev)
if (!indio_dev)
return -ENOMEM;

- ret = cros_ec_sensors_core_init(pdev, indio_dev, false, NULL, NULL);
+ ret = cros_ec_sensors_core_init(pdev, indio_dev, false, NULL);
if (ret)
return ret;

@@ -113,7 +113,7 @@ static int cros_ec_lid_angle_probe(struct platform_device *pdev)
if (ret)
return ret;

- return devm_iio_device_register(dev, indio_dev);
+ return cros_ec_sensors_core_register(dev, indio_dev, NULL);
}

static const struct platform_device_id cros_ec_lid_angle_ids[] = {
diff --git a/drivers/iio/common/cros_ec_sensors/cros_ec_sensors.c b/drivers/iio/common/cros_ec_sensors/cros_ec_sensors.c
index 376a5b30010a..5cce34fdff02 100644
--- a/drivers/iio/common/cros_ec_sensors/cros_ec_sensors.c
+++ b/drivers/iio/common/cros_ec_sensors/cros_ec_sensors.c
@@ -235,8 +235,7 @@ static int cros_ec_sensors_probe(struct platform_device *pdev)
return -ENOMEM;

ret = cros_ec_sensors_core_init(pdev, indio_dev, true,
- cros_ec_sensors_capture,
- cros_ec_sensors_push_data);
+ cros_ec_sensors_capture);
if (ret)
return ret;

@@ -297,7 +296,8 @@ static int cros_ec_sensors_probe(struct platform_device *pdev)
else
state->core.read_ec_sensors_data = cros_ec_sensors_read_cmd;

- return devm_iio_device_register(dev, indio_dev);
+ return cros_ec_sensors_core_register(dev, indio_dev,
+ cros_ec_sensors_push_data);
}

static const struct platform_device_id cros_ec_sensors_ids[] = {
diff --git a/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c b/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c
index b2725c6adc7f..ed9716832161 100644
--- a/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c
+++ b/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c
@@ -234,21 +234,18 @@ static void cros_ec_sensors_core_clean(void *arg)

/**
* cros_ec_sensors_core_init() - basic initialization of the core structure
- * @pdev: platform device created for the sensors
+ * @pdev: platform device created for the sensor
* @indio_dev: iio device structure of the device
* @physical_device: true if the device refers to a physical device
* @trigger_capture: function pointer to call buffer is triggered,
* for backward compatibility.
- * @push_data: function to call when cros_ec_sensorhub receives
- * a sample for that sensor.
*
* Return: 0 on success, -errno on failure.
*/
int cros_ec_sensors_core_init(struct platform_device *pdev,
struct iio_dev *indio_dev,
bool physical_device,
- cros_ec_sensors_capture_t trigger_capture,
- cros_ec_sensorhub_push_data_cb_t push_data)
+ cros_ec_sensors_capture_t trigger_capture)
{
struct device *dev = &pdev->dev;
struct cros_ec_sensors_core_state *state = iio_priv(indio_dev);
@@ -339,17 +336,6 @@ int cros_ec_sensors_core_init(struct platform_device *pdev,
if (ret)
return ret;

- ret = cros_ec_sensorhub_register_push_data(
- sensor_hub, sensor_platform->sensor_num,
- indio_dev, push_data);
- if (ret)
- return ret;
-
- ret = devm_add_action_or_reset(
- dev, cros_ec_sensors_core_clean, pdev);
- if (ret)
- return ret;
-
/* Timestamp coming from FIFO are in ns since boot. */
ret = iio_device_set_clock(indio_dev, CLOCK_BOOTTIME);
if (ret)
@@ -371,6 +357,46 @@ int cros_ec_sensors_core_init(struct platform_device *pdev,
}
EXPORT_SYMBOL_GPL(cros_ec_sensors_core_init);

+/**
+ * cros_ec_sensors_core_register() - Register callback to FIFO and IIO when
+ * sensor is ready.
+ * It must be called at the end of the sensor probe routine.
+ * @dev: device created for the sensor
+ * @indio_dev: iio device structure of the device
+ * @push_data: function to call when cros_ec_sensorhub receives
+ * a sample for that sensor.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+int cros_ec_sensors_core_register(struct device *dev,
+ struct iio_dev *indio_dev,
+ cros_ec_sensorhub_push_data_cb_t push_data)
+{
+ struct cros_ec_sensor_platform *sensor_platform = dev_get_platdata(dev);
+ struct cros_ec_sensorhub *sensor_hub = dev_get_drvdata(dev->parent);
+ struct platform_device *pdev = to_platform_device(dev);
+ struct cros_ec_dev *ec = sensor_hub->ec;
+ int ret;
+
+ ret = devm_iio_device_register(dev, indio_dev);
+ if (ret)
+ return ret;
+
+ if (!push_data ||
+ !cros_ec_check_features(ec, EC_FEATURE_MOTION_SENSE_FIFO))
+ return 0;
+
+ ret = cros_ec_sensorhub_register_push_data(
+ sensor_hub, sensor_platform->sensor_num,
+ indio_dev, push_data);
+ if (ret)
+ return ret;
+
+ return devm_add_action_or_reset(
+ dev, cros_ec_sensors_core_clean, pdev);
+}
+EXPORT_SYMBOL_GPL(cros_ec_sensors_core_register);
+
/**
* cros_ec_motion_send_host_cmd() - send motion sense host command
* @state: pointer to state information for device
diff --git a/drivers/iio/common/ssp_sensors/ssp.h b/drivers/iio/common/ssp_sensors/ssp.h
index abb832795619..f649cdecc277 100644
--- a/drivers/iio/common/ssp_sensors/ssp.h
+++ b/drivers/iio/common/ssp_sensors/ssp.h
@@ -221,8 +221,7 @@ struct ssp_data {
struct iio_dev *sensor_devs[SSP_SENSOR_MAX];
atomic_t enable_refcount;

- __le16 header_buffer[SSP_HEADER_BUFFER_SIZE / sizeof(__le16)]
- ____cacheline_aligned;
+ __le16 header_buffer[SSP_HEADER_BUFFER_SIZE / sizeof(__le16)] __aligned(IIO_DMA_MINALIGN);
};

void ssp_clean_pending_list(struct ssp_data *data);
diff --git a/drivers/iio/dac/ad5064.c b/drivers/iio/dac/ad5064.c
index 27ee2c63c5d4..e3b1add34f46 100644
--- a/drivers/iio/dac/ad5064.c
+++ b/drivers/iio/dac/ad5064.c
@@ -115,13 +115,13 @@ struct ad5064_state {
struct mutex lock;

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/
union {
u8 i2c[3];
__be32 spi;
- } data ____cacheline_aligned;
+ } data __aligned(IIO_DMA_MINALIGN);
};

enum ad5064_type {
diff --git a/drivers/iio/dac/ad5360.c b/drivers/iio/dac/ad5360.c
index ecbc6a51d60f..1bde696a572c 100644
--- a/drivers/iio/dac/ad5360.c
+++ b/drivers/iio/dac/ad5360.c
@@ -79,13 +79,13 @@ struct ad5360_state {
struct mutex lock;

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/
union {
__be32 d32;
u8 d8[4];
- } data[2] ____cacheline_aligned;
+ } data[2] __aligned(IIO_DMA_MINALIGN);
};

enum ad5360_type {
diff --git a/drivers/iio/dac/ad5421.c b/drivers/iio/dac/ad5421.c
index eedf661d32b2..7644acfd879e 100644
--- a/drivers/iio/dac/ad5421.c
+++ b/drivers/iio/dac/ad5421.c
@@ -72,13 +72,13 @@ struct ad5421_state {
struct mutex lock;

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/
union {
__be32 d32;
u8 d8[4];
- } data[2] ____cacheline_aligned;
+ } data[2] __aligned(IIO_DMA_MINALIGN);
};

static const struct iio_event_spec ad5421_current_event[] = {
diff --git a/drivers/iio/dac/ad5449.c b/drivers/iio/dac/ad5449.c
index bad9bdaafa94..4572d6f49275 100644
--- a/drivers/iio/dac/ad5449.c
+++ b/drivers/iio/dac/ad5449.c
@@ -68,10 +68,10 @@ struct ad5449 {
uint16_t dac_cache[AD5449_MAX_CHANNELS];

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/
- __be16 data[2] ____cacheline_aligned;
+ __be16 data[2] __aligned(IIO_DMA_MINALIGN);
};

enum ad5449_type {
diff --git a/drivers/iio/dac/ad5504.c b/drivers/iio/dac/ad5504.c
index 8507573aa13e..e472c9564edf 100644
--- a/drivers/iio/dac/ad5504.c
+++ b/drivers/iio/dac/ad5504.c
@@ -54,7 +54,7 @@ struct ad5504_state {
unsigned pwr_down_mask;
unsigned pwr_down_mode;

- __be16 data[2] ____cacheline_aligned;
+ __be16 data[2] __aligned(IIO_DMA_MINALIGN);
};

/*
diff --git a/drivers/iio/dac/ad5592r-base.h b/drivers/iio/dac/ad5592r-base.h
index 2a22ef691996..cc7be426cbc8 100644
--- a/drivers/iio/dac/ad5592r-base.h
+++ b/drivers/iio/dac/ad5592r-base.h
@@ -14,6 +14,8 @@
#include <linux/mutex.h>
#include <linux/gpio/driver.h>

+#include <linux/iio/iio.h>
+
struct device;
struct ad5592r_state;

@@ -65,7 +67,7 @@ struct ad5592r_state {
u8 gpio_in;
u8 gpio_val;

- __be16 spi_msg ____cacheline_aligned;
+ __be16 spi_msg __aligned(IIO_DMA_MINALIGN);
__be16 spi_msg_nop;
};

diff --git a/drivers/iio/dac/ad5686.h b/drivers/iio/dac/ad5686.h
index cd5fff9e9d53..b7ade3a6b9b6 100644
--- a/drivers/iio/dac/ad5686.h
+++ b/drivers/iio/dac/ad5686.h
@@ -13,6 +13,8 @@
#include <linux/mutex.h>
#include <linux/kernel.h>

+#include <linux/iio/iio.h>
+
#define AD5310_CMD(x) ((x) << 12)

#define AD5683_DATA(x) ((x) << 4)
@@ -137,7 +139,7 @@ struct ad5686_state {
struct mutex lock;

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/

@@ -145,7 +147,7 @@ struct ad5686_state {
__be32 d32;
__be16 d16;
u8 d8[4];
- } data[3] ____cacheline_aligned;
+ } data[3] __aligned(IIO_DMA_MINALIGN);
};

diff --git a/drivers/iio/dac/ad5755.c b/drivers/iio/dac/ad5755.c
index 7a62e6e1d5f1..bbb345323b69 100644
--- a/drivers/iio/dac/ad5755.c
+++ b/drivers/iio/dac/ad5755.c
@@ -189,14 +189,14 @@ struct ad5755_state {
struct mutex lock;

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/

union {
__be32 d32;
u8 d8[4];
- } data[2] ____cacheline_aligned;
+ } data[2] __aligned(IIO_DMA_MINALIGN);
};

enum ad5755_type {
diff --git a/drivers/iio/dac/ad5761.c b/drivers/iio/dac/ad5761.c
index 4cb8471db81e..6aa1a068adb0 100644
--- a/drivers/iio/dac/ad5761.c
+++ b/drivers/iio/dac/ad5761.c
@@ -70,13 +70,13 @@ struct ad5761_state {
enum ad5761_voltage_range range;

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/
union {
__be32 d32;
u8 d8[4];
- } data[3] ____cacheline_aligned;
+ } data[3] __aligned(IIO_DMA_MINALIGN);
};

static const struct ad5761_range_params ad5761_range_params[] = {
diff --git a/drivers/iio/dac/ad5764.c b/drivers/iio/dac/ad5764.c
index d235a8047ba0..26c049d5b73a 100644
--- a/drivers/iio/dac/ad5764.c
+++ b/drivers/iio/dac/ad5764.c
@@ -56,13 +56,13 @@ struct ad5764_state {
struct mutex lock;

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/
union {
__be32 d32;
u8 d8[4];
- } data[2] ____cacheline_aligned;
+ } data[2] __aligned(IIO_DMA_MINALIGN);
};

enum ad5764_type {
diff --git a/drivers/iio/dac/ad5766.c b/drivers/iio/dac/ad5766.c
index 43189af2fb1f..899894523752 100644
--- a/drivers/iio/dac/ad5766.c
+++ b/drivers/iio/dac/ad5766.c
@@ -123,7 +123,7 @@ struct ad5766_state {
u32 d32;
u16 w16[2];
u8 b8[4];
- } data[3] ____cacheline_aligned;
+ } data[3] __aligned(IIO_DMA_MINALIGN);
};

struct ad5766_span_tbl {
diff --git a/drivers/iio/dac/ad5770r.c b/drivers/iio/dac/ad5770r.c
index 7e2fd32e993a..f66d67402e43 100644
--- a/drivers/iio/dac/ad5770r.c
+++ b/drivers/iio/dac/ad5770r.c
@@ -140,7 +140,7 @@ struct ad5770r_state {
bool ch_pwr_down[AD5770R_MAX_CHANNELS];
bool internal_ref;
bool external_res;
- u8 transf_buf[2] ____cacheline_aligned;
+ u8 transf_buf[2] __aligned(IIO_DMA_MINALIGN);
};

static const struct regmap_config ad5770r_spi_regmap_config = {
diff --git a/drivers/iio/dac/ad5791.c b/drivers/iio/dac/ad5791.c
index 2b14914b4050..ee7e89774d9d 100644
--- a/drivers/iio/dac/ad5791.c
+++ b/drivers/iio/dac/ad5791.c
@@ -95,7 +95,7 @@ struct ad5791_state {
union {
__be32 d32;
u8 d8[4];
- } data[3] ____cacheline_aligned;
+ } data[3] __aligned(IIO_DMA_MINALIGN);
};

enum ad5791_supported_device_ids {
diff --git a/drivers/iio/dac/ad7293.c b/drivers/iio/dac/ad7293.c
index 59a38ca4c3c7..06f05750d921 100644
--- a/drivers/iio/dac/ad7293.c
+++ b/drivers/iio/dac/ad7293.c
@@ -144,7 +144,7 @@ struct ad7293_state {
struct regulator *reg_avdd;
struct regulator *reg_vdrive;
u8 page_select;
- u8 data[3] ____cacheline_aligned;
+ u8 data[3] __aligned(IIO_DMA_MINALIGN);
};

static int ad7293_page_select(struct ad7293_state *st, unsigned int reg)
diff --git a/drivers/iio/dac/ad7303.c b/drivers/iio/dac/ad7303.c
index 91eaaf793b3e..558af7926a89 100644
--- a/drivers/iio/dac/ad7303.c
+++ b/drivers/iio/dac/ad7303.c
@@ -44,10 +44,10 @@ struct ad7303_state {

struct mutex lock;
/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/
- __be16 data ____cacheline_aligned;
+ __be16 data __aligned(IIO_DMA_MINALIGN);
};

static int ad7303_write(struct ad7303_state *st, unsigned int chan,
diff --git a/drivers/iio/dac/ad8801.c b/drivers/iio/dac/ad8801.c
index 6be35c92d435..919e8c880697 100644
--- a/drivers/iio/dac/ad8801.c
+++ b/drivers/iio/dac/ad8801.c
@@ -26,7 +26,7 @@ struct ad8801_state {
struct regulator *vrefh_reg;
struct regulator *vrefl_reg;

- __be16 data ____cacheline_aligned;
+ __be16 data __aligned(IIO_DMA_MINALIGN);
};

static int ad8801_spi_write(struct ad8801_state *state,
diff --git a/drivers/iio/dac/ltc2688.c b/drivers/iio/dac/ltc2688.c
index 2f9c384885f4..04ec38b1cad1 100644
--- a/drivers/iio/dac/ltc2688.c
+++ b/drivers/iio/dac/ltc2688.c
@@ -91,10 +91,10 @@ struct ltc2688_state {
struct mutex lock;
int vref;
/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/
- u8 tx_data[6] ____cacheline_aligned;
+ u8 tx_data[6] __aligned(IIO_DMA_MINALIGN);
u8 rx_data[3];
};

diff --git a/drivers/iio/dac/mcp4922.c b/drivers/iio/dac/mcp4922.c
index cb9e60e71b91..6c0e31032c57 100644
--- a/drivers/iio/dac/mcp4922.c
+++ b/drivers/iio/dac/mcp4922.c
@@ -29,7 +29,7 @@ struct mcp4922_state {
unsigned int value[MCP4922_NUM_CHANNELS];
unsigned int vref_mv;
struct regulator *vref_reg;
- u8 mosi[2] ____cacheline_aligned;
+ u8 mosi[2] __aligned(IIO_DMA_MINALIGN);
};

#define MCP4922_CHAN(chan, bits) { \
diff --git a/drivers/iio/dac/ti-dac082s085.c b/drivers/iio/dac/ti-dac082s085.c
index 4e1156e6deb2..f52ad8fa7747 100644
--- a/drivers/iio/dac/ti-dac082s085.c
+++ b/drivers/iio/dac/ti-dac082s085.c
@@ -55,7 +55,7 @@ struct ti_dac_chip {
bool powerdown;
u8 powerdown_mode;
u8 resolution;
- u8 buf[2] ____cacheline_aligned;
+ u8 buf[2] __aligned(IIO_DMA_MINALIGN);
};

#define WRITE_NOT_UPDATE(chan) (0x00 | (chan) << 6)
diff --git a/drivers/iio/dac/ti-dac5571.c b/drivers/iio/dac/ti-dac5571.c
index 0b775f943db3..ba68ea1e3455 100644
--- a/drivers/iio/dac/ti-dac5571.c
+++ b/drivers/iio/dac/ti-dac5571.c
@@ -52,7 +52,7 @@ struct dac5571_data {
struct dac5571_spec const *spec;
int (*dac5571_cmd)(struct dac5571_data *data, int channel, u16 val);
int (*dac5571_pwrdwn)(struct dac5571_data *data, int channel, u8 pwrdwn);
- u8 buf[3] ____cacheline_aligned;
+ u8 buf[3] __aligned(IIO_DMA_MINALIGN);
};

#define DAC5571_POWERDOWN(mode) ((mode) + 1)
diff --git a/drivers/iio/dac/ti-dac7311.c b/drivers/iio/dac/ti-dac7311.c
index e10d17e60ed3..dd0a7b77838d 100644
--- a/drivers/iio/dac/ti-dac7311.c
+++ b/drivers/iio/dac/ti-dac7311.c
@@ -52,7 +52,7 @@ struct ti_dac_chip {
bool powerdown;
u8 powerdown_mode;
u8 resolution;
- u8 buf[2] ____cacheline_aligned;
+ u8 buf[2] __aligned(IIO_DMA_MINALIGN);
};

static u8 ti_dac_get_power(struct ti_dac_chip *ti_dac, bool powerdown)
diff --git a/drivers/iio/dac/ti-dac7612.c b/drivers/iio/dac/ti-dac7612.c
index 4c0f4b5e9ff4..8195815de26f 100644
--- a/drivers/iio/dac/ti-dac7612.c
+++ b/drivers/iio/dac/ti-dac7612.c
@@ -31,10 +31,10 @@ struct dac7612 {
struct mutex lock;

/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/
- uint8_t data[2] ____cacheline_aligned;
+ uint8_t data[2] __aligned(IIO_DMA_MINALIGN);
};

static int dac7612_cmd_single(struct dac7612 *priv, int channel, u16 val)
diff --git a/drivers/iio/frequency/ad9523.c b/drivers/iio/frequency/ad9523.c
index a0f92c336fc4..31c97f9f2c1b 100644
--- a/drivers/iio/frequency/ad9523.c
+++ b/drivers/iio/frequency/ad9523.c
@@ -287,13 +287,13 @@ struct ad9523_state {
struct mutex lock;

/*
- * DMA (thus cache coherency maintenance) requires the
- * transfer buffers to live in their own cache lines.
+ * DMA (thus cache coherency maintenance) may require that
+ * transfer buffers live in their own cache lines.
*/
union {
__be32 d32;
u8 d8[4];
- } data[2] ____cacheline_aligned;
+ } data[2] __aligned(IIO_DMA_MINALIGN);
};

static int ad9523_read(struct iio_dev *indio_dev, unsigned int addr)
diff --git a/drivers/iio/frequency/adf4350.c b/drivers/iio/frequency/adf4350.c
index be1218d86291..85e289700c3c 100644
--- a/drivers/iio/frequency/adf4350.c
+++ b/drivers/iio/frequency/adf4350.c
@@ -56,10 +56,10 @@ struct adf4350_state {
*/
struct mutex lock;
/*
- * DMA (thus cache coherency maintenance) requires the
- * transfer buffers to live in their own cache lines.
+ * DMA (thus cache coherency maintenance) may require that
+ * transfer buffers live in their own cache lines.
*/
- __be32 val ____cacheline_aligned;
+ __be32 val __aligned(IIO_DMA_MINALIGN);
};

static struct adf4350_platform_data default_pdata = {
diff --git a/drivers/iio/frequency/adf4371.c b/drivers/iio/frequency/adf4371.c
index ecd5e18995ad..135c8cedc33d 100644
--- a/drivers/iio/frequency/adf4371.c
+++ b/drivers/iio/frequency/adf4371.c
@@ -175,7 +175,7 @@ struct adf4371_state {
unsigned int mod2;
unsigned int rf_div_sel;
unsigned int ref_div_factor;
- u8 buf[10] ____cacheline_aligned;
+ u8 buf[10] __aligned(IIO_DMA_MINALIGN);
};

static unsigned long long adf4371_pll_fract_n_get_rate(struct adf4371_state *st,
diff --git a/drivers/iio/frequency/admv1013.c b/drivers/iio/frequency/admv1013.c
index b0e1f6571afb..ed8167271358 100644
--- a/drivers/iio/frequency/admv1013.c
+++ b/drivers/iio/frequency/admv1013.c
@@ -100,7 +100,7 @@ struct admv1013_state {
unsigned int input_mode;
unsigned int quad_se_mode;
bool det_en;
- u8 data[3] ____cacheline_aligned;
+ u8 data[3] __aligned(IIO_DMA_MINALIGN);
};

static int __admv1013_spi_read(struct admv1013_state *st, unsigned int reg,
diff --git a/drivers/iio/frequency/admv1014.c b/drivers/iio/frequency/admv1014.c
index a7994f8e6b9b..d1ccaa7ed5fe 100644
--- a/drivers/iio/frequency/admv1014.c
+++ b/drivers/iio/frequency/admv1014.c
@@ -127,7 +127,7 @@ struct admv1014_state {
unsigned int quad_se_mode;
unsigned int p1db_comp;
bool det_en;
- u8 data[3] ____cacheline_aligned;
+ u8 data[3] __aligned(IIO_DMA_MINALIGN);
};

static const int mixer_vgate_table[] = {106, 107, 108, 110, 111, 112, 113, 114,
diff --git a/drivers/iio/frequency/admv4420.c b/drivers/iio/frequency/admv4420.c
index 51134aee8510..863ba8e98c95 100644
--- a/drivers/iio/frequency/admv4420.c
+++ b/drivers/iio/frequency/admv4420.c
@@ -113,7 +113,7 @@ struct admv4420_state {
struct admv4420_n_counter n_counter;
enum admv4420_mux_sel mux_sel;
struct mutex lock;
- u8 transf_buf[4] ____cacheline_aligned;
+ u8 transf_buf[4] __aligned(IIO_DMA_MINALIGN);
};

static const struct regmap_config admv4420_regmap_config = {
diff --git a/drivers/iio/frequency/adrf6780.c b/drivers/iio/frequency/adrf6780.c
index 8255ffd174f6..21878bad0909 100644
--- a/drivers/iio/frequency/adrf6780.c
+++ b/drivers/iio/frequency/adrf6780.c
@@ -86,7 +86,7 @@ struct adrf6780_state {
bool uc_bias_en;
bool lo_sideband;
bool vdet_out_en;
- u8 data[3] ____cacheline_aligned;
+ u8 data[3] __aligned(IIO_DMA_MINALIGN);
};

static int __adrf6780_spi_read(struct adrf6780_state *st, unsigned int reg,
diff --git a/drivers/iio/gyro/adis16080.c b/drivers/iio/gyro/adis16080.c
index acef59d822b1..14b3abf6dce9 100644
--- a/drivers/iio/gyro/adis16080.c
+++ b/drivers/iio/gyro/adis16080.c
@@ -45,7 +45,7 @@ struct adis16080_state {
const struct adis16080_chip_info *info;
struct mutex lock;

- __be16 buf ____cacheline_aligned;
+ __be16 buf __aligned(IIO_DMA_MINALIGN);
};

static int adis16080_read_sample(struct iio_dev *indio_dev,
diff --git a/drivers/iio/gyro/adis16130.c b/drivers/iio/gyro/adis16130.c
index b9c952e65b55..33cde9e6fca5 100644
--- a/drivers/iio/gyro/adis16130.c
+++ b/drivers/iio/gyro/adis16130.c
@@ -41,7 +41,7 @@
struct adis16130_state {
struct spi_device *us;
struct mutex buf_lock;
- u8 buf[4] ____cacheline_aligned;
+ u8 buf[4] __aligned(IIO_DMA_MINALIGN);
};

static int adis16130_spi_read(struct iio_dev *indio_dev, u8 reg_addr, u32 *val)
diff --git a/drivers/iio/gyro/adxrs450.c b/drivers/iio/gyro/adxrs450.c
index 04f350025215..f84438e0c42c 100644
--- a/drivers/iio/gyro/adxrs450.c
+++ b/drivers/iio/gyro/adxrs450.c
@@ -73,7 +73,7 @@ enum {
struct adxrs450_state {
struct spi_device *us;
struct mutex buf_lock;
- __be32 tx ____cacheline_aligned;
+ __be32 tx __aligned(IIO_DMA_MINALIGN);
__be32 rx;

};
diff --git a/drivers/iio/gyro/fxas21002c_core.c b/drivers/iio/gyro/fxas21002c_core.c
index 410e5e9f2672..7a459a823f6e 100644
--- a/drivers/iio/gyro/fxas21002c_core.c
+++ b/drivers/iio/gyro/fxas21002c_core.c
@@ -150,10 +150,10 @@ struct fxas21002c_data {
struct regulator *vddio;

/*
- * DMA (thus cache coherency maintenance) requires the
- * transfer buffers to live in their own cache lines.
+ * DMA (thus cache coherency maintenance) may require the
+ * transfer buffers live in their own cache lines.
*/
- s16 buffer[8] ____cacheline_aligned;
+ s16 buffer[8] __aligned(IIO_DMA_MINALIGN);
};

enum fxas21002c_channel_index {
diff --git a/drivers/iio/imu/fxos8700_core.c b/drivers/iio/imu/fxos8700_core.c
index ab288186f36e..423cfe526f2a 100644
--- a/drivers/iio/imu/fxos8700_core.c
+++ b/drivers/iio/imu/fxos8700_core.c
@@ -167,7 +167,7 @@
struct fxos8700_data {
struct regmap *regmap;
struct iio_trigger *trig;
- __be16 buf[FXOS8700_DATA_BUF_SIZE] ____cacheline_aligned;
+ __be16 buf[FXOS8700_DATA_BUF_SIZE] __aligned(IIO_DMA_MINALIGN);
};

/* Regmap info */
diff --git a/drivers/iio/imu/inv_icm42600/inv_icm42600.h b/drivers/iio/imu/inv_icm42600/inv_icm42600.h
index 995a9dc06521..3d91469beccb 100644
--- a/drivers/iio/imu/inv_icm42600/inv_icm42600.h
+++ b/drivers/iio/imu/inv_icm42600/inv_icm42600.h
@@ -141,7 +141,7 @@ struct inv_icm42600_state {
struct inv_icm42600_suspended suspended;
struct iio_dev *indio_gyro;
struct iio_dev *indio_accel;
- uint8_t buffer[2] ____cacheline_aligned;
+ uint8_t buffer[2] __aligned(IIO_DMA_MINALIGN);
struct inv_icm42600_fifo fifo;
struct {
int64_t gyro;
diff --git a/drivers/iio/imu/inv_icm42600/inv_icm42600_buffer.h b/drivers/iio/imu/inv_icm42600/inv_icm42600_buffer.h
index de2a3949dcc7..8b85ee333bf8 100644
--- a/drivers/iio/imu/inv_icm42600/inv_icm42600_buffer.h
+++ b/drivers/iio/imu/inv_icm42600/inv_icm42600_buffer.h
@@ -39,7 +39,7 @@ struct inv_icm42600_fifo {
size_t accel;
size_t total;
} nb;
- uint8_t data[2080] ____cacheline_aligned;
+ uint8_t data[2080] __aligned(IIO_DMA_MINALIGN);
};

/* FIFO data packet */
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h b/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h
index c6aa36ee966a..32b58b797d57 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h
@@ -203,7 +203,7 @@ struct inv_mpu6050_state {
s32 magn_raw_to_gauss[3];
struct iio_mount_matrix magn_orient;
unsigned int suspended_sensors;
- u8 data[INV_MPU6050_OUTPUT_DATA_SIZE] ____cacheline_aligned;
+ u8 data[INV_MPU6050_OUTPUT_DATA_SIZE] __aligned(IIO_DMA_MINALIGN);
};

/*register and associated bit definition*/
diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c
index b078eb2f3c9d..01369973329a 100644
--- a/drivers/iio/industrialio-buffer.c
+++ b/drivers/iio/industrialio-buffer.c
@@ -915,7 +915,7 @@ static int iio_verify_update(struct iio_dev *indio_dev,
if (scan_mask == NULL)
return -EINVAL;
} else {
- scan_mask = compound_mask;
+ scan_mask = compound_mask;
}

config->scan_bytes = iio_compute_scan_bytes(indio_dev,
@@ -1649,7 +1649,7 @@ static int __iio_buffer_alloc_sysfs_and_mask(struct iio_buffer *buffer,
}

attrn = buffer_attrcount + scan_el_attrcount + ARRAY_SIZE(iio_buffer_attrs);
- attr = kcalloc(attrn + 1, sizeof(* attr), GFP_KERNEL);
+ attr = kcalloc(attrn + 1, sizeof(*attr), GFP_KERNEL);
if (!attr) {
ret = -ENOMEM;
goto error_free_scan_mask;
diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index e1ed44dec2ab..e5aed96de8f3 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -821,7 +821,23 @@ static ssize_t iio_format_avail_list(char *buf, const int *vals,

static ssize_t iio_format_avail_range(char *buf, const int *vals, int type)
{
- return iio_format_list(buf, vals, type, 3, "[", "]");
+ int length;
+
+ /*
+ * length refers to the array size , not the number of elements.
+ * The purpose is to print the range [min , step ,max] so length should
+ * be 3 in case of int, and 6 for other types.
+ */
+ switch (type) {
+ case IIO_VAL_INT:
+ length = 3;
+ break;
+ default:
+ length = 6;
+ break;
+ }
+
+ return iio_format_list(buf, vals, type, length, "[", "]");
}

static ssize_t iio_read_channel_info_avail(struct device *dev,
@@ -892,8 +908,7 @@ static int __iio_str_to_fixpoint(const char *str, int fract_mult,
} else if (*str == '\n') {
if (*(str + 1) == '\0')
break;
- else
- return -EINVAL;
+ return -EINVAL;
} else if (!strncmp(str, " dB", sizeof(" dB") - 1) && scale_db) {
/* Ignore the dB suffix */
str += sizeof(" dB") - 1;
@@ -1640,7 +1655,7 @@ struct iio_dev *iio_device_alloc(struct device *parent, int sizeof_priv)

alloc_size = sizeof(struct iio_dev_opaque);
if (sizeof_priv) {
- alloc_size = ALIGN(alloc_size, IIO_ALIGN);
+ alloc_size = ALIGN(alloc_size, IIO_DMA_MINALIGN);
alloc_size += sizeof_priv;
}

@@ -1650,7 +1665,7 @@ struct iio_dev *iio_device_alloc(struct device *parent, int sizeof_priv)

indio_dev = &iio_dev_opaque->indio_dev;
indio_dev->priv = (char *)iio_dev_opaque +
- ALIGN(sizeof(struct iio_dev_opaque), IIO_ALIGN);
+ ALIGN(sizeof(struct iio_dev_opaque), IIO_DMA_MINALIGN);

indio_dev->dev.parent = parent;
indio_dev->dev.type = &iio_device_type;
diff --git a/drivers/iio/light/cros_ec_light_prox.c b/drivers/iio/light/cros_ec_light_prox.c
index de472f23d1cb..16b893bae388 100644
--- a/drivers/iio/light/cros_ec_light_prox.c
+++ b/drivers/iio/light/cros_ec_light_prox.c
@@ -181,8 +181,7 @@ static int cros_ec_light_prox_probe(struct platform_device *pdev)
return -ENOMEM;

ret = cros_ec_sensors_core_init(pdev, indio_dev, true,
- cros_ec_sensors_capture,
- cros_ec_sensors_push_data);
+ cros_ec_sensors_capture);
if (ret)
return ret;

@@ -240,7 +239,8 @@ static int cros_ec_light_prox_probe(struct platform_device *pdev)

state->core.read_ec_sensors_data = cros_ec_sensors_read_cmd;

- return devm_iio_device_register(dev, indio_dev);
+ return cros_ec_sensors_core_register(dev, indio_dev,
+ cros_ec_sensors_push_data);
}

static const struct platform_device_id cros_ec_light_prox_ids[] = {
diff --git a/drivers/iio/light/isl29028.c b/drivers/iio/light/isl29028.c
index 9de3262aa688..a62787f5d5e7 100644
--- a/drivers/iio/light/isl29028.c
+++ b/drivers/iio/light/isl29028.c
@@ -625,7 +625,7 @@ static int isl29028_probe(struct i2c_client *client,
ISL29028_POWER_OFF_DELAY_MS);
pm_runtime_use_autosuspend(&client->dev);

- ret = devm_iio_device_register(indio_dev->dev.parent, indio_dev);
+ ret = iio_device_register(indio_dev);
if (ret < 0) {
dev_err(&client->dev,
"%s(): iio registration failed with error %d\n",
diff --git a/drivers/iio/potentiometer/ad5110.c b/drivers/iio/potentiometer/ad5110.c
index d4eeedae56e5..8fbcce482989 100644
--- a/drivers/iio/potentiometer/ad5110.c
+++ b/drivers/iio/potentiometer/ad5110.c
@@ -63,10 +63,10 @@ struct ad5110_data {
struct mutex lock;
const struct ad5110_cfg *cfg;
/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
*/
- u8 buf[2] ____cacheline_aligned;
+ u8 buf[2] __aligned(IIO_DMA_MINALIGN);
};

static const struct iio_chan_spec ad5110_channels[] = {
diff --git a/drivers/iio/potentiometer/ad5272.c b/drivers/iio/potentiometer/ad5272.c
index d8cbd170262f..ed5fc0b50fe9 100644
--- a/drivers/iio/potentiometer/ad5272.c
+++ b/drivers/iio/potentiometer/ad5272.c
@@ -50,7 +50,7 @@ struct ad5272_data {
struct i2c_client *client;
struct mutex lock;
const struct ad5272_cfg *cfg;
- u8 buf[2] ____cacheline_aligned;
+ u8 buf[2] __aligned(IIO_DMA_MINALIGN);
};

static const struct iio_chan_spec ad5272_channel = {
diff --git a/drivers/iio/potentiometer/max5481.c b/drivers/iio/potentiometer/max5481.c
index 098d144a8fdd..b40e5ac218d7 100644
--- a/drivers/iio/potentiometer/max5481.c
+++ b/drivers/iio/potentiometer/max5481.c
@@ -44,7 +44,7 @@ static const struct max5481_cfg max5481_cfg[] = {
struct max5481_data {
struct spi_device *spi;
const struct max5481_cfg *cfg;
- u8 msg[3] ____cacheline_aligned;
+ u8 msg[3] __aligned(IIO_DMA_MINALIGN);
};

#define MAX5481_CHANNEL { \
diff --git a/drivers/iio/potentiometer/mcp41010.c b/drivers/iio/potentiometer/mcp41010.c
index 30a4594d4e11..2b73c7540209 100644
--- a/drivers/iio/potentiometer/mcp41010.c
+++ b/drivers/iio/potentiometer/mcp41010.c
@@ -60,7 +60,7 @@ struct mcp41010_data {
const struct mcp41010_cfg *cfg;
struct mutex lock; /* Protect write sequences */
unsigned int value[MCP41010_MAX_WIPERS]; /* Cache wiper values */
- u8 buf[2] ____cacheline_aligned;
+ u8 buf[2] __aligned(IIO_DMA_MINALIGN);
};

#define MCP41010_CHANNEL(ch) { \
diff --git a/drivers/iio/potentiometer/mcp4131.c b/drivers/iio/potentiometer/mcp4131.c
index 7c8c18ab8764..7890c0993ec4 100644
--- a/drivers/iio/potentiometer/mcp4131.c
+++ b/drivers/iio/potentiometer/mcp4131.c
@@ -129,7 +129,7 @@ struct mcp4131_data {
struct spi_device *spi;
const struct mcp4131_cfg *cfg;
struct mutex lock;
- u8 buf[2] ____cacheline_aligned;
+ u8 buf[2] __aligned(IIO_DMA_MINALIGN);
};

#define MCP4131_CHANNEL(ch) { \
diff --git a/drivers/iio/pressure/cros_ec_baro.c b/drivers/iio/pressure/cros_ec_baro.c
index 2f882e109423..0511edbf868d 100644
--- a/drivers/iio/pressure/cros_ec_baro.c
+++ b/drivers/iio/pressure/cros_ec_baro.c
@@ -138,8 +138,7 @@ static int cros_ec_baro_probe(struct platform_device *pdev)
return -ENOMEM;

ret = cros_ec_sensors_core_init(pdev, indio_dev, true,
- cros_ec_sensors_capture,
- cros_ec_sensors_push_data);
+ cros_ec_sensors_capture);
if (ret)
return ret;

@@ -186,7 +185,8 @@ static int cros_ec_baro_probe(struct platform_device *pdev)

state->core.read_ec_sensors_data = cros_ec_sensors_read_cmd;

- return devm_iio_device_register(dev, indio_dev);
+ return cros_ec_sensors_core_register(dev, indio_dev,
+ cros_ec_sensors_push_data);
}

static const struct platform_device_id cros_ec_baro_ids[] = {
diff --git a/drivers/iio/proximity/as3935.c b/drivers/iio/proximity/as3935.c
index 67891ce2bd09..ebc95cf8f5f4 100644
--- a/drivers/iio/proximity/as3935.c
+++ b/drivers/iio/proximity/as3935.c
@@ -65,7 +65,7 @@ struct as3935_state {
u8 chan;
s64 timestamp __aligned(8);
} scan;
- u8 buf[2] ____cacheline_aligned;
+ u8 buf[2] __aligned(IIO_DMA_MINALIGN);
};

static const struct iio_chan_spec as3935_channels[] = {
diff --git a/drivers/iio/proximity/sx9324.c b/drivers/iio/proximity/sx9324.c
index 63fbcaa4cac8..a30ac8007a3d 100644
--- a/drivers/iio/proximity/sx9324.c
+++ b/drivers/iio/proximity/sx9324.c
@@ -93,7 +93,7 @@
#define SX9324_REG_PROX_CTRL4_AVGNEGFILT_MASK GENMASK(5, 3)
#define SX9324_REG_PROX_CTRL4_AVGNEG_FILT_2 0x08
#define SX9324_REG_PROX_CTRL4_AVGPOSFILT_MASK GENMASK(2, 0)
-#define SX9324_REG_PROX_CTRL3_AVGPOS_FILT_256 0x04
+#define SX9324_REG_PROX_CTRL4_AVGPOS_FILT_256 0x04
#define SX9324_REG_PROX_CTRL5 0x35
#define SX9324_REG_PROX_CTRL5_HYST_MASK GENMASK(5, 4)
#define SX9324_REG_PROX_CTRL5_CLOSE_DEBOUNCE_MASK GENMASK(3, 2)
@@ -810,7 +810,7 @@ static const struct sx_common_reg_default sx9324_default_regs[] = {
{ SX9324_REG_PROX_CTRL3, SX9324_REG_PROX_CTRL3_AVGDEB_2SAMPLES |
SX9324_REG_PROX_CTRL3_AVGPOS_THRESH_16K },
{ SX9324_REG_PROX_CTRL4, SX9324_REG_PROX_CTRL4_AVGNEG_FILT_2 |
- SX9324_REG_PROX_CTRL3_AVGPOS_FILT_256 },
+ SX9324_REG_PROX_CTRL4_AVGPOS_FILT_256 },
{ SX9324_REG_PROX_CTRL5, 0x00 },
{ SX9324_REG_PROX_CTRL6, SX9324_REG_PROX_CTRL6_PROXTHRESH_32 },
{ SX9324_REG_PROX_CTRL7, SX9324_REG_PROX_CTRL6_PROXTHRESH_32 },
diff --git a/drivers/iio/resolver/ad2s1200.c b/drivers/iio/resolver/ad2s1200.c
index 9746bd935628..9d95241bdf8f 100644
--- a/drivers/iio/resolver/ad2s1200.c
+++ b/drivers/iio/resolver/ad2s1200.c
@@ -41,7 +41,7 @@ struct ad2s1200_state {
struct spi_device *sdev;
struct gpio_desc *sample;
struct gpio_desc *rdvel;
- __be16 rx ____cacheline_aligned;
+ __be16 rx __aligned(IIO_DMA_MINALIGN);
};

static int ad2s1200_read_raw(struct iio_dev *indio_dev,
diff --git a/drivers/iio/resolver/ad2s90.c b/drivers/iio/resolver/ad2s90.c
index d6a91f137e13..be6836e55376 100644
--- a/drivers/iio/resolver/ad2s90.c
+++ b/drivers/iio/resolver/ad2s90.c
@@ -24,7 +24,7 @@
struct ad2s90_state {
struct mutex lock; /* lock to protect rx buffer */
struct spi_device *sdev;
- u8 rx[2] ____cacheline_aligned;
+ u8 rx[2] __aligned(IIO_DMA_MINALIGN);
};

static int ad2s90_read_raw(struct iio_dev *indio_dev,
diff --git a/drivers/iio/temperature/ltc2983.c b/drivers/iio/temperature/ltc2983.c
index 301c3f13fb26..1b8252d86889 100644
--- a/drivers/iio/temperature/ltc2983.c
+++ b/drivers/iio/temperature/ltc2983.c
@@ -200,11 +200,11 @@ struct ltc2983_data {
u8 num_channels;
u8 iio_channels;
/*
- * DMA (thus cache coherency maintenance) requires the
+ * DMA (thus cache coherency maintenance) may require the
* transfer buffers to live in their own cache lines.
* Holds the converted temperature
*/
- __be32 temp ____cacheline_aligned;
+ __be32 temp __aligned(IIO_DMA_MINALIGN);
};

struct ltc2983_sensor {
diff --git a/drivers/iio/temperature/max31865.c b/drivers/iio/temperature/max31865.c
index 86c3f3509a26..f0079e51d8d2 100644
--- a/drivers/iio/temperature/max31865.c
+++ b/drivers/iio/temperature/max31865.c
@@ -53,7 +53,7 @@ struct max31865_data {
struct mutex lock;
bool filter_50hz;
bool three_wire;
- u8 buf[2] ____cacheline_aligned;
+ u8 buf[2] __aligned(IIO_DMA_MINALIGN);
};

static int max31865_read(struct max31865_data *data, u8 reg,
diff --git a/drivers/iio/temperature/maxim_thermocouple.c b/drivers/iio/temperature/maxim_thermocouple.c
index 98c41cddc6f0..c28a7a6dea5f 100644
--- a/drivers/iio/temperature/maxim_thermocouple.c
+++ b/drivers/iio/temperature/maxim_thermocouple.c
@@ -122,7 +122,7 @@ struct maxim_thermocouple_data {
struct spi_device *spi;
const struct maxim_thermocouple_chip *chip;

- u8 buffer[16] ____cacheline_aligned;
+ u8 buffer[16] __aligned(IIO_DMA_MINALIGN);
char tc_type;
};

diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index 3ebdd42fec36..686d170a5947 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -1179,8 +1179,10 @@ static int setup_base_ctxt(struct hfi1_filedata *fd,
goto done;

ret = init_user_ctxt(fd, uctxt);
- if (ret)
+ if (ret) {
+ hfi1_free_ctxt_rcv_groups(uctxt);
goto done;
+ }

user_init(uctxt);

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index 86f6a4aae1e5..6e7dadc3386c 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -6125,8 +6125,8 @@ static irqreturn_t hns_roce_v2_msix_interrupt_abn(int irq, void *dev_id)

dev_err(dev, "AEQ overflow!\n");

- int_st |= 1 << HNS_ROCE_V2_VF_INT_ST_AEQ_OVERFLOW_S;
- roce_write(hr_dev, ROCEE_VF_ABN_INT_ST_REG, int_st);
+ roce_write(hr_dev, ROCEE_VF_ABN_INT_ST_REG,
+ 1 << HNS_ROCE_V2_VF_INT_ST_AEQ_OVERFLOW_S);

/* Set reset level for reset_event() */
if (ops->set_default_reset_request)
diff --git a/drivers/infiniband/hw/irdma/cm.c b/drivers/infiniband/hw/irdma/cm.c
index 646fa8677490..7b086fe63a24 100644
--- a/drivers/infiniband/hw/irdma/cm.c
+++ b/drivers/infiniband/hw/irdma/cm.c
@@ -1477,12 +1477,13 @@ irdma_find_listener(struct irdma_cm_core *cm_core, u32 *dst_addr, u16 dst_port,
list_for_each_entry (listen_node, &cm_core->listen_list, list) {
memcpy(listen_addr, listen_node->loc_addr, sizeof(listen_addr));
listen_port = listen_node->loc_port;
+ if (listen_port != dst_port ||
+ !(listener_state & listen_node->listener_state))
+ continue;
/* compare node pair, return node handle if a match */
- if ((!memcmp(listen_addr, dst_addr, sizeof(listen_addr)) ||
- !memcmp(listen_addr, ip_zero, sizeof(listen_addr))) &&
- listen_port == dst_port &&
- vlan_id == listen_node->vlan_id &&
- (listener_state & listen_node->listener_state)) {
+ if (!memcmp(listen_addr, ip_zero, sizeof(listen_addr)) ||
+ (!memcmp(listen_addr, dst_addr, sizeof(listen_addr)) &&
+ vlan_id == listen_node->vlan_id)) {
refcount_inc(&listen_node->refcnt);
spin_unlock_irqrestore(&cm_core->listen_list_lock,
flags);
diff --git a/drivers/infiniband/hw/irdma/hw.c b/drivers/infiniband/hw/irdma/hw.c
index 3dc9b5801da1..c1b402a06e17 100644
--- a/drivers/infiniband/hw/irdma/hw.c
+++ b/drivers/infiniband/hw/irdma/hw.c
@@ -257,10 +257,6 @@ static void irdma_process_aeq(struct irdma_pci_f *rf)
iwqp->last_aeq = info->ae_id;
spin_unlock_irqrestore(&iwqp->lock, flags);
ctx_info = &iwqp->ctx_info;
- if (rdma_protocol_roce(&iwqp->iwdev->ibdev, 1))
- ctx_info->roce_info->err_rq_idx_valid = true;
- else
- ctx_info->iwarp_info->err_rq_idx_valid = true;
} else {
if (info->ae_id != IRDMA_AE_CQ_OPERATION_ERROR)
continue;
@@ -370,16 +366,12 @@ static void irdma_process_aeq(struct irdma_pci_f *rf)
case IRDMA_AE_LCE_FUNCTION_CATASTROPHIC:
case IRDMA_AE_LCE_CQ_CATASTROPHIC:
case IRDMA_AE_UDA_XMIT_DGRAM_TOO_LONG:
- if (rdma_protocol_roce(&iwdev->ibdev, 1))
- ctx_info->roce_info->err_rq_idx_valid = false;
- else
- ctx_info->iwarp_info->err_rq_idx_valid = false;
- fallthrough;
default:
ibdev_err(&iwdev->ibdev, "abnormal ae_id = 0x%x bool qp=%d qp_id = %d\n",
info->ae_id, info->qp, info->qp_cq_id);
if (rdma_protocol_roce(&iwdev->ibdev, 1)) {
- if (!info->sq && ctx_info->roce_info->err_rq_idx_valid) {
+ ctx_info->roce_info->err_rq_idx_valid = info->rq;
+ if (info->rq) {
ctx_info->roce_info->err_rq_idx = info->wqe_idx;
irdma_sc_qp_setctx_roce(&iwqp->sc_qp, iwqp->host_ctx.va,
ctx_info);
@@ -388,7 +380,8 @@ static void irdma_process_aeq(struct irdma_pci_f *rf)
irdma_cm_disconn(iwqp);
break;
}
- if (!info->sq && ctx_info->iwarp_info->err_rq_idx_valid) {
+ ctx_info->iwarp_info->err_rq_idx_valid = info->rq;
+ if (info->rq) {
ctx_info->iwarp_info->err_rq_idx = info->wqe_idx;
ctx_info->tcp_info_valid = false;
ctx_info->iwarp_info_valid = true;
diff --git a/drivers/infiniband/hw/irdma/verbs.c b/drivers/infiniband/hw/irdma/verbs.c
index 6daa149dcbda..b29631f6659a 100644
--- a/drivers/infiniband/hw/irdma/verbs.c
+++ b/drivers/infiniband/hw/irdma/verbs.c
@@ -1760,11 +1760,11 @@ static int irdma_destroy_cq(struct ib_cq *ib_cq, struct ib_udata *udata)
spin_unlock_irqrestore(&iwcq->lock, flags);

irdma_cq_wq_destroy(iwdev->rf, cq);
- irdma_cq_free_rsrc(iwdev->rf, iwcq);

spin_lock_irqsave(&iwceq->ce_lock, flags);
irdma_sc_cleanup_ceqes(cq, ceq);
spin_unlock_irqrestore(&iwceq->ce_lock, flags);
+ irdma_cq_free_rsrc(iwdev->rf, iwcq);

return 0;
}
diff --git a/drivers/infiniband/hw/mlx5/fs.c b/drivers/infiniband/hw/mlx5/fs.c
index 661ed2b44508..6092118de672 100644
--- a/drivers/infiniband/hw/mlx5/fs.c
+++ b/drivers/infiniband/hw/mlx5/fs.c
@@ -2265,12 +2265,10 @@ static int mlx5_ib_matcher_ns(struct uverbs_attr_bundle *attrs,
if (err)
return err;

- if (flags) {
- mlx5_ib_ft_type_to_namespace(
+ if (flags)
+ return mlx5_ib_ft_type_to_namespace(
MLX5_IB_UAPI_FLOW_TABLE_TYPE_NIC_TX,
&obj->ns_type);
- return 0;
- }
}

obj->ns_type = MLX5_FLOW_NAMESPACE_BYPASS;
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index df4d7970c1ad..cc99b293f0be 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -3083,7 +3083,7 @@ static struct qedr_mr *__qedr_alloc_mr(struct ib_pd *ibpd,
else
DP_ERR(dev, "roce alloc tid returned error %d\n", rc);

- goto err0;
+ goto err1;
}

/* Index only, 18 bit long, lkey = itid << 8 | key */
@@ -3107,7 +3107,7 @@ static struct qedr_mr *__qedr_alloc_mr(struct ib_pd *ibpd,
rc = dev->ops->rdma_register_tid(dev->rdma_ctx, &mr->hw_mr);
if (rc) {
DP_ERR(dev, "roce register tid returned an error %d\n", rc);
- goto err1;
+ goto err2;
}

mr->ibmr.lkey = mr->hw_mr.itid << 8 | mr->hw_mr.key;
@@ -3116,8 +3116,10 @@ static struct qedr_mr *__qedr_alloc_mr(struct ib_pd *ibpd,
DP_DEBUG(dev, QEDR_MSG_MR, "alloc frmr: %x\n", mr->ibmr.lkey);
return mr;

-err1:
+err2:
dev->ops->rdma_free_tid(dev->rdma_ctx, mr->hw_mr.itid);
+err1:
+ qedr_free_pbl(dev, &mr->info.pbl_info, mr->info.pbl_table);
err0:
kfree(mr);
return ERR_PTR(rc);
diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 138b3e7d3a5f..ec671e171f13 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -114,6 +114,8 @@ void retransmit_timer(struct timer_list *t)
{
struct rxe_qp *qp = from_timer(qp, t, retrans_timer);

+ pr_debug("%s: fired for qp#%d\n", __func__, qp->elem.index);
+
if (qp->valid) {
qp->comp.timeout = 1;
rxe_run_task(&qp->comp.task, 1);
@@ -729,11 +731,15 @@ int rxe_completer(void *arg)
break;

case COMPST_RNR_RETRY:
+ /* we come here if we received an RNR NAK */
if (qp->comp.rnr_retry > 0) {
if (qp->comp.rnr_retry != 7)
qp->comp.rnr_retry--;

- qp->req.need_retry = 1;
+ /* don't start a retry flow until the
+ * rnr timer has fired
+ */
+ qp->req.wait_for_rnr_timer = 1;
pr_debug("qp#%d set rnr nak timer\n",
qp_num(qp));
mod_timer(&qp->rnr_nak_timer,
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 2ffbe3390668..2b0edf360474 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -77,7 +77,7 @@ struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key,
enum rxe_mr_lookup_type type);
int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length);
int advance_dma_data(struct rxe_dma_info *dma, unsigned int length);
-int rxe_invalidate_mr(struct rxe_qp *qp, u32 rkey);
+int rxe_invalidate_mr(struct rxe_qp *qp, u32 key);
int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe);
int rxe_mr_set_page(struct ib_mr *ibmr, u64 addr);
int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata);
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 60a31b718774..76d6498e83f9 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -576,22 +576,22 @@ struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key,
return mr;
}

-int rxe_invalidate_mr(struct rxe_qp *qp, u32 rkey)
+int rxe_invalidate_mr(struct rxe_qp *qp, u32 key)
{
struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
struct rxe_mr *mr;
int ret;

- mr = rxe_pool_get_index(&rxe->mr_pool, rkey >> 8);
+ mr = rxe_pool_get_index(&rxe->mr_pool, key >> 8);
if (!mr) {
- pr_err("%s: No MR for rkey %#x\n", __func__, rkey);
+ pr_err("%s: No MR for key %#x\n", __func__, key);
ret = -EINVAL;
goto err;
}

- if (rkey != mr->rkey) {
- pr_err("%s: rkey (%#x) doesn't match mr->rkey (%#x)\n",
- __func__, rkey, mr->rkey);
+ if (mr->rkey ? (key != mr->rkey) : (key != mr->lkey)) {
+ pr_err("%s: wr key (%#x) doesn't match mr key (%#x)\n",
+ __func__, key, (mr->rkey ? mr->rkey : mr->lkey));
ret = -EINVAL;
goto err_drop_ref;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c
index c86b2efd58f2..fc1416fc7feb 100644
--- a/drivers/infiniband/sw/rxe/rxe_mw.c
+++ b/drivers/infiniband/sw/rxe/rxe_mw.c
@@ -69,8 +69,6 @@ int rxe_dealloc_mw(struct ib_mw *ibmw)
static int rxe_check_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
struct rxe_mw *mw, struct rxe_mr *mr)
{
- u32 key = wqe->wr.wr.mw.rkey & 0xff;
-
if (mw->ibmw.type == IB_MW_TYPE_1) {
if (unlikely(mw->state != RXE_MW_STATE_VALID)) {
pr_err_once(
@@ -108,11 +106,6 @@ static int rxe_check_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
}
}

- if (unlikely(key == (mw->rkey & 0xff))) {
- pr_err_once("attempt to bind MW with same key\n");
- return -EINVAL;
- }
-
/* remaining checks only apply to a nonzero MR */
if (!mr)
return 0;
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 87066d04ed18..69db28944567 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -140,7 +140,7 @@ void *rxe_alloc(struct rxe_pool *pool)

err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
&pool->next, GFP_KERNEL);
- if (err)
+ if (err < 0)
goto err_free;

return obj;
@@ -168,7 +168,7 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)

err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
&pool->next, GFP_KERNEL);
- if (err)
+ if (err < 0)
goto err_cnt;

return 0;
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 62acf890af6c..4889dcae0cc8 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -186,6 +186,14 @@ static void rxe_qp_init_misc(struct rxe_dev *rxe, struct rxe_qp *qp,

spin_lock_init(&qp->state_lock);

+ spin_lock_init(&qp->req.task.state_lock);
+ spin_lock_init(&qp->resp.task.state_lock);
+ spin_lock_init(&qp->comp.task.state_lock);
+
+ spin_lock_init(&qp->sq.sq_lock);
+ spin_lock_init(&qp->rq.producer_lock);
+ spin_lock_init(&qp->rq.consumer_lock);
+
atomic_set(&qp->ssn, 0);
atomic_set(&qp->skb_out, 0);
}
@@ -245,7 +253,6 @@ static int rxe_qp_init_req(struct rxe_dev *rxe, struct rxe_qp *qp,
qp->req.opcode = -1;
qp->comp.opcode = -1;

- spin_lock_init(&qp->sq.sq_lock);
skb_queue_head_init(&qp->req_pkts);

rxe_init_task(rxe, &qp->req.task, qp,
@@ -296,9 +303,6 @@ static int rxe_qp_init_resp(struct rxe_dev *rxe, struct rxe_qp *qp,
}
}

- spin_lock_init(&qp->rq.producer_lock);
- spin_lock_init(&qp->rq.consumer_lock);
-
skb_queue_head_init(&qp->resp_pkts);

rxe_init_task(rxe, &qp->resp.task, qp,
@@ -513,6 +517,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
atomic_set(&qp->ssn, 0);
qp->req.opcode = -1;
qp->req.need_retry = 0;
+ qp->req.wait_for_rnr_timer = 0;
qp->req.noack_pkts = 0;
qp->resp.msn = 0;
qp->resp.opcode = -1;
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 8a1cff80a68e..90669b3c56af 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -103,7 +103,11 @@ void rnr_nak_timer(struct timer_list *t)
{
struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);

- pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
+ pr_debug("%s: fired for qp#%d\n", __func__, qp_num(qp));
+
+ /* request a send queue retry */
+ qp->req.need_retry = 1;
+ qp->req.wait_for_rnr_timer = 0;
rxe_run_task(&qp->req.task, 1);
}

@@ -586,9 +590,11 @@ static int rxe_do_local_ops(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
wqe->status = IB_WC_SUCCESS;
qp->req.wqe_index = queue_next_index(qp->sq.queue, qp->req.wqe_index);

- if ((wqe->wr.send_flags & IB_SEND_SIGNALED) ||
- qp->sq_sig_type == IB_SIGNAL_ALL_WR)
- rxe_run_task(&qp->comp.task, 1);
+ /* There is no ack coming for local work requests
+ * which can lead to a deadlock. So go ahead and complete
+ * it now.
+ */
+ rxe_run_task(&qp->comp.task, 1);

return 0;
}
@@ -624,10 +630,17 @@ int rxe_requester(void *arg)
qp->req.need_rd_atomic = 0;
qp->req.wait_psn = 0;
qp->req.need_retry = 0;
+ qp->req.wait_for_rnr_timer = 0;
goto exit;
}

- if (unlikely(qp->req.need_retry)) {
+ /* we come here if the retransmot timer has fired
+ * or if the rnr timer has fired. If the retransmit
+ * timer fires while we are processing an RNR NAK wait
+ * until the rnr timer has fired before starting the
+ * retry flow
+ */
+ if (unlikely(qp->req.need_retry && !qp->req.wait_for_rnr_timer)) {
req_retry(qp);
qp->req.need_retry = 0;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index e7eff1ca75e9..33e8d0547553 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -123,6 +123,7 @@ struct rxe_req_info {
int need_rd_atomic;
int wait_psn;
int need_retry;
+ int wait_for_rnr_timer;
int noack_pkts;
struct rxe_task task;
};
diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c
index 17f34d584cd9..f88d2971c2c6 100644
--- a/drivers/infiniband/sw/siw/siw_cm.c
+++ b/drivers/infiniband/sw/siw/siw_cm.c
@@ -725,11 +725,11 @@ static int siw_proc_mpareply(struct siw_cep *cep)
enum mpa_v2_ctrl mpa_p2p_mode = MPA_V2_RDMA_NO_RTR;

rv = siw_recv_mpa_rr(cep);
- if (rv != -EAGAIN)
- siw_cancel_mpatimer(cep);
if (rv)
goto out_err;

+ siw_cancel_mpatimer(cep);
+
rep = &cep->mpa.hdr;

if (__mpa_rr_revision(rep->params.bits) > MPA_REVISION_2) {
@@ -895,7 +895,8 @@ static int siw_proc_mpareply(struct siw_cep *cep)
}

out_err:
- siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL);
+ if (rv != -EAGAIN)
+ siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL);

return rv;
}
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
index f8d0bab4424c..e36036b8f386 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -568,7 +568,7 @@ static void iscsi_iser_session_destroy(struct iscsi_cls_session *cls_session)
struct Scsi_Host *shost = iscsi_session_to_shost(cls_session);

iscsi_session_teardown(cls_session);
- iscsi_host_remove(shost);
+ iscsi_host_remove(shost, false);
iscsi_host_free(shost);
}

@@ -685,7 +685,7 @@ iscsi_iser_session_create(struct iscsi_endpoint *ep,
return cls_session;

remove_host:
- iscsi_host_remove(shost);
+ iscsi_host_remove(shost, false);
free_host:
iscsi_host_free(shost);
return NULL;
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index c2c860d0c56e..65bad1dd917b 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -740,25 +740,25 @@ struct path_it {
struct rtrs_clt_path *(*next_path)(struct path_it *it);
};

-/**
- * list_next_or_null_rr_rcu - get next list element in round-robin fashion.
+/*
+ * rtrs_clt_get_next_path_or_null - get clt path from the list or return NULL
* @head: the head for the list.
- * @ptr: the list head to take the next element from.
- * @type: the type of the struct this is embedded in.
- * @memb: the name of the list_head within the struct.
+ * @clt_path: The element to take the next clt_path from.
*
- * Next element returned in round-robin fashion, i.e. head will be skipped,
+ * Next clt path returned in round-robin fashion, i.e. head will be skipped,
* but if list is observed as empty, NULL will be returned.
*
- * This primitive may safely run concurrently with the _rcu list-mutation
+ * This function may safely run concurrently with the _rcu list-mutation
* primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
*/
-#define list_next_or_null_rr_rcu(head, ptr, type, memb) \
-({ \
- list_next_or_null_rcu(head, ptr, type, memb) ?: \
- list_next_or_null_rcu(head, READ_ONCE((ptr)->next), \
- type, memb); \
-})
+static inline struct rtrs_clt_path *
+rtrs_clt_get_next_path_or_null(struct list_head *head, struct rtrs_clt_path *clt_path)
+{
+ return list_next_or_null_rcu(head, &clt_path->s.entry, typeof(*clt_path), s.entry) ?:
+ list_next_or_null_rcu(head,
+ READ_ONCE((&clt_path->s.entry)->next),
+ typeof(*clt_path), s.entry);
+}

/**
* get_next_path_rr() - Returns path in round-robin fashion.
@@ -789,10 +789,8 @@ static struct rtrs_clt_path *get_next_path_rr(struct path_it *it)
path = list_first_or_null_rcu(&clt->paths_list,
typeof(*path), s.entry);
else
- path = list_next_or_null_rr_rcu(&clt->paths_list,
- &path->s.entry,
- typeof(*path),
- s.entry);
+ path = rtrs_clt_get_next_path_or_null(&clt->paths_list, path);
+
rcu_assign_pointer(*ppcpu_path, path);

return path;
@@ -2277,8 +2275,7 @@ static void rtrs_clt_remove_path_from_arr(struct rtrs_clt_path *clt_path)
* removed. If @sess is the last element, then @next is NULL.
*/
rcu_read_lock();
- next = list_next_or_null_rr_rcu(&clt->paths_list, &clt_path->s.entry,
- typeof(*next), s.entry);
+ next = rtrs_clt_get_next_path_or_null(&clt->paths_list, clt_path);
rcu_read_unlock();

/*
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-pri.h b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
index 9a1e5c2ae55c..ac0df734eba8 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-pri.h
+++ b/drivers/infiniband/ulp/rtrs/rtrs-pri.h
@@ -23,6 +23,17 @@
#define RTRS_PROTO_VER_STRING __stringify(RTRS_PROTO_VER_MAJOR) "." \
__stringify(RTRS_PROTO_VER_MINOR)

+/*
+ * Max IB immediate data size is 2^28 (MAX_IMM_PAYL_BITS)
+ * and the minimum chunk size is 4096 (2^12).
+ * So the maximum sess_queue_depth is 65536 (2^16) in theory.
+ * But mempool_create, create_qp and ib_post_send fail with
+ * "cannot allocate memory" error if sess_queue_depth is too big.
+ * Therefore the pratical max value of sess_queue_depth is
+ * somewhere between 1 and 65534 and it depends on the system.
+ */
+#define MAX_SESS_QUEUE_DEPTH 65535
+
enum rtrs_imm_const {
MAX_IMM_TYPE_BITS = 4,
MAX_IMM_TYPE_MASK = ((1 << MAX_IMM_TYPE_BITS) - 1),
@@ -46,16 +57,6 @@ enum {

MAX_PATHS_NUM = 128,

- /*
- * Max IB immediate data size is 2^28 (MAX_IMM_PAYL_BITS)
- * and the minimum chunk size is 4096 (2^12).
- * So the maximum sess_queue_depth is 65536 (2^16) in theory.
- * But mempool_create, create_qp and ib_post_send fail with
- * "cannot allocate memory" error if sess_queue_depth is too big.
- * Therefore the pratical max value of sess_queue_depth is
- * somewhere between 1 and 65534 and it depends on the system.
- */
- MAX_SESS_QUEUE_DEPTH = 65535,
MIN_CHUNK_SIZE = 8192,

RTRS_HB_INTERVAL_MS = 5000,
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index f86ee1c4b970..c3036aeac89e 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -565,12 +565,9 @@ static int srpt_refresh_port(struct srpt_port *sport)
if (ret)
return ret;

- sport->port_guid_id.wwn.priv = sport;
- srpt_format_guid(sport->port_guid_id.name,
- sizeof(sport->port_guid_id.name),
+ srpt_format_guid(sport->guid_name, ARRAY_SIZE(sport->guid_name),
&sport->gid.global.interface_id);
- sport->port_gid_id.wwn.priv = sport;
- snprintf(sport->port_gid_id.name, sizeof(sport->port_gid_id.name),
+ snprintf(sport->gid_name, ARRAY_SIZE(sport->gid_name),
"0x%016llx%016llx",
be64_to_cpu(sport->gid.global.subnet_prefix),
be64_to_cpu(sport->gid.global.interface_id));
@@ -2314,31 +2311,35 @@ static int srpt_cm_req_recv(struct srpt_device *const sdev,
tag_num = ch->rq_size;
tag_size = 1; /* ib_srpt does not use se_sess->sess_cmd_map */

- mutex_lock(&sport->port_guid_id.mutex);
- list_for_each_entry(stpg, &sport->port_guid_id.tpg_list, entry) {
- if (!IS_ERR_OR_NULL(ch->sess))
- break;
- ch->sess = target_setup_session(&stpg->tpg, tag_num,
+ if (sport->guid_id) {
+ mutex_lock(&sport->guid_id->mutex);
+ list_for_each_entry(stpg, &sport->guid_id->tpg_list, entry) {
+ if (!IS_ERR_OR_NULL(ch->sess))
+ break;
+ ch->sess = target_setup_session(&stpg->tpg, tag_num,
tag_size, TARGET_PROT_NORMAL,
ch->sess_name, ch, NULL);
+ }
+ mutex_unlock(&sport->guid_id->mutex);
}
- mutex_unlock(&sport->port_guid_id.mutex);

- mutex_lock(&sport->port_gid_id.mutex);
- list_for_each_entry(stpg, &sport->port_gid_id.tpg_list, entry) {
- if (!IS_ERR_OR_NULL(ch->sess))
- break;
- ch->sess = target_setup_session(&stpg->tpg, tag_num,
+ if (sport->gid_id) {
+ mutex_lock(&sport->gid_id->mutex);
+ list_for_each_entry(stpg, &sport->gid_id->tpg_list, entry) {
+ if (!IS_ERR_OR_NULL(ch->sess))
+ break;
+ ch->sess = target_setup_session(&stpg->tpg, tag_num,
tag_size, TARGET_PROT_NORMAL, i_port_id,
ch, NULL);
- if (!IS_ERR_OR_NULL(ch->sess))
- break;
- /* Retry without leading "0x" */
- ch->sess = target_setup_session(&stpg->tpg, tag_num,
+ if (!IS_ERR_OR_NULL(ch->sess))
+ break;
+ /* Retry without leading "0x" */
+ ch->sess = target_setup_session(&stpg->tpg, tag_num,
tag_size, TARGET_PROT_NORMAL,
i_port_id + 2, ch, NULL);
+ }
+ mutex_unlock(&sport->gid_id->mutex);
}
- mutex_unlock(&sport->port_gid_id.mutex);

if (IS_ERR_OR_NULL(ch->sess)) {
WARN_ON_ONCE(ch->sess == NULL);
@@ -2983,7 +2984,12 @@ static int srpt_release_sport(struct srpt_port *sport)
return 0;
}

-static struct se_wwn *__srpt_lookup_wwn(const char *name)
+struct port_and_port_id {
+ struct srpt_port *sport;
+ struct srpt_port_id **port_id;
+};
+
+static struct port_and_port_id __srpt_lookup_port(const char *name)
{
struct ib_device *dev;
struct srpt_device *sdev;
@@ -2998,25 +3004,38 @@ static struct se_wwn *__srpt_lookup_wwn(const char *name)
for (i = 0; i < dev->phys_port_cnt; i++) {
sport = &sdev->port[i];

- if (strcmp(sport->port_guid_id.name, name) == 0)
- return &sport->port_guid_id.wwn;
- if (strcmp(sport->port_gid_id.name, name) == 0)
- return &sport->port_gid_id.wwn;
+ if (strcmp(sport->guid_name, name) == 0) {
+ kref_get(&sdev->refcnt);
+ return (struct port_and_port_id){
+ sport, &sport->guid_id};
+ }
+ if (strcmp(sport->gid_name, name) == 0) {
+ kref_get(&sdev->refcnt);
+ return (struct port_and_port_id){
+ sport, &sport->gid_id};
+ }
}
}

- return NULL;
+ return (struct port_and_port_id){};
}

-static struct se_wwn *srpt_lookup_wwn(const char *name)
+/**
+ * srpt_lookup_port() - Look up an RDMA port by name
+ * @name: ASCII port name
+ *
+ * Increments the RDMA port reference count if an RDMA port pointer is returned.
+ * The caller must drop that reference count by calling srpt_port_put_ref().
+ */
+static struct port_and_port_id srpt_lookup_port(const char *name)
{
- struct se_wwn *wwn;
+ struct port_and_port_id papi;

spin_lock(&srpt_dev_lock);
- wwn = __srpt_lookup_wwn(name);
+ papi = __srpt_lookup_port(name);
spin_unlock(&srpt_dev_lock);

- return wwn;
+ return papi;
}

static void srpt_free_srq(struct srpt_device *sdev)
@@ -3101,6 +3120,18 @@ static int srpt_use_srq(struct srpt_device *sdev, bool use_srq)
return ret;
}

+static void srpt_free_sdev(struct kref *refcnt)
+{
+ struct srpt_device *sdev = container_of(refcnt, typeof(*sdev), refcnt);
+
+ kfree(sdev);
+}
+
+static void srpt_sdev_put(struct srpt_device *sdev)
+{
+ kref_put(&sdev->refcnt, srpt_free_sdev);
+}
+
/**
* srpt_add_one - InfiniBand device addition callback function
* @device: Describes a HCA.
@@ -3119,6 +3150,7 @@ static int srpt_add_one(struct ib_device *device)
if (!sdev)
return -ENOMEM;

+ kref_init(&sdev->refcnt);
sdev->device = device;
mutex_init(&sdev->sdev_mutex);

@@ -3182,10 +3214,6 @@ static int srpt_add_one(struct ib_device *device)
sport->port_attrib.srp_sq_size = DEF_SRPT_SQ_SIZE;
sport->port_attrib.use_srq = false;
INIT_WORK(&sport->work, srpt_refresh_port_work);
- mutex_init(&sport->port_guid_id.mutex);
- INIT_LIST_HEAD(&sport->port_guid_id.tpg_list);
- mutex_init(&sport->port_gid_id.mutex);
- INIT_LIST_HEAD(&sport->port_gid_id.tpg_list);

ret = srpt_refresh_port(sport);
if (ret) {
@@ -3214,7 +3242,7 @@ static int srpt_add_one(struct ib_device *device)
srpt_free_srq(sdev);
ib_dealloc_pd(sdev->pd);
free_dev:
- kfree(sdev);
+ srpt_sdev_put(sdev);
pr_info("%s(%s) failed.\n", __func__, dev_name(&device->dev));
return ret;
}
@@ -3258,7 +3286,7 @@ static void srpt_remove_one(struct ib_device *device, void *client_data)

ib_dealloc_pd(sdev->pd);

- kfree(sdev);
+ srpt_sdev_put(sdev);
}

static struct ib_client srpt_client = {
@@ -3286,10 +3314,10 @@ static struct srpt_port_id *srpt_wwn_to_sport_id(struct se_wwn *wwn)
{
struct srpt_port *sport = wwn->priv;

- if (wwn == &sport->port_guid_id.wwn)
- return &sport->port_guid_id;
- if (wwn == &sport->port_gid_id.wwn)
- return &sport->port_gid_id;
+ if (sport->guid_id && &sport->guid_id->wwn == wwn)
+ return sport->guid_id;
+ if (sport->gid_id && &sport->gid_id->wwn == wwn)
+ return sport->gid_id;
WARN_ON_ONCE(true);
return NULL;
}
@@ -3774,7 +3802,31 @@ static struct se_wwn *srpt_make_tport(struct target_fabric_configfs *tf,
struct config_group *group,
const char *name)
{
- return srpt_lookup_wwn(name) ? : ERR_PTR(-EINVAL);
+ struct port_and_port_id papi = srpt_lookup_port(name);
+ struct srpt_port *sport = papi.sport;
+ struct srpt_port_id *port_id;
+
+ if (!papi.port_id)
+ return ERR_PTR(-EINVAL);
+ if (*papi.port_id) {
+ /* Attempt to create a directory that already exists. */
+ WARN_ON_ONCE(true);
+ return &(*papi.port_id)->wwn;
+ }
+ port_id = kzalloc(sizeof(*port_id), GFP_KERNEL);
+ if (!port_id) {
+ srpt_sdev_put(sport->sdev);
+ return ERR_PTR(-ENOMEM);
+ }
+ mutex_init(&port_id->mutex);
+ INIT_LIST_HEAD(&port_id->tpg_list);
+ port_id->wwn.priv = sport;
+ memcpy(port_id->name, port_id == sport->guid_id ? sport->guid_name :
+ sport->gid_name, ARRAY_SIZE(port_id->name));
+
+ *papi.port_id = port_id;
+
+ return &port_id->wwn;
}

/**
@@ -3783,6 +3835,18 @@ static struct se_wwn *srpt_make_tport(struct target_fabric_configfs *tf,
*/
static void srpt_drop_tport(struct se_wwn *wwn)
{
+ struct srpt_port_id *port_id = container_of(wwn, typeof(*port_id), wwn);
+ struct srpt_port *sport = wwn->priv;
+
+ if (sport->guid_id == port_id)
+ sport->guid_id = NULL;
+ else if (sport->gid_id == port_id)
+ sport->gid_id = NULL;
+ else
+ WARN_ON_ONCE(true);
+
+ srpt_sdev_put(sport->sdev);
+ kfree(port_id);
}

static ssize_t srpt_wwn_version_show(struct config_item *item, char *buf)
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.h b/drivers/infiniband/ulp/srpt/ib_srpt.h
index 76e66f630c17..4c46b301eea1 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.h
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.h
@@ -376,7 +376,7 @@ struct srpt_tpg {
};

/**
- * struct srpt_port_id - information about an RDMA port name
+ * struct srpt_port_id - LIO RDMA port information
* @mutex: Protects @tpg_list changes.
* @tpg_list: TPGs associated with the RDMA port name.
* @wwn: WWN associated with the RDMA port name.
@@ -393,7 +393,7 @@ struct srpt_port_id {
};

/**
- * struct srpt_port - information associated by SRPT with a single IB port
+ * struct srpt_port - SRPT RDMA port information
* @sdev: backpointer to the HCA information.
* @mad_agent: per-port management datagram processing information.
* @enabled: Whether or not this target port is enabled.
@@ -402,8 +402,10 @@ struct srpt_port_id {
* @lid: cached value of the port's lid.
* @gid: cached value of the port's gid.
* @work: work structure for refreshing the aforementioned cached values.
- * @port_guid_id: target port GUID
- * @port_gid_id: target port GID
+ * @guid_name: port name in GUID format.
+ * @guid_id: LIO target port information for the port name in GUID format.
+ * @gid_name: port name in GID format.
+ * @gid_id: LIO target port information for the port name in GID format.
* @port_attrib: Port attributes that can be accessed through configfs.
* @refcount: Number of objects associated with this port.
* @freed_channels: Completion that will be signaled once @refcount becomes 0.
@@ -419,8 +421,10 @@ struct srpt_port {
u32 lid;
union ib_gid gid;
struct work_struct work;
- struct srpt_port_id port_guid_id;
- struct srpt_port_id port_gid_id;
+ char guid_name[64];
+ struct srpt_port_id *guid_id;
+ char gid_name[64];
+ struct srpt_port_id *gid_id;
struct srpt_port_attrib port_attrib;
atomic_t refcount;
struct completion *freed_channels;
@@ -430,6 +434,7 @@ struct srpt_port {

/**
* struct srpt_device - information associated by SRPT with a single HCA
+ * @refcnt: Reference count for this device.
* @device: Backpointer to the struct ib_device managed by the IB core.
* @pd: IB protection domain.
* @lkey: L_Key (local key) with write access to all local memory.
@@ -445,6 +450,7 @@ struct srpt_port {
* @port: Information about the ports owned by this HCA.
*/
struct srpt_device {
+ struct kref refcnt;
struct ib_device *device;
struct ib_pd *pd;
u32 lkey;
diff --git a/drivers/input/serio/gscps2.c b/drivers/input/serio/gscps2.c
index a9065c6ab550..da2c67cb8642 100644
--- a/drivers/input/serio/gscps2.c
+++ b/drivers/input/serio/gscps2.c
@@ -350,6 +350,10 @@ static int __init gscps2_probe(struct parisc_device *dev)
ps2port->port = serio;
ps2port->padev = dev;
ps2port->addr = ioremap(hpa, GSC_STATUS + 4);
+ if (!ps2port->addr) {
+ ret = -ENOMEM;
+ goto fail_nomem;
+ }
spin_lock_init(&ps2port->lock);

gscps2_reset(ps2port);
diff --git a/drivers/interconnect/imx/imx.c b/drivers/interconnect/imx/imx.c
index 249ca25d1d55..4406ec45fa90 100644
--- a/drivers/interconnect/imx/imx.c
+++ b/drivers/interconnect/imx/imx.c
@@ -234,16 +234,16 @@ int imx_icc_register(struct platform_device *pdev,
struct device *dev = &pdev->dev;
struct icc_onecell_data *data;
struct icc_provider *provider;
- int max_node_id;
+ int num_nodes;
int ret;

/* icc_onecell_data is indexed by node_id, unlike nodes param */
- max_node_id = get_max_node_id(nodes, nodes_count);
- data = devm_kzalloc(dev, struct_size(data, nodes, max_node_id),
+ num_nodes = get_max_node_id(nodes, nodes_count) + 1;
+ data = devm_kzalloc(dev, struct_size(data, nodes, num_nodes),
GFP_KERNEL);
if (!data)
return -ENOMEM;
- data->num_nodes = max_node_id;
+ data->num_nodes = num_nodes;

provider = devm_kzalloc(dev, sizeof(*provider), GFP_KERNEL);
if (!provider)
diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index 4c077c38fbd6..c11d2c2cbb62 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -750,9 +750,12 @@ static bool qcom_iommu_has_secure_context(struct qcom_iommu_dev *qcom_iommu)
{
struct device_node *child;

- for_each_child_of_node(qcom_iommu->dev->of_node, child)
- if (of_device_is_compatible(child, "qcom,msm-iommu-v1-sec"))
+ for_each_child_of_node(qcom_iommu->dev->of_node, child) {
+ if (of_device_is_compatible(child, "qcom,msm-iommu-v1-sec")) {
+ of_node_put(child);
return true;
+ }
+ }

return false;
}
diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 71f2018e23fe..cd4b889d5537 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -630,7 +630,7 @@ static int exynos_sysmmu_probe(struct platform_device *pdev)

ret = iommu_device_register(&data->iommu, &exynos_iommu_ops, dev);
if (ret)
- return ret;
+ goto err_iommu_register;

platform_set_drvdata(pdev, data);

@@ -657,6 +657,10 @@ static int exynos_sysmmu_probe(struct platform_device *pdev)
pm_runtime_enable(dev);

return 0;
+
+err_iommu_register:
+ iommu_device_sysfs_remove(&data->iommu);
+ return ret;
}

static int __maybe_unused exynos_sysmmu_suspend(struct device *dev)
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 497c5bd95caf..2a10c9b54064 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -495,7 +495,7 @@ static int dmar_parse_one_rhsa(struct acpi_dmar_header *header, void *arg)
if (drhd->reg_base_addr == rhsa->base_address) {
int node = pxm_to_node(rhsa->proximity_domain);

- if (!node_online(node))
+ if (node != NUMA_NO_NODE && !node_online(node))
node = NUMA_NO_NODE;
drhd->iommu->node = node;
return 0;
diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index 15edb9a6fcae..1af26c409cb1 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -177,7 +177,7 @@ config MADERA_IRQ
config IRQ_MIPS_CPU
bool
select GENERIC_IRQ_CHIP
- select GENERIC_IRQ_IPI if SYS_SUPPORTS_MULTITHREADING
+ select GENERIC_IRQ_IPI if SMP && SYS_SUPPORTS_MULTITHREADING
select IRQ_DOMAIN
select GENERIC_IRQ_EFFECTIVE_AFF_MASK

@@ -310,7 +310,8 @@ config KEYSTONE_IRQ

config MIPS_GIC
bool
- select GENERIC_IRQ_IPI
+ select GENERIC_IRQ_IPI if SMP
+ select IRQ_DOMAIN_HIERARCHY
select MIPS_CM

config INGENIC_IRQ
diff --git a/drivers/irqchip/irq-mips-gic.c b/drivers/irqchip/irq-mips-gic.c
index ff89b36267dd..1ba0f1555c80 100644
--- a/drivers/irqchip/irq-mips-gic.c
+++ b/drivers/irqchip/irq-mips-gic.c
@@ -52,13 +52,15 @@ static DEFINE_PER_CPU_READ_MOSTLY(unsigned long[GIC_MAX_LONGS], pcpu_masks);

static DEFINE_SPINLOCK(gic_lock);
static struct irq_domain *gic_irq_domain;
-static struct irq_domain *gic_ipi_domain;
static int gic_shared_intrs;
static unsigned int gic_cpu_pin;
static unsigned int timer_cpu_pin;
static struct irq_chip gic_level_irq_controller, gic_edge_irq_controller;
+
+#ifdef CONFIG_GENERIC_IRQ_IPI
static DECLARE_BITMAP(ipi_resrv, GIC_MAX_INTRS);
static DECLARE_BITMAP(ipi_available, GIC_MAX_INTRS);
+#endif /* CONFIG_GENERIC_IRQ_IPI */

static struct gic_all_vpes_chip_data {
u32 map;
@@ -472,9 +474,11 @@ static int gic_irq_domain_map(struct irq_domain *d, unsigned int virq,
u32 map;

if (hwirq >= GIC_SHARED_HWIRQ_BASE) {
+#ifdef CONFIG_GENERIC_IRQ_IPI
/* verify that shared irqs don't conflict with an IPI irq */
if (test_bit(GIC_HWIRQ_TO_SHARED(hwirq), ipi_resrv))
return -EBUSY;
+#endif /* CONFIG_GENERIC_IRQ_IPI */

err = irq_domain_set_hwirq_and_chip(d, virq, hwirq,
&gic_level_irq_controller,
@@ -567,6 +571,8 @@ static const struct irq_domain_ops gic_irq_domain_ops = {
.map = gic_irq_domain_map,
};

+#ifdef CONFIG_GENERIC_IRQ_IPI
+
static int gic_ipi_domain_xlate(struct irq_domain *d, struct device_node *ctrlr,
const u32 *intspec, unsigned int intsize,
irq_hw_number_t *out_hwirq,
@@ -670,6 +676,48 @@ static const struct irq_domain_ops gic_ipi_domain_ops = {
.match = gic_ipi_domain_match,
};

+static int gic_register_ipi_domain(struct device_node *node)
+{
+ struct irq_domain *gic_ipi_domain;
+ unsigned int v[2], num_ipis;
+
+ gic_ipi_domain = irq_domain_add_hierarchy(gic_irq_domain,
+ IRQ_DOMAIN_FLAG_IPI_PER_CPU,
+ GIC_NUM_LOCAL_INTRS + gic_shared_intrs,
+ node, &gic_ipi_domain_ops, NULL);
+ if (!gic_ipi_domain) {
+ pr_err("Failed to add IPI domain");
+ return -ENXIO;
+ }
+
+ irq_domain_update_bus_token(gic_ipi_domain, DOMAIN_BUS_IPI);
+
+ if (node &&
+ !of_property_read_u32_array(node, "mti,reserved-ipi-vectors", v, 2)) {
+ bitmap_set(ipi_resrv, v[0], v[1]);
+ } else {
+ /*
+ * Reserve 2 interrupts per possible CPU/VP for use as IPIs,
+ * meeting the requirements of arch/mips SMP.
+ */
+ num_ipis = 2 * num_possible_cpus();
+ bitmap_set(ipi_resrv, gic_shared_intrs - num_ipis, num_ipis);
+ }
+
+ bitmap_copy(ipi_available, ipi_resrv, GIC_MAX_INTRS);
+
+ return 0;
+}
+
+#else /* !CONFIG_GENERIC_IRQ_IPI */
+
+static inline int gic_register_ipi_domain(struct device_node *node)
+{
+ return 0;
+}
+
+#endif /* !CONFIG_GENERIC_IRQ_IPI */
+
static int gic_cpu_startup(unsigned int cpu)
{
/* Enable or disable EIC */
@@ -688,11 +736,12 @@ static int gic_cpu_startup(unsigned int cpu)
static int __init gic_of_init(struct device_node *node,
struct device_node *parent)
{
- unsigned int cpu_vec, i, gicconfig, v[2], num_ipis;
+ unsigned int cpu_vec, i, gicconfig;
unsigned long reserved;
phys_addr_t gic_base;
struct resource res;
size_t gic_len;
+ int ret;

/* Find the first available CPU vector. */
i = 0;
@@ -734,6 +783,10 @@ static int __init gic_of_init(struct device_node *node,
}

mips_gic_base = ioremap(gic_base, gic_len);
+ if (!mips_gic_base) {
+ pr_err("Failed to ioremap gic_base\n");
+ return -ENOMEM;
+ }

gicconfig = read_gic_config();
gic_shared_intrs = FIELD_GET(GIC_CONFIG_NUMINTERRUPTS, gicconfig);
@@ -780,30 +833,9 @@ static int __init gic_of_init(struct device_node *node,
return -ENXIO;
}

- gic_ipi_domain = irq_domain_add_hierarchy(gic_irq_domain,
- IRQ_DOMAIN_FLAG_IPI_PER_CPU,
- GIC_NUM_LOCAL_INTRS + gic_shared_intrs,
- node, &gic_ipi_domain_ops, NULL);
- if (!gic_ipi_domain) {
- pr_err("Failed to add IPI domain");
- return -ENXIO;
- }
-
- irq_domain_update_bus_token(gic_ipi_domain, DOMAIN_BUS_IPI);
-
- if (node &&
- !of_property_read_u32_array(node, "mti,reserved-ipi-vectors", v, 2)) {
- bitmap_set(ipi_resrv, v[0], v[1]);
- } else {
- /*
- * Reserve 2 interrupts per possible CPU/VP for use as IPIs,
- * meeting the requirements of arch/mips SMP.
- */
- num_ipis = 2 * num_possible_cpus();
- bitmap_set(ipi_resrv, gic_shared_intrs - num_ipis, num_ipis);
- }
-
- bitmap_copy(ipi_available, ipi_resrv, GIC_MAX_INTRS);
+ ret = gic_register_ipi_domain(node);
+ if (ret)
+ return ret;

board_bind_eic_interrupt = &gic_bind_eic_interrupt;

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index e362a7471512..a55d6f6f294b 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3514,7 +3514,7 @@ static void raid_status(struct dm_target *ti, status_type_t type,
{
struct raid_set *rs = ti->private;
struct mddev *mddev = &rs->md;
- struct r5conf *conf = mddev->private;
+ struct r5conf *conf = rs_is_raid456(rs) ? mddev->private : NULL;
int i, max_nr_stripes = conf ? conf->max_nr_stripes : 0;
unsigned long recovery;
unsigned int raid_param_cnt = 1; /* at least 1 for chunksize */
@@ -3824,7 +3824,7 @@ static void attempt_restore_of_faulty_devices(struct raid_set *rs)

memset(cleared_failed_devices, 0, sizeof(cleared_failed_devices));

- for (i = 0; i < mddev->raid_disks; i++) {
+ for (i = 0; i < rs->raid_disks; i++) {
r = &rs->dev[i].rdev;
/* HM FIXME: enhance journal device recovery processing */
if (test_bit(Journal, &r->flags))
diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
index 2db7030aba00..a27395c8621f 100644
--- a/drivers/md/dm-thin-metadata.c
+++ b/drivers/md/dm-thin-metadata.c
@@ -2045,10 +2045,13 @@ int dm_pool_register_metadata_threshold(struct dm_pool_metadata *pmd,
dm_sm_threshold_fn fn,
void *context)
{
- int r;
+ int r = -EINVAL;

pmd_write_lock_in_core(pmd);
- r = dm_sm_register_threshold_callback(pmd->metadata_sm, threshold, fn, context);
+ if (!pmd->fail_io) {
+ r = dm_sm_register_threshold_callback(pmd->metadata_sm,
+ threshold, fn, context);
+ }
pmd_write_unlock(pmd);

return r;
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 4d25d0e27031..53ac6ae870ac 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -3382,8 +3382,10 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
calc_metadata_threshold(pt),
metadata_low_callback,
pool);
- if (r)
+ if (r) {
+ ti->error = "Error registering metadata threshold";
goto out_flags_changed;
+ }

dm_pool_register_pre_commit_callback(pool->pmd,
metadata_pre_commit_callback, pool);
diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 5630b470ba42..27557b852c94 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -22,7 +22,7 @@

#define HIGH_WATERMARK 50
#define LOW_WATERMARK 45
-#define MAX_WRITEBACK_JOBS 0
+#define MAX_WRITEBACK_JOBS min(0x10000000 / PAGE_SIZE, totalram_pages() / 16)
#define ENDIO_LATENCY 16
#define WRITEBACK_LATENCY 64
#define AUTOCOMMIT_BLOCKS_SSD 65536
@@ -1328,8 +1328,8 @@ enum wc_map_op {
WC_MAP_ERROR,
};

-static enum wc_map_op writecache_map_remap_origin(struct dm_writecache *wc, struct bio *bio,
- struct wc_entry *e)
+static void writecache_map_remap_origin(struct dm_writecache *wc, struct bio *bio,
+ struct wc_entry *e)
{
if (e) {
sector_t next_boundary =
@@ -1337,8 +1337,6 @@ static enum wc_map_op writecache_map_remap_origin(struct dm_writecache *wc, stru
if (next_boundary < bio->bi_iter.bi_size >> SECTOR_SHIFT)
dm_accept_partial_bio(bio, next_boundary);
}
-
- return WC_MAP_REMAP_ORIGIN;
}

static enum wc_map_op writecache_map_read(struct dm_writecache *wc, struct bio *bio)
@@ -1365,14 +1363,16 @@ static enum wc_map_op writecache_map_read(struct dm_writecache *wc, struct bio *
map_op = WC_MAP_REMAP;
}
} else {
- map_op = writecache_map_remap_origin(wc, bio, e);
+ writecache_map_remap_origin(wc, bio, e);
+ wc->stats.reads += (bio->bi_iter.bi_size - wc->block_size) >> wc->block_size_bits;
+ map_op = WC_MAP_REMAP_ORIGIN;
}

return map_op;
}

-static enum wc_map_op writecache_bio_copy_ssd(struct dm_writecache *wc, struct bio *bio,
- struct wc_entry *e, bool search_used)
+static void writecache_bio_copy_ssd(struct dm_writecache *wc, struct bio *bio,
+ struct wc_entry *e, bool search_used)
{
unsigned bio_size = wc->block_size;
sector_t start_cache_sec = cache_sector(wc, e);
@@ -1412,14 +1412,15 @@ static enum wc_map_op writecache_bio_copy_ssd(struct dm_writecache *wc, struct b
bio->bi_iter.bi_sector = start_cache_sec;
dm_accept_partial_bio(bio, bio_size >> SECTOR_SHIFT);

+ wc->stats.writes += bio->bi_iter.bi_size >> wc->block_size_bits;
+ wc->stats.writes_allocate += (bio->bi_iter.bi_size - wc->block_size) >> wc->block_size_bits;
+
if (unlikely(wc->uncommitted_blocks >= wc->autocommit_blocks)) {
wc->uncommitted_blocks = 0;
queue_work(wc->writeback_wq, &wc->flush_work);
} else {
writecache_schedule_autocommit(wc);
}
-
- return WC_MAP_REMAP;
}

static enum wc_map_op writecache_map_write(struct dm_writecache *wc, struct bio *bio)
@@ -1429,9 +1430,10 @@ static enum wc_map_op writecache_map_write(struct dm_writecache *wc, struct bio
do {
bool found_entry = false;
bool search_used = false;
- wc->stats.writes++;
- if (writecache_has_error(wc))
+ if (writecache_has_error(wc)) {
+ wc->stats.writes += bio->bi_iter.bi_size >> wc->block_size_bits;
return WC_MAP_ERROR;
+ }
e = writecache_find_entry(wc, bio->bi_iter.bi_sector, 0);
if (e) {
if (!writecache_entry_is_committed(wc, e)) {
@@ -1455,9 +1457,11 @@ static enum wc_map_op writecache_map_write(struct dm_writecache *wc, struct bio
if (unlikely(!e)) {
if (!WC_MODE_PMEM(wc) && !found_entry) {
direct_write:
- wc->stats.writes_around++;
e = writecache_find_entry(wc, bio->bi_iter.bi_sector, WFE_RETURN_FOLLOWING);
- return writecache_map_remap_origin(wc, bio, e);
+ writecache_map_remap_origin(wc, bio, e);
+ wc->stats.writes_around += bio->bi_iter.bi_size >> wc->block_size_bits;
+ wc->stats.writes += bio->bi_iter.bi_size >> wc->block_size_bits;
+ return WC_MAP_REMAP_ORIGIN;
}
wc->stats.writes_blocked_on_freelist++;
writecache_wait_on_freelist(wc);
@@ -1468,10 +1472,13 @@ static enum wc_map_op writecache_map_write(struct dm_writecache *wc, struct bio
wc->uncommitted_blocks++;
wc->stats.writes_allocate++;
bio_copy:
- if (WC_MODE_PMEM(wc))
+ if (WC_MODE_PMEM(wc)) {
bio_copy_block(wc, bio, memory_data(wc, e));
- else
- return writecache_bio_copy_ssd(wc, bio, e, search_used);
+ wc->stats.writes++;
+ } else {
+ writecache_bio_copy_ssd(wc, bio, e, search_used);
+ return WC_MAP_REMAP;
+ }
} while (bio->bi_iter.bi_size);

if (unlikely(bio->bi_opf & REQ_FUA || wc->uncommitted_blocks >= wc->autocommit_blocks))
@@ -1506,7 +1513,7 @@ static enum wc_map_op writecache_map_flush(struct dm_writecache *wc, struct bio

static enum wc_map_op writecache_map_discard(struct dm_writecache *wc, struct bio *bio)
{
- wc->stats.discards++;
+ wc->stats.discards += bio->bi_iter.bi_size >> wc->block_size_bits;

if (writecache_has_error(wc))
return WC_MAP_ERROR;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f01d33bc3613..e0e28a3203f8 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2955,6 +2955,11 @@ static int dm_call_pr(struct block_device *bdev, iterate_devices_callout_fn fn,
goto out;
ti = dm_table_get_target(table, 0);

+ if (dm_suspended_md(md)) {
+ ret = -EAGAIN;
+ goto out;
+ }
+
ret = -EINVAL;
if (!ti->type->iterate_devices)
goto out;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index f79cab8c7700..134c12e481a5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6259,11 +6259,11 @@ static void mddev_detach(struct mddev *mddev)
static void __md_stop(struct mddev *mddev)
{
struct md_personality *pers = mddev->pers;
- md_bitmap_destroy(mddev);
mddev_detach(mddev);
/* Ensure ->event_work is done */
if (mddev->event_work.func)
flush_workqueue(md_misc_wq);
+ md_bitmap_destroy(mddev);
spin_lock(&mddev->lock);
mddev->pers = NULL;
spin_unlock(&mddev->lock);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index dfe7d62d3fbd..e9648d643dd1 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2157,9 +2157,12 @@ static int raid10_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
int err = 0;
int number = rdev->raid_disk;
struct md_rdev **rdevp;
- struct raid10_info *p = conf->mirrors + number;
+ struct raid10_info *p;

print_conf(conf);
+ if (unlikely(number >= mddev->raid_disks))
+ return 0;
+ p = conf->mirrors + number;
if (rdev == p->rdev)
rdevp = &p->rdev;
else if (rdev == p->replacement)
diff --git a/drivers/media/i2c/Kconfig b/drivers/media/i2c/Kconfig
index 2b20aa6c37b1..c926e5d43820 100644
--- a/drivers/media/i2c/Kconfig
+++ b/drivers/media/i2c/Kconfig
@@ -1178,6 +1178,7 @@ config VIDEO_ISL7998X
depends on OF_GPIO
select MEDIA_CONTROLLER
select VIDEO_V4L2_SUBDEV_API
+ select V4L2_FWNODE
help
Support for Intersil ISL7998x analog to MIPI-CSI2 or
BT.656 decoder.
diff --git a/drivers/media/pci/sta2x11/Kconfig b/drivers/media/pci/sta2x11/Kconfig
index a96e170ab04e..118b922c08c3 100644
--- a/drivers/media/pci/sta2x11/Kconfig
+++ b/drivers/media/pci/sta2x11/Kconfig
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
config STA2X11_VIP
tristate "STA2X11 VIP Video For Linux"
- depends on PCI && VIDEO_DEV && VIRT_TO_BUS && I2C
+ depends on PCI && VIDEO_DEV && I2C
depends on STA2X11 || COMPILE_TEST
select GPIOLIB if MEDIA_SUBDRV_AUTOSELECT
select VIDEO_ADV7180 if MEDIA_SUBDRV_AUTOSELECT
diff --git a/drivers/media/pci/tw686x/tw686x-core.c b/drivers/media/pci/tw686x/tw686x-core.c
index 6676e069b515..384d38754a4b 100644
--- a/drivers/media/pci/tw686x/tw686x-core.c
+++ b/drivers/media/pci/tw686x/tw686x-core.c
@@ -315,13 +315,6 @@ static int tw686x_probe(struct pci_dev *pci_dev,

spin_lock_init(&dev->lock);

- err = request_irq(pci_dev->irq, tw686x_irq, IRQF_SHARED,
- dev->name, dev);
- if (err < 0) {
- dev_err(&pci_dev->dev, "unable to request interrupt\n");
- goto iounmap;
- }
-
timer_setup(&dev->dma_delay_timer, tw686x_dma_delay, 0);

/*
@@ -333,18 +326,23 @@ static int tw686x_probe(struct pci_dev *pci_dev,
err = tw686x_video_init(dev);
if (err) {
dev_err(&pci_dev->dev, "can't register video\n");
- goto free_irq;
+ goto iounmap;
}

err = tw686x_audio_init(dev);
if (err)
dev_warn(&pci_dev->dev, "can't register audio\n");

+ err = request_irq(pci_dev->irq, tw686x_irq, IRQF_SHARED,
+ dev->name, dev);
+ if (err < 0) {
+ dev_err(&pci_dev->dev, "unable to request interrupt\n");
+ goto iounmap;
+ }
+
pci_set_drvdata(pci_dev, dev);
return 0;

-free_irq:
- free_irq(pci_dev->irq, dev);
iounmap:
pci_iounmap(pci_dev, dev->mmio);
free_region:
diff --git a/drivers/media/pci/tw686x/tw686x-video.c b/drivers/media/pci/tw686x/tw686x-video.c
index b227e9e78ebd..37a20fe24241 100644
--- a/drivers/media/pci/tw686x/tw686x-video.c
+++ b/drivers/media/pci/tw686x/tw686x-video.c
@@ -1282,8 +1282,10 @@ int tw686x_video_init(struct tw686x_dev *dev)
video_set_drvdata(vdev, vc);

err = video_register_device(vdev, VFL_TYPE_VIDEO, -1);
- if (err < 0)
+ if (err < 0) {
+ video_device_release(vdev);
goto error;
+ }
vc->num = vdev->num;
}

diff --git a/drivers/media/platform/amphion/vdec.c b/drivers/media/platform/amphion/vdec.c
index c0dfede11ab7..36450a2e85c9 100644
--- a/drivers/media/platform/amphion/vdec.c
+++ b/drivers/media/platform/amphion/vdec.c
@@ -26,8 +26,8 @@
#include "vpu_cmds.h"
#include "vpu_rpc.h"

-#define VDEC_FRAME_DEPTH 256
#define VDEC_MIN_BUFFER_CAP 8
+#define VDEC_MIN_BUFFER_OUT 8

struct vdec_fs_info {
char name[8];
@@ -63,8 +63,7 @@ struct vdec_t {
bool is_source_changed;
u32 source_change;
u32 drain;
- u32 ts_pre_count;
- u32 frame_depth;
+ bool aborting;
};

static const struct vpu_format vdec_formats[] = {
@@ -106,7 +105,6 @@ static const struct vpu_format vdec_formats[] = {
.pixfmt = V4L2_PIX_FMT_VC1_ANNEX_L,
.num_planes = 1,
.type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE,
- .flags = V4L2_FMT_FLAG_DYN_RESOLUTION
},
{
.pixfmt = V4L2_PIX_FMT_MPEG2,
@@ -174,16 +172,6 @@ static int vdec_ctrl_init(struct vpu_inst *inst)
return 0;
}

-static void vdec_set_last_buffer_dequeued(struct vpu_inst *inst)
-{
- struct vdec_t *vdec = inst->priv;
-
- if (vdec->eos_received) {
- if (!vpu_set_last_buffer_dequeued(inst))
- vdec->eos_received--;
- }
-}
-
static void vdec_handle_resolution_change(struct vpu_inst *inst)
{
struct vdec_t *vdec = inst->priv;
@@ -230,6 +218,21 @@ static int vdec_update_state(struct vpu_inst *inst, enum vpu_codec_state state,
return 0;
}

+static void vdec_set_last_buffer_dequeued(struct vpu_inst *inst)
+{
+ struct vdec_t *vdec = inst->priv;
+
+ if (inst->state == VPU_CODEC_STATE_DYAMIC_RESOLUTION_CHANGE)
+ return;
+
+ if (vdec->eos_received) {
+ if (!vpu_set_last_buffer_dequeued(inst)) {
+ vdec->eos_received--;
+ vdec_update_state(inst, VPU_CODEC_STATE_DRAIN, 0);
+ }
+ }
+}
+
static int vdec_querycap(struct file *file, void *fh, struct v4l2_capability *cap)
{
strscpy(cap->driver, "amphion-vpu", sizeof(cap->driver));
@@ -470,7 +473,7 @@ static int vdec_drain(struct vpu_inst *inst)
if (!vdec->drain)
return 0;

- if (v4l2_m2m_num_src_bufs_ready(inst->fh.m2m_ctx))
+ if (!vpu_is_source_empty(inst))
return 0;

if (!vdec->params.frame_count) {
@@ -489,6 +492,8 @@ static int vdec_drain(struct vpu_inst *inst)

static int vdec_cmd_start(struct vpu_inst *inst)
{
+ struct vdec_t *vdec = inst->priv;
+
switch (inst->state) {
case VPU_CODEC_STATE_STARTED:
case VPU_CODEC_STATE_DRAIN:
@@ -499,6 +504,8 @@ static int vdec_cmd_start(struct vpu_inst *inst)
break;
}
vpu_process_capture_buffer(inst);
+ if (vdec->eos_received)
+ vdec_set_last_buffer_dequeued(inst);
return 0;
}

@@ -589,11 +596,8 @@ static bool vdec_check_ready(struct vpu_inst *inst, unsigned int type)
{
struct vdec_t *vdec = inst->priv;

- if (V4L2_TYPE_IS_OUTPUT(type)) {
- if (vdec->ts_pre_count >= vdec->frame_depth)
- return false;
+ if (V4L2_TYPE_IS_OUTPUT(type))
return true;
- }

if (vdec->req_frame_count)
return true;
@@ -601,12 +605,21 @@ static bool vdec_check_ready(struct vpu_inst *inst, unsigned int type)
return false;
}

+static struct vb2_v4l2_buffer *vdec_get_src_buffer(struct vpu_inst *inst, u32 count)
+{
+ if (count > 1)
+ vpu_skip_frame(inst, count - 1);
+
+ return vpu_next_src_buf(inst);
+}
+
static int vdec_frame_decoded(struct vpu_inst *inst, void *arg)
{
struct vdec_t *vdec = inst->priv;
struct vpu_dec_pic_info *info = arg;
struct vpu_vb2_buffer *vpu_buf;
struct vb2_v4l2_buffer *vbuf;
+ struct vb2_v4l2_buffer *src_buf;
int ret = 0;

if (!info || info->id >= ARRAY_SIZE(vdec->slots))
@@ -620,14 +633,21 @@ static int vdec_frame_decoded(struct vpu_inst *inst, void *arg)
goto exit;
}
vbuf = &vpu_buf->m2m_buf.vb;
+ src_buf = vdec_get_src_buffer(inst, info->consumed_count);
+ if (src_buf) {
+ v4l2_m2m_buf_copy_metadata(src_buf, vbuf, true);
+ if (info->consumed_count) {
+ v4l2_m2m_src_buf_remove(inst->fh.m2m_ctx);
+ vpu_set_buffer_state(src_buf, VPU_BUF_STATE_IDLE);
+ v4l2_m2m_buf_done(src_buf, VB2_BUF_STATE_DONE);
+ } else {
+ vpu_set_buffer_state(src_buf, VPU_BUF_STATE_DECODED);
+ }
+ }
if (vpu_get_buffer_state(vbuf) == VPU_BUF_STATE_DECODED)
dev_info(inst->dev, "[%d] buf[%d] has been decoded\n", inst->id, info->id);
vpu_set_buffer_state(vbuf, VPU_BUF_STATE_DECODED);
vdec->decoded_frame_count++;
- if (vdec->ts_pre_count >= info->consumed_count)
- vdec->ts_pre_count -= info->consumed_count;
- else
- vdec->ts_pre_count = 0;
exit:
vpu_inst_unlock(inst);

@@ -683,10 +703,9 @@ static void vdec_buf_done(struct vpu_inst *inst, struct vpu_frame_info *frame)
vpu_set_buffer_state(vbuf, VPU_BUF_STATE_READY);
vb2_set_plane_payload(&vbuf->vb2_buf, 0, inst->cap_format.sizeimage[0]);
vb2_set_plane_payload(&vbuf->vb2_buf, 1, inst->cap_format.sizeimage[1]);
- vbuf->vb2_buf.timestamp = frame->timestamp;
vbuf->field = inst->cap_format.field;
vbuf->sequence = sequence;
- dev_dbg(inst->dev, "[%d][OUTPUT TS]%32lld\n", inst->id, frame->timestamp);
+ dev_dbg(inst->dev, "[%d][OUTPUT TS]%32lld\n", inst->id, vbuf->vb2_buf.timestamp);

v4l2_m2m_buf_done(vbuf, VB2_BUF_STATE_DONE);
vpu_inst_lock(inst);
@@ -708,7 +727,6 @@ static void vdec_stop_done(struct vpu_inst *inst)
vdec->fixed_fmt = false;
vdec->params.end_flag = 0;
vdec->drain = 0;
- vdec->ts_pre_count = 0;
vdec->params.frame_count = 0;
vdec->decoded_frame_count = 0;
vdec->display_frame_count = 0;
@@ -716,6 +734,7 @@ static void vdec_stop_done(struct vpu_inst *inst)
vdec->eos_received = 0;
vdec->is_source_changed = false;
vdec->source_change = 0;
+ inst->total_input_count = 0;
vpu_inst_unlock(inst);
}

@@ -924,6 +943,9 @@ static int vdec_response_frame(struct vpu_inst *inst, struct vb2_v4l2_buffer *vb
if (inst->state != VPU_CODEC_STATE_ACTIVE)
return -EINVAL;

+ if (vdec->aborting)
+ return -EINVAL;
+
if (!vdec->req_frame_count)
return -EINVAL;

@@ -1033,6 +1055,8 @@ static void vdec_clear_slots(struct vpu_inst *inst)
vpu_buf = vdec->slots[i];
vbuf = &vpu_buf->m2m_buf.vb;

+ vpu_trace(inst->dev, "clear slot %d\n", i);
+ vdec_response_fs_release(inst, i, vpu_buf->tag);
vdec_recycle_buffer(inst, vbuf);
vdec->slots[i]->state = VPU_BUF_STATE_IDLE;
vdec->slots[i] = NULL;
@@ -1188,7 +1212,6 @@ static void vdec_event_eos(struct vpu_inst *inst)
vdec->eos_received++;
vdec->fixed_fmt = false;
inst->min_buffer_cap = VDEC_MIN_BUFFER_CAP;
- vdec_update_state(inst, VPU_CODEC_STATE_DRAIN, 0);
vdec_set_last_buffer_dequeued(inst);
vpu_inst_unlock(inst);
}
@@ -1244,18 +1267,14 @@ static int vdec_process_output(struct vpu_inst *inst, struct vb2_buffer *vb)
if (free_space < vb2_get_plane_payload(vb, 0) + 0x40000)
return -ENOMEM;

+ vpu_set_buffer_state(vbuf, VPU_BUF_STATE_INUSE);
ret = vpu_iface_input_frame(inst, vb);
if (ret < 0)
return -ENOMEM;

dev_dbg(inst->dev, "[%d][INPUT TS]%32lld\n", inst->id, vb->timestamp);
- vdec->ts_pre_count++;
vdec->params.frame_count++;

- v4l2_m2m_src_buf_remove_by_buf(inst->fh.m2m_ctx, vbuf);
- vpu_set_buffer_state(vbuf, VPU_BUF_STATE_IDLE);
- v4l2_m2m_buf_done(vbuf, VB2_BUF_STATE_DONE);
-
if (vdec->drain)
vdec_drain(inst);

@@ -1299,6 +1318,8 @@ static void vdec_abort(struct vpu_inst *inst)
int ret;

vpu_trace(inst->dev, "[%d] state = %d\n", inst->id, inst->state);
+
+ vdec->aborting = true;
vpu_iface_add_scode(inst, SCODE_PADDING_ABORT);
vdec->params.end_flag = 1;
vpu_iface_set_decode_params(inst, &vdec->params, 1);
@@ -1318,11 +1339,11 @@ static void vdec_abort(struct vpu_inst *inst)
vdec->sequence);
vdec->params.end_flag = 0;
vdec->drain = 0;
- vdec->ts_pre_count = 0;
vdec->params.frame_count = 0;
vdec->decoded_frame_count = 0;
vdec->display_frame_count = 0;
vdec->sequence = 0;
+ vdec->aborting = false;
}

static void vdec_stop(struct vpu_inst *inst, bool free)
@@ -1470,10 +1491,10 @@ static int vdec_stop_session(struct vpu_inst *inst, u32 type)
vdec_update_state(inst, VPU_CODEC_STATE_SEEK, 0);
vdec->drain = 0;
} else {
- if (inst->state != VPU_CODEC_STATE_DYAMIC_RESOLUTION_CHANGE)
+ if (inst->state != VPU_CODEC_STATE_DYAMIC_RESOLUTION_CHANGE) {
vdec_abort(inst);
-
- vdec->eos_received = 0;
+ vdec->eos_received = 0;
+ }
vdec_clear_slots(inst);
}

@@ -1525,10 +1546,6 @@ static int vdec_get_debug_info(struct vpu_inst *inst, char *str, u32 size, u32 i
vdec->drain, vdec->eos_received, vdec->source_change);
break;
case 8:
- num = scnprintf(str, size, "ts_pre_count = %d, frame_depth = %d\n",
- vdec->ts_pre_count, vdec->frame_depth);
- break;
- case 9:
num = scnprintf(str, size, "fps = %d/%d\n",
vdec->codec_info.frame_rate.numerator,
vdec->codec_info.frame_rate.denominator);
@@ -1562,12 +1579,8 @@ static struct vpu_inst_ops vdec_inst_ops = {
static void vdec_init(struct file *file)
{
struct vpu_inst *inst = to_inst(file);
- struct vdec_t *vdec;
struct v4l2_format f;

- vdec = inst->priv;
- vdec->frame_depth = VDEC_FRAME_DEPTH;
-
memset(&f, 0, sizeof(f));
f.type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
f.fmt.pix_mp.pixelformat = V4L2_PIX_FMT_H264;
@@ -1612,36 +1625,18 @@ static int vdec_open(struct file *file)

vdec->fixed_fmt = false;
inst->min_buffer_cap = VDEC_MIN_BUFFER_CAP;
+ inst->min_buffer_out = VDEC_MIN_BUFFER_OUT;
vdec_init(file);

return 0;
}

-static __poll_t vdec_poll(struct file *file, poll_table *wait)
-{
- struct vpu_inst *inst = to_inst(file);
- struct vb2_queue *src_q, *dst_q;
- __poll_t ret;
-
- ret = v4l2_m2m_fop_poll(file, wait);
- src_q = v4l2_m2m_get_src_vq(inst->fh.m2m_ctx);
- dst_q = v4l2_m2m_get_dst_vq(inst->fh.m2m_ctx);
- if (vb2_is_streaming(src_q) && !vb2_is_streaming(dst_q))
- ret &= (~EPOLLERR);
- if (!src_q->error && !dst_q->error &&
- (vb2_is_streaming(src_q) && list_empty(&src_q->queued_list)) &&
- (vb2_is_streaming(dst_q) && list_empty(&dst_q->queued_list)))
- ret &= (~EPOLLERR);
-
- return ret;
-}
-
static const struct v4l2_file_operations vdec_fops = {
.owner = THIS_MODULE,
.open = vdec_open,
.release = vpu_v4l2_close,
.unlocked_ioctl = video_ioctl2,
- .poll = vdec_poll,
+ .poll = v4l2_m2m_fop_poll,
.mmap = v4l2_m2m_fop_mmap,
};

diff --git a/drivers/media/platform/amphion/vpu.h b/drivers/media/platform/amphion/vpu.h
index e56b96a7e5d3..f914de6ed81e 100644
--- a/drivers/media/platform/amphion/vpu.h
+++ b/drivers/media/platform/amphion/vpu.h
@@ -258,6 +258,7 @@ struct vpu_inst {
struct vpu_format cap_format;
u32 min_buffer_cap;
u32 min_buffer_out;
+ u32 total_input_count;

struct v4l2_rect crop;
u32 colorspace;
diff --git a/drivers/media/platform/amphion/vpu_core.c b/drivers/media/platform/amphion/vpu_core.c
index 68ad183925fd..51a764713159 100644
--- a/drivers/media/platform/amphion/vpu_core.c
+++ b/drivers/media/platform/amphion/vpu_core.c
@@ -455,8 +455,13 @@ int vpu_inst_unregister(struct vpu_inst *inst)
}
vpu_core_check_hang(core);
if (core->state == VPU_CORE_HANG && !core->instance_mask) {
+ int err;
+
dev_info(core->dev, "reset hang core\n");
- if (!vpu_core_sw_reset(core)) {
+ mutex_unlock(&core->lock);
+ err = vpu_core_sw_reset(core);
+ mutex_lock(&core->lock);
+ if (!err) {
core->state = VPU_CORE_ACTIVE;
core->hang_mask = 0;
}
diff --git a/drivers/media/platform/amphion/vpu_malone.c b/drivers/media/platform/amphion/vpu_malone.c
index 446a9de0cc11..7f591805c48c 100644
--- a/drivers/media/platform/amphion/vpu_malone.c
+++ b/drivers/media/platform/amphion/vpu_malone.c
@@ -609,6 +609,8 @@ static int vpu_malone_set_params(struct vpu_shared_addr *shared,
enum vpu_malone_format malone_format;

malone_format = vpu_malone_format_remap(params->codec_format);
+ if (WARN_ON(malone_format == MALONE_FMT_NULL))
+ return -EINVAL;
iface->udata_buffer[instance].base = params->udata.base;
iface->udata_buffer[instance].slot_size = params->udata.size;

@@ -1294,6 +1296,8 @@ static int vpu_malone_insert_scode_vc1_l_seq(struct malone_scode_t *scode)
int size = 0;
u8 rcv_seqhdr[MALONE_VC1_RCV_SEQ_HEADER_LEN];

+ if (scode->inst->total_input_count)
+ return 0;
scode->need_data = 0;

ret = vpu_malone_insert_scode_seq(scode, MALONE_CODEC_ID_VC1_SIMPLE, sizeof(rcv_seqhdr));
@@ -1556,7 +1560,7 @@ int vpu_malone_input_frame(struct vpu_shared_addr *shared,
* merge the data to next frame
*/
vbuf = to_vb2_v4l2_buffer(vb);
- if (vpu_vb_is_codecconfig(vbuf) && (s64)vb->timestamp < 0) {
+ if (vpu_vb_is_codecconfig(vbuf)) {
inst->extra_size += size;
return 0;
}
diff --git a/drivers/media/platform/amphion/vpu_msgs.c b/drivers/media/platform/amphion/vpu_msgs.c
index 58502c51ddb3..077644bc1d7c 100644
--- a/drivers/media/platform/amphion/vpu_msgs.c
+++ b/drivers/media/platform/amphion/vpu_msgs.c
@@ -150,7 +150,12 @@ static void vpu_session_handle_eos(struct vpu_inst *inst, struct vpu_rpc_event *

static void vpu_session_handle_error(struct vpu_inst *inst, struct vpu_rpc_event *pkt)
{
- dev_err(inst->dev, "unsupported stream\n");
+ char *str = (char *)pkt->data;
+
+ if (strlen(str))
+ dev_err(inst->dev, "instance %d firmware error : %s\n", inst->id, str);
+ else
+ dev_err(inst->dev, "instance %d is unsupported stream\n", inst->id);
call_void_vop(inst, event_notify, VPU_MSG_ID_UNSUPPORTED, NULL);
vpu_v4l2_set_error(inst);
}
diff --git a/drivers/media/platform/amphion/vpu_rpc.h b/drivers/media/platform/amphion/vpu_rpc.h
index 25119e5e807e..7eb6f01e6ab5 100644
--- a/drivers/media/platform/amphion/vpu_rpc.h
+++ b/drivers/media/platform/amphion/vpu_rpc.h
@@ -312,11 +312,16 @@ static inline int vpu_iface_input_frame(struct vpu_inst *inst,
struct vb2_buffer *vb)
{
struct vpu_iface_ops *ops = vpu_core_get_iface(inst->core);
+ int ret;

if (!ops || !ops->input_frame)
return -EINVAL;

- return ops->input_frame(inst->core->iface, inst, vb);
+ ret = ops->input_frame(inst->core->iface, inst, vb);
+ if (ret < 0)
+ return ret;
+ inst->total_input_count++;
+ return ret;
}

static inline int vpu_iface_config_memory_resource(struct vpu_inst *inst,
diff --git a/drivers/media/platform/amphion/vpu_v4l2.c b/drivers/media/platform/amphion/vpu_v4l2.c
index 9c0704cd5766..0f3726c80967 100644
--- a/drivers/media/platform/amphion/vpu_v4l2.c
+++ b/drivers/media/platform/amphion/vpu_v4l2.c
@@ -127,6 +127,19 @@ int vpu_set_last_buffer_dequeued(struct vpu_inst *inst)
return 0;
}

+bool vpu_is_source_empty(struct vpu_inst *inst)
+{
+ struct v4l2_m2m_buffer *buf = NULL;
+
+ if (!inst->fh.m2m_ctx)
+ return true;
+ v4l2_m2m_for_each_src_buf(inst->fh.m2m_ctx, buf) {
+ if (vpu_get_buffer_state(&buf->vb) == VPU_BUF_STATE_IDLE)
+ return false;
+ }
+ return true;
+}
+
const struct vpu_format *vpu_try_fmt_common(struct vpu_inst *inst, struct v4l2_format *f)
{
struct v4l2_pix_format_mplane *pixmp = &f->fmt.pix_mp;
@@ -234,6 +247,49 @@ int vpu_process_capture_buffer(struct vpu_inst *inst)
return call_vop(inst, process_capture, &vbuf->vb2_buf);
}

+struct vb2_v4l2_buffer *vpu_next_src_buf(struct vpu_inst *inst)
+{
+ struct vb2_v4l2_buffer *src_buf = v4l2_m2m_next_src_buf(inst->fh.m2m_ctx);
+
+ if (!src_buf || vpu_get_buffer_state(src_buf) == VPU_BUF_STATE_IDLE)
+ return NULL;
+
+ while (vpu_vb_is_codecconfig(src_buf)) {
+ v4l2_m2m_src_buf_remove(inst->fh.m2m_ctx);
+ vpu_set_buffer_state(src_buf, VPU_BUF_STATE_IDLE);
+ v4l2_m2m_buf_done(src_buf, VB2_BUF_STATE_DONE);
+
+ src_buf = v4l2_m2m_next_src_buf(inst->fh.m2m_ctx);
+ if (!src_buf || vpu_get_buffer_state(src_buf) == VPU_BUF_STATE_IDLE)
+ return NULL;
+ }
+
+ return src_buf;
+}
+
+void vpu_skip_frame(struct vpu_inst *inst, int count)
+{
+ struct vb2_v4l2_buffer *src_buf;
+ enum vb2_buffer_state state;
+ int i = 0;
+
+ if (count <= 0)
+ return;
+
+ while (i < count) {
+ src_buf = v4l2_m2m_src_buf_remove(inst->fh.m2m_ctx);
+ if (!src_buf || vpu_get_buffer_state(src_buf) == VPU_BUF_STATE_IDLE)
+ return;
+ if (vpu_get_buffer_state(src_buf) == VPU_BUF_STATE_DECODED)
+ state = VB2_BUF_STATE_DONE;
+ else
+ state = VB2_BUF_STATE_ERROR;
+ i++;
+ vpu_set_buffer_state(src_buf, VPU_BUF_STATE_IDLE);
+ v4l2_m2m_buf_done(src_buf, state);
+ }
+}
+
struct vb2_v4l2_buffer *vpu_find_buf_by_sequence(struct vpu_inst *inst, u32 type, u32 sequence)
{
struct v4l2_m2m_buffer *buf = NULL;
@@ -440,10 +496,12 @@ static int vpu_vb2_start_streaming(struct vb2_queue *q, unsigned int count)
fmt->sizeimage[1], fmt->bytesperline[1],
fmt->sizeimage[2], fmt->bytesperline[2],
q->num_buffers);
- call_void_vop(inst, start, q->type);
vb2_clear_last_buffer_dequeued(q);
+ ret = call_vop(inst, start, q->type);
+ if (ret)
+ vpu_vb2_buffers_return(inst, q->type, VB2_BUF_STATE_QUEUED);

- return 0;
+ return ret;
}

static void vpu_vb2_stop_streaming(struct vb2_queue *q)
diff --git a/drivers/media/platform/amphion/vpu_v4l2.h b/drivers/media/platform/amphion/vpu_v4l2.h
index 90fa7ea67495..795ca33a6a50 100644
--- a/drivers/media/platform/amphion/vpu_v4l2.h
+++ b/drivers/media/platform/amphion/vpu_v4l2.h
@@ -19,6 +19,8 @@ int vpu_v4l2_close(struct file *file);
const struct vpu_format *vpu_try_fmt_common(struct vpu_inst *inst, struct v4l2_format *f);
int vpu_process_output_buffer(struct vpu_inst *inst);
int vpu_process_capture_buffer(struct vpu_inst *inst);
+struct vb2_v4l2_buffer *vpu_next_src_buf(struct vpu_inst *inst);
+void vpu_skip_frame(struct vpu_inst *inst, int count);
struct vb2_v4l2_buffer *vpu_find_buf_by_sequence(struct vpu_inst *inst, u32 type, u32 sequence);
struct vb2_v4l2_buffer *vpu_find_buf_by_idx(struct vpu_inst *inst, u32 type, u32 idx);
void vpu_v4l2_set_error(struct vpu_inst *inst);
@@ -27,6 +29,7 @@ int vpu_notify_source_change(struct vpu_inst *inst);
int vpu_set_last_buffer_dequeued(struct vpu_inst *inst);
void vpu_vb2_buffers_return(struct vpu_inst *inst, unsigned int type, enum vb2_buffer_state state);
int vpu_get_num_buffers(struct vpu_inst *inst, u32 type);
+bool vpu_is_source_empty(struct vpu_inst *inst);

dma_addr_t vpu_get_vb_phy_addr(struct vb2_buffer *vb, u32 plane_no);
unsigned int vpu_get_vb_length(struct vb2_buffer *vb, u32 plane_no);
diff --git a/drivers/media/platform/atmel/atmel-sama7g5-isc.c b/drivers/media/platform/atmel/atmel-sama7g5-isc.c
index 07a80b08bc54..e292a8d790a3 100644
--- a/drivers/media/platform/atmel/atmel-sama7g5-isc.c
+++ b/drivers/media/platform/atmel/atmel-sama7g5-isc.c
@@ -612,11 +612,13 @@ static const struct dev_pm_ops microchip_xisc_dev_pm_ops = {
SET_RUNTIME_PM_OPS(xisc_runtime_suspend, xisc_runtime_resume, NULL)
};

+#if IS_ENABLED(CONFIG_OF)
static const struct of_device_id microchip_xisc_of_match[] = {
{ .compatible = "microchip,sama7g5-isc" },
{ }
};
MODULE_DEVICE_TABLE(of, microchip_xisc_of_match);
+#endif

static struct platform_driver microchip_xisc_driver = {
.probe = microchip_xisc_probe,
diff --git a/drivers/media/platform/mediatek/mdp/mtk_mdp_ipi.h b/drivers/media/platform/mediatek/mdp/mtk_mdp_ipi.h
index 2cb8cecb3077..b810c96695c8 100644
--- a/drivers/media/platform/mediatek/mdp/mtk_mdp_ipi.h
+++ b/drivers/media/platform/mediatek/mdp/mtk_mdp_ipi.h
@@ -40,12 +40,14 @@ struct mdp_ipi_init {
* @ipi_id : IPI_MDP
* @ap_inst : AP mtk_mdp_vpu address
* @vpu_inst_addr : VPU MDP instance address
+ * @padding : Alignment padding
*/
struct mdp_ipi_comm {
uint32_t msg_id;
uint32_t ipi_id;
uint64_t ap_inst;
uint32_t vpu_inst_addr;
+ uint32_t padding;
};

/**
diff --git a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec.c b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec.c
index c8ee5e2b4f69..54489f64533e 100644
--- a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec.c
+++ b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec.c
@@ -112,8 +112,6 @@ void mtk_vcodec_dec_set_default_params(struct mtk_vcodec_ctx *ctx)
{
struct mtk_q_data *q_data;

- ctx->dev->vdec_pdata->init_vdec_params(ctx);
-
ctx->m2m_ctx->q_lock = &ctx->dev->dev_mutex;
ctx->fh.m2m_ctx = ctx->m2m_ctx;
ctx->fh.ctrl_handler = &ctx->ctrl_hdl;
@@ -196,6 +194,11 @@ static int vidioc_vdec_querycap(struct file *file, void *priv,
static int vidioc_vdec_subscribe_evt(struct v4l2_fh *fh,
const struct v4l2_event_subscription *sub)
{
+ struct mtk_vcodec_ctx *ctx = fh_to_ctx(fh);
+
+ if (ctx->dev->vdec_pdata->uses_stateless_api)
+ return v4l2_ctrl_subscribe_event(fh, sub);
+
switch (sub->type) {
case V4L2_EVENT_EOS:
return v4l2_event_subscribe(fh, sub, 2, NULL);
diff --git a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_drv.c b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_drv.c
index fe7b2f1739b1..ff8f2a4829a3 100644
--- a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_drv.c
+++ b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_drv.c
@@ -211,9 +211,12 @@ static int fops_vcodec_open(struct file *file)

dev->dec_capability =
mtk_vcodec_fw_get_vdec_capa(dev->fw_handler);
+
mtk_v4l2_debug(0, "decoder capability %x", dev->dec_capability);
}

+ ctx->dev->vdec_pdata->init_vdec_params(ctx);
+
list_add(&ctx->list, &dev->ctx_list);

mutex_unlock(&dev->dev_mutex);
@@ -391,6 +394,8 @@ static int mtk_vcodec_probe(struct platform_device *pdev)
mtk_v4l2_err("Main device of_platform_populate failed.");
goto err_reg_cont;
}
+ } else {
+ set_bit(MTK_VDEC_CORE, dev->subdev_bitmap);
}

ret = video_register_device(vfd_dec, VFL_TYPE_VIDEO, -1);
diff --git a/drivers/media/platform/nxp/imx-jpeg/mxc-jpeg-hw.c b/drivers/media/platform/nxp/imx-jpeg/mxc-jpeg-hw.c
index 29c604b1b179..718b7b08f93e 100644
--- a/drivers/media/platform/nxp/imx-jpeg/mxc-jpeg-hw.c
+++ b/drivers/media/platform/nxp/imx-jpeg/mxc-jpeg-hw.c
@@ -79,6 +79,11 @@ void mxc_jpeg_enable_irq(void __iomem *reg, int slot)
writel(0xFFFFFFFF, reg + MXC_SLOT_OFFSET(slot, SLOT_IRQ_EN));
}

+void mxc_jpeg_disable_irq(void __iomem *reg, int slot)
+{
+ writel(0x0, reg + MXC_SLOT_OFFSET(slot, SLOT_IRQ_EN));
+}
+
void mxc_jpeg_sw_reset(void __iomem *reg)
{
/*
diff --git a/drivers/media/platform/nxp/imx-jpeg/mxc-jpeg-hw.h b/drivers/media/platform/nxp/imx-jpeg/mxc-jpeg-hw.h
index ae70d3a0dc24..bf4e1973a066 100644
--- a/drivers/media/platform/nxp/imx-jpeg/mxc-jpeg-hw.h
+++ b/drivers/media/platform/nxp/imx-jpeg/mxc-jpeg-hw.h
@@ -53,10 +53,10 @@
#define CAST_REC_REGS_SEL CAST_STATUS4
#define CAST_LUMTH CAST_STATUS5
#define CAST_CHRTH CAST_STATUS6
-#define CAST_NOMFRSIZE_LO CAST_STATUS7
-#define CAST_NOMFRSIZE_HI CAST_STATUS8
-#define CAST_OFBSIZE_LO CAST_STATUS9
-#define CAST_OFBSIZE_HI CAST_STATUS10
+#define CAST_NOMFRSIZE_LO CAST_STATUS16
+#define CAST_NOMFRSIZE_HI CAST_STATUS17
+#define CAST_OFBSIZE_LO CAST_STATUS18
+#define CAST_OFBSIZE_HI CAST_STATUS19

#define MXC_MAX_SLOTS 1 /* TODO use all 4 slots*/
/* JPEG-Decoder Wrapper Slot Registers 0..3 */
@@ -125,6 +125,7 @@ u32 mxc_jpeg_get_offset(void __iomem *reg, int slot);
void mxc_jpeg_enable_slot(void __iomem *reg, int slot);
void mxc_jpeg_set_l_endian(void __iomem *reg, int le);
void mxc_jpeg_enable_irq(void __iomem *reg, int slot);
+void mxc_jpeg_disable_irq(void __iomem *reg, int slot);
int mxc_jpeg_set_input(void __iomem *reg, u32 in_buf, u32 bufsize);
int mxc_jpeg_set_output(void __iomem *reg, u16 out_pitch, u32 out_buf,
u16 w, u16 h);
diff --git a/drivers/media/platform/nxp/imx-jpeg/mxc-jpeg.c b/drivers/media/platform/nxp/imx-jpeg/mxc-jpeg.c
index d1ec1f4b506b..55c4b29d3e4e 100644
--- a/drivers/media/platform/nxp/imx-jpeg/mxc-jpeg.c
+++ b/drivers/media/platform/nxp/imx-jpeg/mxc-jpeg.c
@@ -82,6 +82,7 @@ static const struct mxc_jpeg_fmt mxc_formats[] = {
.h_align = 3,
.v_align = 3,
.flags = MXC_JPEG_FMT_TYPE_RAW,
+ .precision = 8,
},
{
.name = "ARGB", /* ARGBARGB packed format */
@@ -93,6 +94,7 @@ static const struct mxc_jpeg_fmt mxc_formats[] = {
.h_align = 3,
.v_align = 3,
.flags = MXC_JPEG_FMT_TYPE_RAW,
+ .precision = 8,
},
{
.name = "YUV420", /* 1st plane = Y, 2nd plane = UV */
@@ -104,6 +106,7 @@ static const struct mxc_jpeg_fmt mxc_formats[] = {
.h_align = 4,
.v_align = 4,
.flags = MXC_JPEG_FMT_TYPE_RAW,
+ .precision = 8,
},
{
.name = "YUV422", /* YUYV */
@@ -115,6 +118,7 @@ static const struct mxc_jpeg_fmt mxc_formats[] = {
.h_align = 4,
.v_align = 3,
.flags = MXC_JPEG_FMT_TYPE_RAW,
+ .precision = 8,
},
{
.name = "YUV444", /* YUVYUV */
@@ -126,6 +130,7 @@ static const struct mxc_jpeg_fmt mxc_formats[] = {
.h_align = 3,
.v_align = 3,
.flags = MXC_JPEG_FMT_TYPE_RAW,
+ .precision = 8,
},
{
.name = "Gray", /* Gray (Y8/Y12) or Single Comp */
@@ -137,6 +142,7 @@ static const struct mxc_jpeg_fmt mxc_formats[] = {
.h_align = 3,
.v_align = 3,
.flags = MXC_JPEG_FMT_TYPE_RAW,
+ .precision = 8,
},
};

@@ -309,6 +315,9 @@ struct mxc_jpeg_src_buf {
/* mxc-jpeg specific */
bool dht_needed;
bool jpeg_parse_error;
+ const struct mxc_jpeg_fmt *fmt;
+ int w;
+ int h;
};

static inline struct mxc_jpeg_src_buf *vb2_to_mxc_buf(struct vb2_buffer *vb)
@@ -321,6 +330,9 @@ static unsigned int debug;
module_param(debug, int, 0644);
MODULE_PARM_DESC(debug, "Debug level (0-3)");

+static void mxc_jpeg_bytesperline(struct mxc_jpeg_q_data *q, u32 precision);
+static void mxc_jpeg_sizeimage(struct mxc_jpeg_q_data *q);
+
static void _bswap16(u16 *a)
{
*a = ((*a & 0x00FF) << 8) | ((*a & 0xFF00) >> 8);
@@ -508,6 +520,7 @@ static bool mxc_jpeg_alloc_slot_data(struct mxc_jpeg_dev *jpeg,
GFP_ATOMIC);
if (!cfg_stm)
goto err;
+ memset(cfg_stm, 0, MXC_JPEG_MAX_CFG_STREAM);
jpeg->slot_data[slot].cfg_stream_vaddr = cfg_stm;

skip_alloc:
@@ -546,6 +559,18 @@ static void mxc_jpeg_free_slot_data(struct mxc_jpeg_dev *jpeg,
jpeg->slot_data[slot].used = false;
}

+static void mxc_jpeg_check_and_set_last_buffer(struct mxc_jpeg_ctx *ctx,
+ struct vb2_v4l2_buffer *src_buf,
+ struct vb2_v4l2_buffer *dst_buf)
+{
+ if (v4l2_m2m_is_last_draining_src_buf(ctx->fh.m2m_ctx, src_buf)) {
+ dst_buf->flags |= V4L2_BUF_FLAG_LAST;
+ v4l2_m2m_mark_stopped(ctx->fh.m2m_ctx);
+ notify_eos(ctx);
+ ctx->header_parsed = false;
+ }
+}
+
static irqreturn_t mxc_jpeg_dec_irq(int irq, void *priv)
{
struct mxc_jpeg_dev *jpeg = priv;
@@ -568,15 +593,8 @@ static irqreturn_t mxc_jpeg_dec_irq(int irq, void *priv)
dev_dbg(dev, "Irq %d on slot %d.\n", irq, slot);

ctx = v4l2_m2m_get_curr_priv(jpeg->m2m_dev);
- if (!ctx) {
- dev_err(dev,
- "Instance released before the end of transaction.\n");
- /* soft reset only resets internal state, not registers */
- mxc_jpeg_sw_reset(reg);
- /* clear all interrupts */
- writel(0xFFFFFFFF, reg + MXC_SLOT_OFFSET(slot, SLOT_STATUS));
+ if (WARN_ON(!ctx))
goto job_unlock;
- }

if (slot != ctx->slot) {
/* TODO investigate when adding multi-instance support */
@@ -620,6 +638,7 @@ static irqreturn_t mxc_jpeg_dec_irq(int irq, void *priv)
dev_dbg(dev, "Decoder DHT cfg finished. Start decoding...\n");
goto job_unlock;
}
+
if (jpeg->mode == MXC_JPEG_ENCODE) {
payload = readl(reg + MXC_SLOT_OFFSET(slot, SLOT_BUF_PTR));
vb2_set_plane_payload(&dst_buf->vb2_buf, 0, payload);
@@ -647,7 +666,9 @@ static irqreturn_t mxc_jpeg_dec_irq(int irq, void *priv)
buf_state = VB2_BUF_STATE_DONE;

buffers_done:
+ mxc_jpeg_disable_irq(reg, ctx->slot);
jpeg->slot_data[slot].used = false; /* unused, but don't free */
+ mxc_jpeg_check_and_set_last_buffer(ctx, src_buf, dst_buf);
v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
v4l2_m2m_buf_done(src_buf, buf_state);
@@ -743,7 +764,13 @@ static unsigned int mxc_jpeg_setup_cfg_stream(void *cfg_stream_vaddr,
u32 fourcc,
u16 w, u16 h)
{
- unsigned int offset = 0;
+ /*
+ * There is a hardware issue that first 128 bytes of configuration data
+ * can't be loaded correctly.
+ * To avoid this issue, we need to write the configuration from
+ * an offset which should be no less than 0x80 (128 bytes).
+ */
+ unsigned int offset = 0x80;
u8 *cfg = (u8 *)cfg_stream_vaddr;
struct mxc_jpeg_sof *sof;
struct mxc_jpeg_sos *sos;
@@ -875,8 +902,8 @@ static void mxc_jpeg_config_enc_desc(struct vb2_buffer *out_buf,
jpeg->slot_data[slot].cfg_stream_size =
mxc_jpeg_setup_cfg_stream(cfg_stream_vaddr,
q_data->fmt->fourcc,
- q_data->w_adjusted,
- q_data->h_adjusted);
+ q_data->w,
+ q_data->h);

/* chain the config descriptor with the encoding descriptor */
cfg_desc->next_descpt_ptr = desc_handle | MXC_NXT_DESCPT_EN;
@@ -916,6 +943,67 @@ static void mxc_jpeg_config_enc_desc(struct vb2_buffer *out_buf,
mxc_jpeg_set_desc(cfg_desc_handle, reg, slot);
}

+static bool mxc_jpeg_source_change(struct mxc_jpeg_ctx *ctx,
+ struct mxc_jpeg_src_buf *jpeg_src_buf)
+{
+ struct device *dev = ctx->mxc_jpeg->dev;
+ struct mxc_jpeg_q_data *q_data_cap;
+
+ if (!jpeg_src_buf->fmt)
+ return false;
+
+ q_data_cap = mxc_jpeg_get_q_data(ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE);
+ if (q_data_cap->fmt != jpeg_src_buf->fmt ||
+ q_data_cap->w != jpeg_src_buf->w ||
+ q_data_cap->h != jpeg_src_buf->h) {
+ dev_dbg(dev, "Detected jpeg res=(%dx%d)->(%dx%d), pixfmt=%c%c%c%c\n",
+ q_data_cap->w, q_data_cap->h,
+ jpeg_src_buf->w, jpeg_src_buf->h,
+ (jpeg_src_buf->fmt->fourcc & 0xff),
+ (jpeg_src_buf->fmt->fourcc >> 8) & 0xff,
+ (jpeg_src_buf->fmt->fourcc >> 16) & 0xff,
+ (jpeg_src_buf->fmt->fourcc >> 24) & 0xff);
+
+ /*
+ * set-up the capture queue with the pixelformat and resolution
+ * detected from the jpeg output stream
+ */
+ q_data_cap->w = jpeg_src_buf->w;
+ q_data_cap->h = jpeg_src_buf->h;
+ q_data_cap->fmt = jpeg_src_buf->fmt;
+ q_data_cap->w_adjusted = q_data_cap->w;
+ q_data_cap->h_adjusted = q_data_cap->h;
+
+ /*
+ * align up the resolution for CAST IP,
+ * but leave the buffer resolution unchanged
+ */
+ v4l_bound_align_image(&q_data_cap->w_adjusted,
+ q_data_cap->w_adjusted, /* adjust up */
+ MXC_JPEG_MAX_WIDTH,
+ q_data_cap->fmt->h_align,
+ &q_data_cap->h_adjusted,
+ q_data_cap->h_adjusted, /* adjust up */
+ MXC_JPEG_MAX_HEIGHT,
+ 0,
+ 0);
+
+ /* setup bytesperline/sizeimage for capture queue */
+ mxc_jpeg_bytesperline(q_data_cap, jpeg_src_buf->fmt->precision);
+ mxc_jpeg_sizeimage(q_data_cap);
+ notify_src_chg(ctx);
+ ctx->source_change = 1;
+ }
+ return ctx->source_change ? true : false;
+}
+
+static int mxc_jpeg_job_ready(void *priv)
+{
+ struct mxc_jpeg_ctx *ctx = priv;
+
+ return ctx->source_change ? 0 : 1;
+}
+
static void mxc_jpeg_device_run(void *priv)
{
struct mxc_jpeg_ctx *ctx = priv;
@@ -954,6 +1042,7 @@ static void mxc_jpeg_device_run(void *priv)
jpeg_src_buf->jpeg_parse_error = true;
}
if (jpeg_src_buf->jpeg_parse_error) {
+ mxc_jpeg_check_and_set_last_buffer(ctx, src_buf, dst_buf);
v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
v4l2_m2m_buf_done(src_buf, VB2_BUF_STATE_ERROR);
@@ -963,6 +1052,13 @@ static void mxc_jpeg_device_run(void *priv)

return;
}
+ if (ctx->mxc_jpeg->mode == MXC_JPEG_DECODE) {
+ if (ctx->source_change || mxc_jpeg_source_change(ctx, jpeg_src_buf)) {
+ spin_unlock_irqrestore(&ctx->mxc_jpeg->hw_lock, flags);
+ v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx);
+ return;
+ }
+ }

mxc_jpeg_enable(reg);
mxc_jpeg_set_l_endian(reg, 1);
@@ -997,44 +1093,33 @@ static void mxc_jpeg_device_run(void *priv)
spin_unlock_irqrestore(&ctx->mxc_jpeg->hw_lock, flags);
}

-static void mxc_jpeg_set_last_buffer_dequeued(struct mxc_jpeg_ctx *ctx)
-{
- struct vb2_queue *q;
-
- ctx->stopped = 1;
- q = v4l2_m2m_get_dst_vq(ctx->fh.m2m_ctx);
- if (!list_empty(&q->done_list))
- return;
-
- q->last_buffer_dequeued = true;
- wake_up(&q->done_wq);
- ctx->stopped = 0;
-}
-
static int mxc_jpeg_decoder_cmd(struct file *file, void *priv,
struct v4l2_decoder_cmd *cmd)
{
struct v4l2_fh *fh = file->private_data;
struct mxc_jpeg_ctx *ctx = mxc_jpeg_fh_to_ctx(fh);
- struct device *dev = ctx->mxc_jpeg->dev;
int ret;

ret = v4l2_m2m_ioctl_try_decoder_cmd(file, fh, cmd);
if (ret < 0)
return ret;

- if (cmd->cmd == V4L2_DEC_CMD_STOP) {
- dev_dbg(dev, "Received V4L2_DEC_CMD_STOP");
- if (v4l2_m2m_num_src_bufs_ready(fh->m2m_ctx) == 0) {
- /* No more src bufs, notify app EOS */
- notify_eos(ctx);
- mxc_jpeg_set_last_buffer_dequeued(ctx);
- } else {
- /* will send EOS later*/
- ctx->stopping = 1;
- }
+ if (!vb2_is_streaming(v4l2_m2m_get_src_vq(fh->m2m_ctx)))
+ return 0;
+
+ ret = v4l2_m2m_ioctl_decoder_cmd(file, priv, cmd);
+ if (ret < 0)
+ return ret;
+
+ if (cmd->cmd == V4L2_DEC_CMD_STOP &&
+ v4l2_m2m_has_stopped(fh->m2m_ctx)) {
+ notify_eos(ctx);
+ ctx->header_parsed = false;
}

+ if (cmd->cmd == V4L2_DEC_CMD_START &&
+ v4l2_m2m_has_stopped(fh->m2m_ctx))
+ vb2_clear_last_buffer_dequeued(&fh->m2m_ctx->cap_q_ctx.q);
return 0;
}

@@ -1043,24 +1128,27 @@ static int mxc_jpeg_encoder_cmd(struct file *file, void *priv,
{
struct v4l2_fh *fh = file->private_data;
struct mxc_jpeg_ctx *ctx = mxc_jpeg_fh_to_ctx(fh);
- struct device *dev = ctx->mxc_jpeg->dev;
int ret;

ret = v4l2_m2m_ioctl_try_encoder_cmd(file, fh, cmd);
if (ret < 0)
return ret;

- if (cmd->cmd == V4L2_ENC_CMD_STOP) {
- dev_dbg(dev, "Received V4L2_ENC_CMD_STOP");
- if (v4l2_m2m_num_src_bufs_ready(fh->m2m_ctx) == 0) {
- /* No more src bufs, notify app EOS */
- notify_eos(ctx);
- mxc_jpeg_set_last_buffer_dequeued(ctx);
- } else {
- /* will send EOS later*/
- ctx->stopping = 1;
- }
- }
+ if (!vb2_is_streaming(v4l2_m2m_get_src_vq(fh->m2m_ctx)) ||
+ !vb2_is_streaming(v4l2_m2m_get_dst_vq(fh->m2m_ctx)))
+ return 0;
+
+ ret = v4l2_m2m_ioctl_encoder_cmd(file, fh, cmd);
+ if (ret < 0)
+ return 0;
+
+ if (cmd->cmd == V4L2_ENC_CMD_STOP &&
+ v4l2_m2m_has_stopped(fh->m2m_ctx))
+ notify_eos(ctx);
+
+ if (cmd->cmd == V4L2_ENC_CMD_START &&
+ v4l2_m2m_has_stopped(fh->m2m_ctx))
+ vb2_clear_last_buffer_dequeued(&fh->m2m_ctx->cap_q_ctx.q);

return 0;
}
@@ -1073,16 +1161,28 @@ static int mxc_jpeg_queue_setup(struct vb2_queue *q,
{
struct mxc_jpeg_ctx *ctx = vb2_get_drv_priv(q);
struct mxc_jpeg_q_data *q_data = NULL;
+ struct mxc_jpeg_q_data tmp_q;
int i;

q_data = mxc_jpeg_get_q_data(ctx, q->type);
if (!q_data)
return -EINVAL;

+ tmp_q.fmt = q_data->fmt;
+ tmp_q.w = q_data->w_adjusted;
+ tmp_q.h = q_data->h_adjusted;
+ for (i = 0; i < MXC_JPEG_MAX_PLANES; i++) {
+ tmp_q.bytesperline[i] = q_data->bytesperline[i];
+ tmp_q.sizeimage[i] = q_data->sizeimage[i];
+ }
+ mxc_jpeg_sizeimage(&tmp_q);
+ for (i = 0; i < MXC_JPEG_MAX_PLANES; i++)
+ tmp_q.sizeimage[i] = max(tmp_q.sizeimage[i], q_data->sizeimage[i]);
+
/* Handle CREATE_BUFS situation - *nplanes != 0 */
if (*nplanes) {
for (i = 0; i < *nplanes; i++) {
- if (sizes[i] < q_data->sizeimage[i])
+ if (sizes[i] < tmp_q.sizeimage[i])
return -EINVAL;
}
return 0;
@@ -1091,7 +1191,7 @@ static int mxc_jpeg_queue_setup(struct vb2_queue *q,
/* Handle REQBUFS situation */
*nplanes = q_data->fmt->colplanes;
for (i = 0; i < *nplanes; i++)
- sizes[i] = q_data->sizeimage[i];
+ sizes[i] = tmp_q.sizeimage[i];

return 0;
}
@@ -1102,6 +1202,10 @@ static int mxc_jpeg_start_streaming(struct vb2_queue *q, unsigned int count)
struct mxc_jpeg_q_data *q_data = mxc_jpeg_get_q_data(ctx, q->type);
int ret;

+ v4l2_m2m_update_start_streaming_state(ctx->fh.m2m_ctx, q);
+
+ if (ctx->mxc_jpeg->mode == MXC_JPEG_DECODE && V4L2_TYPE_IS_CAPTURE(q->type))
+ ctx->source_change = 0;
dev_dbg(ctx->mxc_jpeg->dev, "Start streaming ctx=%p", ctx);
q_data->sequence = 0;

@@ -1131,11 +1235,15 @@ static void mxc_jpeg_stop_streaming(struct vb2_queue *q)
break;
v4l2_m2m_buf_done(vbuf, VB2_BUF_STATE_ERROR);
}
- pm_runtime_put_sync(&ctx->mxc_jpeg->pdev->dev);
- if (V4L2_TYPE_IS_OUTPUT(q->type)) {
- ctx->stopping = 0;
- ctx->stopped = 0;
+
+ v4l2_m2m_update_stop_streaming_state(ctx->fh.m2m_ctx, q);
+ if (V4L2_TYPE_IS_OUTPUT(q->type) &&
+ v4l2_m2m_has_stopped(ctx->fh.m2m_ctx)) {
+ notify_eos(ctx);
+ ctx->header_parsed = false;
}
+
+ pm_runtime_put_sync(&ctx->mxc_jpeg->pdev->dev);
}

static int mxc_jpeg_valid_comp_id(struct device *dev,
@@ -1175,14 +1283,17 @@ static u32 mxc_jpeg_get_image_format(struct device *dev,

for (i = 0; i < MXC_JPEG_NUM_FORMATS; i++)
if (mxc_formats[i].subsampling == header->frame.subsampling &&
- mxc_formats[i].nc == header->frame.num_components) {
+ mxc_formats[i].nc == header->frame.num_components &&
+ mxc_formats[i].precision == header->frame.precision) {
fourcc = mxc_formats[i].fourcc;
break;
}
if (fourcc == 0) {
- dev_err(dev, "Could not identify image format nc=%d, subsampling=%d\n",
+ dev_err(dev,
+ "Could not identify image format nc=%d, subsampling=%d, precision=%d\n",
header->frame.num_components,
- header->frame.subsampling);
+ header->frame.subsampling,
+ header->frame.precision);
return fourcc;
}
/*
@@ -1200,26 +1311,29 @@ static u32 mxc_jpeg_get_image_format(struct device *dev,
return fourcc;
}

-static void mxc_jpeg_bytesperline(struct mxc_jpeg_q_data *q,
- u32 precision)
+static void mxc_jpeg_bytesperline(struct mxc_jpeg_q_data *q, u32 precision)
{
/* Bytes distance between the leftmost pixels in two adjacent lines */
if (q->fmt->fourcc == V4L2_PIX_FMT_JPEG) {
/* bytesperline unused for compressed formats */
q->bytesperline[0] = 0;
q->bytesperline[1] = 0;
- } else if (q->fmt->fourcc == V4L2_PIX_FMT_NV12M) {
+ } else if (q->fmt->subsampling == V4L2_JPEG_CHROMA_SUBSAMPLING_420) {
/* When the image format is planar the bytesperline value
* applies to the first plane and is divided by the same factor
* as the width field for the other planes
*/
- q->bytesperline[0] = q->w * (precision / 8) *
- (q->fmt->depth / 8);
+ q->bytesperline[0] = q->w * DIV_ROUND_UP(precision, 8);
q->bytesperline[1] = q->bytesperline[0];
+ } else if (q->fmt->subsampling == V4L2_JPEG_CHROMA_SUBSAMPLING_422) {
+ q->bytesperline[0] = q->w * DIV_ROUND_UP(precision, 8) * 2;
+ q->bytesperline[1] = 0;
+ } else if (q->fmt->subsampling == V4L2_JPEG_CHROMA_SUBSAMPLING_444) {
+ q->bytesperline[0] = q->w * DIV_ROUND_UP(precision, 8) * q->fmt->nc;
+ q->bytesperline[1] = 0;
} else {
- /* single plane formats */
- q->bytesperline[0] = q->w * (precision / 8) *
- (q->fmt->depth / 8);
+ /* grayscale */
+ q->bytesperline[0] = q->w * DIV_ROUND_UP(precision, 8);
q->bytesperline[1] = 0;
}
}
@@ -1245,17 +1359,17 @@ static void mxc_jpeg_sizeimage(struct mxc_jpeg_q_data *q)
}
}

-static int mxc_jpeg_parse(struct mxc_jpeg_ctx *ctx,
- u8 *src_addr, u32 size, bool *dht_needed)
+static int mxc_jpeg_parse(struct mxc_jpeg_ctx *ctx, struct vb2_buffer *vb)
{
struct device *dev = ctx->mxc_jpeg->dev;
- struct mxc_jpeg_q_data *q_data_out, *q_data_cap;
- enum v4l2_buf_type cap_type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
- bool src_chg = false;
+ struct mxc_jpeg_q_data *q_data_out;
u32 fourcc;
struct v4l2_jpeg_header header;
struct mxc_jpeg_sof *psof = NULL;
struct mxc_jpeg_sos *psos = NULL;
+ struct mxc_jpeg_src_buf *jpeg_src_buf = vb2_to_mxc_buf(vb);
+ u8 *src_addr = (u8 *)vb2_plane_vaddr(vb, 0);
+ u32 size = vb2_get_plane_payload(vb, 0);
int ret;

memset(&header, 0, sizeof(header));
@@ -1266,7 +1380,7 @@ static int mxc_jpeg_parse(struct mxc_jpeg_ctx *ctx,
}

/* if DHT marker present, no need to inject default one */
- *dht_needed = (header.num_dht == 0);
+ jpeg_src_buf->dht_needed = (header.num_dht == 0);

q_data_out = mxc_jpeg_get_q_data(ctx,
V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE);
@@ -1274,21 +1388,15 @@ static int mxc_jpeg_parse(struct mxc_jpeg_ctx *ctx,
dev_warn(dev, "Invalid user resolution 0x0");
dev_warn(dev, "Keeping resolution from JPEG: %dx%d",
header.frame.width, header.frame.height);
- q_data_out->w = header.frame.width;
- q_data_out->h = header.frame.height;
} else if (header.frame.width != q_data_out->w ||
header.frame.height != q_data_out->h) {
dev_err(dev,
"Resolution mismatch: %dx%d (JPEG) versus %dx%d(user)",
header.frame.width, header.frame.height,
q_data_out->w, q_data_out->h);
- return -EINVAL;
- }
- if (header.frame.width % 8 != 0 || header.frame.height % 8 != 0) {
- dev_err(dev, "JPEG width or height not multiple of 8: %dx%d\n",
- header.frame.width, header.frame.height);
- return -EINVAL;
}
+ q_data_out->w = header.frame.width;
+ q_data_out->h = header.frame.height;
if (header.frame.width > MXC_JPEG_MAX_WIDTH ||
header.frame.height > MXC_JPEG_MAX_HEIGHT) {
dev_err(dev, "JPEG width or height should be <= 8192: %dx%d\n",
@@ -1316,51 +1424,13 @@ static int mxc_jpeg_parse(struct mxc_jpeg_ctx *ctx,
if (fourcc == 0)
return -EINVAL;

- /*
- * set-up the capture queue with the pixelformat and resolution
- * detected from the jpeg output stream
- */
- q_data_cap = mxc_jpeg_get_q_data(ctx, cap_type);
- if (q_data_cap->w != header.frame.width ||
- q_data_cap->h != header.frame.height)
- src_chg = true;
- q_data_cap->w = header.frame.width;
- q_data_cap->h = header.frame.height;
- q_data_cap->fmt = mxc_jpeg_find_format(ctx, fourcc);
- q_data_cap->w_adjusted = q_data_cap->w;
- q_data_cap->h_adjusted = q_data_cap->h;
- /*
- * align up the resolution for CAST IP,
- * but leave the buffer resolution unchanged
- */
- v4l_bound_align_image(&q_data_cap->w_adjusted,
- q_data_cap->w_adjusted, /* adjust up */
- MXC_JPEG_MAX_WIDTH,
- q_data_cap->fmt->h_align,
- &q_data_cap->h_adjusted,
- q_data_cap->h_adjusted, /* adjust up */
- MXC_JPEG_MAX_HEIGHT,
- q_data_cap->fmt->v_align,
- 0);
- dev_dbg(dev, "Detected jpeg res=(%dx%d)->(%dx%d), pixfmt=%c%c%c%c\n",
- q_data_cap->w, q_data_cap->h,
- q_data_cap->w_adjusted, q_data_cap->h_adjusted,
- (fourcc & 0xff),
- (fourcc >> 8) & 0xff,
- (fourcc >> 16) & 0xff,
- (fourcc >> 24) & 0xff);
-
- /* setup bytesperline/sizeimage for capture queue */
- mxc_jpeg_bytesperline(q_data_cap, header.frame.precision);
- mxc_jpeg_sizeimage(q_data_cap);
+ jpeg_src_buf->fmt = mxc_jpeg_find_format(ctx, fourcc);
+ jpeg_src_buf->w = header.frame.width;
+ jpeg_src_buf->h = header.frame.height;
+ ctx->header_parsed = true;

- /*
- * if the CAPTURE format was updated with new values, regardless of
- * whether they match the values set by the client or not, signal
- * a source change event
- */
- if (src_chg)
- notify_src_chg(ctx);
+ if (!v4l2_m2m_num_src_bufs_ready(ctx->fh.m2m_ctx))
+ mxc_jpeg_source_change(ctx, jpeg_src_buf);

return 0;
}
@@ -1372,6 +1442,20 @@ static void mxc_jpeg_buf_queue(struct vb2_buffer *vb)
struct mxc_jpeg_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
struct mxc_jpeg_src_buf *jpeg_src_buf;

+ if (V4L2_TYPE_IS_CAPTURE(vb->vb2_queue->type) &&
+ vb2_is_streaming(vb->vb2_queue) &&
+ v4l2_m2m_dst_buf_is_last(ctx->fh.m2m_ctx)) {
+ struct mxc_jpeg_q_data *q_data;
+
+ q_data = mxc_jpeg_get_q_data(ctx, vb->vb2_queue->type);
+ vbuf->field = V4L2_FIELD_NONE;
+ vbuf->sequence = q_data->sequence++;
+ v4l2_m2m_last_buffer_done(ctx->fh.m2m_ctx, vbuf);
+ notify_eos(ctx);
+ ctx->header_parsed = false;
+ return;
+ }
+
if (vb->vb2_queue->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE)
goto end;

@@ -1381,10 +1465,7 @@ static void mxc_jpeg_buf_queue(struct vb2_buffer *vb)

jpeg_src_buf = vb2_to_mxc_buf(vb);
jpeg_src_buf->jpeg_parse_error = false;
- ret = mxc_jpeg_parse(ctx,
- (u8 *)vb2_plane_vaddr(vb, 0),
- vb2_get_plane_payload(vb, 0),
- &jpeg_src_buf->dht_needed);
+ ret = mxc_jpeg_parse(ctx, vb);
if (ret)
jpeg_src_buf->jpeg_parse_error = true;

@@ -1424,23 +1505,11 @@ static int mxc_jpeg_buf_prepare(struct vb2_buffer *vb)
}
vb2_set_plane_payload(vb, i, sizeimage);
}
- return 0;
-}
-
-static void mxc_jpeg_buf_finish(struct vb2_buffer *vb)
-{
- struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
- struct mxc_jpeg_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
- struct vb2_queue *q = vb->vb2_queue;
-
- if (V4L2_TYPE_IS_OUTPUT(vb->type))
- return;
- if (!ctx->stopped)
- return;
- if (list_empty(&q->done_list)) {
- vbuf->flags |= V4L2_BUF_FLAG_LAST;
- ctx->stopped = 0;
+ if (V4L2_TYPE_IS_CAPTURE(vb->vb2_queue->type)) {
+ vb2_set_plane_payload(vb, 0, 0);
+ vb2_set_plane_payload(vb, 1, 0);
}
+ return 0;
}

static const struct vb2_ops mxc_jpeg_qops = {
@@ -1449,7 +1518,6 @@ static const struct vb2_ops mxc_jpeg_qops = {
.wait_finish = vb2_ops_wait_finish,
.buf_out_validate = mxc_jpeg_buf_out_validate,
.buf_prepare = mxc_jpeg_buf_prepare,
- .buf_finish = mxc_jpeg_buf_finish,
.start_streaming = mxc_jpeg_start_streaming,
.stop_streaming = mxc_jpeg_stop_streaming,
.buf_queue = mxc_jpeg_buf_queue,
@@ -1510,7 +1578,7 @@ static void mxc_jpeg_set_default_params(struct mxc_jpeg_ctx *ctx)
q[i]->h = MXC_JPEG_DEFAULT_HEIGHT;
q[i]->w_adjusted = MXC_JPEG_DEFAULT_WIDTH;
q[i]->h_adjusted = MXC_JPEG_DEFAULT_HEIGHT;
- mxc_jpeg_bytesperline(q[i], 8);
+ mxc_jpeg_bytesperline(q[i], q[i]->fmt->precision);
mxc_jpeg_sizeimage(q[i]);
}
}
@@ -1585,26 +1653,42 @@ static int mxc_jpeg_enum_fmt_vid_cap(struct file *file, void *priv,
struct v4l2_fmtdesc *f)
{
struct mxc_jpeg_ctx *ctx = mxc_jpeg_fh_to_ctx(priv);
+ struct mxc_jpeg_q_data *q_data = mxc_jpeg_get_q_data(ctx, f->type);

- if (ctx->mxc_jpeg->mode == MXC_JPEG_ENCODE)
+ if (ctx->mxc_jpeg->mode == MXC_JPEG_ENCODE) {
return enum_fmt(mxc_formats, MXC_JPEG_NUM_FORMATS, f,
MXC_JPEG_FMT_TYPE_ENC);
- else
+ } else if (!ctx->header_parsed) {
return enum_fmt(mxc_formats, MXC_JPEG_NUM_FORMATS, f,
MXC_JPEG_FMT_TYPE_RAW);
+ } else {
+ /* For the decoder CAPTURE queue, only enumerate the raw formats
+ * supported for the format currently active on OUTPUT
+ * (more precisely what was propagated on capture queue
+ * after jpeg parse on the output buffer)
+ */
+ if (f->index)
+ return -EINVAL;
+ f->pixelformat = q_data->fmt->fourcc;
+ strscpy(f->description, q_data->fmt->name, sizeof(f->description));
+ return 0;
+ }
}

static int mxc_jpeg_enum_fmt_vid_out(struct file *file, void *priv,
struct v4l2_fmtdesc *f)
{
struct mxc_jpeg_ctx *ctx = mxc_jpeg_fh_to_ctx(priv);
+ u32 type = ctx->mxc_jpeg->mode == MXC_JPEG_DECODE ? MXC_JPEG_FMT_TYPE_ENC :
+ MXC_JPEG_FMT_TYPE_RAW;
+ int ret;

+ ret = enum_fmt(mxc_formats, MXC_JPEG_NUM_FORMATS, f, type);
+ if (ret)
+ return ret;
if (ctx->mxc_jpeg->mode == MXC_JPEG_DECODE)
- return enum_fmt(mxc_formats, MXC_JPEG_NUM_FORMATS, f,
- MXC_JPEG_FMT_TYPE_ENC);
- else
- return enum_fmt(mxc_formats, MXC_JPEG_NUM_FORMATS, f,
- MXC_JPEG_FMT_TYPE_RAW);
+ f->flags = V4L2_FMT_FLAG_DYN_RESOLUTION;
+ return 0;
}

static int mxc_jpeg_try_fmt(struct v4l2_format *f, const struct mxc_jpeg_fmt *fmt,
@@ -1624,22 +1708,17 @@ static int mxc_jpeg_try_fmt(struct v4l2_format *f, const struct mxc_jpeg_fmt *fm
pix_mp->num_planes = fmt->colplanes;
pix_mp->pixelformat = fmt->fourcc;

- /*
- * use MXC_JPEG_H_ALIGN instead of fmt->v_align, for vertical
- * alignment, to loosen up the alignment to multiple of 8,
- * otherwise NV12-1080p fails as 1080 is not a multiple of 16
- */
+ pix_mp->width = w;
+ pix_mp->height = h;
v4l_bound_align_image(&w,
- MXC_JPEG_MIN_WIDTH,
- w, /* adjust downwards*/
+ w, /* adjust upwards*/
+ MXC_JPEG_MAX_WIDTH,
fmt->h_align,
&h,
- MXC_JPEG_MIN_HEIGHT,
- h, /* adjust downwards*/
- MXC_JPEG_H_ALIGN,
+ h, /* adjust upwards*/
+ MXC_JPEG_MAX_HEIGHT,
+ 0,
0);
- pix_mp->width = w; /* negotiate the width */
- pix_mp->height = h; /* negotiate the height */

/* get user input into the tmp_q */
tmp_q.w = w;
@@ -1652,7 +1731,7 @@ static int mxc_jpeg_try_fmt(struct v4l2_format *f, const struct mxc_jpeg_fmt *fm
}

/* calculate bytesperline & sizeimage into the tmp_q */
- mxc_jpeg_bytesperline(&tmp_q, 8);
+ mxc_jpeg_bytesperline(&tmp_q, fmt->precision);
mxc_jpeg_sizeimage(&tmp_q);

/* adjust user format according to our calculations */
@@ -1765,35 +1844,19 @@ static int mxc_jpeg_s_fmt(struct mxc_jpeg_ctx *ctx,

q_data->w_adjusted = q_data->w;
q_data->h_adjusted = q_data->h;
- if (jpeg->mode == MXC_JPEG_DECODE) {
- /*
- * align up the resolution for CAST IP,
- * but leave the buffer resolution unchanged
- */
- v4l_bound_align_image(&q_data->w_adjusted,
- q_data->w_adjusted, /* adjust upwards */
- MXC_JPEG_MAX_WIDTH,
- q_data->fmt->h_align,
- &q_data->h_adjusted,
- q_data->h_adjusted, /* adjust upwards */
- MXC_JPEG_MAX_HEIGHT,
- q_data->fmt->v_align,
- 0);
- } else {
- /*
- * align down the resolution for CAST IP,
- * but leave the buffer resolution unchanged
- */
- v4l_bound_align_image(&q_data->w_adjusted,
- MXC_JPEG_MIN_WIDTH,
- q_data->w_adjusted, /* adjust downwards*/
- q_data->fmt->h_align,
- &q_data->h_adjusted,
- MXC_JPEG_MIN_HEIGHT,
- q_data->h_adjusted, /* adjust downwards*/
- q_data->fmt->v_align,
- 0);
- }
+ /*
+ * align up the resolution for CAST IP,
+ * but leave the buffer resolution unchanged
+ */
+ v4l_bound_align_image(&q_data->w_adjusted,
+ q_data->w_adjusted, /* adjust upwards */
+ MXC_JPEG_MAX_WIDTH,
+ q_data->fmt->h_align,
+ &q_data->h_adjusted,
+ q_data->h_adjusted, /* adjust upwards */
+ MXC_JPEG_MAX_HEIGHT,
+ q_data->fmt->v_align,
+ 0);

for (i = 0; i < pix_mp->num_planes; i++) {
q_data->bytesperline[i] = pix_mp->plane_fmt[i].bytesperline;
@@ -1875,27 +1938,6 @@ static int mxc_jpeg_subscribe_event(struct v4l2_fh *fh,
}
}

-static int mxc_jpeg_dqbuf(struct file *file, void *priv,
- struct v4l2_buffer *buf)
-{
- struct v4l2_fh *fh = file->private_data;
- struct mxc_jpeg_ctx *ctx = mxc_jpeg_fh_to_ctx(priv);
- struct device *dev = ctx->mxc_jpeg->dev;
- int num_src_ready = v4l2_m2m_num_src_bufs_ready(fh->m2m_ctx);
- int ret;
-
- dev_dbg(dev, "DQBUF type=%d, index=%d", buf->type, buf->index);
- if (ctx->stopping == 1 && num_src_ready == 0) {
- /* No more src bufs, notify app EOS */
- notify_eos(ctx);
- ctx->stopping = 0;
- mxc_jpeg_set_last_buffer_dequeued(ctx);
- }
-
- ret = v4l2_m2m_dqbuf(file, fh->m2m_ctx, buf);
- return ret;
-}
-
static const struct v4l2_ioctl_ops mxc_jpeg_ioctl_ops = {
.vidioc_querycap = mxc_jpeg_querycap,
.vidioc_enum_fmt_vid_cap = mxc_jpeg_enum_fmt_vid_cap,
@@ -1919,7 +1961,7 @@ static const struct v4l2_ioctl_ops mxc_jpeg_ioctl_ops = {
.vidioc_encoder_cmd = mxc_jpeg_encoder_cmd,

.vidioc_qbuf = v4l2_m2m_ioctl_qbuf,
- .vidioc_dqbuf = mxc_jpeg_dqbuf,
+ .vidioc_dqbuf = v4l2_m2m_ioctl_dqbuf,

.vidioc_create_bufs = v4l2_m2m_ioctl_create_bufs,
.vidioc_prepare_buf = v4l2_m2m_ioctl_prepare_buf,
@@ -1962,6 +2004,7 @@ static const struct v4l2_file_operations mxc_jpeg_fops = {
};

static const struct v4l2_m2m_ops mxc_jpeg_m2m_ops = {
+ .job_ready = mxc_jpeg_job_ready,
.device_run = mxc_jpeg_device_run,
};

@@ -2078,12 +2121,14 @@ static int mxc_jpeg_probe(struct platform_device *pdev)
jpeg->clk_ipg = devm_clk_get(dev, "ipg");
if (IS_ERR(jpeg->clk_ipg)) {
dev_err(dev, "failed to get clock: ipg\n");
+ ret = PTR_ERR(jpeg->clk_ipg);
goto err_clk;
}

jpeg->clk_per = devm_clk_get(dev, "per");
if (IS_ERR(jpeg->clk_per)) {
dev_err(dev, "failed to get clock: per\n");
+ ret = PTR_ERR(jpeg->clk_per);
goto err_clk;
}

diff --git a/drivers/media/platform/nxp/imx-jpeg/mxc-jpeg.h b/drivers/media/platform/nxp/imx-jpeg/mxc-jpeg.h
index f53f004ba851..542993eb8d5b 100644
--- a/drivers/media/platform/nxp/imx-jpeg/mxc-jpeg.h
+++ b/drivers/media/platform/nxp/imx-jpeg/mxc-jpeg.h
@@ -49,6 +49,7 @@ enum mxc_jpeg_mode {
* @h_align: horizontal alignment order (align to 2^h_align)
* @v_align: vertical alignment order (align to 2^v_align)
* @flags: flags describing format applicability
+ * @precision: jpeg sample precision
*/
struct mxc_jpeg_fmt {
const char *name;
@@ -60,6 +61,7 @@ struct mxc_jpeg_fmt {
int h_align;
int v_align;
u32 flags;
+ u8 precision;
};

struct mxc_jpeg_desc {
@@ -90,9 +92,9 @@ struct mxc_jpeg_ctx {
struct mxc_jpeg_q_data cap_q;
struct v4l2_fh fh;
enum mxc_jpeg_enc_state enc_state;
- unsigned int stopping;
- unsigned int stopped;
unsigned int slot;
+ unsigned int source_change;
+ bool header_parsed;
};

struct mxc_jpeg_slot_data {
diff --git a/drivers/media/platform/qcom/camss/camss-csid.c b/drivers/media/platform/qcom/camss/camss-csid.c
index f993f349b66b..80628801cf09 100644
--- a/drivers/media/platform/qcom/camss/camss-csid.c
+++ b/drivers/media/platform/qcom/camss/camss-csid.c
@@ -666,7 +666,7 @@ int msm_csid_subdev_init(struct camss *camss, struct csid_device *csid,
if (csid->num_supplies) {
csid->supplies = devm_kmalloc_array(camss->dev,
csid->num_supplies,
- sizeof(csid->supplies),
+ sizeof(*csid->supplies),
GFP_KERNEL);
if (!csid->supplies)
return -ENOMEM;
diff --git a/drivers/media/platform/renesas/rcar-vin/rcar-core.c b/drivers/media/platform/renesas/rcar-vin/rcar-core.c
index 64cb05b3907c..ed3ddf36f906 100644
--- a/drivers/media/platform/renesas/rcar-vin/rcar-core.c
+++ b/drivers/media/platform/renesas/rcar-vin/rcar-core.c
@@ -1264,7 +1264,7 @@ static const struct rvin_info rcar_info_r8a77980 = {
};

static const struct rvin_group_route rcar_info_r8a77990_routes[] = {
- { .master = 0, .csi = RVIN_CSI40, .chsel = 0x03 },
+ { .master = 4, .csi = RVIN_CSI40, .chsel = 0x03 },
{ /* Sentinel */ }
};

diff --git a/drivers/media/usb/hdpvr/hdpvr-video.c b/drivers/media/usb/hdpvr/hdpvr-video.c
index 60e57e0f1927..fd7d2a9d0449 100644
--- a/drivers/media/usb/hdpvr/hdpvr-video.c
+++ b/drivers/media/usb/hdpvr/hdpvr-video.c
@@ -409,7 +409,7 @@ static ssize_t hdpvr_read(struct file *file, char __user *buffer, size_t count,
struct hdpvr_device *dev = video_drvdata(file);
struct hdpvr_buffer *buf = NULL;
struct urb *urb;
- unsigned int ret = 0;
+ int ret = 0;
int rem, cnt;

if (*pos)
diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
index 675e22895ebe..094e1815209b 100644
--- a/drivers/media/v4l2-core/v4l2-mem2mem.c
+++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
@@ -924,7 +924,7 @@ static __poll_t v4l2_m2m_poll_for_data(struct file *file,
if ((!src_q->streaming || src_q->error ||
list_empty(&src_q->queued_list)) &&
(!dst_q->streaming || dst_q->error ||
- list_empty(&dst_q->queued_list)))
+ (list_empty(&dst_q->queued_list) && !dst_q->last_buffer_dequeued)))
return EPOLLERR;

spin_lock_irqsave(&src_q->done_lock, flags);
diff --git a/drivers/memstick/core/ms_block.c b/drivers/memstick/core/ms_block.c
index 3993bdd4b519..f8fdf88fb240 100644
--- a/drivers/memstick/core/ms_block.c
+++ b/drivers/memstick/core/ms_block.c
@@ -1341,17 +1341,17 @@ static int msb_ftl_initialize(struct msb_data *msb)
msb->zone_count = msb->block_count / MS_BLOCKS_IN_ZONE;
msb->logical_block_count = msb->zone_count * 496 - 2;

- msb->used_blocks_bitmap = kzalloc(msb->block_count / 8, GFP_KERNEL);
- msb->erased_blocks_bitmap = kzalloc(msb->block_count / 8, GFP_KERNEL);
+ msb->used_blocks_bitmap = bitmap_zalloc(msb->block_count, GFP_KERNEL);
+ msb->erased_blocks_bitmap = bitmap_zalloc(msb->block_count, GFP_KERNEL);
msb->lba_to_pba_table =
kmalloc_array(msb->logical_block_count, sizeof(u16),
GFP_KERNEL);

if (!msb->used_blocks_bitmap || !msb->lba_to_pba_table ||
!msb->erased_blocks_bitmap) {
- kfree(msb->used_blocks_bitmap);
+ bitmap_free(msb->used_blocks_bitmap);
+ bitmap_free(msb->erased_blocks_bitmap);
kfree(msb->lba_to_pba_table);
- kfree(msb->erased_blocks_bitmap);
return -ENOMEM;
}

@@ -1946,7 +1946,8 @@ static DEFINE_MUTEX(msb_disk_lock); /* protects against races in open/release */
static void msb_data_clear(struct msb_data *msb)
{
kfree(msb->boot_page);
- kfree(msb->used_blocks_bitmap);
+ bitmap_free(msb->used_blocks_bitmap);
+ bitmap_free(msb->erased_blocks_bitmap);
kfree(msb->lba_to_pba_table);
kfree(msb->cache);
msb->card = NULL;
diff --git a/drivers/mfd/max77620.c b/drivers/mfd/max77620.c
index fec2096474ad..a6661e07035b 100644
--- a/drivers/mfd/max77620.c
+++ b/drivers/mfd/max77620.c
@@ -419,9 +419,11 @@ static int max77620_initialise_fps(struct max77620_chip *chip)
ret = max77620_config_fps(chip, fps_child);
if (ret < 0) {
of_node_put(fps_child);
+ of_node_put(fps_np);
return ret;
}
}
+ of_node_put(fps_np);

config = chip->enable_global_lpm ? MAX77620_ONOFFCNFG2_SLP_LPM_MSK : 0;
ret = regmap_update_bits(chip->rmap, MAX77620_REG_ONOFFCNFG2,
diff --git a/drivers/mfd/t7l66xb.c b/drivers/mfd/t7l66xb.c
index 5369c67e3280..663ffd4b8570 100644
--- a/drivers/mfd/t7l66xb.c
+++ b/drivers/mfd/t7l66xb.c
@@ -397,11 +397,8 @@ static int t7l66xb_probe(struct platform_device *dev)

static int t7l66xb_remove(struct platform_device *dev)
{
- struct t7l66xb_platform_data *pdata = dev_get_platdata(&dev->dev);
struct t7l66xb *t7l66xb = platform_get_drvdata(dev);
- int ret;

- ret = pdata->disable(dev);
clk_disable_unprepare(t7l66xb->clk48m);
clk_put(t7l66xb->clk48m);
clk_disable_unprepare(t7l66xb->clk32k);
@@ -412,8 +409,7 @@ static int t7l66xb_remove(struct platform_device *dev)
mfd_remove_devices(&dev->dev);
kfree(t7l66xb);

- return ret;
-
+ return 0;
}

static struct platform_driver t7l66xb_platform_driver = {
diff --git a/drivers/misc/cardreader/rtsx_pcr.c b/drivers/misc/cardreader/rtsx_pcr.c
index 2a2619e3c72c..f001d99bf366 100644
--- a/drivers/misc/cardreader/rtsx_pcr.c
+++ b/drivers/misc/cardreader/rtsx_pcr.c
@@ -1507,7 +1507,7 @@ static int rtsx_pci_probe(struct pci_dev *pcidev,
pcr->remap_addr = ioremap(base, len);
if (!pcr->remap_addr) {
ret = -ENOMEM;
- goto free_handle;
+ goto free_idr;
}

pcr->rtsx_resv_buf = dma_alloc_coherent(&(pcidev->dev),
@@ -1570,6 +1570,10 @@ static int rtsx_pci_probe(struct pci_dev *pcidev,
pcr->rtsx_resv_buf, pcr->rtsx_resv_buf_addr);
unmap:
iounmap(pcr->remap_addr);
+free_idr:
+ spin_lock(&rtsx_pci_lock);
+ idr_remove(&rtsx_pci_idr, pcr->id);
+ spin_unlock(&rtsx_pci_lock);
free_handle:
kfree(handle);
free_pcr:
diff --git a/drivers/misc/eeprom/idt_89hpesx.c b/drivers/misc/eeprom/idt_89hpesx.c
index b0cff4b152da..7f430742ce2b 100644
--- a/drivers/misc/eeprom/idt_89hpesx.c
+++ b/drivers/misc/eeprom/idt_89hpesx.c
@@ -909,14 +909,18 @@ static ssize_t idt_dbgfs_csr_write(struct file *filep, const char __user *ubuf,
u32 csraddr, csrval;
char *buf;

+ if (*offp)
+ return 0;
+
/* Copy data from User-space */
buf = kmalloc(count + 1, GFP_KERNEL);
if (!buf)
return -ENOMEM;

- ret = simple_write_to_buffer(buf, count, offp, ubuf, count);
- if (ret < 0)
+ if (copy_from_user(buf, ubuf, count)) {
+ ret = -EFAULT;
goto free_buf;
+ }
buf[count] = 0;

/* Find position of colon in the buffer */
diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 23fcb7a16605..0b8d3d25ba7a 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -175,7 +175,7 @@ static inline int mmc_blk_part_switch(struct mmc_card *card,
unsigned int part_type);
static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
struct mmc_card *card,
- int disable_multi,
+ int recovery_mode,
struct mmc_queue *mq);
static void mmc_blk_hsq_req_done(struct mmc_request *mrq);

@@ -1285,7 +1285,7 @@ static void mmc_blk_eval_resp_error(struct mmc_blk_request *brq)
}

static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq,
- int disable_multi, bool *do_rel_wr_p,
+ int recovery_mode, bool *do_rel_wr_p,
bool *do_data_tag_p)
{
struct mmc_blk_data *md = mq->blkdata;
@@ -1351,12 +1351,12 @@ static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq,
brq->data.blocks--;

/*
- * After a read error, we redo the request one sector
+ * After a read error, we redo the request one (native) sector
* at a time in order to accurately determine which
* sectors can be read successfully.
*/
- if (disable_multi)
- brq->data.blocks = 1;
+ if (recovery_mode)
+ brq->data.blocks = queue_physical_block_size(mq->queue) >> 9;

/*
* Some controllers have HW issues while operating
@@ -1573,7 +1573,7 @@ static int mmc_blk_cqe_issue_rw_rq(struct mmc_queue *mq, struct request *req)

static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
struct mmc_card *card,
- int disable_multi,
+ int recovery_mode,
struct mmc_queue *mq)
{
u32 readcmd, writecmd;
@@ -1582,7 +1582,7 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
struct mmc_blk_data *md = mq->blkdata;
bool do_rel_wr, do_data_tag;

- mmc_blk_data_prep(mq, mqrq, disable_multi, &do_rel_wr, &do_data_tag);
+ mmc_blk_data_prep(mq, mqrq, recovery_mode, &do_rel_wr, &do_data_tag);

brq->mrq.cmd = &brq->cmd;

@@ -1673,7 +1673,7 @@ static int mmc_blk_fix_state(struct mmc_card *card, struct request *req)

#define MMC_READ_SINGLE_RETRIES 2

-/* Single sector read during recovery */
+/* Single (native) sector read during recovery */
static void mmc_blk_read_single(struct mmc_queue *mq, struct request *req)
{
struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
@@ -1681,6 +1681,7 @@ static void mmc_blk_read_single(struct mmc_queue *mq, struct request *req)
struct mmc_card *card = mq->card;
struct mmc_host *host = card->host;
blk_status_t error = BLK_STS_OK;
+ size_t bytes_per_read = queue_physical_block_size(mq->queue);

do {
u32 status;
@@ -1715,13 +1716,13 @@ static void mmc_blk_read_single(struct mmc_queue *mq, struct request *req)
else
error = BLK_STS_OK;

- } while (blk_update_request(req, error, 512));
+ } while (blk_update_request(req, error, bytes_per_read));

return;

error_exit:
mrq->data->bytes_xfered = 0;
- blk_update_request(req, BLK_STS_IOERR, 512);
+ blk_update_request(req, BLK_STS_IOERR, bytes_per_read);
/* Let it try the remaining request again */
if (mqrq->retries > MMC_MAX_RETRIES - 1)
mqrq->retries = MMC_MAX_RETRIES - 1;
@@ -1862,10 +1863,9 @@ static void mmc_blk_mq_rw_recovery(struct mmc_queue *mq, struct request *req)
return;
}

- /* FIXME: Missing single sector read for large sector size */
- if (!mmc_large_sector(card) && rq_data_dir(req) == READ &&
- brq->data.blocks > 1) {
- /* Read one sector at a time */
+ if (rq_data_dir(req) == READ && brq->data.blocks >
+ queue_physical_block_size(mq->queue) >> 9) {
+ /* Read one (native) sector at a time */
mmc_blk_read_single(mq, req);
return;
}
diff --git a/drivers/mmc/core/quirks.h b/drivers/mmc/core/quirks.h
index f879dc63d936..be4393988086 100644
--- a/drivers/mmc/core/quirks.h
+++ b/drivers/mmc/core/quirks.h
@@ -163,8 +163,10 @@ static inline bool mmc_fixup_of_compatible_match(struct mmc_card *card,
struct device_node *np;

for_each_child_of_node(mmc_dev(card->host)->of_node, np) {
- if (of_device_is_compatible(np, compatible))
+ if (of_device_is_compatible(np, compatible)) {
+ of_node_put(np);
return true;
+ }
}

return false;
diff --git a/drivers/mmc/host/cavium-octeon.c b/drivers/mmc/host/cavium-octeon.c
index 2c4b2df52adb..12dca91a8ef6 100644
--- a/drivers/mmc/host/cavium-octeon.c
+++ b/drivers/mmc/host/cavium-octeon.c
@@ -277,6 +277,7 @@ static int octeon_mmc_probe(struct platform_device *pdev)
if (ret) {
dev_err(&pdev->dev, "Error populating slots\n");
octeon_mmc_set_shared_power(host, 0);
+ of_node_put(cn);
goto error;
}
i++;
diff --git a/drivers/mmc/host/cavium-thunderx.c b/drivers/mmc/host/cavium-thunderx.c
index 76013bbbcff3..202b1d6da678 100644
--- a/drivers/mmc/host/cavium-thunderx.c
+++ b/drivers/mmc/host/cavium-thunderx.c
@@ -142,8 +142,10 @@ static int thunder_mmc_probe(struct pci_dev *pdev,
continue;

ret = cvm_mmc_of_slot_probe(&host->slot_pdev[i]->dev, host);
- if (ret)
+ if (ret) {
+ of_node_put(child_node);
goto error;
+ }
}
i++;
}
diff --git a/drivers/mmc/host/mxcmmc.c b/drivers/mmc/host/mxcmmc.c
index 40b6878bea6c..b5940083a108 100644
--- a/drivers/mmc/host/mxcmmc.c
+++ b/drivers/mmc/host/mxcmmc.c
@@ -1025,7 +1025,7 @@ static int mxcmci_probe(struct platform_device *pdev)
mmc->max_req_size = mmc->max_blk_size * mmc->max_blk_count;
mmc->max_seg_size = mmc->max_req_size;

- host->devtype = (enum mxcmci_type)of_device_get_match_data(&pdev->dev);
+ host->devtype = (uintptr_t)of_device_get_match_data(&pdev->dev);

/* adjust max_segs after devtype detection */
if (!is_mpc512x_mmc(host))
diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c
index ddb5ca2f559e..4fb49306275e 100644
--- a/drivers/mmc/host/renesas_sdhi_core.c
+++ b/drivers/mmc/host/renesas_sdhi_core.c
@@ -940,6 +940,10 @@ int renesas_sdhi_probe(struct platform_device *pdev,
if (IS_ERR(priv->clk_cd))
return dev_err_probe(&pdev->dev, PTR_ERR(priv->clk_cd), "cannot get cd clock");

+ priv->rstc = devm_reset_control_get_optional_exclusive(&pdev->dev, NULL);
+ if (IS_ERR(priv->rstc))
+ return PTR_ERR(priv->rstc);
+
priv->pinctrl = devm_pinctrl_get(&pdev->dev);
if (!IS_ERR(priv->pinctrl)) {
priv->pins_default = pinctrl_lookup_state(priv->pinctrl,
@@ -1032,10 +1036,6 @@ int renesas_sdhi_probe(struct platform_device *pdev,
if (ret)
goto efree;

- priv->rstc = devm_reset_control_get_optional_exclusive(&pdev->dev, NULL);
- if (IS_ERR(priv->rstc))
- return PTR_ERR(priv->rstc);
-
ver = sd_ctrl_read16(host, CTL_VERSION);
/* GEN2_SDR104 is first known SDHI to use 32bit block count */
if (ver < SDHI_VER_GEN2_SDR104 && mmc_data->max_blk_count > U16_MAX)
diff --git a/drivers/mmc/host/sdhci-of-at91.c b/drivers/mmc/host/sdhci-of-at91.c
index 10fb4cb2c731..cd0134580a90 100644
--- a/drivers/mmc/host/sdhci-of-at91.c
+++ b/drivers/mmc/host/sdhci-of-at91.c
@@ -100,8 +100,13 @@ static void sdhci_at91_set_clock(struct sdhci_host *host, unsigned int clock)
static void sdhci_at91_set_uhs_signaling(struct sdhci_host *host,
unsigned int timing)
{
- if (timing == MMC_TIMING_MMC_DDR52)
- sdhci_writeb(host, SDMMC_MC1R_DDR, SDMMC_MC1R);
+ u8 mc1r;
+
+ if (timing == MMC_TIMING_MMC_DDR52) {
+ mc1r = sdhci_readb(host, SDMMC_MC1R);
+ mc1r |= SDMMC_MC1R_DDR;
+ sdhci_writeb(host, mc1r, SDMMC_MC1R);
+ }
sdhci_set_uhs_signaling(host, timing);
}

diff --git a/drivers/mmc/host/sdhci-of-esdhc.c b/drivers/mmc/host/sdhci-of-esdhc.c
index d9dc41143bb3..8b3d8119f388 100644
--- a/drivers/mmc/host/sdhci-of-esdhc.c
+++ b/drivers/mmc/host/sdhci-of-esdhc.c
@@ -904,6 +904,7 @@ static int esdhc_signal_voltage_switch(struct mmc_host *mmc,
scfg_node = of_find_matching_node(NULL, scfg_device_ids);
if (scfg_node)
scfg_base = of_iomap(scfg_node, 0);
+ of_node_put(scfg_node);
if (scfg_base) {
sdhciovselcr = SDHCIOVSELCR_TGLEN |
SDHCIOVSELCR_VSELVAL;
diff --git a/drivers/mtd/devices/mtd_dataflash.c b/drivers/mtd/devices/mtd_dataflash.c
index 134e27328597..25bad4318305 100644
--- a/drivers/mtd/devices/mtd_dataflash.c
+++ b/drivers/mtd/devices/mtd_dataflash.c
@@ -112,6 +112,13 @@ static const struct of_device_id dataflash_dt_ids[] = {
MODULE_DEVICE_TABLE(of, dataflash_dt_ids);
#endif

+static const struct spi_device_id dataflash_spi_ids[] = {
+ { .name = "at45", },
+ { .name = "dataflash", },
+ { /* sentinel */ }
+};
+MODULE_DEVICE_TABLE(spi, dataflash_spi_ids);
+
/* ......................................................................... */

/*
@@ -936,6 +943,7 @@ static struct spi_driver dataflash_driver = {

.probe = dataflash_probe,
.remove = dataflash_remove,
+ .id_table = dataflash_spi_ids,

/* FIXME: investigate suspend and resume... */
};
diff --git a/drivers/mtd/devices/st_spi_fsm.c b/drivers/mtd/devices/st_spi_fsm.c
index 983999c020d6..48bda2dd1bb5 100644
--- a/drivers/mtd/devices/st_spi_fsm.c
+++ b/drivers/mtd/devices/st_spi_fsm.c
@@ -2115,10 +2115,12 @@ static int stfsm_probe(struct platform_device *pdev)
(long long)fsm->mtd.size, (long long)(fsm->mtd.size >> 20),
fsm->mtd.erasesize, (fsm->mtd.erasesize >> 10));

- return mtd_device_register(&fsm->mtd, NULL, 0);
-
+ ret = mtd_device_register(&fsm->mtd, NULL, 0);
+ if (ret) {
err_clk_unprepare:
- clk_disable_unprepare(fsm->clk);
+ clk_disable_unprepare(fsm->clk);
+ }
+
return ret;
}

diff --git a/drivers/mtd/hyperbus/rpc-if.c b/drivers/mtd/hyperbus/rpc-if.c
index 6e08ec1d4f09..b70d259e48a7 100644
--- a/drivers/mtd/hyperbus/rpc-if.c
+++ b/drivers/mtd/hyperbus/rpc-if.c
@@ -134,7 +134,7 @@ static int rpcif_hb_probe(struct platform_device *pdev)

error = rpcif_hw_init(&hyperbus->rpc, true);
if (error)
- return error;
+ goto out_disable_rpm;

hyperbus->hbdev.map.size = hyperbus->rpc.size;
hyperbus->hbdev.map.virt = hyperbus->rpc.dirmap;
@@ -145,8 +145,12 @@ static int rpcif_hb_probe(struct platform_device *pdev)
hyperbus->hbdev.np = of_get_next_child(pdev->dev.parent->of_node, NULL);
error = hyperbus_register_device(&hyperbus->hbdev);
if (error)
- rpcif_disable_rpm(&hyperbus->rpc);
+ goto out_disable_rpm;
+
+ return 0;

+out_disable_rpm:
+ rpcif_disable_rpm(&hyperbus->rpc);
return error;
}

diff --git a/drivers/mtd/maps/physmap-versatile.c b/drivers/mtd/maps/physmap-versatile.c
index ad7cd9cfaee0..a1b8b7b25f88 100644
--- a/drivers/mtd/maps/physmap-versatile.c
+++ b/drivers/mtd/maps/physmap-versatile.c
@@ -93,6 +93,7 @@ static int ap_flash_init(struct platform_device *pdev)
return -ENODEV;
}
ebi_base = of_iomap(ebi, 0);
+ of_node_put(ebi);
if (!ebi_base)
return -ENODEV;

@@ -207,6 +208,7 @@ int of_flash_probe_versatile(struct platform_device *pdev,

versatile_flashprot = (enum versatile_flashprot)devid->data;
rmap = syscon_node_to_regmap(sysnp);
+ of_node_put(sysnp);
if (IS_ERR(rmap))
return PTR_ERR(rmap);

diff --git a/drivers/mtd/nand/raw/arasan-nand-controller.c b/drivers/mtd/nand/raw/arasan-nand-controller.c
index 53bd10738418..296fb16c8dc3 100644
--- a/drivers/mtd/nand/raw/arasan-nand-controller.c
+++ b/drivers/mtd/nand/raw/arasan-nand-controller.c
@@ -347,17 +347,17 @@ static int anfc_select_target(struct nand_chip *chip, int target)

/* Update clock frequency */
if (nfc->cur_clk != anand->clk) {
- clk_disable_unprepare(nfc->controller_clk);
- ret = clk_set_rate(nfc->controller_clk, anand->clk);
+ clk_disable_unprepare(nfc->bus_clk);
+ ret = clk_set_rate(nfc->bus_clk, anand->clk);
if (ret) {
dev_err(nfc->dev, "Failed to change clock rate\n");
return ret;
}

- ret = clk_prepare_enable(nfc->controller_clk);
+ ret = clk_prepare_enable(nfc->bus_clk);
if (ret) {
dev_err(nfc->dev,
- "Failed to re-enable the controller clock\n");
+ "Failed to re-enable the bus clock\n");
return ret;
}

@@ -1043,7 +1043,13 @@ static int anfc_setup_interface(struct nand_chip *chip, int target,
DQS_BUFF_SEL_OUT(dqs_mode);
}

- anand->clk = ANFC_XLNX_SDR_DFLT_CORE_CLK;
+ if (nand_interface_is_sdr(conf)) {
+ anand->clk = ANFC_XLNX_SDR_DFLT_CORE_CLK;
+ } else {
+ /* ONFI timings are defined in picoseconds */
+ anand->clk = div_u64((u64)NSEC_PER_SEC * 1000,
+ conf->timings.nvddr.tCK_min);
+ }

/*
* Due to a hardware bug in the ZynqMP SoC, SDR timing modes 0-1 work
diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c
index ac3be92872d0..032180183339 100644
--- a/drivers/mtd/nand/raw/meson_nand.c
+++ b/drivers/mtd/nand/raw/meson_nand.c
@@ -1307,7 +1307,6 @@ static int meson_nfc_nand_chip_cleanup(struct meson_nfc *nfc)
if (ret)
return ret;

- meson_nfc_free_buffer(&meson_chip->nand);
nand_cleanup(&meson_chip->nand);
list_del(&meson_chip->node);
}
diff --git a/drivers/mtd/parsers/ofpart_bcm4908.c b/drivers/mtd/parsers/ofpart_bcm4908.c
index 0eddef4c198e..bb072a0940e4 100644
--- a/drivers/mtd/parsers/ofpart_bcm4908.c
+++ b/drivers/mtd/parsers/ofpart_bcm4908.c
@@ -35,12 +35,15 @@ static long long bcm4908_partitions_fw_offset(void)
err = kstrtoul(s + len + 1, 0, &offset);
if (err) {
pr_err("failed to parse %s\n", s + len + 1);
+ of_node_put(root);
return err;
}

+ of_node_put(root);
return offset << 10;
}

+ of_node_put(root);
return -ENOENT;
}

diff --git a/drivers/mtd/parsers/redboot.c b/drivers/mtd/parsers/redboot.c
index feb44a573d44..a16b42a88581 100644
--- a/drivers/mtd/parsers/redboot.c
+++ b/drivers/mtd/parsers/redboot.c
@@ -58,6 +58,7 @@ static void parse_redboot_of(struct mtd_info *master)
return;

ret = of_property_read_u32(npart, "fis-index-block", &dirblock);
+ of_node_put(npart);
if (ret)
return;

diff --git a/drivers/mtd/sm_ftl.c b/drivers/mtd/sm_ftl.c
index 0cff2cda1b5a..7f955fade838 100644
--- a/drivers/mtd/sm_ftl.c
+++ b/drivers/mtd/sm_ftl.c
@@ -1111,9 +1111,9 @@ static void sm_release(struct mtd_blktrans_dev *dev)
{
struct sm_ftl *ftl = dev->priv;

- mutex_lock(&ftl->mutex);
del_timer_sync(&ftl->timer);
cancel_work_sync(&ftl->flush_work);
+ mutex_lock(&ftl->mutex);
sm_cache_flush(ftl);
mutex_unlock(&ftl->mutex);
}
diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
index c1630131c734..170182eb431e 100644
--- a/drivers/mtd/spi-nor/core.c
+++ b/drivers/mtd/spi-nor/core.c
@@ -177,7 +177,7 @@ int spi_nor_controller_ops_write_reg(struct spi_nor *nor, u8 opcode,

static int spi_nor_controller_ops_erase(struct spi_nor *nor, loff_t offs)
{
- if (spi_nor_protocol_is_dtr(nor->write_proto))
+ if (spi_nor_protocol_is_dtr(nor->reg_proto))
return -EOPNOTSUPP;

return nor->controller_ops->erase(nor, offs);
@@ -976,7 +976,7 @@ static int spi_nor_erase_chip(struct spi_nor *nor)
SPI_MEM_OP_NO_DUMMY,
SPI_MEM_OP_NO_DATA);

- spi_nor_spimem_setup_op(nor, &op, nor->write_proto);
+ spi_nor_spimem_setup_op(nor, &op, nor->reg_proto);

ret = spi_mem_exec_op(nor->spimem, &op);
} else {
@@ -1121,7 +1121,7 @@ int spi_nor_erase_sector(struct spi_nor *nor, u32 addr)
SPI_MEM_OP_NO_DUMMY,
SPI_MEM_OP_NO_DATA);

- spi_nor_spimem_setup_op(nor, &op, nor->write_proto);
+ spi_nor_spimem_setup_op(nor, &op, nor->reg_proto);

return spi_mem_exec_op(nor->spimem, &op);
} else if (nor->controller_ops->erase) {
diff --git a/drivers/net/can/dev/netlink.c b/drivers/net/can/dev/netlink.c
index 7633d98e3912..037824011266 100644
--- a/drivers/net/can/dev/netlink.c
+++ b/drivers/net/can/dev/netlink.c
@@ -176,7 +176,8 @@ static int can_changelink(struct net_device *dev, struct nlattr *tb[],
* directly via do_set_bitrate(). Bail out if neither
* is given.
*/
- if (!priv->bittiming_const && !priv->do_set_bittiming)
+ if (!priv->bittiming_const && !priv->do_set_bittiming &&
+ !priv->bitrate_const)
return -EOPNOTSUPP;

memcpy(&bt, nla_data(data[IFLA_CAN_BITTIMING]), sizeof(bt));
@@ -278,7 +279,8 @@ static int can_changelink(struct net_device *dev, struct nlattr *tb[],
* directly via do_set_bitrate(). Bail out if neither
* is given.
*/
- if (!priv->data_bittiming_const && !priv->do_set_data_bittiming)
+ if (!priv->data_bittiming_const && !priv->do_set_data_bittiming &&
+ !priv->data_bitrate_const)
return -EOPNOTSUPP;

memcpy(&dbt, nla_data(data[IFLA_CAN_DATA_BITTIMING]),
diff --git a/drivers/net/can/pch_can.c b/drivers/net/can/pch_can.c
index 888bef03de09..17f8d67ddb18 100644
--- a/drivers/net/can/pch_can.c
+++ b/drivers/net/can/pch_can.c
@@ -489,6 +489,7 @@ static void pch_can_error(struct net_device *ndev, u32 status)
if (!skb)
return;

+ errc = ioread32(&priv->regs->errc);
if (status & PCH_BUS_OFF) {
pch_can_set_tx_all(priv, 0);
pch_can_set_rx_all(priv, 0);
@@ -496,9 +497,11 @@ static void pch_can_error(struct net_device *ndev, u32 status)
cf->can_id |= CAN_ERR_BUSOFF;
priv->can.can_stats.bus_off++;
can_bus_off(ndev);
+ } else {
+ cf->data[6] = errc & PCH_TEC;
+ cf->data[7] = (errc & PCH_REC) >> 8;
}

- errc = ioread32(&priv->regs->errc);
/* Warning interrupt. */
if (status & PCH_EWARN) {
state = CAN_STATE_ERROR_WARNING;
@@ -556,9 +559,6 @@ static void pch_can_error(struct net_device *ndev, u32 status)
break;
}

- cf->data[6] = errc & PCH_TEC;
- cf->data[7] = (errc & PCH_REC) >> 8;
-
priv->can.state = state;
netif_receive_skb(skb);
}
diff --git a/drivers/net/can/rcar/rcar_can.c b/drivers/net/can/rcar/rcar_can.c
index 33e37395379d..4ca5b09f1e5d 100644
--- a/drivers/net/can/rcar/rcar_can.c
+++ b/drivers/net/can/rcar/rcar_can.c
@@ -233,11 +233,8 @@ static void rcar_can_error(struct net_device *ndev)
if (eifr & (RCAR_CAN_EIFR_EWIF | RCAR_CAN_EIFR_EPIF)) {
txerr = readb(&priv->regs->tecr);
rxerr = readb(&priv->regs->recr);
- if (skb) {
+ if (skb)
cf->can_id |= CAN_ERR_CRTL;
- cf->data[6] = txerr;
- cf->data[7] = rxerr;
- }
}
if (eifr & RCAR_CAN_EIFR_BEIF) {
int rx_errors = 0, tx_errors = 0;
@@ -337,6 +334,9 @@ static void rcar_can_error(struct net_device *ndev)
can_bus_off(ndev);
if (skb)
cf->can_id |= CAN_ERR_BUSOFF;
+ } else if (skb) {
+ cf->data[6] = txerr;
+ cf->data[7] = rxerr;
}
if (eifr & RCAR_CAN_EIFR_ORIF) {
netdev_dbg(priv->ndev, "Receive overrun error interrupt\n");
diff --git a/drivers/net/can/sja1000/sja1000.c b/drivers/net/can/sja1000/sja1000.c
index 966316479485..366a369080ef 100644
--- a/drivers/net/can/sja1000/sja1000.c
+++ b/drivers/net/can/sja1000/sja1000.c
@@ -405,9 +405,6 @@ static int sja1000_err(struct net_device *dev, uint8_t isrc, uint8_t status)
txerr = priv->read_reg(priv, SJA1000_TXERR);
rxerr = priv->read_reg(priv, SJA1000_RXERR);

- cf->data[6] = txerr;
- cf->data[7] = rxerr;
-
if (isrc & IRQ_DOI) {
/* data overrun interrupt */
netdev_dbg(dev, "data overrun interrupt\n");
@@ -429,6 +426,10 @@ static int sja1000_err(struct net_device *dev, uint8_t isrc, uint8_t status)
else
state = CAN_STATE_ERROR_ACTIVE;
}
+ if (state != CAN_STATE_BUS_OFF) {
+ cf->data[6] = txerr;
+ cf->data[7] = rxerr;
+ }
if (isrc & IRQ_BEI) {
/* bus error interrupt */
priv->can.can_stats.bus_error++;
diff --git a/drivers/net/can/spi/hi311x.c b/drivers/net/can/spi/hi311x.c
index a5b2952b8d0f..7dc88a9eca0f 100644
--- a/drivers/net/can/spi/hi311x.c
+++ b/drivers/net/can/spi/hi311x.c
@@ -672,8 +672,6 @@ static irqreturn_t hi3110_can_ist(int irq, void *dev_id)

txerr = hi3110_read(spi, HI3110_READ_TEC);
rxerr = hi3110_read(spi, HI3110_READ_REC);
- cf->data[6] = txerr;
- cf->data[7] = rxerr;
tx_state = txerr >= rxerr ? new_state : 0;
rx_state = txerr <= rxerr ? new_state : 0;
can_change_state(net, cf, tx_state, rx_state);
@@ -686,6 +684,9 @@ static irqreturn_t hi3110_can_ist(int irq, void *dev_id)
hi3110_hw_sleep(spi);
break;
}
+ } else {
+ cf->data[6] = txerr;
+ cf->data[7] = rxerr;
}
}

diff --git a/drivers/net/can/sun4i_can.c b/drivers/net/can/sun4i_can.c
index 25d6d81ab4f4..f3ad585e766f 100644
--- a/drivers/net/can/sun4i_can.c
+++ b/drivers/net/can/sun4i_can.c
@@ -538,11 +538,6 @@ static int sun4i_can_err(struct net_device *dev, u8 isrc, u8 status)
rxerr = (errc >> 16) & 0xFF;
txerr = errc & 0xFF;

- if (skb) {
- cf->data[6] = txerr;
- cf->data[7] = rxerr;
- }
-
if (isrc & SUN4I_INT_DATA_OR) {
/* data overrun interrupt */
netdev_dbg(dev, "data overrun interrupt\n");
@@ -573,6 +568,10 @@ static int sun4i_can_err(struct net_device *dev, u8 isrc, u8 status)
else
state = CAN_STATE_ERROR_ACTIVE;
}
+ if (skb && state != CAN_STATE_BUS_OFF) {
+ cf->data[6] = txerr;
+ cf->data[7] = rxerr;
+ }
if (isrc & SUN4I_INT_BUS_ERR) {
/* bus error interrupt */
netdev_dbg(dev, "bus error interrupt\n");
diff --git a/drivers/net/can/usb/kvaser_usb/kvaser_usb_hydra.c b/drivers/net/can/usb/kvaser_usb/kvaser_usb_hydra.c
index 5d70844ac030..404093468b2f 100644
--- a/drivers/net/can/usb/kvaser_usb/kvaser_usb_hydra.c
+++ b/drivers/net/can/usb/kvaser_usb/kvaser_usb_hydra.c
@@ -917,8 +917,10 @@ static void kvaser_usb_hydra_update_state(struct kvaser_usb_net_priv *priv,
new_state < CAN_STATE_BUS_OFF)
priv->can.can_stats.restarts++;

- cf->data[6] = bec->txerr;
- cf->data[7] = bec->rxerr;
+ if (new_state != CAN_STATE_BUS_OFF) {
+ cf->data[6] = bec->txerr;
+ cf->data[7] = bec->rxerr;
+ }

netif_rx(skb);
}
@@ -1069,8 +1071,10 @@ kvaser_usb_hydra_error_frame(struct kvaser_usb_net_priv *priv,
shhwtstamps->hwtstamp = hwtstamp;

cf->can_id |= CAN_ERR_BUSERROR;
- cf->data[6] = bec.txerr;
- cf->data[7] = bec.rxerr;
+ if (new_state != CAN_STATE_BUS_OFF) {
+ cf->data[6] = bec.txerr;
+ cf->data[7] = bec.rxerr;
+ }

netif_rx(skb);

diff --git a/drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c b/drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c
index cc809ecd1e62..f551fde16a70 100644
--- a/drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c
+++ b/drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c
@@ -853,8 +853,10 @@ static void kvaser_usb_leaf_rx_error(const struct kvaser_usb *dev,
break;
}

- cf->data[6] = es->txerr;
- cf->data[7] = es->rxerr;
+ if (new_state != CAN_STATE_BUS_OFF) {
+ cf->data[6] = es->txerr;
+ cf->data[7] = es->rxerr;
+ }

netif_rx(skb);
}
diff --git a/drivers/net/can/usb/usb_8dev.c b/drivers/net/can/usb/usb_8dev.c
index b638604bf1ee..ea63dc687fdb 100644
--- a/drivers/net/can/usb/usb_8dev.c
+++ b/drivers/net/can/usb/usb_8dev.c
@@ -439,9 +439,10 @@ static void usb_8dev_rx_err_msg(struct usb_8dev_priv *priv,

if (rx_errors)
stats->rx_errors++;
-
- cf->data[6] = txerr;
- cf->data[7] = rxerr;
+ if (priv->can.state != CAN_STATE_BUS_OFF) {
+ cf->data[6] = txerr;
+ cf->data[7] = rxerr;
+ }

priv->bec.txerr = txerr;
priv->bec.rxerr = rxerr;
diff --git a/drivers/net/dsa/ocelot/Kconfig b/drivers/net/dsa/ocelot/Kconfig
index 220b0b027b55..08db9cf76818 100644
--- a/drivers/net/dsa/ocelot/Kconfig
+++ b/drivers/net/dsa/ocelot/Kconfig
@@ -6,6 +6,7 @@ config NET_DSA_MSCC_FELIX
depends on NET_VENDOR_FREESCALE
depends on HAS_IOMEM
depends on PTP_1588_CLOCK_OPTIONAL
+ depends on NET_SCH_TAPRIO || NET_SCH_TAPRIO=n
select MSCC_OCELOT_SWITCH_LIB
select NET_DSA_TAG_OCELOT_8021Q
select NET_DSA_TAG_OCELOT
diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
index faccfb3f0158..d95c2c0a64bd 100644
--- a/drivers/net/dsa/ocelot/felix.c
+++ b/drivers/net/dsa/ocelot/felix.c
@@ -1613,9 +1613,18 @@ static void felix_txtstamp(struct dsa_switch *ds, int port,
static int felix_change_mtu(struct dsa_switch *ds, int port, int new_mtu)
{
struct ocelot *ocelot = ds->priv;
+ struct ocelot_port *ocelot_port = ocelot->ports[port];
+ struct felix *felix = ocelot_to_felix(ocelot);

ocelot_port_set_maxlen(ocelot, port, new_mtu);

+ mutex_lock(&ocelot->tas_lock);
+
+ if (ocelot_port->taprio && felix->info->tas_guard_bands_update)
+ felix->info->tas_guard_bands_update(ocelot, port);
+
+ mutex_unlock(&ocelot->tas_lock);
+
return 0;
}

diff --git a/drivers/net/dsa/ocelot/felix.h b/drivers/net/dsa/ocelot/felix.h
index f083b06fdfe9..b0e4635a8cc2 100644
--- a/drivers/net/dsa/ocelot/felix.h
+++ b/drivers/net/dsa/ocelot/felix.h
@@ -53,6 +53,7 @@ struct felix_info {
struct phylink_link_state *state);
int (*port_setup_tc)(struct dsa_switch *ds, int port,
enum tc_setup_type type, void *type_data);
+ void (*tas_guard_bands_update)(struct ocelot *ocelot, int port);
void (*port_sched_speed_set)(struct ocelot *ocelot, int port,
u32 speed);
struct regmap *(*init_regmap)(struct ocelot *ocelot,
diff --git a/drivers/net/dsa/ocelot/felix_vsc9959.c b/drivers/net/dsa/ocelot/felix_vsc9959.c
index 4a071f96ea28..d6e8549c12c4 100644
--- a/drivers/net/dsa/ocelot/felix_vsc9959.c
+++ b/drivers/net/dsa/ocelot/felix_vsc9959.c
@@ -1124,9 +1124,212 @@ static void vsc9959_mdio_bus_free(struct ocelot *ocelot)
mdiobus_free(felix->imdio);
}

+/* Extract shortest continuous gate open intervals in ns for each traffic class
+ * of a cyclic tc-taprio schedule. If a gate is always open, the duration is
+ * considered U64_MAX. If the gate is always closed, it is considered 0.
+ */
+static void vsc9959_tas_min_gate_lengths(struct tc_taprio_qopt_offload *taprio,
+ u64 min_gate_len[OCELOT_NUM_TC])
+{
+ struct tc_taprio_sched_entry *entry;
+ u64 gate_len[OCELOT_NUM_TC];
+ u8 gates_ever_opened = 0;
+ int tc, i, n;
+
+ /* Initialize arrays */
+ for (tc = 0; tc < OCELOT_NUM_TC; tc++) {
+ min_gate_len[tc] = U64_MAX;
+ gate_len[tc] = 0;
+ }
+
+ /* If we don't have taprio, consider all gates as permanently open */
+ if (!taprio)
+ return;
+
+ n = taprio->num_entries;
+
+ /* Walk through the gate list twice to determine the length
+ * of consecutively open gates for a traffic class, including
+ * open gates that wrap around. We are just interested in the
+ * minimum window size, and this doesn't change what the
+ * minimum is (if the gate never closes, min_gate_len will
+ * remain U64_MAX).
+ */
+ for (i = 0; i < 2 * n; i++) {
+ entry = &taprio->entries[i % n];
+
+ for (tc = 0; tc < OCELOT_NUM_TC; tc++) {
+ if (entry->gate_mask & BIT(tc)) {
+ gate_len[tc] += entry->interval;
+ gates_ever_opened |= BIT(tc);
+ } else {
+ /* Gate closes now, record a potential new
+ * minimum and reinitialize length
+ */
+ if (min_gate_len[tc] > gate_len[tc] &&
+ gate_len[tc])
+ min_gate_len[tc] = gate_len[tc];
+ gate_len[tc] = 0;
+ }
+ }
+ }
+
+ /* min_gate_len[tc] actually tracks minimum *open* gate time, so for
+ * permanently closed gates, min_gate_len[tc] will still be U64_MAX.
+ * Therefore they are currently indistinguishable from permanently
+ * open gates. Overwrite the gate len with 0 when we know they're
+ * actually permanently closed, i.e. after the loop above.
+ */
+ for (tc = 0; tc < OCELOT_NUM_TC; tc++)
+ if (!(gates_ever_opened & BIT(tc)))
+ min_gate_len[tc] = 0;
+}
+
+/* Update QSYS_PORT_MAX_SDU to make sure the static guard bands added by the
+ * switch (see the ALWAYS_GUARD_BAND_SCH_Q comment) are correct at all MTU
+ * values (the default value is 1518). Also, for traffic class windows smaller
+ * than one MTU sized frame, update QSYS_QMAXSDU_CFG to enable oversized frame
+ * dropping, such that these won't hang the port, as they will never be sent.
+ */
+static void vsc9959_tas_guard_bands_update(struct ocelot *ocelot, int port)
+{
+ struct ocelot_port *ocelot_port = ocelot->ports[port];
+ u64 min_gate_len[OCELOT_NUM_TC];
+ int speed, picos_per_byte;
+ u64 needed_bit_time_ps;
+ u32 val, maxlen;
+ u8 tas_speed;
+ int tc;
+
+ lockdep_assert_held(&ocelot->tas_lock);
+
+ val = ocelot_read_rix(ocelot, QSYS_TAG_CONFIG, port);
+ tas_speed = QSYS_TAG_CONFIG_LINK_SPEED_X(val);
+
+ switch (tas_speed) {
+ case OCELOT_SPEED_10:
+ speed = SPEED_10;
+ break;
+ case OCELOT_SPEED_100:
+ speed = SPEED_100;
+ break;
+ case OCELOT_SPEED_1000:
+ speed = SPEED_1000;
+ break;
+ case OCELOT_SPEED_2500:
+ speed = SPEED_2500;
+ break;
+ default:
+ return;
+ }
+
+ picos_per_byte = (USEC_PER_SEC * 8) / speed;
+
+ val = ocelot_port_readl(ocelot_port, DEV_MAC_MAXLEN_CFG);
+ /* MAXLEN_CFG accounts automatically for VLAN. We need to include it
+ * manually in the bit time calculation, plus the preamble and SFD.
+ */
+ maxlen = val + 2 * VLAN_HLEN;
+ /* Consider the standard Ethernet overhead of 8 octets preamble+SFD,
+ * 4 octets FCS, 12 octets IFG.
+ */
+ needed_bit_time_ps = (maxlen + 24) * picos_per_byte;
+
+ dev_dbg(ocelot->dev,
+ "port %d: max frame size %d needs %llu ps at speed %d\n",
+ port, maxlen, needed_bit_time_ps, speed);
+
+ vsc9959_tas_min_gate_lengths(ocelot_port->taprio, min_gate_len);
+
+ for (tc = 0; tc < OCELOT_NUM_TC; tc++) {
+ u32 max_sdu;
+
+ if (min_gate_len[tc] == U64_MAX /* Gate always open */ ||
+ min_gate_len[tc] * 1000 > needed_bit_time_ps) {
+ /* Setting QMAXSDU_CFG to 0 disables oversized frame
+ * dropping.
+ */
+ max_sdu = 0;
+ dev_dbg(ocelot->dev,
+ "port %d tc %d min gate len %llu"
+ ", sending all frames\n",
+ port, tc, min_gate_len[tc]);
+ } else {
+ /* If traffic class doesn't support a full MTU sized
+ * frame, make sure to enable oversize frame dropping
+ * for frames larger than the smallest that would fit.
+ */
+ max_sdu = div_u64(min_gate_len[tc] * 1000,
+ picos_per_byte);
+ /* A TC gate may be completely closed, which is a
+ * special case where all packets are oversized.
+ * Any limit smaller than 64 octets accomplishes this
+ */
+ if (!max_sdu)
+ max_sdu = 1;
+ /* Take L1 overhead into account, but just don't allow
+ * max_sdu to go negative or to 0. Here we use 20
+ * because QSYS_MAXSDU_CFG_* already counts the 4 FCS
+ * octets as part of packet size.
+ */
+ if (max_sdu > 20)
+ max_sdu -= 20;
+ dev_info(ocelot->dev,
+ "port %d tc %d min gate length %llu"
+ " ns not enough for max frame size %d at %d"
+ " Mbps, dropping frames over %d"
+ " octets including FCS\n",
+ port, tc, min_gate_len[tc], maxlen, speed,
+ max_sdu);
+ }
+
+ /* ocelot_write_rix is a macro that concatenates
+ * QSYS_MAXSDU_CFG_* with _RSZ, so we need to spell out
+ * the writes to each traffic class
+ */
+ switch (tc) {
+ case 0:
+ ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_0,
+ port);
+ break;
+ case 1:
+ ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_1,
+ port);
+ break;
+ case 2:
+ ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_2,
+ port);
+ break;
+ case 3:
+ ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_3,
+ port);
+ break;
+ case 4:
+ ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_4,
+ port);
+ break;
+ case 5:
+ ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_5,
+ port);
+ break;
+ case 6:
+ ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_6,
+ port);
+ break;
+ case 7:
+ ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_7,
+ port);
+ break;
+ }
+ }
+
+ ocelot_write_rix(ocelot, maxlen, QSYS_PORT_MAX_SDU, port);
+}
+
static void vsc9959_sched_speed_set(struct ocelot *ocelot, int port,
u32 speed)
{
+ struct ocelot_port *ocelot_port = ocelot->ports[port];
u8 tas_speed;

switch (speed) {
@@ -1151,6 +1354,13 @@ static void vsc9959_sched_speed_set(struct ocelot *ocelot, int port,
QSYS_TAG_CONFIG_LINK_SPEED(tas_speed),
QSYS_TAG_CONFIG_LINK_SPEED_M,
QSYS_TAG_CONFIG, port);
+
+ mutex_lock(&ocelot->tas_lock);
+
+ if (ocelot_port->taprio)
+ vsc9959_tas_guard_bands_update(ocelot, port);
+
+ mutex_unlock(&ocelot->tas_lock);
}

static void vsc9959_new_base_time(struct ocelot *ocelot, ktime_t base_time,
@@ -1193,10 +1403,13 @@ static void vsc9959_tas_gcl_set(struct ocelot *ocelot, const u32 gcl_ix,
static int vsc9959_qos_port_tas_set(struct ocelot *ocelot, int port,
struct tc_taprio_qopt_offload *taprio)
{
+ struct ocelot_port *ocelot_port = ocelot->ports[port];
struct timespec64 base_ts;
int ret, i;
u32 val;

+ mutex_lock(&ocelot->tas_lock);
+
if (!taprio->enable) {
ocelot_rmw_rix(ocelot,
QSYS_TAG_CONFIG_INIT_GATE_STATE(0xFF),
@@ -1204,15 +1417,25 @@ static int vsc9959_qos_port_tas_set(struct ocelot *ocelot, int port,
QSYS_TAG_CONFIG_INIT_GATE_STATE_M,
QSYS_TAG_CONFIG, port);

+ taprio_offload_free(ocelot_port->taprio);
+ ocelot_port->taprio = NULL;
+
+ vsc9959_tas_guard_bands_update(ocelot, port);
+
+ mutex_unlock(&ocelot->tas_lock);
return 0;
}

if (taprio->cycle_time > NSEC_PER_SEC ||
- taprio->cycle_time_extension >= NSEC_PER_SEC)
- return -EINVAL;
+ taprio->cycle_time_extension >= NSEC_PER_SEC) {
+ ret = -EINVAL;
+ goto err;
+ }

- if (taprio->num_entries > VSC9959_TAS_GCL_ENTRY_MAX)
- return -ERANGE;
+ if (taprio->num_entries > VSC9959_TAS_GCL_ENTRY_MAX) {
+ ret = -ERANGE;
+ goto err;
+ }

/* Enable guard band. The switch will schedule frames without taking
* their length into account. Thus we'll always need to enable the
@@ -1233,8 +1456,10 @@ static int vsc9959_qos_port_tas_set(struct ocelot *ocelot, int port,
* config is pending, need reset the TAS module
*/
val = ocelot_read(ocelot, QSYS_PARAM_STATUS_REG_8);
- if (val & QSYS_PARAM_STATUS_REG_8_CONFIG_PENDING)
- return -EBUSY;
+ if (val & QSYS_PARAM_STATUS_REG_8_CONFIG_PENDING) {
+ ret = -EBUSY;
+ goto err;
+ }

ocelot_rmw_rix(ocelot,
QSYS_TAG_CONFIG_ENABLE |
@@ -1267,10 +1492,71 @@ static int vsc9959_qos_port_tas_set(struct ocelot *ocelot, int port,
ret = readx_poll_timeout(vsc9959_tas_read_cfg_status, ocelot, val,
!(val & QSYS_TAS_PARAM_CFG_CTRL_CONFIG_CHANGE),
10, 100000);
+ if (ret)
+ goto err;
+
+ ocelot_port->taprio = taprio_offload_get(taprio);
+ vsc9959_tas_guard_bands_update(ocelot, port);
+
+err:
+ mutex_unlock(&ocelot->tas_lock);

return ret;
}

+static void vsc9959_tas_clock_adjust(struct ocelot *ocelot)
+{
+ struct tc_taprio_qopt_offload *taprio;
+ struct ocelot_port *ocelot_port;
+ struct timespec64 base_ts;
+ int port;
+ u32 val;
+
+ mutex_lock(&ocelot->tas_lock);
+
+ for (port = 0; port < ocelot->num_phys_ports; port++) {
+ ocelot_port = ocelot->ports[port];
+ taprio = ocelot_port->taprio;
+ if (!taprio)
+ continue;
+
+ ocelot_rmw(ocelot,
+ QSYS_TAS_PARAM_CFG_CTRL_PORT_NUM(port),
+ QSYS_TAS_PARAM_CFG_CTRL_PORT_NUM_M,
+ QSYS_TAS_PARAM_CFG_CTRL);
+
+ ocelot_rmw_rix(ocelot,
+ QSYS_TAG_CONFIG_INIT_GATE_STATE(0xFF),
+ QSYS_TAG_CONFIG_ENABLE |
+ QSYS_TAG_CONFIG_INIT_GATE_STATE_M,
+ QSYS_TAG_CONFIG, port);
+
+ vsc9959_new_base_time(ocelot, taprio->base_time,
+ taprio->cycle_time, &base_ts);
+
+ ocelot_write(ocelot, base_ts.tv_nsec, QSYS_PARAM_CFG_REG_1);
+ ocelot_write(ocelot, lower_32_bits(base_ts.tv_sec),
+ QSYS_PARAM_CFG_REG_2);
+ val = upper_32_bits(base_ts.tv_sec);
+ ocelot_rmw(ocelot,
+ QSYS_PARAM_CFG_REG_3_BASE_TIME_SEC_MSB(val),
+ QSYS_PARAM_CFG_REG_3_BASE_TIME_SEC_MSB_M,
+ QSYS_PARAM_CFG_REG_3);
+
+ ocelot_rmw(ocelot, QSYS_TAS_PARAM_CFG_CTRL_CONFIG_CHANGE,
+ QSYS_TAS_PARAM_CFG_CTRL_CONFIG_CHANGE,
+ QSYS_TAS_PARAM_CFG_CTRL);
+
+ ocelot_rmw_rix(ocelot,
+ QSYS_TAG_CONFIG_INIT_GATE_STATE(0xFF) |
+ QSYS_TAG_CONFIG_ENABLE,
+ QSYS_TAG_CONFIG_ENABLE |
+ QSYS_TAG_CONFIG_INIT_GATE_STATE_M,
+ QSYS_TAG_CONFIG, port);
+ }
+ mutex_unlock(&ocelot->tas_lock);
+}
+
static int vsc9959_qos_port_cbs_set(struct dsa_switch *ds, int port,
struct tc_cbs_qopt_offload *cbs_qopt)
{
@@ -2210,6 +2496,7 @@ static const struct ocelot_ops vsc9959_ops = {
.psfp_filter_del = vsc9959_psfp_filter_del,
.psfp_stats_get = vsc9959_psfp_stats_get,
.cut_through_fwd = vsc9959_cut_through_fwd,
+ .tas_clock_adjust = vsc9959_tas_clock_adjust,
};

static const struct felix_info felix_info_vsc9959 = {
@@ -2237,6 +2524,7 @@ static const struct felix_info felix_info_vsc9959 = {
.port_modes = vsc9959_port_modes,
.port_setup_tc = vsc9959_port_setup_tc,
.port_sched_speed_set = vsc9959_sched_speed_set,
+ .tas_guard_bands_update = vsc9959_tas_guard_bands_update,
.init_regmap = ocelot_regmap_init,
};

diff --git a/drivers/net/ethernet/atheros/ag71xx.c b/drivers/net/ethernet/atheros/ag71xx.c
index ec167af0e3b2..5041265f4226 100644
--- a/drivers/net/ethernet/atheros/ag71xx.c
+++ b/drivers/net/ethernet/atheros/ag71xx.c
@@ -946,7 +946,7 @@ static unsigned int ag71xx_max_frame_len(unsigned int mtu)
return ETH_HLEN + VLAN_HLEN + mtu + ETH_FCS_LEN;
}

-static void ag71xx_hw_set_macaddr(struct ag71xx *ag, unsigned char *mac)
+static void ag71xx_hw_set_macaddr(struct ag71xx *ag, const unsigned char *mac)
{
u32 t;

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_dev.h b/drivers/net/ethernet/huawei/hinic/hinic_dev.h
index fb3e89141a0d..a4fbf44f944c 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_dev.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_dev.h
@@ -95,9 +95,6 @@ struct hinic_dev {
u16 sq_depth;
u16 rq_depth;

- struct hinic_txq_stats tx_stats;
- struct hinic_rxq_stats rx_stats;
-
u8 rss_tmpl_idx;
u8 rss_hash_engine;
u16 num_rss;
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_main.c b/drivers/net/ethernet/huawei/hinic/hinic_main.c
index 05329292d940..c23ee2ddbce3 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_main.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_main.c
@@ -62,8 +62,6 @@ MODULE_PARM_DESC(rx_weight, "Number Rx packets for NAPI budget (default=64)");

#define HINIC_LRO_RX_TIMER_DEFAULT 16

-#define VLAN_BITMAP_SIZE(nic_dev) (ALIGN(VLAN_N_VID, 8) / 8)
-
#define work_to_rx_mode_work(work) \
container_of(work, struct hinic_rx_mode_work, work)

@@ -82,56 +80,44 @@ static int set_features(struct hinic_dev *nic_dev,
netdev_features_t pre_features,
netdev_features_t features, bool force_change);

-static void update_rx_stats(struct hinic_dev *nic_dev, struct hinic_rxq *rxq)
+static void gather_rx_stats(struct hinic_rxq_stats *nic_rx_stats, struct hinic_rxq *rxq)
{
- struct hinic_rxq_stats *nic_rx_stats = &nic_dev->rx_stats;
struct hinic_rxq_stats rx_stats;

- u64_stats_init(&rx_stats.syncp);
-
hinic_rxq_get_stats(rxq, &rx_stats);

- u64_stats_update_begin(&nic_rx_stats->syncp);
nic_rx_stats->bytes += rx_stats.bytes;
nic_rx_stats->pkts += rx_stats.pkts;
nic_rx_stats->errors += rx_stats.errors;
nic_rx_stats->csum_errors += rx_stats.csum_errors;
nic_rx_stats->other_errors += rx_stats.other_errors;
- u64_stats_update_end(&nic_rx_stats->syncp);
-
- hinic_rxq_clean_stats(rxq);
}

-static void update_tx_stats(struct hinic_dev *nic_dev, struct hinic_txq *txq)
+static void gather_tx_stats(struct hinic_txq_stats *nic_tx_stats, struct hinic_txq *txq)
{
- struct hinic_txq_stats *nic_tx_stats = &nic_dev->tx_stats;
struct hinic_txq_stats tx_stats;

- u64_stats_init(&tx_stats.syncp);
-
hinic_txq_get_stats(txq, &tx_stats);

- u64_stats_update_begin(&nic_tx_stats->syncp);
nic_tx_stats->bytes += tx_stats.bytes;
nic_tx_stats->pkts += tx_stats.pkts;
nic_tx_stats->tx_busy += tx_stats.tx_busy;
nic_tx_stats->tx_wake += tx_stats.tx_wake;
nic_tx_stats->tx_dropped += tx_stats.tx_dropped;
nic_tx_stats->big_frags_pkts += tx_stats.big_frags_pkts;
- u64_stats_update_end(&nic_tx_stats->syncp);
-
- hinic_txq_clean_stats(txq);
}

-static void update_nic_stats(struct hinic_dev *nic_dev)
+static void gather_nic_stats(struct hinic_dev *nic_dev,
+ struct hinic_rxq_stats *nic_rx_stats,
+ struct hinic_txq_stats *nic_tx_stats)
{
int i, num_qps = hinic_hwdev_num_qps(nic_dev->hwdev);

for (i = 0; i < num_qps; i++)
- update_rx_stats(nic_dev, &nic_dev->rxqs[i]);
+ gather_rx_stats(nic_rx_stats, &nic_dev->rxqs[i]);

for (i = 0; i < num_qps; i++)
- update_tx_stats(nic_dev, &nic_dev->txqs[i]);
+ gather_tx_stats(nic_tx_stats, &nic_dev->txqs[i]);
}

/**
@@ -560,8 +546,6 @@ int hinic_close(struct net_device *netdev)
netif_carrier_off(netdev);
netif_tx_disable(netdev);

- update_nic_stats(nic_dev);
-
up(&nic_dev->mgmt_lock);

if (!HINIC_IS_VF(nic_dev->hwdev->hwif))
@@ -855,26 +839,19 @@ static void hinic_get_stats64(struct net_device *netdev,
struct rtnl_link_stats64 *stats)
{
struct hinic_dev *nic_dev = netdev_priv(netdev);
- struct hinic_rxq_stats *nic_rx_stats;
- struct hinic_txq_stats *nic_tx_stats;
-
- nic_rx_stats = &nic_dev->rx_stats;
- nic_tx_stats = &nic_dev->tx_stats;
-
- down(&nic_dev->mgmt_lock);
+ struct hinic_rxq_stats nic_rx_stats = {};
+ struct hinic_txq_stats nic_tx_stats = {};

if (nic_dev->flags & HINIC_INTF_UP)
- update_nic_stats(nic_dev);
-
- up(&nic_dev->mgmt_lock);
+ gather_nic_stats(nic_dev, &nic_rx_stats, &nic_tx_stats);

- stats->rx_bytes = nic_rx_stats->bytes;
- stats->rx_packets = nic_rx_stats->pkts;
- stats->rx_errors = nic_rx_stats->errors;
+ stats->rx_bytes = nic_rx_stats.bytes;
+ stats->rx_packets = nic_rx_stats.pkts;
+ stats->rx_errors = nic_rx_stats.errors;

- stats->tx_bytes = nic_tx_stats->bytes;
- stats->tx_packets = nic_tx_stats->pkts;
- stats->tx_errors = nic_tx_stats->tx_dropped;
+ stats->tx_bytes = nic_tx_stats.bytes;
+ stats->tx_packets = nic_tx_stats.pkts;
+ stats->tx_errors = nic_tx_stats.tx_dropped;
}

static int hinic_set_features(struct net_device *netdev,
@@ -1173,8 +1150,6 @@ static void hinic_free_intr_coalesce(struct hinic_dev *nic_dev)
static int nic_dev_init(struct pci_dev *pdev)
{
struct hinic_rx_mode_work *rx_mode_work;
- struct hinic_txq_stats *tx_stats;
- struct hinic_rxq_stats *rx_stats;
struct hinic_dev *nic_dev;
struct net_device *netdev;
struct hinic_hwdev *hwdev;
@@ -1236,15 +1211,8 @@ static int nic_dev_init(struct pci_dev *pdev)

sema_init(&nic_dev->mgmt_lock, 1);

- tx_stats = &nic_dev->tx_stats;
- rx_stats = &nic_dev->rx_stats;
-
- u64_stats_init(&tx_stats->syncp);
- u64_stats_init(&rx_stats->syncp);
-
- nic_dev->vlan_bitmap = devm_kzalloc(&pdev->dev,
- VLAN_BITMAP_SIZE(nic_dev),
- GFP_KERNEL);
+ nic_dev->vlan_bitmap = devm_bitmap_zalloc(&pdev->dev, VLAN_N_VID,
+ GFP_KERNEL);
if (!nic_dev->vlan_bitmap) {
err = -ENOMEM;
goto err_vlan_bitmap;
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_rx.c b/drivers/net/ethernet/huawei/hinic/hinic_rx.c
index b33ed4d92b71..b6bce622a6a8 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_rx.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_rx.c
@@ -73,7 +73,6 @@ void hinic_rxq_get_stats(struct hinic_rxq *rxq, struct hinic_rxq_stats *stats)
struct hinic_rxq_stats *rxq_stats = &rxq->rxq_stats;
unsigned int start;

- u64_stats_update_begin(&stats->syncp);
do {
start = u64_stats_fetch_begin(&rxq_stats->syncp);
stats->pkts = rxq_stats->pkts;
@@ -83,7 +82,6 @@ void hinic_rxq_get_stats(struct hinic_rxq *rxq, struct hinic_rxq_stats *stats)
stats->csum_errors = rxq_stats->csum_errors;
stats->other_errors = rxq_stats->other_errors;
} while (u64_stats_fetch_retry(&rxq_stats->syncp, start));
- u64_stats_update_end(&stats->syncp);
}

/**
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_tx.c b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
index 8d59babbf476..082eb2a5088d 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_tx.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
@@ -98,7 +98,6 @@ void hinic_txq_get_stats(struct hinic_txq *txq, struct hinic_txq_stats *stats)
struct hinic_txq_stats *txq_stats = &txq->txq_stats;
unsigned int start;

- u64_stats_update_begin(&stats->syncp);
do {
start = u64_stats_fetch_begin(&txq_stats->syncp);
stats->pkts = txq_stats->pkts;
@@ -108,7 +107,6 @@ void hinic_txq_get_stats(struct hinic_txq *txq, struct hinic_txq_stats *stats)
stats->tx_dropped = txq_stats->tx_dropped;
stats->big_frags_pkts = txq_stats->big_frags_pkts;
} while (u64_stats_fetch_retry(&txq_stats->syncp, start));
- u64_stats_update_end(&stats->syncp);
}

/**
diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 0ea0361cd86b..a988c08e906f 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -92,6 +92,7 @@ struct iavf_vsi {
#define IAVF_HKEY_ARRAY_SIZE ((IAVF_VFQF_HKEY_MAX_INDEX + 1) * 4)
#define IAVF_HLUT_ARRAY_SIZE ((IAVF_VFQF_HLUT_MAX_INDEX + 1) * 4)
#define IAVF_MBPS_DIVISOR 125000 /* divisor to convert to Mbps */
+#define IAVF_MBPS_QUANTA 50

#define IAVF_VIRTCHNL_VF_RESOURCE_SIZE (sizeof(struct virtchnl_vf_resource) + \
(IAVF_MAX_VF_VSI * \
@@ -430,6 +431,11 @@ struct iavf_adapter {
/* lock to protect access to the cloud filter list */
spinlock_t cloud_filter_list_lock;
u16 num_cloud_filters;
+ /* snapshot of "num_active_queues" before setup_tc for qdisc add
+ * is invoked. This information is useful during qdisc del flow,
+ * to restore correct number of queues
+ */
+ int orig_num_active_queues;

#define IAVF_MAX_FDIR_FILTERS 128 /* max allowed Flow Director filters */
u16 fdir_active_fltr;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 2e2c153ce46a..3dbfaead2ac7 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -3322,6 +3322,7 @@ static int iavf_validate_ch_config(struct iavf_adapter *adapter,
struct tc_mqprio_qopt_offload *mqprio_qopt)
{
u64 total_max_rate = 0;
+ u32 tx_rate_rem = 0;
int i, num_qps = 0;
u64 tx_rate = 0;
int ret = 0;
@@ -3336,12 +3337,32 @@ static int iavf_validate_ch_config(struct iavf_adapter *adapter,
return -EINVAL;
if (mqprio_qopt->min_rate[i]) {
dev_err(&adapter->pdev->dev,
- "Invalid min tx rate (greater than 0) specified\n");
+ "Invalid min tx rate (greater than 0) specified for TC%d\n",
+ i);
return -EINVAL;
}
- /*convert to Mbps */
+
+ /* convert to Mbps */
tx_rate = div_u64(mqprio_qopt->max_rate[i],
IAVF_MBPS_DIVISOR);
+
+ if (mqprio_qopt->max_rate[i] &&
+ tx_rate < IAVF_MBPS_QUANTA) {
+ dev_err(&adapter->pdev->dev,
+ "Invalid max tx rate for TC%d, minimum %dMbps\n",
+ i, IAVF_MBPS_QUANTA);
+ return -EINVAL;
+ }
+
+ (void)div_u64_rem(tx_rate, IAVF_MBPS_QUANTA, &tx_rate_rem);
+
+ if (tx_rate_rem != 0) {
+ dev_err(&adapter->pdev->dev,
+ "Invalid max tx rate for TC%d, not divisible by %d\n",
+ i, IAVF_MBPS_QUANTA);
+ return -EINVAL;
+ }
+
total_max_rate += tx_rate;
num_qps += mqprio_qopt->qopt.count[i];
}
@@ -3408,6 +3429,7 @@ static int __iavf_setup_tc(struct net_device *netdev, void *type_data)
netif_tx_disable(netdev);
iavf_del_all_cloud_filters(adapter);
adapter->aq_required = IAVF_FLAG_AQ_DISABLE_CHANNELS;
+ total_qps = adapter->orig_num_active_queues;
goto exit;
} else {
return -EINVAL;
@@ -3451,7 +3473,21 @@ static int __iavf_setup_tc(struct net_device *netdev, void *type_data)
adapter->ch_config.ch_info[i].offset = 0;
}
}
+
+ /* Take snapshot of original config such as "num_active_queues"
+ * It is used later when delete ADQ flow is exercised, so that
+ * once delete ADQ flow completes, VF shall go back to its
+ * original queue configuration
+ */
+
+ adapter->orig_num_active_queues = adapter->num_active_queues;
+
+ /* Store queue info based on TC so that VF gets configured
+ * with correct number of queues when VF completes ADQ config
+ * flow
+ */
adapter->ch_config.total_qps = total_qps;
+
netif_tx_stop_all_queues(netdev);
netif_tx_disable(netdev);
adapter->aq_required |= IAVF_FLAG_AQ_ENABLE_CHANNELS;
@@ -3468,6 +3504,12 @@ static int __iavf_setup_tc(struct net_device *netdev, void *type_data)
}
}
exit:
+ if (test_bit(__IAVF_IN_REMOVE_TASK, &adapter->crit_section))
+ return 0;
+
+ netif_set_real_num_rx_queues(netdev, total_qps);
+ netif_set_real_num_tx_queues(netdev, total_qps);
+
return ret;
}

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 522462f41067..f48938af960c 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -419,7 +419,7 @@ static int ice_vsi_sync_fltr(struct ice_vsi *vsi)
IFF_PROMISC;
goto out_promisc;
}
- if (vsi->current_netdev_flags &
+ if (vsi->netdev->features &
NETIF_F_HW_VLAN_CTAG_FILTER)
vlan_ops->ena_rx_filtering(vsi);
}
diff --git a/drivers/net/ethernet/intel/ice/ice_switch.c b/drivers/net/ethernet/intel/ice/ice_switch.c
index 25b8f6f726eb..73960c8f9dba 100644
--- a/drivers/net/ethernet/intel/ice/ice_switch.c
+++ b/drivers/net/ethernet/intel/ice/ice_switch.c
@@ -4874,7 +4874,7 @@ ice_find_free_recp_res_idx(struct ice_hw *hw, const unsigned long *profiles,
bitmap_zero(recipes, ICE_MAX_NUM_RECIPES);
bitmap_zero(used_idx, ICE_MAX_FV_WORDS);

- bitmap_set(possible_idx, 0, ICE_MAX_FV_WORDS);
+ bitmap_fill(possible_idx, ICE_MAX_FV_WORDS);

/* For each profile we are going to associate the recipe with, add the
* recipes that are associated with that profile. This will give us
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index d0d14325a0d9..3ddf76aea3f1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -109,7 +109,7 @@ struct page_pool;
#define MLX5E_REQUIRED_WQE_MTTS (MLX5_ALIGN_MTTS(MLX5_MPWRQ_PAGES_PER_WQE + 1))
#define MLX5E_REQUIRED_MTTS(wqes) (wqes * MLX5E_REQUIRED_WQE_MTTS)
#define MLX5E_MAX_RQ_NUM_MTTS \
- ((1 << 16) * 2) /* So that MLX5_MTT_OCTW(num_mtts) fits into u16 */
+ (ALIGN_DOWN(U16_MAX, 4) * 2) /* So that MLX5_MTT_OCTW(num_mtts) fits into u16 */
#define MLX5E_ORDER2_MAX_PACKET_MTU (order_base_2(10 * 1024))
#define MLX5E_PARAMS_MAXIMUM_LOG_RQ_SIZE_MPW \
(ilog2(MLX5E_MAX_RQ_NUM_MTTS / MLX5E_REQUIRED_WQE_MTTS))
@@ -174,8 +174,8 @@ struct page_pool;
ALIGN_DOWN(MLX5E_KLM_MAX_ENTRIES_PER_WQE(wqe_size), MLX5_UMR_KLM_ALIGNMENT)

#define MLX5E_MAX_KLM_PER_WQE(mdev) \
- MLX5E_KLM_ENTRIES_PER_WQE(mlx5e_get_sw_max_sq_mpw_wqebbs(mlx5e_get_max_sq_wqebbs(mdev)) \
- << MLX5_MKEY_BSF_OCTO_SIZE)
+ MLX5E_KLM_ENTRIES_PER_WQE(MLX5_SEND_WQE_BB * \
+ mlx5e_get_sw_max_sq_mpw_wqebbs(mlx5e_get_max_sq_wqebbs(mdev)))

#define MLX5E_MSG_LEVEL NETIF_MSG_LINK

@@ -233,7 +233,7 @@ static inline u16 mlx5e_get_max_sq_wqebbs(struct mlx5_core_dev *mdev)
MLX5_CAP_GEN(mdev, max_wqe_sz_sq) / MLX5_SEND_WQE_BB);
}

-static inline u16 mlx5e_get_sw_max_sq_mpw_wqebbs(u16 max_sq_wqebbs)
+static inline u8 mlx5e_get_sw_max_sq_mpw_wqebbs(u8 max_sq_wqebbs)
{
/* The return value will be multiplied by MLX5_SEND_WQEBB_NUM_DS.
* Since max_sq_wqebbs may be up to MLX5_SEND_WQE_MAX_WQEBBS == 16,
@@ -242,11 +242,12 @@ static inline u16 mlx5e_get_sw_max_sq_mpw_wqebbs(u16 max_sq_wqebbs)
* than MLX5_SEND_WQE_MAX_WQEBBS to let a full-session WQE be
* cache-aligned.
*/
-#if L1_CACHE_BYTES < 128
- return min_t(u16, max_sq_wqebbs, MLX5_SEND_WQE_MAX_WQEBBS - 1);
-#else
- return min_t(u16, max_sq_wqebbs, MLX5_SEND_WQE_MAX_WQEBBS - 2);
+ u8 wqebbs = min_t(u8, max_sq_wqebbs, MLX5_SEND_WQE_MAX_WQEBBS - 1);
+
+#if L1_CACHE_BYTES >= 128
+ wqebbs = ALIGN_DOWN(wqebbs, 2);
#endif
+ return wqebbs;
}

struct mlx5e_tx_wqe {
@@ -456,7 +457,7 @@ struct mlx5e_txqsq {
struct netdev_queue *txq;
u32 sqn;
u16 stop_room;
- u16 max_sq_mpw_wqebbs;
+ u8 max_sq_mpw_wqebbs;
u8 min_inline_mode;
struct device *pdev;
__be32 mkey_be;
@@ -571,7 +572,7 @@ struct mlx5e_xdpsq {
struct device *pdev;
__be32 mkey_be;
u16 stop_room;
- u16 max_sq_mpw_wqebbs;
+ u8 max_sq_mpw_wqebbs;
u8 min_inline_mode;
unsigned long state;
unsigned int hw_mtu;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
index 08fd1370a8b0..75da12e1b0c7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
@@ -797,8 +797,20 @@ static u8 mlx5e_build_icosq_log_wq_sz(struct mlx5_core_dev *mdev,
return MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE;

wqebbs = MLX5E_UMR_WQEBBS * BIT(mlx5e_get_rq_log_wq_sz(rqp->rqc));
+
+ /* If XDP program is attached, XSK may be turned on at any time without
+ * restarting the channel. ICOSQ must be big enough to fit UMR WQEs of
+ * both regular RQ and XSK RQ.
+ * Although mlx5e_mpwqe_get_log_rq_size accepts mlx5e_xsk_param, it
+ * doesn't affect its return value, as long as params->xdp_prog != NULL,
+ * so we can just multiply by 2.
+ */
+ if (params->xdp_prog)
+ wqebbs *= 2;
+
if (params->packet_merge.type == MLX5E_PACKET_MERGE_SHAMPO)
wqebbs += mlx5e_shampo_icosq_sz(mdev, params, rqp);
+
return max_t(u8, MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE, order_base_2(wqebbs));
}

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c
index dea137dd744b..2b64dd557b5d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c
@@ -128,6 +128,7 @@ mlx5e_tc_post_act_add(struct mlx5e_post_act *post_act, struct mlx5_flow_attr *at
post_attr->inner_match_level = MLX5_MATCH_NONE;
post_attr->outer_match_level = MLX5_MATCH_NONE;
post_attr->action &= ~MLX5_FLOW_CONTEXT_ACTION_DECAP;
+ post_attr->flags |= MLX5_ATTR_FLAG_NO_IN_PORT;

handle->ns_type = post_act->ns_type;
/* Splits were handled before post action */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
index 7f88ccf67fdd..8b56cb8b4743 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h
@@ -7,6 +7,8 @@
#include "en.h"
#include <net/xdp_sock_drv.h>

+#define MLX5E_MTT_PTAG_MASK 0xfffffffffffffff8ULL
+
/* RX data path */

struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
@@ -22,6 +24,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
static inline int mlx5e_xsk_page_alloc_pool(struct mlx5e_rq *rq,
struct mlx5e_dma_info *dma_info)
{
+retry:
dma_info->xsk = xsk_buff_alloc(rq->xsk_pool);
if (!dma_info->xsk)
return -ENOMEM;
@@ -33,6 +36,17 @@ static inline int mlx5e_xsk_page_alloc_pool(struct mlx5e_rq *rq,
*/
dma_info->addr = xsk_buff_xdp_get_frame_dma(dma_info->xsk);

+ /* MTT page mapping has alignment requirements. If they are not
+ * satisfied, leak the descriptor so that it won't come again, and try
+ * to allocate a new one.
+ */
+ if (rq->wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ) {
+ if (unlikely(dma_info->addr & ~MLX5E_MTT_PTAG_MASK)) {
+ xsk_buff_discard(dma_info->xsk);
+ goto retry;
+ }
+ }
+
return 0;
}

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.c
index d93aadbf10da..90ea78239d40 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.c
@@ -16,7 +16,7 @@ static int mlx5e_ktls_add(struct net_device *netdev, struct sock *sk,
struct mlx5_core_dev *mdev = priv->mdev;
int err;

- if (WARN_ON(!mlx5e_ktls_type_check(mdev, crypto_info)))
+ if (!mlx5e_ktls_type_check(mdev, crypto_info))
return -EOPNOTSUPP;

if (direction == TLS_OFFLOAD_CTX_DIR_TX)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 796d97bcf1aa..c6546613f7d8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -230,10 +230,8 @@ esw_setup_ft_dest(struct mlx5_flow_destination *dest,
}

static void
-esw_setup_slow_path_dest(struct mlx5_flow_destination *dest,
- struct mlx5_flow_act *flow_act,
- struct mlx5_fs_chains *chains,
- int i)
+esw_setup_accept_dest(struct mlx5_flow_destination *dest, struct mlx5_flow_act *flow_act,
+ struct mlx5_fs_chains *chains, int i)
{
if (mlx5_chains_ignore_flow_level_supported(chains))
flow_act->flags |= FLOW_ACT_IGNORE_FLOW_LEVEL;
@@ -241,6 +239,16 @@ esw_setup_slow_path_dest(struct mlx5_flow_destination *dest,
dest[i].ft = mlx5_chains_get_tc_end_ft(chains);
}

+static void
+esw_setup_slow_path_dest(struct mlx5_flow_destination *dest, struct mlx5_flow_act *flow_act,
+ struct mlx5_eswitch *esw, int i)
+{
+ if (MLX5_CAP_ESW_FLOWTABLE_FDB(esw->dev, ignore_flow_level))
+ flow_act->flags |= FLOW_ACT_IGNORE_FLOW_LEVEL;
+ dest[i].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
+ dest[i].ft = esw->fdb_table.offloads.slow_fdb;
+}
+
static int
esw_setup_chain_dest(struct mlx5_flow_destination *dest,
struct mlx5_flow_act *flow_act,
@@ -473,8 +481,11 @@ esw_setup_dests(struct mlx5_flow_destination *dest,
} else if (attr->dest_ft) {
esw_setup_ft_dest(dest, flow_act, esw, attr, spec, *i);
(*i)++;
- } else if (mlx5e_tc_attr_flags_skip(attr->flags)) {
- esw_setup_slow_path_dest(dest, flow_act, chains, *i);
+ } else if (attr->flags & MLX5_ATTR_FLAG_SLOW_PATH) {
+ esw_setup_slow_path_dest(dest, flow_act, esw, *i);
+ (*i)++;
+ } else if (attr->flags & MLX5_ATTR_FLAG_ACCEPT) {
+ esw_setup_accept_dest(dest, flow_act, chains, *i);
(*i)++;
} else if (attr->dest_chain) {
err = esw_setup_chain_dest(dest, flow_act, chains, attr->dest_chain,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/tout.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/tout.c
index d758848d34d0..696e45e2bd06 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/tout.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/tout.c
@@ -32,20 +32,17 @@ static void tout_set(struct mlx5_core_dev *dev, u64 val, enum mlx5_timeouts_type
dev->timeouts->to[type] = val;
}

-void mlx5_tout_set_def_val(struct mlx5_core_dev *dev)
+int mlx5_tout_init(struct mlx5_core_dev *dev)
{
int i;

- for (i = 0; i < MAX_TIMEOUT_TYPES; i++)
- tout_set(dev, tout_def_sw_val[i], i);
-}
-
-int mlx5_tout_init(struct mlx5_core_dev *dev)
-{
dev->timeouts = kmalloc(sizeof(*dev->timeouts), GFP_KERNEL);
if (!dev->timeouts)
return -ENOMEM;

+ for (i = 0; i < MAX_TIMEOUT_TYPES; i++)
+ tout_set(dev, tout_def_sw_val[i], i);
+
return 0;
}

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/tout.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/tout.h
index 257c03eeab36..bc9e9aeda847 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/tout.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/tout.h
@@ -35,7 +35,6 @@ int mlx5_tout_init(struct mlx5_core_dev *dev);
void mlx5_tout_cleanup(struct mlx5_core_dev *dev);
void mlx5_tout_query_iseg(struct mlx5_core_dev *dev);
int mlx5_tout_query_dtor(struct mlx5_core_dev *dev);
-void mlx5_tout_set_def_val(struct mlx5_core_dev *dev);
u64 _mlx5_tout_ms(struct mlx5_core_dev *dev, enum mlx5_timeouts_types type);

#define mlx5_tout_ms(dev, type) _mlx5_tout_ms(dev, MLX5_TO_##type##_MS)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 8b5263699994..75d216246955 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -526,7 +526,7 @@ static int handle_hca_cap(struct mlx5_core_dev *dev, void *set_ctx)

/* Check log_max_qp from HCA caps to set in current profile */
if (prof->log_max_qp == LOG_MAX_SUPPORTED_QPS) {
- prof->log_max_qp = min_t(u8, 17, MLX5_CAP_GEN_MAX(dev, log_max_qp));
+ prof->log_max_qp = min_t(u8, 18, MLX5_CAP_GEN_MAX(dev, log_max_qp));
} else if (MLX5_CAP_GEN_MAX(dev, log_max_qp) < prof->log_max_qp) {
mlx5_core_warn(dev, "log_max_qp value in current profile is %d, changing it to HCA capability limit (%d)\n",
prof->log_max_qp,
@@ -1025,8 +1025,6 @@ static int mlx5_function_setup(struct mlx5_core_dev *dev, u64 timeout)
if (mlx5_core_is_pf(dev))
pcie_print_link_status(dev->pdev);

- mlx5_tout_set_def_val(dev);
-
/* wait for firmware to accept initialization segments configurations
*/
err = wait_fw_init(dev, timeout,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
index 887ee0f729d1..2935614f6fa9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
@@ -87,6 +87,11 @@ static int mlx5_device_enable_sriov(struct mlx5_core_dev *dev, int num_vfs)
enable_vfs_hca:
num_msix_count = mlx5_get_default_msix_vec_count(dev, num_vfs);
for (vf = 0; vf < num_vfs; vf++) {
+ /* Notify the VF before its enablement to let it set
+ * some stuff.
+ */
+ blocking_notifier_call_chain(&sriov->vfs_ctx[vf].notifier,
+ MLX5_PF_NOTIFY_ENABLE_VF, dev);
err = mlx5_core_enable_hca(dev, vf + 1);
if (err) {
mlx5_core_warn(dev, "failed to enable VF %d (%d)\n", vf, err);
@@ -127,6 +132,11 @@ mlx5_device_disable_sriov(struct mlx5_core_dev *dev, int num_vfs, bool clear_vf)
for (vf = num_vfs - 1; vf >= 0; vf--) {
if (!sriov->vfs_ctx[vf].enabled)
continue;
+ /* Notify the VF before its disablement to let it clean
+ * some resources.
+ */
+ blocking_notifier_call_chain(&sriov->vfs_ctx[vf].notifier,
+ MLX5_PF_NOTIFY_DISABLE_VF, dev);
err = mlx5_core_disable_hca(dev, vf + 1);
if (err) {
mlx5_core_warn(dev, "failed to disable VF %d\n", vf);
@@ -257,7 +267,7 @@ int mlx5_sriov_init(struct mlx5_core_dev *dev)
{
struct mlx5_core_sriov *sriov = &dev->priv.sriov;
struct pci_dev *pdev = dev->pdev;
- int total_vfs;
+ int total_vfs, i;

if (!mlx5_core_is_pf(dev))
return 0;
@@ -269,6 +279,9 @@ int mlx5_sriov_init(struct mlx5_core_dev *dev)
if (!sriov->vfs_ctx)
return -ENOMEM;

+ for (i = 0; i < total_vfs; i++)
+ BLOCKING_INIT_NOTIFIER_HEAD(&sriov->vfs_ctx[i].notifier);
+
return 0;
}

@@ -281,3 +294,53 @@ void mlx5_sriov_cleanup(struct mlx5_core_dev *dev)

kfree(sriov->vfs_ctx);
}
+
+/**
+ * mlx5_sriov_blocking_notifier_unregister - Unregister a VF from
+ * a notification block chain.
+ *
+ * @mdev: The mlx5 core device.
+ * @vf_id: The VF id.
+ * @nb: The notifier block to be unregistered.
+ */
+void mlx5_sriov_blocking_notifier_unregister(struct mlx5_core_dev *mdev,
+ int vf_id,
+ struct notifier_block *nb)
+{
+ struct mlx5_vf_context *vfs_ctx;
+ struct mlx5_core_sriov *sriov;
+
+ sriov = &mdev->priv.sriov;
+ if (WARN_ON(vf_id < 0 || vf_id >= sriov->num_vfs))
+ return;
+
+ vfs_ctx = &sriov->vfs_ctx[vf_id];
+ blocking_notifier_chain_unregister(&vfs_ctx->notifier, nb);
+}
+EXPORT_SYMBOL(mlx5_sriov_blocking_notifier_unregister);
+
+/**
+ * mlx5_sriov_blocking_notifier_register - Register a VF notification
+ * block chain.
+ *
+ * @mdev: The mlx5 core device.
+ * @vf_id: The VF id.
+ * @nb: The notifier block to be called upon the VF events.
+ *
+ * Returns 0 on success or an error code.
+ */
+int mlx5_sriov_blocking_notifier_register(struct mlx5_core_dev *mdev,
+ int vf_id,
+ struct notifier_block *nb)
+{
+ struct mlx5_vf_context *vfs_ctx;
+ struct mlx5_core_sriov *sriov;
+
+ sriov = &mdev->priv.sriov;
+ if (vf_id < 0 || vf_id >= sriov->num_vfs)
+ return -EINVAL;
+
+ vfs_ctx = &sriov->vfs_ctx[vf_id];
+ return blocking_notifier_chain_register(&vfs_ctx->notifier, nb);
+}
+EXPORT_SYMBOL(mlx5_sriov_blocking_notifier_register);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_dbg.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_dbg.c
index d5998ef59be4..7adcf0eec13b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_dbg.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_dbg.c
@@ -21,10 +21,11 @@ enum dr_dump_rec_type {
DR_DUMP_REC_TYPE_TABLE_TX = 3102,

DR_DUMP_REC_TYPE_MATCHER = 3200,
- DR_DUMP_REC_TYPE_MATCHER_MASK = 3201,
+ DR_DUMP_REC_TYPE_MATCHER_MASK_DEPRECATED = 3201,
DR_DUMP_REC_TYPE_MATCHER_RX = 3202,
DR_DUMP_REC_TYPE_MATCHER_TX = 3203,
DR_DUMP_REC_TYPE_MATCHER_BUILDER = 3204,
+ DR_DUMP_REC_TYPE_MATCHER_MASK = 3205,

DR_DUMP_REC_TYPE_RULE = 3300,
DR_DUMP_REC_TYPE_RULE_RX_ENTRY_V0 = 3301,
@@ -114,13 +115,15 @@ dr_dump_rule_action_mem(struct seq_file *file, const u64 rule_id,
break;
case DR_ACTION_TYP_FT:
if (action->dest_tbl->is_fw_tbl)
- seq_printf(file, "%d,0x%llx,0x%llx,0x%x\n",
+ seq_printf(file, "%d,0x%llx,0x%llx,0x%x,0x%x\n",
DR_DUMP_REC_TYPE_ACTION_FT, action_id,
- rule_id, action->dest_tbl->fw_tbl.id);
+ rule_id, action->dest_tbl->fw_tbl.id,
+ -1);
else
- seq_printf(file, "%d,0x%llx,0x%llx,0x%x\n",
+ seq_printf(file, "%d,0x%llx,0x%llx,0x%x,0x%llx\n",
DR_DUMP_REC_TYPE_ACTION_FT, action_id,
- rule_id, action->dest_tbl->tbl->table_id);
+ rule_id, action->dest_tbl->tbl->table_id,
+ DR_DBG_PTR_TO_ID(action->dest_tbl->tbl));

break;
case DR_ACTION_TYP_CTR:
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index 20ceac81a2c2..497077726916 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -3245,6 +3245,7 @@ int ocelot_init(struct ocelot *ocelot)
mutex_init(&ocelot->ptp_lock);
mutex_init(&ocelot->mact_lock);
mutex_init(&ocelot->fwd_domain_lock);
+ mutex_init(&ocelot->tas_lock);
spin_lock_init(&ocelot->ptp_clock_lock);
spin_lock_init(&ocelot->ts_id_lock);
snprintf(queue_name, sizeof(queue_name), "%s-stats",
diff --git a/drivers/net/ethernet/mscc/ocelot_ptp.c b/drivers/net/ethernet/mscc/ocelot_ptp.c
index 87ad2137ba06..09c703efe946 100644
--- a/drivers/net/ethernet/mscc/ocelot_ptp.c
+++ b/drivers/net/ethernet/mscc/ocelot_ptp.c
@@ -72,6 +72,10 @@ int ocelot_ptp_settime64(struct ptp_clock_info *ptp,
ocelot_write_rix(ocelot, val, PTP_PIN_CFG, TOD_ACC_PIN);

spin_unlock_irqrestore(&ocelot->ptp_clock_lock, flags);
+
+ if (ocelot->ops->tas_clock_adjust)
+ ocelot->ops->tas_clock_adjust(ocelot);
+
return 0;
}
EXPORT_SYMBOL(ocelot_ptp_settime64);
@@ -105,6 +109,9 @@ int ocelot_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
ocelot_write_rix(ocelot, val, PTP_PIN_CFG, TOD_ACC_PIN);

spin_unlock_irqrestore(&ocelot->ptp_clock_lock, flags);
+
+ if (ocelot->ops->tas_clock_adjust)
+ ocelot->ops->tas_clock_adjust(ocelot);
} else {
/* Fall back using ocelot_ptp_settime64 which is not exact. */
struct timespec64 ts;
@@ -117,6 +124,7 @@ int ocelot_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)

ocelot_ptp_settime64(ptp, &ts);
}
+
return 0;
}
EXPORT_SYMBOL(ocelot_ptp_adjtime);
diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
index f3568901eb91..1443f788ee37 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
@@ -1437,7 +1437,7 @@ static int ionic_set_nic_features(struct ionic_lif *lif,
if ((old_hw_features ^ lif->hw_features) & IONIC_ETH_HW_RX_HASH)
ionic_lif_rss_config(lif, lif->rss_types, NULL, NULL);

- if ((vlan_flags & features) &&
+ if ((vlan_flags & le64_to_cpu(ctx.cmd.lif_setattr.features)) &&
!(vlan_flags & le64_to_cpu(ctx.comp.lif_setattr.features)))
dev_info_once(lif->ionic->dev, "NIC is not supporting vlan offload, likely in SmartNIC mode\n");

diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index a43820212932..50854265864d 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -351,10 +351,12 @@ nsim_map_alloc_elem(struct bpf_offloaded_map *offmap, unsigned int idx)
{
struct nsim_bpf_bound_map *nmap = offmap->dev_priv;

- nmap->entry[idx].key = kmalloc(offmap->map.key_size, GFP_USER);
+ nmap->entry[idx].key = kmalloc(offmap->map.key_size,
+ GFP_KERNEL_ACCOUNT | __GFP_NOWARN);
if (!nmap->entry[idx].key)
return -ENOMEM;
- nmap->entry[idx].value = kmalloc(offmap->map.value_size, GFP_USER);
+ nmap->entry[idx].value = kmalloc(offmap->map.value_size,
+ GFP_KERNEL_ACCOUNT | __GFP_NOWARN);
if (!nmap->entry[idx].value) {
kfree(nmap->entry[idx].key);
nmap->entry[idx].key = NULL;
@@ -496,7 +498,7 @@ nsim_bpf_map_alloc(struct netdevsim *ns, struct bpf_offloaded_map *offmap)
if (offmap->map.map_flags)
return -EINVAL;

- nmap = kzalloc(sizeof(*nmap), GFP_USER);
+ nmap = kzalloc(sizeof(*nmap), GFP_KERNEL_ACCOUNT);
if (!nmap)
return -ENOMEM;

diff --git a/drivers/net/netdevsim/fib.c b/drivers/net/netdevsim/fib.c
index 378ee779061c..14787d17f703 100644
--- a/drivers/net/netdevsim/fib.c
+++ b/drivers/net/netdevsim/fib.c
@@ -53,6 +53,7 @@ struct nsim_fib_data {
struct rhashtable nexthop_ht;
struct devlink *devlink;
struct work_struct fib_event_work;
+ struct work_struct fib_flush_work;
struct list_head fib_event_queue;
spinlock_t fib_event_queue_lock; /* Protects fib event queue list */
struct mutex nh_lock; /* Protects NH HT */
@@ -977,7 +978,7 @@ static int nsim_fib_event_schedule_work(struct nsim_fib_data *data,

fib_event = kzalloc(sizeof(*fib_event), GFP_ATOMIC);
if (!fib_event)
- return NOTIFY_BAD;
+ goto err_fib_event_alloc;

fib_event->data = data;
fib_event->event = event;
@@ -1005,6 +1006,9 @@ static int nsim_fib_event_schedule_work(struct nsim_fib_data *data,

err_fib_prepare_event:
kfree(fib_event);
+err_fib_event_alloc:
+ if (event == FIB_EVENT_ENTRY_DEL)
+ schedule_work(&data->fib_flush_work);
return NOTIFY_BAD;
}

@@ -1482,6 +1486,24 @@ static void nsim_fib_event_work(struct work_struct *work)
mutex_unlock(&data->fib_lock);
}

+static void nsim_fib_flush_work(struct work_struct *work)
+{
+ struct nsim_fib_data *data = container_of(work, struct nsim_fib_data,
+ fib_flush_work);
+ struct nsim_fib_rt *fib_rt, *fib_rt_tmp;
+
+ /* Process pending work. */
+ flush_work(&data->fib_event_work);
+
+ mutex_lock(&data->fib_lock);
+ list_for_each_entry_safe(fib_rt, fib_rt_tmp, &data->fib_rt_list, list) {
+ rhashtable_remove_fast(&data->fib_rt_ht, &fib_rt->ht_node,
+ nsim_fib_rt_ht_params);
+ nsim_fib_rt_free(fib_rt, data);
+ }
+ mutex_unlock(&data->fib_lock);
+}
+
static int
nsim_fib_debugfs_init(struct nsim_fib_data *data, struct nsim_dev *nsim_dev)
{
@@ -1540,6 +1562,7 @@ struct nsim_fib_data *nsim_fib_create(struct devlink *devlink,
goto err_rhashtable_nexthop_destroy;

INIT_WORK(&data->fib_event_work, nsim_fib_event_work);
+ INIT_WORK(&data->fib_flush_work, nsim_fib_flush_work);
INIT_LIST_HEAD(&data->fib_event_queue);
spin_lock_init(&data->fib_event_queue_lock);

@@ -1586,6 +1609,7 @@ struct nsim_fib_data *nsim_fib_create(struct devlink *devlink,
err_nexthop_nb_unregister:
unregister_nexthop_notifier(devlink_net(devlink), &data->nexthop_nb);
err_rhashtable_fib_destroy:
+ cancel_work_sync(&data->fib_flush_work);
flush_work(&data->fib_event_work);
rhashtable_free_and_destroy(&data->fib_rt_ht, nsim_fib_rt_free,
data);
@@ -1615,6 +1639,7 @@ void nsim_fib_destroy(struct devlink *devlink, struct nsim_fib_data *data)
NSIM_RESOURCE_IPV4_FIB);
unregister_fib_notifier(devlink_net(devlink), &data->fib_nb);
unregister_nexthop_notifier(devlink_net(devlink), &data->nexthop_nb);
+ cancel_work_sync(&data->fib_flush_work);
flush_work(&data->fib_event_work);
rhashtable_free_and_destroy(&data->fib_rt_ht, nsim_fib_rt_free,
data);
diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c
index d8cac02a79b9..636b0907a598 100644
--- a/drivers/net/phy/smsc.c
+++ b/drivers/net/phy/smsc.c
@@ -110,7 +110,7 @@ static int smsc_phy_config_init(struct phy_device *phydev)
struct smsc_phy_priv *priv = phydev->priv;
int rc;

- if (!priv->energy_enable)
+ if (!priv->energy_enable || phydev->irq != PHY_POLL)
return 0;

rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS);
@@ -210,6 +210,8 @@ static int lan95xx_config_aneg_ext(struct phy_device *phydev)
* response on link pulses to detect presence of plugged Ethernet cable.
* The Energy Detect Power-Down mode is enabled again in the end of procedure to
* save approximately 220 mW of power if cable is unplugged.
+ * The workaround is only applicable to poll mode. Energy Detect Power-Down may
+ * not be used in interrupt mode lest link change detection becomes unreliable.
*/
static int lan87xx_read_status(struct phy_device *phydev)
{
@@ -217,7 +219,7 @@ static int lan87xx_read_status(struct phy_device *phydev)

int err = genphy_read_status(phydev);

- if (!phydev->link && priv->energy_enable) {
+ if (!phydev->link && priv->energy_enable && phydev->irq == PHY_POLL) {
/* Disable EDPD to wake up PHY */
int rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS);
if (rc < 0)
diff --git a/drivers/net/usb/Kconfig b/drivers/net/usb/Kconfig
index e62fc4f2aee0..76659c1c525a 100644
--- a/drivers/net/usb/Kconfig
+++ b/drivers/net/usb/Kconfig
@@ -637,8 +637,9 @@ config USB_NET_AQC111
* Aquantia AQtion USB to 5GbE

config USB_RTL8153_ECM
- tristate "RTL8153 ECM support"
+ tristate
depends on USB_NET_CDCETHER && (USB_RTL8152 || USB_RTL8152=n)
+ default y
help
This option supports ECM mode for RTL8153 ethernet adapter, when
CONFIG_USB_RTL8152 is not set, or the RTL8153 device is not
diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c
index dc1f6d8444ad..873f6deabbd1 100644
--- a/drivers/net/usb/ax88179_178a.c
+++ b/drivers/net/usb/ax88179_178a.c
@@ -1801,7 +1801,7 @@ static const struct driver_info ax88179_info = {
.link_reset = ax88179_link_reset,
.reset = ax88179_reset,
.stop = ax88179_stop,
- .flags = FLAG_ETHER | FLAG_FRAMING_AX | FLAG_SEND_ZLP,
+ .flags = FLAG_ETHER | FLAG_FRAMING_AX,
.rx_fixup = ax88179_rx_fixup,
.tx_fixup = ax88179_tx_fixup,
};
@@ -1814,7 +1814,7 @@ static const struct driver_info ax88178a_info = {
.link_reset = ax88179_link_reset,
.reset = ax88179_reset,
.stop = ax88179_stop,
- .flags = FLAG_ETHER | FLAG_FRAMING_AX | FLAG_SEND_ZLP,
+ .flags = FLAG_ETHER | FLAG_FRAMING_AX,
.rx_fixup = ax88179_rx_fixup,
.tx_fixup = ax88179_tx_fixup,
};
@@ -1827,7 +1827,7 @@ static const struct driver_info cypress_GX3_info = {
.link_reset = ax88179_link_reset,
.reset = ax88179_reset,
.stop = ax88179_stop,
- .flags = FLAG_ETHER | FLAG_FRAMING_AX | FLAG_SEND_ZLP,
+ .flags = FLAG_ETHER | FLAG_FRAMING_AX,
.rx_fixup = ax88179_rx_fixup,
.tx_fixup = ax88179_tx_fixup,
};
@@ -1840,7 +1840,7 @@ static const struct driver_info dlink_dub1312_info = {
.link_reset = ax88179_link_reset,
.reset = ax88179_reset,
.stop = ax88179_stop,
- .flags = FLAG_ETHER | FLAG_FRAMING_AX | FLAG_SEND_ZLP,
+ .flags = FLAG_ETHER | FLAG_FRAMING_AX,
.rx_fixup = ax88179_rx_fixup,
.tx_fixup = ax88179_tx_fixup,
};
@@ -1853,7 +1853,7 @@ static const struct driver_info sitecom_info = {
.link_reset = ax88179_link_reset,
.reset = ax88179_reset,
.stop = ax88179_stop,
- .flags = FLAG_ETHER | FLAG_FRAMING_AX | FLAG_SEND_ZLP,
+ .flags = FLAG_ETHER | FLAG_FRAMING_AX,
.rx_fixup = ax88179_rx_fixup,
.tx_fixup = ax88179_tx_fixup,
};
@@ -1866,7 +1866,7 @@ static const struct driver_info samsung_info = {
.link_reset = ax88179_link_reset,
.reset = ax88179_reset,
.stop = ax88179_stop,
- .flags = FLAG_ETHER | FLAG_FRAMING_AX | FLAG_SEND_ZLP,
+ .flags = FLAG_ETHER | FLAG_FRAMING_AX,
.rx_fixup = ax88179_rx_fixup,
.tx_fixup = ax88179_tx_fixup,
};
@@ -1879,7 +1879,7 @@ static const struct driver_info lenovo_info = {
.link_reset = ax88179_link_reset,
.reset = ax88179_reset,
.stop = ax88179_stop,
- .flags = FLAG_ETHER | FLAG_FRAMING_AX | FLAG_SEND_ZLP,
+ .flags = FLAG_ETHER | FLAG_FRAMING_AX,
.rx_fixup = ax88179_rx_fixup,
.tx_fixup = ax88179_tx_fixup,
};
@@ -1892,7 +1892,7 @@ static const struct driver_info belkin_info = {
.link_reset = ax88179_link_reset,
.reset = ax88179_reset,
.stop = ax88179_stop,
- .flags = FLAG_ETHER | FLAG_FRAMING_AX | FLAG_SEND_ZLP,
+ .flags = FLAG_ETHER | FLAG_FRAMING_AX,
.rx_fixup = ax88179_rx_fixup,
.tx_fixup = ax88179_tx_fixup,
};
@@ -1905,7 +1905,7 @@ static const struct driver_info toshiba_info = {
.link_reset = ax88179_link_reset,
.reset = ax88179_reset,
.stop = ax88179_stop,
- .flags = FLAG_ETHER | FLAG_FRAMING_AX | FLAG_SEND_ZLP,
+ .flags = FLAG_ETHER | FLAG_FRAMING_AX,
.rx_fixup = ax88179_rx_fixup,
.tx_fixup = ax88179_tx_fixup,
};
@@ -1918,7 +1918,7 @@ static const struct driver_info mct_info = {
.link_reset = ax88179_link_reset,
.reset = ax88179_reset,
.stop = ax88179_stop,
- .flags = FLAG_ETHER | FLAG_FRAMING_AX | FLAG_SEND_ZLP,
+ .flags = FLAG_ETHER | FLAG_FRAMING_AX,
.rx_fixup = ax88179_rx_fixup,
.tx_fixup = ax88179_tx_fixup,
};
@@ -1931,7 +1931,7 @@ static const struct driver_info at_umc2000_info = {
.link_reset = ax88179_link_reset,
.reset = ax88179_reset,
.stop = ax88179_stop,
- .flags = FLAG_ETHER | FLAG_FRAMING_AX | FLAG_SEND_ZLP,
+ .flags = FLAG_ETHER | FLAG_FRAMING_AX,
.rx_fixup = ax88179_rx_fixup,
.tx_fixup = ax88179_tx_fixup,
};
@@ -1944,7 +1944,7 @@ static const struct driver_info at_umc200_info = {
.link_reset = ax88179_link_reset,
.reset = ax88179_reset,
.stop = ax88179_stop,
- .flags = FLAG_ETHER | FLAG_FRAMING_AX | FLAG_SEND_ZLP,
+ .flags = FLAG_ETHER | FLAG_FRAMING_AX,
.rx_fixup = ax88179_rx_fixup,
.tx_fixup = ax88179_tx_fixup,
};
@@ -1957,7 +1957,7 @@ static const struct driver_info at_umc2000sp_info = {
.link_reset = ax88179_link_reset,
.reset = ax88179_reset,
.stop = ax88179_stop,
- .flags = FLAG_ETHER | FLAG_FRAMING_AX | FLAG_SEND_ZLP,
+ .flags = FLAG_ETHER | FLAG_FRAMING_AX,
.rx_fixup = ax88179_rx_fixup,
.tx_fixup = ax88179_tx_fixup,
};
diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index edf0492ad489..515363d74078 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -18,6 +18,8 @@
#include <linux/usb/usbnet.h>
#include <linux/slab.h>
#include <linux/of_net.h>
+#include <linux/irq.h>
+#include <linux/irqdomain.h>
#include <linux/mdio.h>
#include <linux/phy.h>
#include <net/selftests.h>
@@ -53,6 +55,9 @@
#define SUSPEND_ALLMODES (SUSPEND_SUSPEND0 | SUSPEND_SUSPEND1 | \
SUSPEND_SUSPEND2 | SUSPEND_SUSPEND3)

+#define SMSC95XX_NR_IRQS (1) /* raise to 12 for GPIOs */
+#define PHY_HWIRQ (SMSC95XX_NR_IRQS - 1)
+
struct smsc95xx_priv {
u32 mac_cr;
u32 hash_hi;
@@ -61,8 +66,12 @@ struct smsc95xx_priv {
spinlock_t mac_cr_lock;
u8 features;
u8 suspend_flags;
+ struct irq_chip irqchip;
+ struct irq_domain *irqdomain;
+ struct fwnode_handle *irqfwnode;
struct mii_bus *mdiobus;
struct phy_device *phydev;
+ struct task_struct *pm_task;
};

static bool turbo_mode = true;
@@ -72,13 +81,14 @@ MODULE_PARM_DESC(turbo_mode, "Enable multiple frames per Rx transaction");
static int __must_check __smsc95xx_read_reg(struct usbnet *dev, u32 index,
u32 *data, int in_pm)
{
+ struct smsc95xx_priv *pdata = dev->driver_priv;
u32 buf;
int ret;
int (*fn)(struct usbnet *, u8, u8, u16, u16, void *, u16);

BUG_ON(!dev);

- if (!in_pm)
+ if (current != pdata->pm_task)
fn = usbnet_read_cmd;
else
fn = usbnet_read_cmd_nopm;
@@ -102,13 +112,14 @@ static int __must_check __smsc95xx_read_reg(struct usbnet *dev, u32 index,
static int __must_check __smsc95xx_write_reg(struct usbnet *dev, u32 index,
u32 data, int in_pm)
{
+ struct smsc95xx_priv *pdata = dev->driver_priv;
u32 buf;
int ret;
int (*fn)(struct usbnet *, u8, u8, u16, u16, const void *, u16);

BUG_ON(!dev);

- if (!in_pm)
+ if (current != pdata->pm_task)
fn = usbnet_write_cmd;
else
fn = usbnet_write_cmd_nopm;
@@ -566,16 +577,12 @@ static int smsc95xx_phy_update_flowcontrol(struct usbnet *dev)
return smsc95xx_write_reg(dev, AFC_CFG, afc_cfg);
}

-static int smsc95xx_link_reset(struct usbnet *dev)
+static void smsc95xx_mac_update_fullduplex(struct usbnet *dev)
{
struct smsc95xx_priv *pdata = dev->driver_priv;
unsigned long flags;
int ret;

- ret = smsc95xx_write_reg(dev, INT_STS, INT_STS_CLEAR_ALL_);
- if (ret < 0)
- return ret;
-
spin_lock_irqsave(&pdata->mac_cr_lock, flags);
if (pdata->phydev->duplex != DUPLEX_FULL) {
pdata->mac_cr &= ~MAC_CR_FDPX_;
@@ -587,18 +594,22 @@ static int smsc95xx_link_reset(struct usbnet *dev)
spin_unlock_irqrestore(&pdata->mac_cr_lock, flags);

ret = smsc95xx_write_reg(dev, MAC_CR, pdata->mac_cr);
- if (ret < 0)
- return ret;
+ if (ret < 0) {
+ if (ret != -ENODEV)
+ netdev_warn(dev->net,
+ "Error updating MAC full duplex mode\n");
+ return;
+ }

ret = smsc95xx_phy_update_flowcontrol(dev);
if (ret < 0)
netdev_warn(dev->net, "Error updating PHY flow control\n");
-
- return ret;
}

static void smsc95xx_status(struct usbnet *dev, struct urb *urb)
{
+ struct smsc95xx_priv *pdata = dev->driver_priv;
+ unsigned long flags;
u32 intdata;

if (urb->actual_length != 4) {
@@ -610,11 +621,15 @@ static void smsc95xx_status(struct usbnet *dev, struct urb *urb)
intdata = get_unaligned_le32(urb->transfer_buffer);
netif_dbg(dev, link, dev->net, "intdata: 0x%08X\n", intdata);

+ local_irq_save(flags);
+
if (intdata & INT_ENP_PHY_INT_)
- usbnet_defer_kevent(dev, EVENT_LINK_RESET);
+ generic_handle_domain_irq(pdata->irqdomain, PHY_HWIRQ);
else
netdev_warn(dev->net, "unexpected interrupt, intdata=0x%08X\n",
intdata);
+
+ local_irq_restore(flags);
}

/* Enable or disable Tx & Rx checksum offload engines */
@@ -1092,6 +1107,7 @@ static void smsc95xx_handle_link_change(struct net_device *net)
struct usbnet *dev = netdev_priv(net);

phy_print_status(net->phydev);
+ smsc95xx_mac_update_fullduplex(dev);
usbnet_defer_kevent(dev, EVENT_LINK_CHANGE);
}

@@ -1099,8 +1115,9 @@ static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf)
{
struct smsc95xx_priv *pdata;
bool is_internal_phy;
+ char usb_path[64];
+ int ret, phy_irq;
u32 val;
- int ret;

printk(KERN_INFO SMSC_CHIPNAME " v" SMSC_DRIVER_VERSION "\n");

@@ -1140,10 +1157,38 @@ static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf)
if (ret)
goto free_pdata;

+ /* create irq domain for use by PHY driver and GPIO consumers */
+ usb_make_path(dev->udev, usb_path, sizeof(usb_path));
+ pdata->irqfwnode = irq_domain_alloc_named_fwnode(usb_path);
+ if (!pdata->irqfwnode) {
+ ret = -ENOMEM;
+ goto free_pdata;
+ }
+
+ pdata->irqdomain = irq_domain_create_linear(pdata->irqfwnode,
+ SMSC95XX_NR_IRQS,
+ &irq_domain_simple_ops,
+ pdata);
+ if (!pdata->irqdomain) {
+ ret = -ENOMEM;
+ goto free_irqfwnode;
+ }
+
+ phy_irq = irq_create_mapping(pdata->irqdomain, PHY_HWIRQ);
+ if (!phy_irq) {
+ ret = -ENOENT;
+ goto remove_irqdomain;
+ }
+
+ pdata->irqchip = dummy_irq_chip;
+ pdata->irqchip.name = SMSC_CHIPNAME;
+ irq_set_chip_and_handler_name(phy_irq, &pdata->irqchip,
+ handle_simple_irq, "phy");
+
pdata->mdiobus = mdiobus_alloc();
if (!pdata->mdiobus) {
ret = -ENOMEM;
- goto free_pdata;
+ goto dispose_irq;
}

ret = smsc95xx_read_reg(dev, HW_CFG, &val);
@@ -1176,6 +1221,7 @@ static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf)
goto unregister_mdio;
}

+ pdata->phydev->irq = phy_irq;
pdata->phydev->is_internal = is_internal_phy;

/* detect device revision as different features may be available */
@@ -1218,6 +1264,15 @@ static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf)
free_mdio:
mdiobus_free(pdata->mdiobus);

+dispose_irq:
+ irq_dispose_mapping(phy_irq);
+
+remove_irqdomain:
+ irq_domain_remove(pdata->irqdomain);
+
+free_irqfwnode:
+ irq_domain_free_fwnode(pdata->irqfwnode);
+
free_pdata:
kfree(pdata);
return ret;
@@ -1230,6 +1285,9 @@ static void smsc95xx_unbind(struct usbnet *dev, struct usb_interface *intf)
phy_disconnect(dev->net->phydev);
mdiobus_unregister(pdata->mdiobus);
mdiobus_free(pdata->mdiobus);
+ irq_dispose_mapping(irq_find_mapping(pdata->irqdomain, PHY_HWIRQ));
+ irq_domain_remove(pdata->irqdomain);
+ irq_domain_free_fwnode(pdata->irqfwnode);
netif_dbg(dev, ifdown, dev->net, "free pdata\n");
kfree(pdata);
}
@@ -1254,29 +1312,6 @@ static u32 smsc_crc(const u8 *buffer, size_t len, int filter)
return crc << ((filter % 2) * 16);
}

-static int smsc95xx_enable_phy_wakeup_interrupts(struct usbnet *dev, u16 mask)
-{
- int ret;
-
- netdev_dbg(dev->net, "enabling PHY wakeup interrupts\n");
-
- /* read to clear */
- ret = smsc95xx_mdio_read_nopm(dev, PHY_INT_SRC);
- if (ret < 0)
- return ret;
-
- /* enable interrupt source */
- ret = smsc95xx_mdio_read_nopm(dev, PHY_INT_MASK);
- if (ret < 0)
- return ret;
-
- ret |= mask;
-
- smsc95xx_mdio_write_nopm(dev, PHY_INT_MASK, ret);
-
- return 0;
-}
-
static int smsc95xx_link_ok_nopm(struct usbnet *dev)
{
int ret;
@@ -1443,7 +1478,6 @@ static int smsc95xx_enter_suspend3(struct usbnet *dev)
static int smsc95xx_autosuspend(struct usbnet *dev, u32 link_up)
{
struct smsc95xx_priv *pdata = dev->driver_priv;
- int ret;

if (!netif_running(dev->net)) {
/* interface is ifconfig down so fully power down hw */
@@ -1462,27 +1496,10 @@ static int smsc95xx_autosuspend(struct usbnet *dev, u32 link_up)
}

netdev_dbg(dev->net, "autosuspend entering SUSPEND1\n");
-
- /* enable PHY wakeup events for if cable is attached */
- ret = smsc95xx_enable_phy_wakeup_interrupts(dev,
- PHY_INT_MASK_ANEG_COMP_);
- if (ret < 0) {
- netdev_warn(dev->net, "error enabling PHY wakeup ints\n");
- return ret;
- }
-
netdev_info(dev->net, "entering SUSPEND1 mode\n");
return smsc95xx_enter_suspend1(dev);
}

- /* enable PHY wakeup events so we remote wakeup if cable is pulled */
- ret = smsc95xx_enable_phy_wakeup_interrupts(dev,
- PHY_INT_MASK_LINK_DOWN_);
- if (ret < 0) {
- netdev_warn(dev->net, "error enabling PHY wakeup ints\n");
- return ret;
- }
-
netdev_dbg(dev->net, "autosuspend entering SUSPEND3\n");
return smsc95xx_enter_suspend3(dev);
}
@@ -1494,9 +1511,12 @@ static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
u32 val, link_up;
int ret;

+ pdata->pm_task = current;
+
ret = usbnet_suspend(intf, message);
if (ret < 0) {
netdev_warn(dev->net, "usbnet_suspend error\n");
+ pdata->pm_task = NULL;
return ret;
}

@@ -1548,13 +1568,6 @@ static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
}

if (pdata->wolopts & WAKE_PHY) {
- ret = smsc95xx_enable_phy_wakeup_interrupts(dev,
- (PHY_INT_MASK_ANEG_COMP_ | PHY_INT_MASK_LINK_DOWN_));
- if (ret < 0) {
- netdev_warn(dev->net, "error enabling PHY wakeup ints\n");
- goto done;
- }
-
/* if link is down then configure EDPD and enter SUSPEND1,
* otherwise enter SUSPEND0 below
*/
@@ -1743,6 +1756,7 @@ static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
if (ret && PMSG_IS_AUTO(message))
usbnet_resume(intf);

+ pdata->pm_task = NULL;
return ret;
}

@@ -1763,45 +1777,53 @@ static int smsc95xx_resume(struct usb_interface *intf)
/* do this first to ensure it's cleared even in error case */
pdata->suspend_flags = 0;

+ pdata->pm_task = current;
+
if (suspend_flags & SUSPEND_ALLMODES) {
/* clear wake-up sources */
ret = smsc95xx_read_reg_nopm(dev, WUCSR, &val);
if (ret < 0)
- return ret;
+ goto done;

val &= ~(WUCSR_WAKE_EN_ | WUCSR_MPEN_);

ret = smsc95xx_write_reg_nopm(dev, WUCSR, val);
if (ret < 0)
- return ret;
+ goto done;

/* clear wake-up status */
ret = smsc95xx_read_reg_nopm(dev, PM_CTRL, &val);
if (ret < 0)
- return ret;
+ goto done;

val &= ~PM_CTL_WOL_EN_;
val |= PM_CTL_WUPS_;

ret = smsc95xx_write_reg_nopm(dev, PM_CTRL, val);
if (ret < 0)
- return ret;
+ goto done;
}

+ phy_init_hw(pdata->phydev);
+
ret = usbnet_resume(intf);
if (ret < 0)
netdev_warn(dev->net, "usbnet_resume error\n");

- phy_init_hw(pdata->phydev);
+done:
+ pdata->pm_task = NULL;
return ret;
}

static int smsc95xx_reset_resume(struct usb_interface *intf)
{
struct usbnet *dev = usb_get_intfdata(intf);
+ struct smsc95xx_priv *pdata = dev->driver_priv;
int ret;

+ pdata->pm_task = current;
ret = smsc95xx_reset(dev);
+ pdata->pm_task = NULL;
if (ret < 0)
return ret;

@@ -1997,7 +2019,6 @@ static const struct driver_info smsc95xx_info = {
.description = "smsc95xx USB 2.0 Ethernet",
.bind = smsc95xx_bind,
.unbind = smsc95xx_unbind,
- .link_reset = smsc95xx_link_reset,
.reset = smsc95xx_reset,
.check_connect = smsc95xx_start_phy,
.stop = smsc95xx_stop,
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 9b2bd09628a3..502e6abbc8e2 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -849,13 +849,11 @@ int usbnet_stop (struct net_device *net)

mpn = !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags);

- /* deferred work (task, timer, softirq) must also stop.
- * can't flush_scheduled_work() until we drop rtnl (later),
- * else workers could deadlock; so make workers a NOP.
- */
+ /* deferred work (timer, softirq, task) must also stop */
dev->flags = 0;
del_timer_sync (&dev->delay);
tasklet_kill (&dev->bh);
+ cancel_work_sync(&dev->kevent);
if (!pm)
usb_autopm_put_interface(dev->intf);

@@ -1619,8 +1617,6 @@ void usbnet_disconnect (struct usb_interface *intf)
net = dev->net;
unregister_netdev (net);

- cancel_work_sync(&dev->kevent);
-
usb_scuttle_anchored_urbs(&dev->deferred);

if (dev->driver_info->unbind)
diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c
index 9a4c8ff32d9d..5bf7822c53f1 100644
--- a/drivers/net/wireguard/allowedips.c
+++ b/drivers/net/wireguard/allowedips.c
@@ -6,6 +6,8 @@
#include "allowedips.h"
#include "peer.h"

+enum { MAX_ALLOWEDIPS_BITS = 128 };
+
static struct kmem_cache *node_cache;

static void swap_endian(u8 *dst, const u8 *src, u8 bits)
@@ -40,7 +42,8 @@ static void push_rcu(struct allowedips_node **stack,
struct allowedips_node __rcu *p, unsigned int *len)
{
if (rcu_access_pointer(p)) {
- WARN_ON(IS_ENABLED(DEBUG) && *len >= 128);
+ if (WARN_ON(IS_ENABLED(DEBUG) && *len >= MAX_ALLOWEDIPS_BITS))
+ return;
stack[(*len)++] = rcu_dereference_raw(p);
}
}
@@ -52,7 +55,7 @@ static void node_free_rcu(struct rcu_head *rcu)

static void root_free_rcu(struct rcu_head *rcu)
{
- struct allowedips_node *node, *stack[128] = {
+ struct allowedips_node *node, *stack[MAX_ALLOWEDIPS_BITS] = {
container_of(rcu, struct allowedips_node, rcu) };
unsigned int len = 1;

@@ -65,7 +68,7 @@ static void root_free_rcu(struct rcu_head *rcu)

static void root_remove_peer_lists(struct allowedips_node *root)
{
- struct allowedips_node *node, *stack[128] = { root };
+ struct allowedips_node *node, *stack[MAX_ALLOWEDIPS_BITS] = { root };
unsigned int len = 1;

while (len > 0 && (node = stack[--len])) {
diff --git a/drivers/net/wireguard/selftest/allowedips.c b/drivers/net/wireguard/selftest/allowedips.c
index e173204ae7d7..41db10f9be49 100644
--- a/drivers/net/wireguard/selftest/allowedips.c
+++ b/drivers/net/wireguard/selftest/allowedips.c
@@ -593,10 +593,10 @@ bool __init wg_allowedips_selftest(void)
wg_allowedips_remove_by_peer(&t, a, &mutex);
test_negative(4, a, 192, 168, 0, 1);

- /* These will hit the WARN_ON(len >= 128) in free_node if something
- * goes wrong.
+ /* These will hit the WARN_ON(len >= MAX_ALLOWEDIPS_BITS) in free_node
+ * if something goes wrong.
*/
- for (i = 0; i < 128; ++i) {
+ for (i = 0; i < MAX_ALLOWEDIPS_BITS; ++i) {
part = cpu_to_be64(~(1LLU << (i % 64)));
memset(&ip, 0xff, 16);
memcpy((u8 *)&ip + (i < 64) * 8, &part, 8);
diff --git a/drivers/net/wireguard/selftest/ratelimiter.c b/drivers/net/wireguard/selftest/ratelimiter.c
index 007cd4457c5f..ba87d294604f 100644
--- a/drivers/net/wireguard/selftest/ratelimiter.c
+++ b/drivers/net/wireguard/selftest/ratelimiter.c
@@ -6,28 +6,29 @@
#ifdef DEBUG

#include <linux/jiffies.h>
+#include <linux/hrtimer.h>

static const struct {
bool result;
- unsigned int msec_to_sleep_before;
+ u64 nsec_to_sleep_before;
} expected_results[] __initconst = {
[0 ... PACKETS_BURSTABLE - 1] = { true, 0 },
[PACKETS_BURSTABLE] = { false, 0 },
- [PACKETS_BURSTABLE + 1] = { true, MSEC_PER_SEC / PACKETS_PER_SECOND },
+ [PACKETS_BURSTABLE + 1] = { true, NSEC_PER_SEC / PACKETS_PER_SECOND },
[PACKETS_BURSTABLE + 2] = { false, 0 },
- [PACKETS_BURSTABLE + 3] = { true, (MSEC_PER_SEC / PACKETS_PER_SECOND) * 2 },
+ [PACKETS_BURSTABLE + 3] = { true, (NSEC_PER_SEC / PACKETS_PER_SECOND) * 2 },
[PACKETS_BURSTABLE + 4] = { true, 0 },
[PACKETS_BURSTABLE + 5] = { false, 0 }
};

static __init unsigned int maximum_jiffies_at_index(int index)
{
- unsigned int total_msecs = 2 * MSEC_PER_SEC / PACKETS_PER_SECOND / 3;
+ u64 total_nsecs = 2 * NSEC_PER_SEC / PACKETS_PER_SECOND / 3;
int i;

for (i = 0; i <= index; ++i)
- total_msecs += expected_results[i].msec_to_sleep_before;
- return msecs_to_jiffies(total_msecs);
+ total_nsecs += expected_results[i].nsec_to_sleep_before;
+ return nsecs_to_jiffies(total_nsecs);
}

static __init int timings_test(struct sk_buff *skb4, struct iphdr *hdr4,
@@ -42,8 +43,12 @@ static __init int timings_test(struct sk_buff *skb4, struct iphdr *hdr4,
loop_start_time = jiffies;

for (i = 0; i < ARRAY_SIZE(expected_results); ++i) {
- if (expected_results[i].msec_to_sleep_before)
- msleep(expected_results[i].msec_to_sleep_before);
+ if (expected_results[i].nsec_to_sleep_before) {
+ ktime_t timeout = ktime_add(ktime_add_ns(ktime_get_coarse_boottime(), TICK_NSEC * 4 / 3),
+ ns_to_ktime(expected_results[i].nsec_to_sleep_before));
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ schedule_hrtimeout_range_clock(&timeout, 0, HRTIMER_MODE_ABS, CLOCK_BOOTTIME);
+ }

if (time_is_before_jiffies(loop_start_time +
maximum_jiffies_at_index(i)))
@@ -127,7 +132,7 @@ bool __init wg_ratelimiter_selftest(void)
if (IS_ENABLED(CONFIG_KASAN) || IS_ENABLED(CONFIG_UBSAN))
return true;

- BUILD_BUG_ON(MSEC_PER_SEC % PACKETS_PER_SECOND != 0);
+ BUILD_BUG_ON(NSEC_PER_SEC % PACKETS_PER_SECOND != 0);

if (wg_ratelimiter_init())
goto out;
@@ -176,7 +181,6 @@ bool __init wg_ratelimiter_selftest(void)
test += test_count;
goto err;
}
- msleep(500);
continue;
} else if (ret < 0) {
test += test_count;
@@ -195,7 +199,6 @@ bool __init wg_ratelimiter_selftest(void)
test += test_count;
goto err;
}
- msleep(50);
continue;
}
test += test_count;
diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
index 8328966a0471..603dc7bebc0c 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.c
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -1249,13 +1249,12 @@ static void ath10k_snoc_init_napi(struct ath10k *ar)
static int ath10k_snoc_request_irq(struct ath10k *ar)
{
struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
- int irqflags = IRQF_TRIGGER_RISING;
int ret, id;

for (id = 0; id < CE_COUNT_MAX; id++) {
ret = request_irq(ar_snoc->ce_irqs[id].irq_line,
- ath10k_snoc_per_engine_handler,
- irqflags, ce_name[id], ar);
+ ath10k_snoc_per_engine_handler, 0,
+ ce_name[id], ar);
if (ret) {
ath10k_err(ar,
"failed to register IRQ handler for CE %d: %d\n",
diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c
index 90a5df1fbdbd..0106e529f3c5 100644
--- a/drivers/net/wireless/ath/ath11k/core.c
+++ b/drivers/net/wireless/ath/ath11k/core.c
@@ -903,23 +903,23 @@ static int ath11k_core_pdev_create(struct ath11k_base *ab)
return ret;
}

- ret = ath11k_mac_register(ab);
+ ret = ath11k_dp_pdev_alloc(ab);
if (ret) {
- ath11k_err(ab, "failed register the radio with mac80211: %d\n", ret);
+ ath11k_err(ab, "failed to attach DP pdev: %d\n", ret);
goto err_pdev_debug;
}

- ret = ath11k_dp_pdev_alloc(ab);
+ ret = ath11k_mac_register(ab);
if (ret) {
- ath11k_err(ab, "failed to attach DP pdev: %d\n", ret);
- goto err_mac_unregister;
+ ath11k_err(ab, "failed register the radio with mac80211: %d\n", ret);
+ goto err_dp_pdev_free;
}

ret = ath11k_thermal_register(ab);
if (ret) {
ath11k_err(ab, "could not register thermal device: %d\n",
ret);
- goto err_dp_pdev_free;
+ goto err_mac_unregister;
}

ret = ath11k_spectral_init(ab);
@@ -932,10 +932,10 @@ static int ath11k_core_pdev_create(struct ath11k_base *ab)

err_thermal_unregister:
ath11k_thermal_unregister(ab);
-err_dp_pdev_free:
- ath11k_dp_pdev_free(ab);
err_mac_unregister:
ath11k_mac_unregister(ab);
+err_dp_pdev_free:
+ ath11k_dp_pdev_free(ab);
err_pdev_debug:
ath11k_debugfs_pdev_destroy(ab);

diff --git a/drivers/net/wireless/ath/ath11k/debug.h b/drivers/net/wireless/ath/ath11k/debug.h
index fbbd5fe02aa8..91545640c47b 100644
--- a/drivers/net/wireless/ath/ath11k/debug.h
+++ b/drivers/net/wireless/ath/ath11k/debug.h
@@ -23,8 +23,8 @@ enum ath11k_debug_mask {
ATH11K_DBG_TESTMODE = 0x00000400,
ATH11k_DBG_HAL = 0x00000800,
ATH11K_DBG_PCI = 0x00001000,
- ATH11K_DBG_DP_TX = 0x00001000,
- ATH11K_DBG_DP_RX = 0x00002000,
+ ATH11K_DBG_DP_TX = 0x00002000,
+ ATH11K_DBG_DP_RX = 0x00004000,
ATH11K_DBG_ANY = 0xffffffff,
};

diff --git a/drivers/net/wireless/ath/ath11k/dp_rx.c b/drivers/net/wireless/ath/ath11k/dp_rx.c
index 049774cc158c..b3e133add1ce 100644
--- a/drivers/net/wireless/ath/ath11k/dp_rx.c
+++ b/drivers/net/wireless/ath/ath11k/dp_rx.c
@@ -835,8 +835,9 @@ void ath11k_peer_rx_tid_delete(struct ath11k *ar,
HAL_REO_CMD_UPDATE_RX_QUEUE, &cmd,
ath11k_dp_rx_tid_del_func);
if (ret) {
- ath11k_err(ar->ab, "failed to send HAL_REO_CMD_UPDATE_RX_QUEUE cmd, tid %d (%d)\n",
- tid, ret);
+ if (ret != -ESHUTDOWN)
+ ath11k_err(ar->ab, "failed to send HAL_REO_CMD_UPDATE_RX_QUEUE cmd, tid %d (%d)\n",
+ tid, ret);
dma_unmap_single(ar->ab->dev, rx_tid->paddr, rx_tid->size,
DMA_BIDIRECTIONAL);
kfree(rx_tid->vaddr);
diff --git a/drivers/net/wireless/ath/ath11k/htc.c b/drivers/net/wireless/ath/ath11k/htc.c
index 6913b7494b9b..2de1e953a539 100644
--- a/drivers/net/wireless/ath/ath11k/htc.c
+++ b/drivers/net/wireless/ath/ath11k/htc.c
@@ -258,8 +258,10 @@ void ath11k_htc_tx_completion_handler(struct ath11k_base *ab,
u8 eid;

eid = ATH11K_SKB_CB(skb)->eid;
- if (eid >= ATH11K_HTC_EP_COUNT)
+ if (eid >= ATH11K_HTC_EP_COUNT) {
+ dev_kfree_skb_any(skb);
return;
+ }

ep = &htc->endpoint[eid];
spin_lock_bh(&htc->tx_lock);
diff --git a/drivers/net/wireless/ath/ath11k/pci.c b/drivers/net/wireless/ath/ath11k/pci.c
index 8a3ff12057e8..e2382c8595b6 100644
--- a/drivers/net/wireless/ath/ath11k/pci.c
+++ b/drivers/net/wireless/ath/ath11k/pci.c
@@ -1566,7 +1566,9 @@ static void ath11k_pci_remove(struct pci_dev *pdev)
static void ath11k_pci_shutdown(struct pci_dev *pdev)
{
struct ath11k_base *ab = pci_get_drvdata(pdev);
+ struct ath11k_pci *ab_pci = ath11k_pci_priv(ab);

+ ath11k_pci_set_irq_affinity_hint(ab_pci, NULL);
ath11k_pci_power_down(ab);
}

diff --git a/drivers/net/wireless/ath/ath9k/htc.h b/drivers/net/wireless/ath/ath9k/htc.h
index 6b45e63fae4b..e3d546ef71dd 100644
--- a/drivers/net/wireless/ath/ath9k/htc.h
+++ b/drivers/net/wireless/ath/ath9k/htc.h
@@ -327,11 +327,11 @@ static inline struct ath9k_htc_tx_ctl *HTC_SKB_CB(struct sk_buff *skb)
}

#ifdef CONFIG_ATH9K_HTC_DEBUGFS
-
-#define TX_STAT_INC(c) (hif_dev->htc_handle->drv_priv->debug.tx_stats.c++)
-#define TX_STAT_ADD(c, a) (hif_dev->htc_handle->drv_priv->debug.tx_stats.c += a)
-#define RX_STAT_INC(c) (hif_dev->htc_handle->drv_priv->debug.skbrx_stats.c++)
-#define RX_STAT_ADD(c, a) (hif_dev->htc_handle->drv_priv->debug.skbrx_stats.c += a)
+#define __STAT_SAFE(expr) (hif_dev->htc_handle->drv_priv ? (expr) : 0)
+#define TX_STAT_INC(c) __STAT_SAFE(hif_dev->htc_handle->drv_priv->debug.tx_stats.c++)
+#define TX_STAT_ADD(c, a) __STAT_SAFE(hif_dev->htc_handle->drv_priv->debug.tx_stats.c += a)
+#define RX_STAT_INC(c) __STAT_SAFE(hif_dev->htc_handle->drv_priv->debug.skbrx_stats.c++)
+#define RX_STAT_ADD(c, a) __STAT_SAFE(hif_dev->htc_handle->drv_priv->debug.skbrx_stats.c += a)
#define CAB_STAT_INC priv->debug.tx_stats.cab_queued++

#define TX_QSTAT_INC(q) (priv->debug.tx_stats.queue_stats[q]++)
diff --git a/drivers/net/wireless/ath/ath9k/htc_drv_init.c b/drivers/net/wireless/ath/ath9k/htc_drv_init.c
index ff61ae34ecdf..07ac88fb1c57 100644
--- a/drivers/net/wireless/ath/ath9k/htc_drv_init.c
+++ b/drivers/net/wireless/ath/ath9k/htc_drv_init.c
@@ -944,7 +944,6 @@ int ath9k_htc_probe_device(struct htc_target *htc_handle, struct device *dev,
priv->hw = hw;
priv->htc = htc_handle;
priv->dev = dev;
- htc_handle->drv_priv = priv;
SET_IEEE80211_DEV(hw, priv->dev);

ret = ath9k_htc_wait_for_target(priv);
@@ -965,6 +964,8 @@ int ath9k_htc_probe_device(struct htc_target *htc_handle, struct device *dev,
if (ret)
goto err_init;

+ htc_handle->drv_priv = priv;
+
return 0;

err_init:
diff --git a/drivers/net/wireless/ath/wil6210/debugfs.c b/drivers/net/wireless/ath/wil6210/debugfs.c
index 4c944e595978..ac7787e1a7f6 100644
--- a/drivers/net/wireless/ath/wil6210/debugfs.c
+++ b/drivers/net/wireless/ath/wil6210/debugfs.c
@@ -1010,20 +1010,14 @@ static ssize_t wil_write_file_wmi(struct file *file, const char __user *buf,
void *cmd;
int cmdlen = len - sizeof(struct wmi_cmd_hdr);
u16 cmdid;
- int rc, rc1;
+ int rc1;

- if (cmdlen < 0)
+ if (cmdlen < 0 || *ppos != 0)
return -EINVAL;

- wmi = kmalloc(len, GFP_KERNEL);
- if (!wmi)
- return -ENOMEM;
-
- rc = simple_write_to_buffer(wmi, len, ppos, buf, len);
- if (rc < 0) {
- kfree(wmi);
- return rc;
- }
+ wmi = memdup_user(buf, len);
+ if (IS_ERR(wmi))
+ return PTR_ERR(wmi);

cmd = (cmdlen > 0) ? &wmi[1] : NULL;
cmdid = le16_to_cpu(wmi->command_id);
@@ -1033,7 +1027,7 @@ static ssize_t wil_write_file_wmi(struct file *file, const char __user *buf,

wil_info(wil, "0x%04x[%d] -> %d\n", cmdid, cmdlen, rc1);

- return rc;
+ return len;
}

static const struct file_operations fops_wmi = {
diff --git a/drivers/net/wireless/intel/iwlegacy/4965-rs.c b/drivers/net/wireless/intel/iwlegacy/4965-rs.c
index 9a491e5db75b..532e3b91777d 100644
--- a/drivers/net/wireless/intel/iwlegacy/4965-rs.c
+++ b/drivers/net/wireless/intel/iwlegacy/4965-rs.c
@@ -2403,7 +2403,7 @@ il4965_rs_fill_link_cmd(struct il_priv *il, struct il_lq_sta *lq_sta,
/* Repeat initial/next rate.
* For legacy IL_NUMBER_TRY == 1, this loop will not execute.
* For HT IL_HT_NUMBER_TRY == 3, this executes twice. */
- while (repeat_rate > 0 && idx < LINK_QUAL_MAX_RETRY_NUM) {
+ while (repeat_rate > 0) {
if (is_legacy(tbl_type.lq_type)) {
if (ant_toggle_cnt < NUM_TRY_BEFORE_ANT_TOGGLE)
ant_toggle_cnt++;
@@ -2422,6 +2422,8 @@ il4965_rs_fill_link_cmd(struct il_priv *il, struct il_lq_sta *lq_sta,
cpu_to_le32(new_rate);
repeat_rate--;
idx++;
+ if (idx >= LINK_QUAL_MAX_RETRY_NUM)
+ goto out;
}

il4965_rs_get_tbl_info_from_mcs(new_rate, lq_sta->band,
@@ -2466,6 +2468,7 @@ il4965_rs_fill_link_cmd(struct il_priv *il, struct il_lq_sta *lq_sta,
repeat_rate--;
}

+out:
lq_cmd->agg_params.agg_frame_cnt_limit = LINK_QUAL_AGG_FRAME_LIMIT_DEF;
lq_cmd->agg_params.agg_dis_start_th = LINK_QUAL_AGG_DISABLE_START_DEF;

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/sta.c b/drivers/net/wireless/intel/iwlwifi/mvm/sta.c
index c7f9d3870f21..8a38d1bfe9b3 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/sta.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/sta.c
@@ -1862,6 +1862,7 @@ static void iwl_mvm_disable_sta_queues(struct iwl_mvm *mvm,
iwl_mvm_txq_from_mac80211(sta->txq[i]);

mvmtxq->txq_id = IWL_MVM_INVALID_QUEUE;
+ list_del_init(&mvmtxq->list);
}
}

diff --git a/drivers/net/wireless/intersil/p54/main.c b/drivers/net/wireless/intersil/p54/main.c
index a3ca6620dc0c..8fa3ec71603e 100644
--- a/drivers/net/wireless/intersil/p54/main.c
+++ b/drivers/net/wireless/intersil/p54/main.c
@@ -682,7 +682,7 @@ static void p54_flush(struct ieee80211_hw *dev, struct ieee80211_vif *vif,
* queues have already been stopped and no new frames can sneak
* up from behind.
*/
- while ((total = p54_flush_count(priv) && i--)) {
+ while ((total = p54_flush_count(priv)) && i--) {
/* waste time */
msleep(20);
}
diff --git a/drivers/net/wireless/intersil/p54/p54spi.c b/drivers/net/wireless/intersil/p54/p54spi.c
index f99b7ba69fc3..19152fd449ba 100644
--- a/drivers/net/wireless/intersil/p54/p54spi.c
+++ b/drivers/net/wireless/intersil/p54/p54spi.c
@@ -164,7 +164,7 @@ static int p54spi_request_firmware(struct ieee80211_hw *dev)

ret = p54_parse_firmware(dev, priv->firmware);
if (ret) {
- release_firmware(priv->firmware);
+ /* the firmware is released by the caller */
return ret;
}

@@ -659,6 +659,7 @@ static int p54spi_probe(struct spi_device *spi)
return 0;

err_free_common:
+ release_firmware(priv->firmware);
free_irq(gpio_to_irq(p54spi_gpio_irq), spi);
err_free_gpio_irq:
gpio_free(p54spi_gpio_irq);
diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c
index e9ec63e0e395..1f39f2ed633d 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -680,7 +680,7 @@ struct mac80211_hwsim_data {
bool ps_poll_pending;
struct dentry *debugfs;

- uintptr_t pending_cookie;
+ atomic_t pending_cookie;
struct sk_buff_head pending; /* packets pending */
/*
* Only radios in the same group can communicate together (the
@@ -1416,8 +1416,7 @@ static void mac80211_hwsim_tx_frame_nl(struct ieee80211_hw *hw,
goto nla_put_failure;

/* We create a cookie to identify this skb */
- data->pending_cookie++;
- cookie = data->pending_cookie;
+ cookie = atomic_inc_return(&data->pending_cookie);
info->rate_driver_data[0] = (void *)cookie;
if (nla_put_u64_64bit(skb, HWSIM_ATTR_COOKIE, cookie, HWSIM_ATTR_PAD))
goto nla_put_failure;
@@ -4080,6 +4079,7 @@ static int hwsim_tx_info_frame_received_nl(struct sk_buff *skb_2,
const u8 *src;
unsigned int hwsim_flags;
int i;
+ unsigned long flags;
bool found = false;

if (!info->attrs[HWSIM_ATTR_ADDR_TRANSMITTER] ||
@@ -4107,18 +4107,20 @@ static int hwsim_tx_info_frame_received_nl(struct sk_buff *skb_2,
}

/* look for the skb matching the cookie passed back from user */
+ spin_lock_irqsave(&data2->pending.lock, flags);
skb_queue_walk_safe(&data2->pending, skb, tmp) {
- u64 skb_cookie;
+ uintptr_t skb_cookie;

txi = IEEE80211_SKB_CB(skb);
- skb_cookie = (u64)(uintptr_t)txi->rate_driver_data[0];
+ skb_cookie = (uintptr_t)txi->rate_driver_data[0];

if (skb_cookie == ret_skb_cookie) {
- skb_unlink(skb, &data2->pending);
+ __skb_unlink(skb, &data2->pending);
found = true;
break;
}
}
+ spin_unlock_irqrestore(&data2->pending.lock, flags);

/* not found */
if (!found)
diff --git a/drivers/net/wireless/marvell/libertas/if_usb.c b/drivers/net/wireless/marvell/libertas/if_usb.c
index 5d6dc1dd050d..32fdc4150b60 100644
--- a/drivers/net/wireless/marvell/libertas/if_usb.c
+++ b/drivers/net/wireless/marvell/libertas/if_usb.c
@@ -287,6 +287,7 @@ static int if_usb_probe(struct usb_interface *intf,
return 0;

err_get_fw:
+ usb_put_dev(udev);
lbs_remove_card(priv);
err_add_card:
if_usb_reset_device(cardp);
diff --git a/drivers/net/wireless/mediatek/mt76/eeprom.c b/drivers/net/wireless/mediatek/mt76/eeprom.c
index a499861918fa..9bc8758573fc 100644
--- a/drivers/net/wireless/mediatek/mt76/eeprom.c
+++ b/drivers/net/wireless/mediatek/mt76/eeprom.c
@@ -162,10 +162,13 @@ mt76_find_power_limits_node(struct mt76_dev *dev)
}

if (mt76_string_prop_find(country, dev->alpha2) ||
- mt76_string_prop_find(regd, region_name))
+ mt76_string_prop_find(regd, region_name)) {
+ of_node_put(np);
return cur;
+ }
}

+ of_node_put(np);
return fallback;
}

diff --git a/drivers/net/wireless/mediatek/mt76/mac80211.c b/drivers/net/wireless/mediatek/mt76/mac80211.c
index 8a2fedbb1451..2cd7b3f1db64 100644
--- a/drivers/net/wireless/mediatek/mt76/mac80211.c
+++ b/drivers/net/wireless/mediatek/mt76/mac80211.c
@@ -210,6 +210,7 @@ static int mt76_led_init(struct mt76_dev *dev)
if (!of_property_read_u32(np, "led-sources", &led_pin))
dev->led_pin = led_pin;
dev->led_al = of_property_read_bool(np, "led-active-low");
+ of_node_put(np);
}

return led_classdev_register(dev->dev, &dev->led_cdev);
diff --git a/drivers/net/wireless/mediatek/mt76/mt7615/mac.c b/drivers/net/wireless/mediatek/mt76/mt7615/mac.c
index bd687f7de628..9e832b27170f 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7615/mac.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7615/mac.c
@@ -2282,6 +2282,7 @@ mt7615_dfs_init_radar_specs(struct mt7615_phy *phy)

int mt7615_dfs_init_radar_detector(struct mt7615_phy *phy)
{
+ struct cfg80211_chan_def *chandef = &phy->mt76->chandef;
struct mt7615_dev *dev = phy->dev;
bool ext_phy = phy != &dev->phy;
enum mt76_dfs_state dfs_state, prev_state;
@@ -2292,13 +2293,13 @@ int mt7615_dfs_init_radar_detector(struct mt7615_phy *phy)

prev_state = phy->mt76->dfs_state;
dfs_state = mt76_phy_dfs_state(phy->mt76);
+ if ((chandef->chan->flags & IEEE80211_CHAN_RADAR) &&
+ dfs_state < MT_DFS_STATE_CAC)
+ dfs_state = MT_DFS_STATE_ACTIVE;

if (prev_state == dfs_state)
return 0;

- if (prev_state == MT_DFS_STATE_UNKNOWN)
- mt7615_dfs_stop_radar_detector(phy);
-
if (dfs_state == MT_DFS_STATE_DISABLED)
goto stop;

diff --git a/drivers/net/wireless/mediatek/mt76/mt7615/main.c b/drivers/net/wireless/mediatek/mt76/mt7615/main.c
index 6b8e3e7ae4a2..36990637a8a2 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7615/main.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7615/main.c
@@ -282,26 +282,6 @@ static void mt7615_remove_interface(struct ieee80211_hw *hw,
mt76_packet_id_flush(&dev->mt76, &mvif->sta.wcid);
}

-static void mt7615_init_dfs_state(struct mt7615_phy *phy)
-{
- struct mt76_phy *mphy = phy->mt76;
- struct ieee80211_hw *hw = mphy->hw;
- struct cfg80211_chan_def *chandef = &hw->conf.chandef;
-
- if (hw->conf.flags & IEEE80211_CONF_OFFCHANNEL)
- return;
-
- if (!(chandef->chan->flags & IEEE80211_CHAN_RADAR) &&
- !(mphy->chandef.chan->flags & IEEE80211_CHAN_RADAR))
- return;
-
- if (mphy->chandef.chan->center_freq == chandef->chan->center_freq &&
- mphy->chandef.width == chandef->width)
- return;
-
- phy->dfs_state = -1;
-}
-
int mt7615_set_channel(struct mt7615_phy *phy)
{
struct mt7615_dev *dev = phy->dev;
@@ -314,7 +294,6 @@ int mt7615_set_channel(struct mt7615_phy *phy)

set_bit(MT76_RESET, &phy->mt76->state);

- mt7615_init_dfs_state(phy);
mt76_set_channel(phy->mt76);

if (is_mt7615(&dev->mt76) && dev->flash_eeprom) {
diff --git a/drivers/net/wireless/mediatek/mt76/mt7615/mcu.c b/drivers/net/wireless/mediatek/mt76/mt7615/mcu.c
index 97e2a85cb728..7127d6007ae0 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7615/mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7615/mcu.c
@@ -350,10 +350,11 @@ static int mt7615_mcu_fw_pmctrl(struct mt7615_dev *dev)
}

mt7622_trigger_hif_int(dev, false);
-
- pm->stats.last_doze_event = jiffies;
- pm->stats.awake_time += pm->stats.last_doze_event -
- pm->stats.last_wake_event;
+ if (!err) {
+ pm->stats.last_doze_event = jiffies;
+ pm->stats.awake_time += pm->stats.last_doze_event -
+ pm->stats.last_wake_event;
+ }
out:
mutex_unlock(&pm->mutex);

@@ -402,6 +403,9 @@ mt7615_mcu_rx_radar_detected(struct mt7615_dev *dev, struct sk_buff *skb)
if (r->band_idx && dev->mt76.phy2)
mphy = dev->mt76.phy2;

+ if (mt76_phy_dfs_state(mphy) < MT_DFS_STATE_CAC)
+ return;
+
ieee80211_radar_detected(mphy->hw);
dev->hw_pattern++;
}
diff --git a/drivers/net/wireless/mediatek/mt76/mt7615/mt7615.h b/drivers/net/wireless/mediatek/mt76/mt7615/mt7615.h
index 2e91f6a27d0f..082c73b571ae 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7615/mt7615.h
+++ b/drivers/net/wireless/mediatek/mt76/mt7615/mt7615.h
@@ -177,7 +177,6 @@ struct mt7615_phy {

u8 chfreq;
u8 rdd_state;
- int dfs_state;

u32 rx_ampdu_ts;
u32 ampdu_ref;
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_usb_mcu.c b/drivers/net/wireless/mediatek/mt76/mt76x02_usb_mcu.c
index 2953df7d8388..c6c16fe8ee85 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x02_usb_mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x02_usb_mcu.c
@@ -108,7 +108,7 @@ __mt76x02u_mcu_send_msg(struct mt76_dev *dev, struct sk_buff *skb,
ret = mt76u_bulk_msg(dev, skb->data, skb->len, NULL, 500,
MT_EP_OUT_INBAND_CMD);
if (ret)
- return ret;
+ goto out;

if (wait_resp)
ret = mt76x02u_mcu_wait_resp(dev, seq);
diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/init.c b/drivers/net/wireless/mediatek/mt76/mt7921/init.c
index 91fc41922d95..37453e1c136f 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/init.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/init.c
@@ -49,8 +49,8 @@ mt7921_init_wiphy(struct ieee80211_hw *hw)
struct wiphy *wiphy = hw->wiphy;

hw->queues = 4;
- hw->max_rx_aggregation_subframes = 64;
- hw->max_tx_aggregation_subframes = 128;
+ hw->max_rx_aggregation_subframes = IEEE80211_MAX_AMPDU_BUF_HE;
+ hw->max_tx_aggregation_subframes = IEEE80211_MAX_AMPDU_BUF_HE;
hw->netdev_features = NETIF_F_RXCSUM;

hw->radiotap_timestamp.units_pos =
@@ -291,7 +291,7 @@ int mt7921_register_device(struct mt7921_dev *dev)
IEEE80211_HT_CAP_LDPC_CODING |
IEEE80211_HT_CAP_MAX_AMSDU;
dev->mphy.sband_5g.sband.vht_cap.cap |=
- IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_7991 |
+ IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_11454 |
IEEE80211_VHT_CAP_MAX_A_MPDU_LENGTH_EXPONENT_MASK |
IEEE80211_VHT_CAP_SU_BEAMFORMEE_CAPABLE |
IEEE80211_VHT_CAP_MU_BEAMFORMEE_CAPABLE |
diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/mcu.c b/drivers/net/wireless/mediatek/mt76/mt7921/mcu.c
index da2be050ed7c..2a609b25561c 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/mcu.c
@@ -538,13 +538,6 @@ static int mt7921_load_patch(struct mt7921_dev *dev)
if (ret)
dev_err(dev->mt76.dev, "Failed to start patch\n");

- if (mt76_is_sdio(&dev->mt76)) {
- /* activate again */
- ret = __mt7921_mcu_fw_pmctrl(dev);
- if (!ret)
- ret = __mt7921_mcu_drv_pmctrl(dev);
- }
-
out:
sem = mt76_connac_mcu_patch_sem_ctrl(&dev->mt76, false);
switch (sem) {
@@ -555,6 +548,14 @@ static int mt7921_load_patch(struct mt7921_dev *dev)
dev_err(dev->mt76.dev, "Failed to release patch semaphore\n");
break;
}
+
+ if (!ret && mt76_is_sdio(&dev->mt76)) {
+ /* activate again */
+ ret = __mt7921_mcu_fw_pmctrl(dev);
+ if (!ret)
+ ret = __mt7921_mcu_drv_pmctrl(dev);
+ }
+
release_firmware(fw);

return ret;
diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/pci_mcu.c b/drivers/net/wireless/mediatek/mt76/mt7921/pci_mcu.c
index 36669e5aeef3..a1ab5f878f81 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/pci_mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/pci_mcu.c
@@ -102,7 +102,7 @@ int mt7921e_mcu_fw_pmctrl(struct mt7921_dev *dev)
{
struct mt76_phy *mphy = &dev->mt76.phy;
struct mt76_connac_pm *pm = &dev->pm;
- int i, err = 0;
+ int i;

for (i = 0; i < MT7921_DRV_OWN_RETRY_COUNT; i++) {
mt76_wr(dev, MT_CONN_ON_LPCTL, PCIE_LPCR_HOST_SET_OWN);
@@ -114,12 +114,12 @@ int mt7921e_mcu_fw_pmctrl(struct mt7921_dev *dev)
if (i == MT7921_DRV_OWN_RETRY_COUNT) {
dev_err(dev->mt76.dev, "firmware own failed\n");
clear_bit(MT76_STATE_PM, &mphy->state);
- err = -EIO;
+ return -EIO;
}

pm->stats.last_doze_event = jiffies;
pm->stats.awake_time += pm->stats.last_doze_event -
pm->stats.last_wake_event;

- return err;
+ return 0;
}
diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/sdio_mcu.c b/drivers/net/wireless/mediatek/mt76/mt7921/sdio_mcu.c
index 54a5c712a3c3..c572a3107b8b 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/sdio_mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/sdio_mcu.c
@@ -136,8 +136,8 @@ int mt7921s_mcu_fw_pmctrl(struct mt7921_dev *dev)
struct sdio_func *func = dev->mt76.sdio.func;
struct mt76_phy *mphy = &dev->mt76.phy;
struct mt76_connac_pm *pm = &dev->pm;
- int err = 0;
u32 status;
+ int err;

sdio_claim_host(func);

@@ -148,7 +148,7 @@ int mt7921s_mcu_fw_pmctrl(struct mt7921_dev *dev)
2000, 1000000);
if (err < 0) {
dev_err(dev->mt76.dev, "mailbox ACK not cleared\n");
- goto err;
+ goto out;
}
}

@@ -156,18 +156,18 @@ int mt7921s_mcu_fw_pmctrl(struct mt7921_dev *dev)

err = readx_poll_timeout(mt76s_read_pcr, &dev->mt76, status,
!(status & WHLPCR_IS_DRIVER_OWN), 2000, 1000000);
+out:
sdio_release_host(func);

-err:
if (err < 0) {
dev_err(dev->mt76.dev, "firmware own failed\n");
clear_bit(MT76_STATE_PM, &mphy->state);
- err = -EIO;
+ return -EIO;
}

pm->stats.last_doze_event = jiffies;
pm->stats.awake_time += pm->stats.last_doze_event -
pm->stats.last_wake_event;

- return err;
+ return 0;
}
diff --git a/drivers/net/wireless/microchip/wilc1000/spi.c b/drivers/net/wireless/microchip/wilc1000/spi.c
index 18420e954402..2ae8dd3411ac 100644
--- a/drivers/net/wireless/microchip/wilc1000/spi.c
+++ b/drivers/net/wireless/microchip/wilc1000/spi.c
@@ -191,11 +191,11 @@ static void wilc_wlan_power(struct wilc *wilc, bool on)
/* assert ENABLE: */
gpiod_set_value(gpios->enable, 1);
mdelay(5);
- /* deassert RESET: */
- gpiod_set_value(gpios->reset, 0);
- } else {
/* assert RESET: */
gpiod_set_value(gpios->reset, 1);
+ } else {
+ /* deassert RESET: */
+ gpiod_set_value(gpios->reset, 0);
/* deassert ENABLE: */
gpiod_set_value(gpios->enable, 0);
}
diff --git a/drivers/net/wireless/realtek/rtlwifi/debug.c b/drivers/net/wireless/realtek/rtlwifi/debug.c
index 901cdfe3723c..0b1bc04cb6ad 100644
--- a/drivers/net/wireless/realtek/rtlwifi/debug.c
+++ b/drivers/net/wireless/realtek/rtlwifi/debug.c
@@ -329,8 +329,8 @@ static ssize_t rtl_debugfs_set_write_h2c(struct file *filp,

tmp_len = (count > sizeof(tmp) - 1 ? sizeof(tmp) - 1 : count);

- if (!buffer || copy_from_user(tmp, buffer, tmp_len))
- return count;
+ if (copy_from_user(tmp, buffer, tmp_len))
+ return -EFAULT;

tmp[tmp_len] = '\0';

@@ -340,8 +340,8 @@ static ssize_t rtl_debugfs_set_write_h2c(struct file *filp,
&h2c_data[4], &h2c_data[5],
&h2c_data[6], &h2c_data[7]);

- if (h2c_len <= 0)
- return count;
+ if (h2c_len == 0)
+ return -EINVAL;

for (i = 0; i < h2c_len; i++)
h2c_data_packed[i] = (u8)h2c_data[i];
diff --git a/drivers/net/wireless/realtek/rtw88/main.c b/drivers/net/wireless/realtek/rtw88/main.c
index 8b9899e41b0b..ded952913ae6 100644
--- a/drivers/net/wireless/realtek/rtw88/main.c
+++ b/drivers/net/wireless/realtek/rtw88/main.c
@@ -1974,6 +1974,10 @@ int rtw_core_init(struct rtw_dev *rtwdev)
timer_setup(&rtwdev->tx_report.purge_timer,
rtw_tx_report_purge_timer, 0);
rtwdev->tx_wq = alloc_workqueue("rtw_tx_wq", WQ_UNBOUND | WQ_HIGHPRI, 0);
+ if (!rtwdev->tx_wq) {
+ rtw_warn(rtwdev, "alloc_workqueue rtw_tx_wq failed\n");
+ return -ENOMEM;
+ }

INIT_DELAYED_WORK(&rtwdev->watch_dog_work, rtw_watch_dog_work);
INIT_DELAYED_WORK(&coex->bt_relink_work, rtw_coex_bt_relink_work);
diff --git a/drivers/net/wireless/realtek/rtw89/rtw8852a_rfk.c b/drivers/net/wireless/realtek/rtw89/rtw8852a_rfk.c
index ad272854c442..8e4869eb3a24 100644
--- a/drivers/net/wireless/realtek/rtw89/rtw8852a_rfk.c
+++ b/drivers/net/wireless/realtek/rtw89/rtw8852a_rfk.c
@@ -2330,8 +2330,8 @@ static u8 _dpk_pas_read(struct rtw89_dev *rtwdev, bool is_check)
val2_q = abs(sign_extend32(val2_q, 11));

rtw89_debug(rtwdev, RTW89_DBG_RFK, "[DPK] PAS_delta = 0x%x\n",
- (val1_i * val1_i + val1_q * val1_q) /
- (val2_i * val2_i + val2_q * val2_q));
+ phy_div(val1_i * val1_i + val1_q * val1_q,
+ val2_i * val2_i + val2_q * val2_q));

} else {
for (i = 0; i < 32; i++) {
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c9831daafbc6..a58a69999dbc 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1897,8 +1897,10 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_id_ns *id)

if (ns->head->ids.csi == NVME_CSI_ZNS) {
ret = nvme_update_zone_info(ns, lbaf);
- if (ret)
- goto out_unfreeze;
+ if (ret) {
+ blk_mq_unfreeze_queue(ns->disk->queue);
+ goto out;
+ }
}

set_disk_ro(ns->disk, (id->nsattr & NVME_NS_ATTR_RO) ||
@@ -1909,7 +1911,7 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_id_ns *id)
if (blk_queue_is_zoned(ns->queue)) {
ret = nvme_revalidate_zones(ns);
if (ret && !nvme_first_scan(ns->disk))
- return ret;
+ goto out;
}

if (nvme_ns_head_multipath(ns->head)) {
@@ -1924,9 +1926,9 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_id_ns *id)
disk_update_readahead(ns->head->disk);
blk_mq_unfreeze_queue(ns->head->disk->queue);
}
- return 0;

-out_unfreeze:
+ ret = 0;
+out:
/*
* If probing fails due an unsupported feature, hide the block device,
* but still allow other access.
@@ -1936,7 +1938,6 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_id_ns *id)
set_bit(NVME_NS_READY, &ns->flags);
ret = 0;
}
- blk_mq_unfreeze_queue(ns->disk->queue);
return ret;
}

@@ -2093,6 +2094,7 @@ static int nvme_report_zones(struct gendisk *disk, sector_t sector,
static const struct block_device_operations nvme_bdev_ops = {
.owner = THIS_MODULE,
.ioctl = nvme_ioctl,
+ .compat_ioctl = blkdev_compat_ptr_ioctl,
.open = nvme_open,
.release = nvme_release,
.getgeo = nvme_getgeo,
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 080f85f4105f..05f9da251758 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1899,6 +1899,24 @@ nvme_fc_ctrl_ioerr_work(struct work_struct *work)
nvme_fc_error_recovery(ctrl, "transport detected io error");
}

+/*
+ * nvme_fc_io_getuuid - Routine called to get the appid field
+ * associated with request by the lldd
+ * @req:IO request from nvme fc to driver
+ * Returns: UUID if there is an appid associated with VM or
+ * NULL if the user/libvirt has not set the appid to VM
+ */
+char *nvme_fc_io_getuuid(struct nvmefc_fcp_req *req)
+{
+ struct nvme_fc_fcp_op *op = fcp_req_to_fcp_op(req);
+ struct request *rq = op->rq;
+
+ if (!IS_ENABLED(CONFIG_BLK_CGROUP_FC_APPID) || !rq->bio)
+ return NULL;
+ return blkcg_get_fc_appid(rq->bio);
+}
+EXPORT_SYMBOL_GPL(nvme_fc_io_getuuid);
+
static void
nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
{
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index d464fdf978fb..b0fe23439c4a 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -408,6 +408,7 @@ const struct block_device_operations nvme_ns_head_ops = {
.open = nvme_ns_head_open,
.release = nvme_ns_head_release,
.ioctl = nvme_ns_head_ioctl,
+ .compat_ioctl = blkdev_compat_ptr_ioctl,
.getgeo = nvme_getgeo,
.report_zones = nvme_ns_head_report_zones,
.pr_ops = &nvme_pr_ops,
diff --git a/drivers/nvme/host/trace.h b/drivers/nvme/host/trace.h
index 37c7f4c89f92..6f0eaf6a1528 100644
--- a/drivers/nvme/host/trace.h
+++ b/drivers/nvme/host/trace.h
@@ -98,7 +98,7 @@ TRACE_EVENT(nvme_complete_rq,
TP_fast_assign(
__entry->ctrl_id = nvme_req(req)->ctrl->instance;
__entry->qid = nvme_req_qid(req);
- __entry->cid = req->tag;
+ __entry->cid = nvme_req(req)->cmd->common.command_id;
__entry->result = le64_to_cpu(nvme_req(req)->result.u64);
__entry->retries = nvme_req(req)->retries;
__entry->flags = nvme_req(req)->flags;
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
index e34718b09550..82b61acf7a72 100644
--- a/drivers/nvme/target/zns.c
+++ b/drivers/nvme/target/zns.c
@@ -34,8 +34,7 @@ static int validate_conv_zones_cb(struct blk_zone *z,

bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
{
- struct request_queue *q = ns->bdev->bd_disk->queue;
- u8 zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
+ u8 zasl = nvmet_zasl(bdev_max_zone_append_sectors(ns->bdev));
struct gendisk *bd_disk = ns->bdev->bd_disk;
int ret;

diff --git a/drivers/of/device.c b/drivers/of/device.c
index 874f031442dc..75b6cbffa755 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -81,8 +81,11 @@ of_dma_set_restricted_buffer(struct device *dev, struct device_node *np)
* restricted-dma-pool region is allowed.
*/
if (of_device_is_compatible(node, "restricted-dma-pool") &&
- of_device_is_available(node))
+ of_device_is_available(node)) {
+ of_node_put(node);
break;
+ }
+ of_node_put(node);
}

/*
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 0f30496ce80b..d624e8b185aa 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -246,7 +246,7 @@ static int populate_node(const void *blob,
}

*pnp = np;
- return true;
+ return 0;
}

static void reverse_nodes(struct device_node *parent)
diff --git a/drivers/of/kexec.c b/drivers/of/kexec.c
index 8d374cc552be..91b04b04eec4 100644
--- a/drivers/of/kexec.c
+++ b/drivers/of/kexec.c
@@ -126,6 +126,7 @@ int ima_get_kexec_buffer(void **addr, size_t *size)
{
int ret, len;
unsigned long tmp_addr;
+ unsigned long start_pfn, end_pfn;
size_t tmp_size;
const void *prop;

@@ -140,6 +141,22 @@ int ima_get_kexec_buffer(void **addr, size_t *size)
if (ret)
return ret;

+ /* Do some sanity on the returned size for the ima-kexec buffer */
+ if (!tmp_size)
+ return -ENOENT;
+
+ /*
+ * Calculate the PFNs for the buffer and ensure
+ * they are with in addressable memory.
+ */
+ start_pfn = PHYS_PFN(tmp_addr);
+ end_pfn = PHYS_PFN(tmp_addr + tmp_size - 1);
+ if (!page_is_ram(start_pfn) || !page_is_ram(end_pfn)) {
+ pr_warn("IMA buffer at 0x%lx, size = 0x%zx beyond memory\n",
+ tmp_addr, tmp_size);
+ return -EINVAL;
+ }
+
*addr = __va(tmp_addr);
*size = tmp_size;

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 740407252298..e154b18ec4b1 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -2413,8 +2413,8 @@ struct opp_table *dev_pm_opp_attach_genpd(struct device *dev,
}

virt_dev = dev_pm_domain_attach_by_name(dev, *name);
- if (IS_ERR(virt_dev)) {
- ret = PTR_ERR(virt_dev);
+ if (IS_ERR_OR_NULL(virt_dev)) {
+ ret = PTR_ERR(virt_dev) ? : -ENODEV;
dev_err(dev, "Couldn't attach to pm_domain: %d\n", ret);
goto err;
}
diff --git a/drivers/parisc/lba_pci.c b/drivers/parisc/lba_pci.c
index 732b516c7bf8..afc6e66ddc31 100644
--- a/drivers/parisc/lba_pci.c
+++ b/drivers/parisc/lba_pci.c
@@ -1476,9 +1476,13 @@ lba_driver_probe(struct parisc_device *dev)
u32 func_class;
void *tmp_obj;
char *version;
- void __iomem *addr = ioremap(dev->hpa.start, 4096);
+ void __iomem *addr;
int max;

+ addr = ioremap(dev->hpa.start, 4096);
+ if (addr == NULL)
+ return -ENOMEM;
+
/* Read HW Rev First */
func_class = READ_REG32(addr + LBA_FCLASS);

diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
index 0eda8236c125..13c2e73f0eaf 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -780,8 +780,9 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
ep->msi_mem = pci_epc_mem_alloc_addr(epc, &ep->msi_mem_phys,
epc->mem->window.page_size);
if (!ep->msi_mem) {
+ ret = -ENOMEM;
dev_err(dev, "Failed to reserve memory for MSI/MSI-X\n");
- return -ENOMEM;
+ goto err_exit_epc_mem;
}

if (ep->ops->get_features) {
@@ -790,6 +791,19 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
return 0;
}

- return dw_pcie_ep_init_complete(ep);
+ ret = dw_pcie_ep_init_complete(ep);
+ if (ret)
+ goto err_free_epc_mem;
+
+ return 0;
+
+err_free_epc_mem:
+ pci_epc_mem_free_addr(epc, ep->msi_mem_phys, ep->msi_mem,
+ epc->mem->window.page_size);
+
+err_exit_epc_mem:
+ pci_epc_mem_exit(epc);
+
+ return ret;
}
EXPORT_SYMBOL_GPL(dw_pcie_ep_init);
diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index 9979302532b7..d0d768f22ac3 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -421,8 +421,14 @@ int dw_pcie_host_init(struct pcie_port *pp)
bridge->sysdata = pp;

ret = pci_host_probe(bridge);
- if (!ret)
- return 0;
+ if (ret)
+ goto err_stop_link;
+
+ return 0;
+
+err_stop_link:
+ if (pci->ops && pci->ops->stop_link)
+ pci->ops->stop_link(pci);

err_free_msi:
if (pp->has_msi_ctrl)
@@ -433,8 +439,14 @@ EXPORT_SYMBOL_GPL(dw_pcie_host_init);

void dw_pcie_host_deinit(struct pcie_port *pp)
{
+ struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+
pci_stop_root_bus(pp->bridge->bus);
pci_remove_root_bus(pp->bridge->bus);
+
+ if (pci->ops && pci->ops->stop_link)
+ pci->ops->stop_link(pci);
+
if (pp->has_msi_ctrl)
dw_pcie_free_msi(pp);
}
@@ -531,7 +543,6 @@ static struct pci_ops dw_pcie_ops = {

void dw_pcie_setup_rc(struct pcie_port *pp)
{
- int i;
u32 val, ctrl, num_ctrls;
struct dw_pcie *pci = to_dw_pcie_from_pp(pp);

@@ -582,19 +593,22 @@ void dw_pcie_setup_rc(struct pcie_port *pp)
PCI_COMMAND_MASTER | PCI_COMMAND_SERR;
dw_pcie_writel_dbi(pci, PCI_COMMAND, val);

- /* Ensure all outbound windows are disabled so there are multiple matches */
- for (i = 0; i < pci->num_ob_windows; i++)
- dw_pcie_disable_atu(pci, i, DW_PCIE_REGION_OUTBOUND);
-
/*
* If the platform provides its own child bus config accesses, it means
* the platform uses its own address translation component rather than
* ATU, so we should not program the ATU here.
*/
if (pp->bridge->child_ops == &dw_child_pcie_ops) {
- int atu_idx = 0;
+ int i, atu_idx = 0;
struct resource_entry *entry;

+ /*
+ * Disable all outbound windows to make sure a transaction
+ * can't match multiple windows.
+ */
+ for (i = 0; i < pci->num_ob_windows; i++)
+ dw_pcie_disable_atu(pci, i, DW_PCIE_REGION_OUTBOUND);
+
/* Get last memory resource entry */
resource_list_for_each_entry(entry, &pp->bridge->windows) {
if (resource_type(entry->res) != IORESOURCE_MEM)
diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
index d92c8a25094f..5848cc520b52 100644
--- a/drivers/pci/controller/dwc/pcie-designware.c
+++ b/drivers/pci/controller/dwc/pcie-designware.c
@@ -287,8 +287,8 @@ static void dw_pcie_prog_outbound_atu_unroll(struct dw_pcie *pci, u8 func_no,
dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_UPPER_TARGET,
upper_32_bits(pci_addr));
val = type | PCIE_ATU_FUNC_NUM(func_no);
- val = upper_32_bits(size - 1) ?
- val | PCIE_ATU_INCREASE_REGION_SIZE : val;
+ if (upper_32_bits(limit_addr) > upper_32_bits(cpu_addr))
+ val |= PCIE_ATU_INCREASE_REGION_SIZE;
if (pci->version == 0x490A)
val = dw_pcie_enable_ecrc(val);
dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_REGION_CTRL1, val);
@@ -315,6 +315,7 @@ static void __dw_pcie_prog_outbound_atu(struct dw_pcie *pci, u8 func_no,
u64 pci_addr, u64 size)
{
u32 retries, val;
+ u64 limit_addr;

if (pci->ops && pci->ops->cpu_addr_fixup)
cpu_addr = pci->ops->cpu_addr_fixup(pci, cpu_addr);
@@ -325,6 +326,8 @@ static void __dw_pcie_prog_outbound_atu(struct dw_pcie *pci, u8 func_no,
return;
}

+ limit_addr = cpu_addr + size - 1;
+
dw_pcie_writel_dbi(pci, PCIE_ATU_VIEWPORT,
PCIE_ATU_REGION_OUTBOUND | index);
dw_pcie_writel_dbi(pci, PCIE_ATU_LOWER_BASE,
@@ -332,17 +335,18 @@ static void __dw_pcie_prog_outbound_atu(struct dw_pcie *pci, u8 func_no,
dw_pcie_writel_dbi(pci, PCIE_ATU_UPPER_BASE,
upper_32_bits(cpu_addr));
dw_pcie_writel_dbi(pci, PCIE_ATU_LIMIT,
- lower_32_bits(cpu_addr + size - 1));
+ lower_32_bits(limit_addr));
if (pci->version >= 0x460A)
dw_pcie_writel_dbi(pci, PCIE_ATU_UPPER_LIMIT,
- upper_32_bits(cpu_addr + size - 1));
+ upper_32_bits(limit_addr));
dw_pcie_writel_dbi(pci, PCIE_ATU_LOWER_TARGET,
lower_32_bits(pci_addr));
dw_pcie_writel_dbi(pci, PCIE_ATU_UPPER_TARGET,
upper_32_bits(pci_addr));
val = type | PCIE_ATU_FUNC_NUM(func_no);
- val = ((upper_32_bits(size - 1)) && (pci->version >= 0x460A)) ?
- val | PCIE_ATU_INCREASE_REGION_SIZE : val;
+ if (upper_32_bits(limit_addr) > upper_32_bits(cpu_addr) &&
+ pci->version >= 0x460A)
+ val |= PCIE_ATU_INCREASE_REGION_SIZE;
if (pci->version == 0x490A)
val = dw_pcie_enable_ecrc(val);
dw_pcie_writel_dbi(pci, PCIE_ATU_CR1, val);
@@ -491,7 +495,7 @@ int dw_pcie_prog_inbound_atu(struct dw_pcie *pci, u8 func_no, int index,
void dw_pcie_disable_atu(struct dw_pcie *pci, int index,
enum dw_pcie_region_type type)
{
- int region;
+ u32 region;

switch (type) {
case DW_PCIE_REGION_INBOUND:
@@ -504,8 +508,18 @@ void dw_pcie_disable_atu(struct dw_pcie *pci, int index,
return;
}

- dw_pcie_writel_dbi(pci, PCIE_ATU_VIEWPORT, region | index);
- dw_pcie_writel_dbi(pci, PCIE_ATU_CR2, ~(u32)PCIE_ATU_ENABLE);
+ if (pci->iatu_unroll_enabled) {
+ if (region == PCIE_ATU_REGION_INBOUND) {
+ dw_pcie_writel_ib_unroll(pci, index, PCIE_ATU_UNR_REGION_CTRL2,
+ ~(u32)PCIE_ATU_ENABLE);
+ } else {
+ dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_REGION_CTRL2,
+ ~(u32)PCIE_ATU_ENABLE);
+ }
+ } else {
+ dw_pcie_writel_dbi(pci, PCIE_ATU_VIEWPORT, region | index);
+ dw_pcie_writel_dbi(pci, PCIE_ATU_CR2, ~(u32)PCIE_ATU_ENABLE);
+ }
}

int dw_pcie_wait_for_link(struct dw_pcie *pci)
@@ -726,6 +740,13 @@ void dw_pcie_setup(struct dw_pcie *pci)
val |= PORT_LINK_DLL_LINK_EN;
dw_pcie_writel_dbi(pci, PCIE_PORT_LINK_CONTROL, val);

+ if (of_property_read_bool(np, "snps,enable-cdm-check")) {
+ val = dw_pcie_readl_dbi(pci, PCIE_PL_CHK_REG_CONTROL_STATUS);
+ val |= PCIE_PL_CHK_REG_CHK_REG_CONTINUOUS |
+ PCIE_PL_CHK_REG_CHK_REG_START;
+ dw_pcie_writel_dbi(pci, PCIE_PL_CHK_REG_CONTROL_STATUS, val);
+ }
+
of_property_read_u32(np, "num-lanes", &pci->num_lanes);
if (!pci->num_lanes) {
dev_dbg(pci->dev, "Using h/w default number of lanes\n");
@@ -772,11 +793,4 @@ void dw_pcie_setup(struct dw_pcie *pci)
break;
}
dw_pcie_writel_dbi(pci, PCIE_LINK_WIDTH_SPEED_CONTROL, val);
-
- if (of_property_read_bool(np, "snps,enable-cdm-check")) {
- val = dw_pcie_readl_dbi(pci, PCIE_PL_CHK_REG_CONTROL_STATUS);
- val |= PCIE_PL_CHK_REG_CHK_REG_CONTINUOUS |
- PCIE_PL_CHK_REG_CHK_REG_START;
- dw_pcie_writel_dbi(pci, PCIE_PL_CHK_REG_CONTROL_STATUS, val);
- }
}
diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index ed55421eb9ba..340542aab8a5 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -337,8 +337,6 @@ static int qcom_pcie_init_2_1_0(struct qcom_pcie *pcie)
reset_control_assert(res->ext_reset);
reset_control_assert(res->phy_reset);

- writel(1, pcie->parf + PCIE20_PARF_PHY_CTRL);
-
ret = regulator_bulk_enable(ARRAY_SIZE(res->supplies), res->supplies);
if (ret < 0) {
dev_err(dev, "cannot enable regulators\n");
@@ -381,15 +379,15 @@ static int qcom_pcie_init_2_1_0(struct qcom_pcie *pcie)
goto err_deassert_axi;
}

- ret = clk_bulk_prepare_enable(ARRAY_SIZE(res->clks), res->clks);
- if (ret)
- goto err_clks;
-
/* enable PCIe clocks and resets */
val = readl(pcie->parf + PCIE20_PARF_PHY_CTRL);
val &= ~BIT(0);
writel(val, pcie->parf + PCIE20_PARF_PHY_CTRL);

+ ret = clk_bulk_prepare_enable(ARRAY_SIZE(res->clks), res->clks);
+ if (ret)
+ goto err_clks;
+
if (of_device_is_compatible(node, "qcom,pcie-ipq8064") ||
of_device_is_compatible(node, "qcom,pcie-ipq8064-v2")) {
writel(PCS_DEEMPH_TX_DEEMPH_GEN1(24) |
@@ -1038,9 +1036,7 @@ static int qcom_pcie_init_2_3_3(struct qcom_pcie *pcie)
struct qcom_pcie_resources_2_3_3 *res = &pcie->res.v2_3_3;
struct dw_pcie *pci = pcie->pci;
struct device *dev = pci->dev;
- u16 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
int i, ret;
- u32 val;

for (i = 0; i < ARRAY_SIZE(res->rst); i++) {
ret = reset_control_assert(res->rst[i]);
@@ -1097,6 +1093,33 @@ static int qcom_pcie_init_2_3_3(struct qcom_pcie *pcie)
goto err_clk_aux;
}

+ return 0;
+
+err_clk_aux:
+ clk_disable_unprepare(res->ahb_clk);
+err_clk_ahb:
+ clk_disable_unprepare(res->axi_s_clk);
+err_clk_axi_s:
+ clk_disable_unprepare(res->axi_m_clk);
+err_clk_axi_m:
+ clk_disable_unprepare(res->iface);
+err_clk_iface:
+ /*
+ * Not checking for failure, will anyway return
+ * the original failure in 'ret'.
+ */
+ for (i = 0; i < ARRAY_SIZE(res->rst); i++)
+ reset_control_assert(res->rst[i]);
+
+ return ret;
+}
+
+static int qcom_pcie_post_init_2_3_3(struct qcom_pcie *pcie)
+{
+ struct dw_pcie *pci = pcie->pci;
+ u16 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
+ u32 val;
+
writel(SLV_ADDR_SPACE_SZ,
pcie->parf + PCIE20_v3_PARF_SLV_ADDR_SPACE_SIZE);

@@ -1124,24 +1147,6 @@ static int qcom_pcie_init_2_3_3(struct qcom_pcie *pcie)
PCI_EXP_DEVCTL2);

return 0;
-
-err_clk_aux:
- clk_disable_unprepare(res->ahb_clk);
-err_clk_ahb:
- clk_disable_unprepare(res->axi_s_clk);
-err_clk_axi_s:
- clk_disable_unprepare(res->axi_m_clk);
-err_clk_axi_m:
- clk_disable_unprepare(res->iface);
-err_clk_iface:
- /*
- * Not checking for failure, will anyway return
- * the original failure in 'ret'.
- */
- for (i = 0; i < ARRAY_SIZE(res->rst); i++)
- reset_control_assert(res->rst[i]);
-
- return ret;
}

static int qcom_pcie_get_resources_2_7_0(struct qcom_pcie *pcie)
@@ -1467,6 +1472,7 @@ static const struct qcom_pcie_ops ops_2_4_0 = {
static const struct qcom_pcie_ops ops_2_3_3 = {
.get_resources = qcom_pcie_get_resources_2_3_3,
.init = qcom_pcie_init_2_3_3,
+ .post_init = qcom_pcie_post_init_2_3_3,
.deinit = qcom_pcie_deinit_2_3_3,
.ltssm_enable = qcom_pcie_2_3_2_ltssm_enable,
};
diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
index b1b5f836a806..a479e58cc8fb 100644
--- a/drivers/pci/controller/dwc/pcie-tegra194.c
+++ b/drivers/pci/controller/dwc/pcie-tegra194.c
@@ -352,15 +352,14 @@ static irqreturn_t tegra_pcie_rp_irq_handler(int irq, void *arg)
struct tegra194_pcie *pcie = arg;
struct dw_pcie *pci = &pcie->pci;
struct pcie_port *pp = &pci->pp;
- u32 val, tmp;
+ u32 val, status_l0, status_l1;
u16 val_w;

- val = appl_readl(pcie, APPL_INTR_STATUS_L0);
- if (val & APPL_INTR_STATUS_L0_LINK_STATE_INT) {
- val = appl_readl(pcie, APPL_INTR_STATUS_L1_0_0);
- if (val & APPL_INTR_STATUS_L1_0_0_LINK_REQ_RST_NOT_CHGED) {
- appl_writel(pcie, val, APPL_INTR_STATUS_L1_0_0);
-
+ status_l0 = appl_readl(pcie, APPL_INTR_STATUS_L0);
+ if (status_l0 & APPL_INTR_STATUS_L0_LINK_STATE_INT) {
+ status_l1 = appl_readl(pcie, APPL_INTR_STATUS_L1_0_0);
+ appl_writel(pcie, status_l1, APPL_INTR_STATUS_L1_0_0);
+ if (status_l1 & APPL_INTR_STATUS_L1_0_0_LINK_REQ_RST_NOT_CHGED) {
/* SBR & Surprise Link Down WAR */
val = appl_readl(pcie, APPL_CAR_RESET_OVRD);
val &= ~APPL_CAR_RESET_OVRD_CYA_OVERRIDE_CORE_RST_N;
@@ -376,15 +375,15 @@ static irqreturn_t tegra_pcie_rp_irq_handler(int irq, void *arg)
}
}

- if (val & APPL_INTR_STATUS_L0_INT_INT) {
- val = appl_readl(pcie, APPL_INTR_STATUS_L1_8_0);
- if (val & APPL_INTR_STATUS_L1_8_0_AUTO_BW_INT_STS) {
+ if (status_l0 & APPL_INTR_STATUS_L0_INT_INT) {
+ status_l1 = appl_readl(pcie, APPL_INTR_STATUS_L1_8_0);
+ if (status_l1 & APPL_INTR_STATUS_L1_8_0_AUTO_BW_INT_STS) {
appl_writel(pcie,
APPL_INTR_STATUS_L1_8_0_AUTO_BW_INT_STS,
APPL_INTR_STATUS_L1_8_0);
apply_bad_link_workaround(pp);
}
- if (val & APPL_INTR_STATUS_L1_8_0_BW_MGT_INT_STS) {
+ if (status_l1 & APPL_INTR_STATUS_L1_8_0_BW_MGT_INT_STS) {
appl_writel(pcie,
APPL_INTR_STATUS_L1_8_0_BW_MGT_INT_STS,
APPL_INTR_STATUS_L1_8_0);
@@ -396,25 +395,24 @@ static irqreturn_t tegra_pcie_rp_irq_handler(int irq, void *arg)
}
}

- val = appl_readl(pcie, APPL_INTR_STATUS_L0);
- if (val & APPL_INTR_STATUS_L0_CDM_REG_CHK_INT) {
- val = appl_readl(pcie, APPL_INTR_STATUS_L1_18);
- tmp = dw_pcie_readl_dbi(pci, PCIE_PL_CHK_REG_CONTROL_STATUS);
- if (val & APPL_INTR_STATUS_L1_18_CDM_REG_CHK_CMPLT) {
+ if (status_l0 & APPL_INTR_STATUS_L0_CDM_REG_CHK_INT) {
+ status_l1 = appl_readl(pcie, APPL_INTR_STATUS_L1_18);
+ val = dw_pcie_readl_dbi(pci, PCIE_PL_CHK_REG_CONTROL_STATUS);
+ if (status_l1 & APPL_INTR_STATUS_L1_18_CDM_REG_CHK_CMPLT) {
dev_info(pci->dev, "CDM check complete\n");
- tmp |= PCIE_PL_CHK_REG_CHK_REG_COMPLETE;
+ val |= PCIE_PL_CHK_REG_CHK_REG_COMPLETE;
}
- if (val & APPL_INTR_STATUS_L1_18_CDM_REG_CHK_CMP_ERR) {
+ if (status_l1 & APPL_INTR_STATUS_L1_18_CDM_REG_CHK_CMP_ERR) {
dev_err(pci->dev, "CDM comparison mismatch\n");
- tmp |= PCIE_PL_CHK_REG_CHK_REG_COMPARISON_ERROR;
+ val |= PCIE_PL_CHK_REG_CHK_REG_COMPARISON_ERROR;
}
- if (val & APPL_INTR_STATUS_L1_18_CDM_REG_CHK_LOGIC_ERR) {
+ if (status_l1 & APPL_INTR_STATUS_L1_18_CDM_REG_CHK_LOGIC_ERR) {
dev_err(pci->dev, "CDM Logic error\n");
- tmp |= PCIE_PL_CHK_REG_CHK_REG_LOGIC_ERROR;
+ val |= PCIE_PL_CHK_REG_CHK_REG_LOGIC_ERROR;
}
- dw_pcie_writel_dbi(pci, PCIE_PL_CHK_REG_CONTROL_STATUS, tmp);
- tmp = dw_pcie_readl_dbi(pci, PCIE_PL_CHK_REG_ERR_ADDR);
- dev_err(pci->dev, "CDM Error Address Offset = 0x%08X\n", tmp);
+ dw_pcie_writel_dbi(pci, PCIE_PL_CHK_REG_CONTROL_STATUS, val);
+ val = dw_pcie_readl_dbi(pci, PCIE_PL_CHK_REG_ERR_ADDR);
+ dev_err(pci->dev, "CDM Error Address Offset = 0x%08X\n", val);
}

return IRQ_HANDLED;
@@ -980,7 +978,7 @@ static int tegra194_pcie_start_link(struct dw_pcie *pci)
offset = dw_pcie_find_ext_capability(pci, PCI_EXT_CAP_ID_DLF);
val = dw_pcie_readl_dbi(pci, offset + PCI_DLF_CAP);
val &= ~PCI_DLF_EXCHANGE_ENABLE;
- dw_pcie_writel_dbi(pci, offset, val);
+ dw_pcie_writel_dbi(pci, offset + PCI_DLF_CAP, val);

tegra194_pcie_host_init(pp);
dw_pcie_setup_rc(pp);
@@ -1951,6 +1949,7 @@ static int tegra_pcie_config_ep(struct tegra194_pcie *pcie,
if (ret) {
dev_err(dev, "Failed to initialize DWC Endpoint subsystem: %d\n",
ret);
+ pm_runtime_disable(dev);
return ret;
}

diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c b/drivers/pci/controller/pcie-mediatek-gen3.c
index 5d9fd36b02d1..a02c466a597c 100644
--- a/drivers/pci/controller/pcie-mediatek-gen3.c
+++ b/drivers/pci/controller/pcie-mediatek-gen3.c
@@ -600,7 +600,8 @@ static int mtk_pcie_init_irq_domains(struct mtk_gen3_pcie *pcie)
&intx_domain_ops, pcie);
if (!pcie->intx_domain) {
dev_err(dev, "failed to create INTx IRQ domain\n");
- return -ENODEV;
+ ret = -ENODEV;
+ goto out_put_node;
}

/* Setup MSI */
@@ -623,13 +624,15 @@ static int mtk_pcie_init_irq_domains(struct mtk_gen3_pcie *pcie)
goto err_msi_domain;
}

+ of_node_put(intc_node);
return 0;

err_msi_domain:
irq_domain_remove(pcie->msi_bottom_domain);
err_msi_bottom_domain:
irq_domain_remove(pcie->intx_domain);
-
+out_put_node:
+ of_node_put(intc_node);
return ret;
}

diff --git a/drivers/pci/controller/pcie-microchip-host.c b/drivers/pci/controller/pcie-microchip-host.c
index 2c52a8cef726..932b1c182149 100644
--- a/drivers/pci/controller/pcie-microchip-host.c
+++ b/drivers/pci/controller/pcie-microchip-host.c
@@ -904,6 +904,7 @@ static int mc_pcie_init_irq_domains(struct mc_pcie *port)
&event_domain_ops, port);
if (!port->event_domain) {
dev_err(dev, "failed to get event domain\n");
+ of_node_put(pcie_intc_node);
return -ENOMEM;
}

@@ -913,6 +914,7 @@ static int mc_pcie_init_irq_domains(struct mc_pcie *port)
&intx_domain_ops, port);
if (!port->intx_domain) {
dev_err(dev, "failed to get an INTx IRQ domain\n");
+ of_node_put(pcie_intc_node);
return -ENOMEM;
}

diff --git a/drivers/pci/endpoint/functions/pci-epf-test.c b/drivers/pci/endpoint/functions/pci-epf-test.c
index 5b833f00e980..a5ed779b0a51 100644
--- a/drivers/pci/endpoint/functions/pci-epf-test.c
+++ b/drivers/pci/endpoint/functions/pci-epf-test.c
@@ -627,7 +627,6 @@ static void pci_epf_test_unbind(struct pci_epf *epf)

cancel_delayed_work(&epf_test->cmd_handler);
pci_epf_test_clean_dma_chan(epf_test);
- pci_epc_stop(epc);
for (bar = 0; bar < PCI_STD_NUM_BARS; bar++) {
epf_bar = &epf->bar[bar];

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 7952e5efd6cf..a1e38ca93cd9 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -538,7 +538,7 @@ static const char *aer_agent_string[] = {
u64 *stats = pdev->aer_stats->stats_array; \
size_t len = 0; \
\
- for (i = 0; i < ARRAY_SIZE(strings_array); i++) { \
+ for (i = 0; i < ARRAY_SIZE(pdev->aer_stats->stats_array); i++) {\
if (strings_array[i]) \
len += sysfs_emit_at(buf, len, "%s %llu\n", \
strings_array[i], \
@@ -1347,6 +1347,11 @@ static int aer_probe(struct pcie_device *dev)
struct device *device = &dev->device;
struct pci_dev *port = dev->port;

+ BUILD_BUG_ON(ARRAY_SIZE(aer_correctable_error_string) <
+ AER_MAX_TYPEOF_COR_ERRS);
+ BUILD_BUG_ON(ARRAY_SIZE(aer_uncorrectable_error_string) <
+ AER_MAX_TYPEOF_UNCOR_ERRS);
+
/* Limit to Root Ports or Root Complex Event Collectors */
if ((pci_pcie_type(port) != PCI_EXP_TYPE_RC_EC) &&
(pci_pcie_type(port) != PCI_EXP_TYPE_ROOT_PORT))
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 604feeb84ee4..1ac7fec47d6f 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -222,15 +222,8 @@ static int get_port_device_capability(struct pci_dev *dev)

#ifdef CONFIG_PCIEAER
if (dev->aer_cap && pci_aer_available() &&
- (pcie_ports_native || host->native_aer)) {
+ (pcie_ports_native || host->native_aer))
services |= PCIE_PORT_SERVICE_AER;
-
- /*
- * Disable AER on this port in case it's been enabled by the
- * BIOS (the AER service driver will enable it when necessary).
- */
- pci_disable_pcie_error_reporting(dev);
- }
#endif

/* Root Ports and Root Complex Event Collectors may generate PMEs */
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index d44bcc29d99c..cd5945e17fdf 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -39,6 +39,24 @@
#include <asm/mmu.h>
#include <asm/sysreg.h>

+/*
+ * Cache if the event is allowed to trace Context information.
+ * This allows us to perform the check, i.e, perfmon_capable(),
+ * in the context of the event owner, once, during the event_init().
+ */
+#define SPE_PMU_HW_FLAGS_CX BIT(0)
+
+static void set_spe_event_has_cx(struct perf_event *event)
+{
+ if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable())
+ event->hw.flags |= SPE_PMU_HW_FLAGS_CX;
+}
+
+static bool get_spe_event_has_cx(struct perf_event *event)
+{
+ return !!(event->hw.flags & SPE_PMU_HW_FLAGS_CX);
+}
+
#define ARM_SPE_BUF_PAD_BYTE 0

struct arm_spe_pmu_buf {
@@ -272,7 +290,7 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event)
if (!attr->exclude_kernel)
reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT);

- if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable())
+ if (get_spe_event_has_cx(event))
reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT);

return reg;
@@ -709,10 +727,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
!(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT))
return -EOPNOTSUPP;

+ set_spe_event_has_cx(event);
reg = arm_spe_event_to_pmscr(event);
if (!perfmon_capable() &&
(reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) |
- BIT(SYS_PMSCR_EL1_CX_SHIFT) |
BIT(SYS_PMSCR_EL1_PCT_SHIFT))))
return -EACCES;

diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
index b2b8d2074ed0..130b9f1a40e0 100644
--- a/drivers/perf/riscv_pmu.c
+++ b/drivers/perf/riscv_pmu.c
@@ -170,7 +170,6 @@ int riscv_pmu_event_set_period(struct perf_event *event)
left = (max_period >> 1);

local64_set(&hwc->prev_count, (u64)-left);
- perf_event_update_userpage(event);

return overflow;
}
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index a1317a483512..77b826d6c05b 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -274,8 +274,13 @@ static int pmu_sbi_ctr_get_idx(struct perf_event *event)
cflags |= SBI_PMU_CFG_FLAG_SET_UINH;

/* retrieve the available counter index */
+#if defined(CONFIG_32BIT)
+ ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, cbase, cmask,
+ cflags, hwc->event_base, hwc->config, hwc->config >> 32);
+#else
ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, cbase, cmask,
cflags, hwc->event_base, hwc->config, 0);
+#endif
if (ret.error) {
pr_debug("Not able to find a counter for event %lx config %llx\n",
hwc->event_base, hwc->config);
@@ -417,8 +422,13 @@ static void pmu_sbi_ctr_start(struct perf_event *event, u64 ival)
struct hw_perf_event *hwc = &event->hw;
unsigned long flag = SBI_PMU_START_FLAG_SET_INIT_VALUE;

+#if defined(CONFIG_32BIT)
ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, hwc->idx,
1, flag, ival, ival >> 32, 0);
+#else
+ ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, hwc->idx,
+ 1, flag, ival, 0, 0);
+#endif
if (ret.error && (ret.error != SBI_ERR_ALREADY_STARTED))
pr_err("Starting counter idx %d failed with error %d\n",
hwc->idx, sbi_err_map_linux_errno(ret.error));
@@ -525,8 +535,14 @@ static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
hwc = &event->hw;
max_period = riscv_pmu_ctr_get_width_mask(event);
init_val = local64_read(&hwc->prev_count) & max_period;
+#if defined(CONFIG_32BIT)
+ sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, idx, 1,
+ flag, init_val, init_val >> 32, 0);
+#else
sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, idx, 1,
flag, init_val, 0, 0);
+#endif
+ perf_event_update_userpage(event);
}
ctr_ovf_mask = ctr_ovf_mask >> 1;
idx++;
@@ -666,12 +682,15 @@ static int pmu_sbi_setup_irqs(struct riscv_pmu *pmu, struct platform_device *pde
child = of_get_compatible_child(cpu, "riscv,cpu-intc");
if (!child) {
pr_err("Failed to find INTC node\n");
+ of_node_put(cpu);
return -ENODEV;
}
domain = irq_find_host(child);
of_node_put(child);
- if (domain)
+ if (domain) {
+ of_node_put(cpu);
break;
+ }
}
if (!domain) {
pr_err("Failed to find INTC IRQ root domain\n");
diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.h b/drivers/phy/qualcomm/phy-qcom-qmp.h
index 06b2556ed93a..b9a91520439c 100644
--- a/drivers/phy/qualcomm/phy-qcom-qmp.h
+++ b/drivers/phy/qualcomm/phy-qcom-qmp.h
@@ -1116,7 +1116,8 @@
#define QSERDES_V5_COM_CORE_CLK_EN 0x174
#define QSERDES_V5_COM_CMN_CONFIG 0x17c
#define QSERDES_V5_COM_CMN_MISC1 0x19c
-#define QSERDES_V5_COM_CMN_MODE 0x1a4
+#define QSERDES_V5_COM_CMN_MODE 0x1a0
+#define QSERDES_V5_COM_CMN_MODE_CONTD 0x1a4
#define QSERDES_V5_COM_VCO_DC_LEVEL_CTRL 0x1a8
#define QSERDES_V5_COM_BIN_VCOCAL_CMP_CODE1_MODE0 0x1ac
#define QSERDES_V5_COM_BIN_VCOCAL_CMP_CODE2_MODE0 0x1b0
diff --git a/drivers/phy/rockchip/phy-rockchip-inno-usb2.c b/drivers/phy/rockchip/phy-rockchip-inno-usb2.c
index cba5c32cbaee..0c6548ac9d8a 100644
--- a/drivers/phy/rockchip/phy-rockchip-inno-usb2.c
+++ b/drivers/phy/rockchip/phy-rockchip-inno-usb2.c
@@ -942,7 +942,9 @@ static irqreturn_t rockchip_usb2phy_irq(int irq, void *data)

switch (rport->port_id) {
case USB2PHY_PORT_OTG:
- ret |= rockchip_usb2phy_otg_mux_irq(irq, rport);
+ if (rport->mode != USB_DR_MODE_HOST &&
+ rport->mode != USB_DR_MODE_UNKNOWN)
+ ret |= rockchip_usb2phy_otg_mux_irq(irq, rport);
break;
case USB2PHY_PORT_HOST:
ret |= rockchip_usb2phy_linestate_irq(irq, rport);
diff --git a/drivers/phy/samsung/phy-exynosautov9-ufs.c b/drivers/phy/samsung/phy-exynosautov9-ufs.c
index 36398a15c2db..d043dfdb598a 100644
--- a/drivers/phy/samsung/phy-exynosautov9-ufs.c
+++ b/drivers/phy/samsung/phy-exynosautov9-ufs.c
@@ -31,22 +31,22 @@ static const struct samsung_ufs_phy_cfg exynosautov9_pre_init_cfg[] = {
PHY_COMN_REG_CFG(0x023, 0xc0, PWR_MODE_ANY),
PHY_COMN_REG_CFG(0x023, 0x00, PWR_MODE_ANY),

- PHY_TRSV_REG_CFG(0x042, 0x5d, PWR_MODE_ANY),
- PHY_TRSV_REG_CFG(0x043, 0x80, PWR_MODE_ANY),
+ PHY_TRSV_REG_CFG_AUTOV9(0x042, 0x5d, PWR_MODE_ANY),
+ PHY_TRSV_REG_CFG_AUTOV9(0x043, 0x80, PWR_MODE_ANY),

END_UFS_PHY_CFG,
};

/* Calibration for HS mode series A/B */
static const struct samsung_ufs_phy_cfg exynosautov9_pre_pwr_hs_cfg[] = {
- PHY_TRSV_REG_CFG(0x032, 0xbc, PWR_MODE_HS_ANY),
- PHY_TRSV_REG_CFG(0x03c, 0x7f, PWR_MODE_HS_ANY),
- PHY_TRSV_REG_CFG(0x048, 0xc0, PWR_MODE_HS_ANY),
+ PHY_TRSV_REG_CFG_AUTOV9(0x032, 0xbc, PWR_MODE_HS_ANY),
+ PHY_TRSV_REG_CFG_AUTOV9(0x03c, 0x7f, PWR_MODE_HS_ANY),
+ PHY_TRSV_REG_CFG_AUTOV9(0x048, 0xc0, PWR_MODE_HS_ANY),

- PHY_TRSV_REG_CFG(0x04a, 0x00, PWR_MODE_HS_G3_SER_B),
- PHY_TRSV_REG_CFG(0x04b, 0x10, PWR_MODE_HS_G1_SER_B |
- PWR_MODE_HS_G3_SER_B),
- PHY_TRSV_REG_CFG(0x04d, 0x63, PWR_MODE_HS_G3_SER_B),
+ PHY_TRSV_REG_CFG_AUTOV9(0x04a, 0x00, PWR_MODE_HS_G3_SER_B),
+ PHY_TRSV_REG_CFG_AUTOV9(0x04b, 0x10, PWR_MODE_HS_G1_SER_B |
+ PWR_MODE_HS_G3_SER_B),
+ PHY_TRSV_REG_CFG_AUTOV9(0x04d, 0x63, PWR_MODE_HS_G3_SER_B),

END_UFS_PHY_CFG,
};
diff --git a/drivers/phy/st/phy-stm32-usbphyc.c b/drivers/phy/st/phy-stm32-usbphyc.c
index 007a23c78d56..a98c911cc37a 100644
--- a/drivers/phy/st/phy-stm32-usbphyc.c
+++ b/drivers/phy/st/phy-stm32-usbphyc.c
@@ -358,7 +358,9 @@ static int stm32_usbphyc_phy_init(struct phy *phy)
return 0;

pll_disable:
- return stm32_usbphyc_pll_disable(usbphyc);
+ stm32_usbphyc_pll_disable(usbphyc);
+
+ return ret;
}

static int stm32_usbphyc_phy_exit(struct phy *phy)
diff --git a/drivers/phy/ti/phy-tusb1210.c b/drivers/phy/ti/phy-tusb1210.c
index c3ab4b69ea68..669c13d6e402 100644
--- a/drivers/phy/ti/phy-tusb1210.c
+++ b/drivers/phy/ti/phy-tusb1210.c
@@ -105,8 +105,9 @@ static int tusb1210_power_on(struct phy *phy)
msleep(TUSB1210_RESET_TIME_MS);

/* Restore the optional eye diagram optimization value */
- return tusb1210_ulpi_write(tusb, TUSB1210_VENDOR_SPECIFIC2,
- tusb->vendor_specific2);
+ tusb1210_ulpi_write(tusb, TUSB1210_VENDOR_SPECIFIC2, tusb->vendor_specific2);
+
+ return 0;
}

static int tusb1210_power_off(struct phy *phy)
diff --git a/drivers/pinctrl/Kconfig b/drivers/pinctrl/Kconfig
index f52960d2dfbe..bff144c97e66 100644
--- a/drivers/pinctrl/Kconfig
+++ b/drivers/pinctrl/Kconfig
@@ -32,7 +32,7 @@ config DEBUG_PINCTRL
Say Y here to add some extra checks and diagnostics to PINCTRL calls.

config PINCTRL_AMD
- tristate "AMD GPIO pin control"
+ bool "AMD GPIO pin control"
depends on HAS_IOMEM
depends on ACPI || COMPILE_TEST
select GPIOLIB
diff --git a/drivers/platform/chrome/cros_ec.c b/drivers/platform/chrome/cros_ec.c
index a5cc8f24299e..4168d7f5f273 100644
--- a/drivers/platform/chrome/cros_ec.c
+++ b/drivers/platform/chrome/cros_ec.c
@@ -135,16 +135,16 @@ static int cros_ec_sleep_event(struct cros_ec_device *ec_dev, u8 sleep_event)
buf.msg.command = EC_CMD_HOST_SLEEP_EVENT;

ret = cros_ec_cmd_xfer_status(ec_dev, &buf.msg);
-
- /* For now, report failure to transition to S0ix with a warning. */
+ /* Report failure to transition to system wide suspend with a warning. */
if (ret >= 0 && ec_dev->host_sleep_v1 &&
- (sleep_event == HOST_SLEEP_EVENT_S0IX_RESUME)) {
+ (sleep_event == HOST_SLEEP_EVENT_S0IX_RESUME ||
+ sleep_event == HOST_SLEEP_EVENT_S3_RESUME)) {
ec_dev->last_resume_result =
buf.u.resp1.resume_response.sleep_transitions;

WARN_ONCE(buf.u.resp1.resume_response.sleep_transitions &
EC_HOST_RESUME_SLEEP_TIMEOUT,
- "EC detected sleep transition timeout. Total slp_s0 transitions: %d",
+ "EC detected sleep transition timeout. Total sleep transitions: %d",
buf.u.resp1.resume_response.sleep_transitions &
EC_HOST_RESUME_SLEEP_TRANSITIONS_MASK);
}
diff --git a/drivers/platform/mellanox/mlxreg-lc.c b/drivers/platform/mellanox/mlxreg-lc.c
index c897a2f15840..55834ccb4ac7 100644
--- a/drivers/platform/mellanox/mlxreg-lc.c
+++ b/drivers/platform/mellanox/mlxreg-lc.c
@@ -716,8 +716,12 @@ mlxreg_lc_config_init(struct mlxreg_lc *mlxreg_lc, void *regmap,
switch (regval) {
case MLXREG_LC_SN4800_C16:
err = mlxreg_lc_sn4800_c16_config_init(mlxreg_lc, regmap, data);
- if (err)
+ if (err) {
+ dev_err(dev, "Failed to config client %s at bus %d at addr 0x%02x\n",
+ data->hpdev.brdinfo->type, data->hpdev.nr,
+ data->hpdev.brdinfo->addr);
return err;
+ }
break;
default:
return -ENODEV;
@@ -730,8 +734,11 @@ mlxreg_lc_config_init(struct mlxreg_lc *mlxreg_lc, void *regmap,
mlxreg_lc->mux = platform_device_register_resndata(dev, "i2c-mux-mlxcpld", data->hpdev.nr,
NULL, 0, mlxreg_lc->mux_data,
sizeof(*mlxreg_lc->mux_data));
- if (IS_ERR(mlxreg_lc->mux))
+ if (IS_ERR(mlxreg_lc->mux)) {
+ dev_err(dev, "Failed to create mux infra for client %s at bus %d at addr 0x%02x\n",
+ data->hpdev.brdinfo->type, data->hpdev.nr, data->hpdev.brdinfo->addr);
return PTR_ERR(mlxreg_lc->mux);
+ }

/* Register IO access driver. */
if (mlxreg_lc->io_data) {
@@ -740,6 +747,9 @@ mlxreg_lc_config_init(struct mlxreg_lc *mlxreg_lc, void *regmap,
platform_device_register_resndata(dev, "mlxreg-io", data->hpdev.nr, NULL, 0,
mlxreg_lc->io_data, sizeof(*mlxreg_lc->io_data));
if (IS_ERR(mlxreg_lc->io_regs)) {
+ dev_err(dev, "Failed to create regio for client %s at bus %d at addr 0x%02x\n",
+ data->hpdev.brdinfo->type, data->hpdev.nr,
+ data->hpdev.brdinfo->addr);
err = PTR_ERR(mlxreg_lc->io_regs);
goto fail_register_io;
}
@@ -753,6 +763,9 @@ mlxreg_lc_config_init(struct mlxreg_lc *mlxreg_lc, void *regmap,
mlxreg_lc->led_data,
sizeof(*mlxreg_lc->led_data));
if (IS_ERR(mlxreg_lc->led)) {
+ dev_err(dev, "Failed to create LED objects for client %s at bus %d at addr 0x%02x\n",
+ data->hpdev.brdinfo->type, data->hpdev.nr,
+ data->hpdev.brdinfo->addr);
err = PTR_ERR(mlxreg_lc->led);
goto fail_register_led;
}
@@ -809,7 +822,8 @@ static int mlxreg_lc_probe(struct platform_device *pdev)
if (!data->hpdev.adapter) {
dev_err(&pdev->dev, "Failed to get adapter for bus %d\n",
data->hpdev.nr);
- return -EFAULT;
+ err = -EFAULT;
+ goto i2c_get_adapter_fail;
}

/* Create device at the top of line card I2C tree.*/
@@ -818,32 +832,40 @@ static int mlxreg_lc_probe(struct platform_device *pdev)
if (IS_ERR(data->hpdev.client)) {
dev_err(&pdev->dev, "Failed to create client %s at bus %d at addr 0x%02x\n",
data->hpdev.brdinfo->type, data->hpdev.nr, data->hpdev.brdinfo->addr);
-
- i2c_put_adapter(data->hpdev.adapter);
- data->hpdev.adapter = NULL;
- return PTR_ERR(data->hpdev.client);
+ err = PTR_ERR(data->hpdev.client);
+ goto i2c_new_device_fail;
}

regmap = devm_regmap_init_i2c(data->hpdev.client,
&mlxreg_lc_regmap_conf);
if (IS_ERR(regmap)) {
+ dev_err(&pdev->dev, "Failed to create regmap for client %s at bus %d at addr 0x%02x\n",
+ data->hpdev.brdinfo->type, data->hpdev.nr, data->hpdev.brdinfo->addr);
err = PTR_ERR(regmap);
- goto mlxreg_lc_probe_fail;
+ goto devm_regmap_init_i2c_fail;
}

/* Set default registers. */
for (i = 0; i < mlxreg_lc_regmap_conf.num_reg_defaults; i++) {
err = regmap_write(regmap, mlxreg_lc_regmap_default[i].reg,
mlxreg_lc_regmap_default[i].def);
- if (err)
- goto mlxreg_lc_probe_fail;
+ if (err) {
+ dev_err(&pdev->dev, "Failed to set default regmap %d for client %s at bus %d at addr 0x%02x\n",
+ i, data->hpdev.brdinfo->type, data->hpdev.nr,
+ data->hpdev.brdinfo->addr);
+ goto regmap_write_fail;
+ }
}

/* Sync registers with hardware. */
regcache_mark_dirty(regmap);
err = regcache_sync(regmap);
- if (err)
- goto mlxreg_lc_probe_fail;
+ if (err) {
+ dev_err(&pdev->dev, "Failed to sync regmap for client %s at bus %d at addr 0x%02x\n",
+ data->hpdev.brdinfo->type, data->hpdev.nr, data->hpdev.brdinfo->addr);
+ err = PTR_ERR(regmap);
+ goto regcache_sync_fail;
+ }

par_pdata = data->hpdev.brdinfo->platform_data;
mlxreg_lc->par_regmap = par_pdata->regmap;
@@ -854,12 +876,27 @@ static int mlxreg_lc_probe(struct platform_device *pdev)
/* Configure line card. */
err = mlxreg_lc_config_init(mlxreg_lc, regmap, data);
if (err)
- goto mlxreg_lc_probe_fail;
+ goto mlxreg_lc_config_init_fail;

return err;

-mlxreg_lc_probe_fail:
+mlxreg_lc_config_init_fail:
+regcache_sync_fail:
+regmap_write_fail:
+devm_regmap_init_i2c_fail:
+ if (data->hpdev.client) {
+ i2c_unregister_device(data->hpdev.client);
+ data->hpdev.client = NULL;
+ }
+i2c_new_device_fail:
i2c_put_adapter(data->hpdev.adapter);
+ data->hpdev.adapter = NULL;
+i2c_get_adapter_fail:
+ /* Clear event notification callback and handle. */
+ if (data->notifier) {
+ data->notifier->user_handler = NULL;
+ data->notifier->handle = NULL;
+ }
return err;
}

@@ -868,11 +905,18 @@ static int mlxreg_lc_remove(struct platform_device *pdev)
struct mlxreg_core_data *data = dev_get_platdata(&pdev->dev);
struct mlxreg_lc *mlxreg_lc = platform_get_drvdata(pdev);

- /* Clear event notification callback. */
- if (data->notifier) {
- data->notifier->user_handler = NULL;
- data->notifier->handle = NULL;
- }
+ /*
+ * Probing and removing are invoked by hotplug events raised upon line card insertion and
+ * removing. If probing procedure fails all data is cleared. However, hotplug event still
+ * will be raised on line card removing and activate removing procedure. In this case there
+ * is nothing to remove.
+ */
+ if (!data->notifier || !data->notifier->handle)
+ return 0;
+
+ /* Clear event notification callback and handle. */
+ data->notifier->user_handler = NULL;
+ data->notifier->handle = NULL;

/* Destroy static I2C device feeding by main power. */
mlxreg_lc_destroy_static_devices(mlxreg_lc, mlxreg_lc->main_devs,
diff --git a/drivers/platform/olpc/olpc-ec.c b/drivers/platform/olpc/olpc-ec.c
index 4ff5c3a12991..921520475ff6 100644
--- a/drivers/platform/olpc/olpc-ec.c
+++ b/drivers/platform/olpc/olpc-ec.c
@@ -264,7 +264,7 @@ static ssize_t ec_dbgfs_cmd_write(struct file *file, const char __user *buf,
int i, m;
unsigned char ec_cmd[EC_MAX_CMD_ARGS];
unsigned int ec_cmd_int[EC_MAX_CMD_ARGS];
- char cmdbuf[64];
+ char cmdbuf[64] = "";
int ec_cmd_bytes;

mutex_lock(&ec_dbgfs_lock);
diff --git a/drivers/platform/x86/pmc_atom.c b/drivers/platform/x86/pmc_atom.c
index a40fae6edc84..f24ab24f2927 100644
--- a/drivers/platform/x86/pmc_atom.c
+++ b/drivers/platform/x86/pmc_atom.c
@@ -402,21 +402,16 @@ static const struct dmi_system_id critclk_systems[] = {
},
},
{
- /* pmc_plt_clk0 - 3 are used for the 4 ethernet controllers */
- .ident = "Lex 3I380D",
+ /*
+ * Lex System / Lex Computech Co. makes a lot of Bay Trail
+ * based embedded boards which often come with multiple
+ * ethernet controllers using multiple pmc_plt_clks. See:
+ * https://www.lex.com.tw/products/embedded-ipc-board/
+ */
+ .ident = "Lex BayTrail",
.callback = dmi_callback,
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "Lex BayTrail"),
- DMI_MATCH(DMI_PRODUCT_NAME, "3I380D"),
- },
- },
- {
- /* pmc_plt_clk* - are used for ethernet controllers */
- .ident = "Lex 2I385SW",
- .callback = dmi_callback,
- .matches = {
- DMI_MATCH(DMI_SYS_VENDOR, "Lex BayTrail"),
- DMI_MATCH(DMI_PRODUCT_NAME, "2I385SW"),
},
},
{
diff --git a/drivers/pwm/pwm-lpc18xx-sct.c b/drivers/pwm/pwm-lpc18xx-sct.c
index b909096dba2f..43b5509dde51 100644
--- a/drivers/pwm/pwm-lpc18xx-sct.c
+++ b/drivers/pwm/pwm-lpc18xx-sct.c
@@ -98,7 +98,7 @@ struct lpc18xx_pwm_chip {
unsigned long clk_rate;
unsigned int period_ns;
unsigned int min_period_ns;
- unsigned int max_period_ns;
+ u64 max_period_ns;
unsigned int period_event;
unsigned long event_map;
struct mutex res_lock;
@@ -145,40 +145,48 @@ static void lpc18xx_pwm_set_conflict_res(struct lpc18xx_pwm_chip *lpc18xx_pwm,
mutex_unlock(&lpc18xx_pwm->res_lock);
}

-static void lpc18xx_pwm_config_period(struct pwm_chip *chip, int period_ns)
+static void lpc18xx_pwm_config_period(struct pwm_chip *chip, u64 period_ns)
{
struct lpc18xx_pwm_chip *lpc18xx_pwm = to_lpc18xx_pwm_chip(chip);
- u64 val;
+ u32 val;

- val = (u64)period_ns * lpc18xx_pwm->clk_rate;
- do_div(val, NSEC_PER_SEC);
+ /*
+ * With clk_rate < NSEC_PER_SEC this cannot overflow.
+ * With period_ns < max_period_ns this also fits into an u32.
+ * As period_ns >= min_period_ns = DIV_ROUND_UP(NSEC_PER_SEC, lpc18xx_pwm->clk_rate);
+ * we have val >= 1.
+ */
+ val = mul_u64_u64_div_u64(period_ns, lpc18xx_pwm->clk_rate, NSEC_PER_SEC);

lpc18xx_pwm_writel(lpc18xx_pwm,
LPC18XX_PWM_MATCH(lpc18xx_pwm->period_event),
- (u32)val - 1);
+ val - 1);

lpc18xx_pwm_writel(lpc18xx_pwm,
LPC18XX_PWM_MATCHREL(lpc18xx_pwm->period_event),
- (u32)val - 1);
+ val - 1);
}

static void lpc18xx_pwm_config_duty(struct pwm_chip *chip,
- struct pwm_device *pwm, int duty_ns)
+ struct pwm_device *pwm, u64 duty_ns)
{
struct lpc18xx_pwm_chip *lpc18xx_pwm = to_lpc18xx_pwm_chip(chip);
struct lpc18xx_pwm_data *lpc18xx_data = &lpc18xx_pwm->channeldata[pwm->hwpwm];
- u64 val;
+ u32 val;

- val = (u64)duty_ns * lpc18xx_pwm->clk_rate;
- do_div(val, NSEC_PER_SEC);
+ /*
+ * With clk_rate < NSEC_PER_SEC this cannot overflow.
+ * With duty_ns <= period_ns < max_period_ns this also fits into an u32.
+ */
+ val = mul_u64_u64_div_u64(duty_ns, lpc18xx_pwm->clk_rate, NSEC_PER_SEC);

lpc18xx_pwm_writel(lpc18xx_pwm,
LPC18XX_PWM_MATCH(lpc18xx_data->duty_event),
- (u32)val);
+ val);

lpc18xx_pwm_writel(lpc18xx_pwm,
LPC18XX_PWM_MATCHREL(lpc18xx_data->duty_event),
- (u32)val);
+ val);
}

static int lpc18xx_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
@@ -360,12 +368,27 @@ static int lpc18xx_pwm_probe(struct platform_device *pdev)
goto disable_pwmclk;
}

+ /*
+ * If clkrate is too fast, the calculations in .apply() might overflow.
+ */
+ if (lpc18xx_pwm->clk_rate > NSEC_PER_SEC) {
+ ret = dev_err_probe(&pdev->dev, -EINVAL, "pwm clock to fast\n");
+ goto disable_pwmclk;
+ }
+
+ /*
+ * If clkrate is too fast, the calculations in .apply() might overflow.
+ */
+ if (lpc18xx_pwm->clk_rate > NSEC_PER_SEC) {
+ ret = dev_err_probe(&pdev->dev, -EINVAL, "pwm clock to fast\n");
+ goto disable_pwmclk;
+ }
+
mutex_init(&lpc18xx_pwm->res_lock);
mutex_init(&lpc18xx_pwm->period_lock);

- val = (u64)NSEC_PER_SEC * LPC18XX_PWM_TIMER_MAX;
- do_div(val, lpc18xx_pwm->clk_rate);
- lpc18xx_pwm->max_period_ns = val;
+ lpc18xx_pwm->max_period_ns =
+ mul_u64_u64_div_u64(NSEC_PER_SEC, LPC18XX_PWM_TIMER_MAX, lpc18xx_pwm->clk_rate);

lpc18xx_pwm->min_period_ns = DIV_ROUND_UP(NSEC_PER_SEC,
lpc18xx_pwm->clk_rate);
diff --git a/drivers/pwm/pwm-sifive.c b/drivers/pwm/pwm-sifive.c
index 253c4a17d255..58347fcd4812 100644
--- a/drivers/pwm/pwm-sifive.c
+++ b/drivers/pwm/pwm-sifive.c
@@ -23,7 +23,7 @@
#define PWM_SIFIVE_PWMCFG 0x0
#define PWM_SIFIVE_PWMCOUNT 0x8
#define PWM_SIFIVE_PWMS 0x10
-#define PWM_SIFIVE_PWMCMP0 0x20
+#define PWM_SIFIVE_PWMCMP(i) (0x20 + 4 * (i))

/* PWMCFG fields */
#define PWM_SIFIVE_PWMCFG_SCALE GENMASK(3, 0)
@@ -36,8 +36,6 @@
#define PWM_SIFIVE_PWMCFG_GANG BIT(24)
#define PWM_SIFIVE_PWMCFG_IP BIT(28)

-/* PWM_SIFIVE_SIZE_PWMCMP is used to calculate offset for pwmcmpX registers */
-#define PWM_SIFIVE_SIZE_PWMCMP 4
#define PWM_SIFIVE_CMPWIDTH 16
#define PWM_SIFIVE_DEFAULT_PERIOD 10000000

@@ -112,8 +110,7 @@ static void pwm_sifive_get_state(struct pwm_chip *chip, struct pwm_device *pwm,
struct pwm_sifive_ddata *ddata = pwm_sifive_chip_to_ddata(chip);
u32 duty, val;

- duty = readl(ddata->regs + PWM_SIFIVE_PWMCMP0 +
- pwm->hwpwm * PWM_SIFIVE_SIZE_PWMCMP);
+ duty = readl(ddata->regs + PWM_SIFIVE_PWMCMP(pwm->hwpwm));

state->enabled = duty > 0;

@@ -194,8 +191,7 @@ static int pwm_sifive_apply(struct pwm_chip *chip, struct pwm_device *pwm,
pwm_sifive_update_clock(ddata, clk_get_rate(ddata->clk));
}

- writel(frac, ddata->regs + PWM_SIFIVE_PWMCMP0 +
- pwm->hwpwm * PWM_SIFIVE_SIZE_PWMCMP);
+ writel(frac, ddata->regs + PWM_SIFIVE_PWMCMP(pwm->hwpwm));

if (state->enabled != enabled)
pwm_sifive_enable(chip, state->enabled);
@@ -233,6 +229,8 @@ static int pwm_sifive_probe(struct platform_device *pdev)
struct pwm_sifive_ddata *ddata;
struct pwm_chip *chip;
int ret;
+ u32 val;
+ unsigned int enabled_pwms = 0, enabled_clks = 1;

ddata = devm_kzalloc(dev, sizeof(*ddata), GFP_KERNEL);
if (!ddata)
@@ -259,6 +257,33 @@ static int pwm_sifive_probe(struct platform_device *pdev)
return ret;
}

+ val = readl(ddata->regs + PWM_SIFIVE_PWMCFG);
+ if (val & PWM_SIFIVE_PWMCFG_EN_ALWAYS) {
+ unsigned int i;
+
+ for (i = 0; i < chip->npwm; ++i) {
+ val = readl(ddata->regs + PWM_SIFIVE_PWMCMP(i));
+ if (val > 0)
+ ++enabled_pwms;
+ }
+ }
+
+ /* The clk should be on once for each running PWM. */
+ if (enabled_pwms) {
+ while (enabled_clks < enabled_pwms) {
+ /* This is not expected to fail as the clk is already on */
+ ret = clk_enable(ddata->clk);
+ if (unlikely(ret)) {
+ dev_err_probe(dev, ret, "Failed to enable clk\n");
+ goto disable_clk;
+ }
+ ++enabled_clks;
+ }
+ } else {
+ clk_disable(ddata->clk);
+ enabled_clks = 0;
+ }
+
/* Watch for changes to underlying clock frequency */
ddata->notifier.notifier_call = pwm_sifive_clock_notifier;
ret = clk_notifier_register(ddata->clk, &ddata->notifier);
@@ -281,7 +306,11 @@ static int pwm_sifive_probe(struct platform_device *pdev)
unregister_clk:
clk_notifier_unregister(ddata->clk, &ddata->notifier);
disable_clk:
- clk_disable_unprepare(ddata->clk);
+ while (enabled_clks) {
+ clk_disable(ddata->clk);
+ --enabled_clks;
+ }
+ clk_unprepare(ddata->clk);

return ret;
}
@@ -289,23 +318,19 @@ static int pwm_sifive_probe(struct platform_device *pdev)
static int pwm_sifive_remove(struct platform_device *dev)
{
struct pwm_sifive_ddata *ddata = platform_get_drvdata(dev);
- bool is_enabled = false;
struct pwm_device *pwm;
int ch;

+ pwmchip_remove(&ddata->chip);
+ clk_notifier_unregister(ddata->clk, &ddata->notifier);
+
for (ch = 0; ch < ddata->chip.npwm; ch++) {
pwm = &ddata->chip.pwms[ch];
- if (pwm->state.enabled) {
- is_enabled = true;
- break;
- }
+ if (pwm->state.enabled)
+ clk_disable(ddata->clk);
}
- if (is_enabled)
- clk_disable(ddata->clk);

- clk_disable_unprepare(ddata->clk);
- pwmchip_remove(&ddata->chip);
- clk_notifier_unregister(ddata->clk, &ddata->notifier);
+ clk_unprepare(ddata->clk);

return 0;
}
diff --git a/drivers/regulator/of_regulator.c b/drivers/regulator/of_regulator.c
index f54d4f176882..e12b681c72e5 100644
--- a/drivers/regulator/of_regulator.c
+++ b/drivers/regulator/of_regulator.c
@@ -264,8 +264,12 @@ static int of_get_regulation_constraints(struct device *dev,
}

suspend_np = of_get_child_by_name(np, regulator_states[i]);
- if (!suspend_np || !suspend_state)
+ if (!suspend_np)
continue;
+ if (!suspend_state) {
+ of_node_put(suspend_np);
+ continue;
+ }

if (!of_property_read_u32(suspend_np, "regulator-mode",
&pval)) {
diff --git a/drivers/regulator/qcom_smd-regulator.c b/drivers/regulator/qcom_smd-regulator.c
index 7dff94a2eb7e..0af8286e1b10 100644
--- a/drivers/regulator/qcom_smd-regulator.c
+++ b/drivers/regulator/qcom_smd-regulator.c
@@ -357,10 +357,10 @@ static const struct regulator_desc pm8941_switch = {

static const struct regulator_desc pm8916_pldo = {
.linear_ranges = (struct linear_range[]) {
- REGULATOR_LINEAR_RANGE(750000, 0, 208, 12500),
+ REGULATOR_LINEAR_RANGE(1750000, 0, 127, 12500),
},
.n_linear_ranges = 1,
- .n_voltages = 209,
+ .n_voltages = 128,
.ops = &rpm_smps_ldo_ops,
};

diff --git a/drivers/remoteproc/imx_rproc.c b/drivers/remoteproc/imx_rproc.c
index 91eb037089ef..f17bb41a6551 100644
--- a/drivers/remoteproc/imx_rproc.c
+++ b/drivers/remoteproc/imx_rproc.c
@@ -562,16 +562,17 @@ static int imx_rproc_addr_init(struct imx_rproc *priv,

node = of_parse_phandle(np, "memory-region", a);
/* Not map vdevbuffer, vdevring region */
- if (!strncmp(node->name, "vdev", strlen("vdev")))
+ if (!strncmp(node->name, "vdev", strlen("vdev"))) {
+ of_node_put(node);
continue;
+ }
err = of_address_to_resource(node, 0, &res);
+ of_node_put(node);
if (err) {
dev_err(dev, "unable to resolve memory region\n");
return err;
}

- of_node_put(node);
-
if (b >= IMX_RPROC_MEM_MAX)
break;

diff --git a/drivers/remoteproc/qcom_q6v5_pas.c b/drivers/remoteproc/qcom_q6v5_pas.c
index 1ae47cc153e5..e7765bbe17a0 100644
--- a/drivers/remoteproc/qcom_q6v5_pas.c
+++ b/drivers/remoteproc/qcom_q6v5_pas.c
@@ -87,6 +87,9 @@ static void adsp_minidump(struct rproc *rproc)
{
struct qcom_adsp *adsp = rproc->priv;

+ if (rproc->dump_conf == RPROC_COREDUMP_DISABLED)
+ return;
+
qcom_minidump(rproc, adsp->minidump_id);
}

diff --git a/drivers/remoteproc/qcom_sysmon.c b/drivers/remoteproc/qcom_sysmon.c
index 9fca81492863..a9f04dd83ab6 100644
--- a/drivers/remoteproc/qcom_sysmon.c
+++ b/drivers/remoteproc/qcom_sysmon.c
@@ -41,6 +41,7 @@ struct qcom_sysmon {
struct completion comp;
struct completion ind_comp;
struct completion shutdown_comp;
+ struct completion ssctl_comp;
struct mutex lock;

bool ssr_ack;
@@ -445,6 +446,8 @@ static int ssctl_new_server(struct qmi_handle *qmi, struct qmi_service *svc)

svc->priv = sysmon;

+ complete(&sysmon->ssctl_comp);
+
return 0;
}

@@ -501,6 +504,7 @@ static int sysmon_start(struct rproc_subdev *subdev)
.ssr_event = SSCTL_SSR_EVENT_AFTER_POWERUP
};

+ reinit_completion(&sysmon->ssctl_comp);
mutex_lock(&sysmon->state_lock);
sysmon->state = SSCTL_SSR_EVENT_AFTER_POWERUP;
blocking_notifier_call_chain(&sysmon_notifiers, 0, (void *)&event);
@@ -545,6 +549,11 @@ static void sysmon_stop(struct rproc_subdev *subdev, bool crashed)
if (crashed)
return;

+ if (sysmon->ssctl_instance) {
+ if (!wait_for_completion_timeout(&sysmon->ssctl_comp, HZ / 2))
+ dev_err(sysmon->dev, "timeout waiting for ssctl service\n");
+ }
+
if (sysmon->ssctl_version)
sysmon->shutdown_acked = ssctl_request_shutdown(sysmon);
else if (sysmon->ept)
@@ -631,6 +640,7 @@ struct qcom_sysmon *qcom_add_sysmon_subdev(struct rproc *rproc,
init_completion(&sysmon->comp);
init_completion(&sysmon->ind_comp);
init_completion(&sysmon->shutdown_comp);
+ init_completion(&sysmon->ssctl_comp);
mutex_init(&sysmon->lock);
mutex_init(&sysmon->state_lock);

diff --git a/drivers/remoteproc/qcom_wcnss.c b/drivers/remoteproc/qcom_wcnss.c
index 9a223d394087..68f37296b151 100644
--- a/drivers/remoteproc/qcom_wcnss.c
+++ b/drivers/remoteproc/qcom_wcnss.c
@@ -467,6 +467,7 @@ static int wcnss_request_irq(struct qcom_wcnss *wcnss,
irq_handler_t thread_fn)
{
int ret;
+ int irq_number;

ret = platform_get_irq_byname(pdev, name);
if (ret < 0 && optional) {
@@ -477,14 +478,19 @@ static int wcnss_request_irq(struct qcom_wcnss *wcnss,
return ret;
}

+ irq_number = ret;
+
ret = devm_request_threaded_irq(&pdev->dev, ret,
NULL, thread_fn,
IRQF_TRIGGER_RISING | IRQF_ONESHOT,
"wcnss", wcnss);
- if (ret)
+ if (ret) {
dev_err(&pdev->dev, "request %s IRQ failed\n", name);
+ return ret;
+ }

- return ret;
+ /* Return the IRQ number if the IRQ was successfully acquired */
+ return irq_number;
}

static int wcnss_alloc_memory_region(struct qcom_wcnss *wcnss)
diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c
index 4840ad906018..0481926c6975 100644
--- a/drivers/remoteproc/ti_k3_r5_remoteproc.c
+++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c
@@ -1655,6 +1655,7 @@ static int k3_r5_cluster_of_init(struct platform_device *pdev)
if (!cpdev) {
ret = -ENODEV;
dev_err(dev, "could not get R5 core platform device\n");
+ of_node_put(child);
goto fail;
}

@@ -1663,6 +1664,7 @@ static int k3_r5_cluster_of_init(struct platform_device *pdev)
dev_err(dev, "k3_r5_core_of_init failed, ret = %d\n",
ret);
put_device(&cpdev->dev);
+ of_node_put(child);
goto fail;
}

diff --git a/drivers/rpmsg/mtk_rpmsg.c b/drivers/rpmsg/mtk_rpmsg.c
index 5b4404b8be4c..d1213c33da20 100644
--- a/drivers/rpmsg/mtk_rpmsg.c
+++ b/drivers/rpmsg/mtk_rpmsg.c
@@ -234,7 +234,9 @@ static void mtk_register_device_work_function(struct work_struct *register_work)
if (info->registered)
continue;

+ mutex_unlock(&subdev->channels_lock);
ret = mtk_rpmsg_register_device(subdev, &info->info);
+ mutex_lock(&subdev->channels_lock);
if (ret) {
dev_err(&pdev->dev, "Can't create rpmsg_device\n");
continue;
diff --git a/drivers/rpmsg/qcom_smd.c b/drivers/rpmsg/qcom_smd.c
index 1957b27c4cf3..f7af53891ef9 100644
--- a/drivers/rpmsg/qcom_smd.c
+++ b/drivers/rpmsg/qcom_smd.c
@@ -1383,6 +1383,7 @@ static int qcom_smd_parse_edge(struct device *dev,
}

edge->ipc_regmap = syscon_node_to_regmap(syscon_np);
+ of_node_put(syscon_np);
if (IS_ERR(edge->ipc_regmap)) {
ret = PTR_ERR(edge->ipc_regmap);
goto put_node;
diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
index b6183d4f62a2..4f2189111494 100644
--- a/drivers/rpmsg/rpmsg_char.c
+++ b/drivers/rpmsg/rpmsg_char.c
@@ -120,8 +120,11 @@ static int rpmsg_eptdev_open(struct inode *inode, struct file *filp)
struct rpmsg_device *rpdev = eptdev->rpdev;
struct device *dev = &eptdev->dev;

- if (eptdev->ept)
+ mutex_lock(&eptdev->ept_lock);
+ if (eptdev->ept) {
+ mutex_unlock(&eptdev->ept_lock);
return -EBUSY;
+ }

get_device(dev);

@@ -137,11 +140,13 @@ static int rpmsg_eptdev_open(struct inode *inode, struct file *filp)
if (!ept) {
dev_err(dev, "failed to open %s\n", eptdev->chinfo.name);
put_device(dev);
+ mutex_unlock(&eptdev->ept_lock);
return -EINVAL;
}

eptdev->ept = ept;
filp->private_data = eptdev;
+ mutex_unlock(&eptdev->ept_lock);

return 0;
}
diff --git a/drivers/rtc/rtc-rx8025.c b/drivers/rtc/rtc-rx8025.c
index 5bfdd34a72ff..753c79892aa6 100644
--- a/drivers/rtc/rtc-rx8025.c
+++ b/drivers/rtc/rtc-rx8025.c
@@ -55,6 +55,8 @@
#define RX8025_BIT_CTRL2_XST BIT(5)
#define RX8025_BIT_CTRL2_VDET BIT(6)

+#define RX8035_BIT_HOUR_1224 BIT(7)
+
/* Clock precision adjustment */
#define RX8025_ADJ_RESOLUTION 3050 /* in ppb */
#define RX8025_ADJ_DATA_MAX 62
@@ -78,6 +80,7 @@ struct rx8025_data {
struct rtc_device *rtc;
enum rx_model model;
u8 ctrl1;
+ int is_24;
};

static s32 rx8025_read_reg(const struct i2c_client *client, u8 number)
@@ -226,7 +229,7 @@ static int rx8025_get_time(struct device *dev, struct rtc_time *dt)

dt->tm_sec = bcd2bin(date[RX8025_REG_SEC] & 0x7f);
dt->tm_min = bcd2bin(date[RX8025_REG_MIN] & 0x7f);
- if (rx8025->ctrl1 & RX8025_BIT_CTRL1_1224)
+ if (rx8025->is_24)
dt->tm_hour = bcd2bin(date[RX8025_REG_HOUR] & 0x3f);
else
dt->tm_hour = bcd2bin(date[RX8025_REG_HOUR] & 0x1f) % 12
@@ -254,7 +257,7 @@ static int rx8025_set_time(struct device *dev, struct rtc_time *dt)
*/
date[RX8025_REG_SEC] = bin2bcd(dt->tm_sec);
date[RX8025_REG_MIN] = bin2bcd(dt->tm_min);
- if (rx8025->ctrl1 & RX8025_BIT_CTRL1_1224)
+ if (rx8025->is_24)
date[RX8025_REG_HOUR] = bin2bcd(dt->tm_hour);
else
date[RX8025_REG_HOUR] = (dt->tm_hour >= 12 ? 0x20 : 0)
@@ -279,6 +282,7 @@ static int rx8025_init_client(struct i2c_client *client)
struct rx8025_data *rx8025 = i2c_get_clientdata(client);
u8 ctrl[2], ctrl2;
int need_clear = 0;
+ int hour_reg;
int err;

err = rx8025_read_regs(client, RX8025_REG_CTRL1, 2, ctrl);
@@ -303,6 +307,16 @@ static int rx8025_init_client(struct i2c_client *client)

err = rx8025_write_reg(client, RX8025_REG_CTRL2, ctrl2);
}
+
+ if (rx8025->model == model_rx_8035) {
+ /* In RX-8035, 12/24 flag is in the hour register */
+ hour_reg = rx8025_read_reg(client, RX8025_REG_HOUR);
+ if (hour_reg < 0)
+ return hour_reg;
+ rx8025->is_24 = (hour_reg & RX8035_BIT_HOUR_1224);
+ } else {
+ rx8025->is_24 = (ctrl[1] & RX8025_BIT_CTRL1_1224);
+ }
out:
return err;
}
@@ -329,7 +343,7 @@ static int rx8025_read_alarm(struct device *dev, struct rtc_wkalrm *t)
/* Hardware alarms precision is 1 minute! */
t->time.tm_sec = 0;
t->time.tm_min = bcd2bin(ald[0] & 0x7f);
- if (rx8025->ctrl1 & RX8025_BIT_CTRL1_1224)
+ if (rx8025->is_24)
t->time.tm_hour = bcd2bin(ald[1] & 0x3f);
else
t->time.tm_hour = bcd2bin(ald[1] & 0x1f) % 12
@@ -350,7 +364,7 @@ static int rx8025_set_alarm(struct device *dev, struct rtc_wkalrm *t)
int err;

ald[0] = bin2bcd(t->time.tm_min);
- if (rx8025->ctrl1 & RX8025_BIT_CTRL1_1224)
+ if (rx8025->is_24)
ald[1] = bin2bcd(t->time.tm_hour);
else
ald[1] = (t->time.tm_hour >= 12 ? 0x20 : 0)
diff --git a/drivers/s390/char/zcore.c b/drivers/s390/char/zcore.c
index 516783ba950f..92b32ce645b9 100644
--- a/drivers/s390/char/zcore.c
+++ b/drivers/s390/char/zcore.c
@@ -50,6 +50,7 @@ static struct dentry *zcore_reipl_file;
static struct dentry *zcore_hsa_file;
static struct ipl_parameter_block *zcore_ipl_block;

+static DEFINE_MUTEX(hsa_buf_mutex);
static char hsa_buf[PAGE_SIZE] __aligned(PAGE_SIZE);

/*
@@ -66,19 +67,24 @@ int memcpy_hsa_user(void __user *dest, unsigned long src, size_t count)
if (!hsa_available)
return -ENODATA;

+ mutex_lock(&hsa_buf_mutex);
while (count) {
if (sclp_sdias_copy(hsa_buf, src / PAGE_SIZE + 2, 1)) {
TRACE("sclp_sdias_copy() failed\n");
+ mutex_unlock(&hsa_buf_mutex);
return -EIO;
}
offset = src % PAGE_SIZE;
bytes = min(PAGE_SIZE - offset, count);
- if (copy_to_user(dest, hsa_buf + offset, bytes))
+ if (copy_to_user(dest, hsa_buf + offset, bytes)) {
+ mutex_unlock(&hsa_buf_mutex);
return -EFAULT;
+ }
src += bytes;
dest += bytes;
count -= bytes;
}
+ mutex_unlock(&hsa_buf_mutex);
return 0;
}

@@ -96,9 +102,11 @@ int memcpy_hsa_kernel(void *dest, unsigned long src, size_t count)
if (!hsa_available)
return -ENODATA;

+ mutex_lock(&hsa_buf_mutex);
while (count) {
if (sclp_sdias_copy(hsa_buf, src / PAGE_SIZE + 2, 1)) {
TRACE("sclp_sdias_copy() failed\n");
+ mutex_unlock(&hsa_buf_mutex);
return -EIO;
}
offset = src % PAGE_SIZE;
@@ -108,6 +116,7 @@ int memcpy_hsa_kernel(void *dest, unsigned long src, size_t count)
dest += bytes;
count -= bytes;
}
+ mutex_unlock(&hsa_buf_mutex);
return 0;
}

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index ee182cfb467d..e76a4a3cf1ce 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -107,9 +107,10 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
/*
* Reset to IDLE only if processing of a channel program
* has finished. Do not overwrite a possible processing
- * state if the final interrupt was for HSCH or CSCH.
+ * state if the interrupt was unsolicited, or if the final
+ * interrupt was for HSCH or CSCH.
*/
- if (private->mdev && cp_is_finished)
+ if (cp_is_finished)
private->state = VFIO_CCW_STATE_IDLE;

if (private->io_trigger)
@@ -301,19 +302,11 @@ static int vfio_ccw_sch_event(struct subchannel *sch, int process)
if (work_pending(&sch->todo_work))
goto out_unlock;

- if (cio_update_schib(sch)) {
- vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_NOT_OPER);
- rc = 0;
- goto out_unlock;
- }
-
- private = dev_get_drvdata(&sch->dev);
- if (private->state == VFIO_CCW_STATE_NOT_OPER) {
- private->state = private->mdev ? VFIO_CCW_STATE_IDLE :
- VFIO_CCW_STATE_STANDBY;
- }
rc = 0;

+ if (cio_update_schib(sch))
+ vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_NOT_OPER);
+
out_unlock:
spin_unlock_irqrestore(sch->lock, flags);

diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index d8589afac272..8d76f5b26e2b 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -146,7 +146,7 @@ static int vfio_ccw_mdev_probe(struct mdev_device *mdev)
vfio_uninit_group_dev(&private->vdev);
atomic_inc(&private->avail);
private->mdev = NULL;
- private->state = VFIO_CCW_STATE_IDLE;
+ private->state = VFIO_CCW_STATE_STANDBY;
return ret;
}

diff --git a/drivers/s390/scsi/zfcp_fc.c b/drivers/s390/scsi/zfcp_fc.c
index 511bf8e0a436..b61acbb09be3 100644
--- a/drivers/s390/scsi/zfcp_fc.c
+++ b/drivers/s390/scsi/zfcp_fc.c
@@ -145,27 +145,33 @@ void zfcp_fc_enqueue_event(struct zfcp_adapter *adapter,

static int zfcp_fc_wka_port_get(struct zfcp_fc_wka_port *wka_port)
{
+ int ret = -EIO;
+
if (mutex_lock_interruptible(&wka_port->mutex))
return -ERESTARTSYS;

if (wka_port->status == ZFCP_FC_WKA_PORT_OFFLINE ||
wka_port->status == ZFCP_FC_WKA_PORT_CLOSING) {
wka_port->status = ZFCP_FC_WKA_PORT_OPENING;
- if (zfcp_fsf_open_wka_port(wka_port))
+ if (zfcp_fsf_open_wka_port(wka_port)) {
+ /* could not even send request, nothing to wait for */
wka_port->status = ZFCP_FC_WKA_PORT_OFFLINE;
+ goto out;
+ }
}

- mutex_unlock(&wka_port->mutex);
-
- wait_event(wka_port->completion_wq,
+ wait_event(wka_port->opened,
wka_port->status == ZFCP_FC_WKA_PORT_ONLINE ||
wka_port->status == ZFCP_FC_WKA_PORT_OFFLINE);

if (wka_port->status == ZFCP_FC_WKA_PORT_ONLINE) {
atomic_inc(&wka_port->refcount);
- return 0;
+ ret = 0;
+ goto out;
}
- return -EIO;
+out:
+ mutex_unlock(&wka_port->mutex);
+ return ret;
}

static void zfcp_fc_wka_port_offline(struct work_struct *work)
@@ -181,9 +187,12 @@ static void zfcp_fc_wka_port_offline(struct work_struct *work)

wka_port->status = ZFCP_FC_WKA_PORT_CLOSING;
if (zfcp_fsf_close_wka_port(wka_port)) {
+ /* could not even send request, nothing to wait for */
wka_port->status = ZFCP_FC_WKA_PORT_OFFLINE;
- wake_up(&wka_port->completion_wq);
+ goto out;
}
+ wait_event(wka_port->closed,
+ wka_port->status == ZFCP_FC_WKA_PORT_OFFLINE);
out:
mutex_unlock(&wka_port->mutex);
}
@@ -193,13 +202,15 @@ static void zfcp_fc_wka_port_put(struct zfcp_fc_wka_port *wka_port)
if (atomic_dec_return(&wka_port->refcount) != 0)
return;
/* wait 10 milliseconds, other reqs might pop in */
- schedule_delayed_work(&wka_port->work, HZ / 100);
+ queue_delayed_work(wka_port->adapter->work_queue, &wka_port->work,
+ msecs_to_jiffies(10));
}

static void zfcp_fc_wka_port_init(struct zfcp_fc_wka_port *wka_port, u32 d_id,
struct zfcp_adapter *adapter)
{
- init_waitqueue_head(&wka_port->completion_wq);
+ init_waitqueue_head(&wka_port->opened);
+ init_waitqueue_head(&wka_port->closed);

wka_port->adapter = adapter;
wka_port->d_id = d_id;
diff --git a/drivers/s390/scsi/zfcp_fc.h b/drivers/s390/scsi/zfcp_fc.h
index 8aaf409ce9cb..97755407ce1b 100644
--- a/drivers/s390/scsi/zfcp_fc.h
+++ b/drivers/s390/scsi/zfcp_fc.h
@@ -185,7 +185,8 @@ enum zfcp_fc_wka_status {
/**
* struct zfcp_fc_wka_port - representation of well-known-address (WKA) FC port
* @adapter: Pointer to adapter structure this WKA port belongs to
- * @completion_wq: Wait for completion of open/close command
+ * @opened: Wait for completion of open command
+ * @closed: Wait for completion of close command
* @status: Current status of WKA port
* @refcount: Reference count to keep port open as long as it is in use
* @d_id: FC destination id or well-known-address
@@ -195,7 +196,8 @@ enum zfcp_fc_wka_status {
*/
struct zfcp_fc_wka_port {
struct zfcp_adapter *adapter;
- wait_queue_head_t completion_wq;
+ wait_queue_head_t opened;
+ wait_queue_head_t closed;
enum zfcp_fc_wka_status status;
atomic_t refcount;
u32 d_id;
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index 4f1e4385ce58..19223b075568 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -1907,7 +1907,7 @@ static void zfcp_fsf_open_wka_port_handler(struct zfcp_fsf_req *req)
wka_port->status = ZFCP_FC_WKA_PORT_ONLINE;
}
out:
- wake_up(&wka_port->completion_wq);
+ wake_up(&wka_port->opened);
}

/**
@@ -1966,7 +1966,7 @@ static void zfcp_fsf_close_wka_port_handler(struct zfcp_fsf_req *req)
}

wka_port->status = ZFCP_FC_WKA_PORT_OFFLINE;
- wake_up(&wka_port->completion_wq);
+ wake_up(&wka_port->closed);
}

/**
diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c
index 3bb0adefbe06..02026476c39c 100644
--- a/drivers/scsi/be2iscsi/be_main.c
+++ b/drivers/scsi/be2iscsi/be_main.c
@@ -5745,7 +5745,7 @@ static void beiscsi_remove(struct pci_dev *pcidev)
cancel_work_sync(&phba->sess_work);

beiscsi_iface_destroy_default(phba);
- iscsi_host_remove(phba->shost);
+ iscsi_host_remove(phba->shost, false);
beiscsi_disable_port(phba, 1);

/* after cancelling boot_work */
diff --git a/drivers/scsi/bnx2i/bnx2i_iscsi.c b/drivers/scsi/bnx2i/bnx2i_iscsi.c
index 15fbd09baa94..a3c800e04a2e 100644
--- a/drivers/scsi/bnx2i/bnx2i_iscsi.c
+++ b/drivers/scsi/bnx2i/bnx2i_iscsi.c
@@ -909,7 +909,7 @@ void bnx2i_free_hba(struct bnx2i_hba *hba)
{
struct Scsi_Host *shost = hba->shost;

- iscsi_host_remove(shost);
+ iscsi_host_remove(shost, false);
INIT_LIST_HEAD(&hba->ep_ofld_list);
INIT_LIST_HEAD(&hba->ep_active_list);
INIT_LIST_HEAD(&hba->ep_destroy_list);
diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
index 4365d52c6430..32abdf0fa9aa 100644
--- a/drivers/scsi/cxgbi/libcxgbi.c
+++ b/drivers/scsi/cxgbi/libcxgbi.c
@@ -328,7 +328,7 @@ void cxgbi_hbas_remove(struct cxgbi_device *cdev)
chba = cdev->hbas[i];
if (chba) {
cdev->hbas[i] = NULL;
- iscsi_host_remove(chba->shost);
+ iscsi_host_remove(chba->shost, false);
pci_dev_put(cdev->pdev);
iscsi_host_free(chba->shost);
}
diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 9fee70d6434a..52c6f70d60ec 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -898,7 +898,7 @@ iscsi_sw_tcp_session_create(struct iscsi_endpoint *ep, uint16_t cmds_max,
remove_session:
iscsi_session_teardown(cls_session);
remove_host:
- iscsi_host_remove(shost);
+ iscsi_host_remove(shost, false);
free_host:
iscsi_host_free(shost);
return NULL;
@@ -915,7 +915,7 @@ static void iscsi_sw_tcp_session_destroy(struct iscsi_cls_session *cls_session)
iscsi_tcp_r2tpool_free(cls_session->dd_data);
iscsi_session_teardown(cls_session);

- iscsi_host_remove(shost);
+ iscsi_host_remove(shost, false);
iscsi_host_free(shost);
}

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 797abf4f5399..3ddb701cd29c 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -2828,11 +2828,12 @@ static void iscsi_notify_host_removed(struct iscsi_cls_session *cls_session)
/**
* iscsi_host_remove - remove host and sessions
* @shost: scsi host
+ * @is_shutdown: true if called from a driver shutdown callout
*
* If there are any sessions left, this will initiate the removal and wait
* for the completion.
*/
-void iscsi_host_remove(struct Scsi_Host *shost)
+void iscsi_host_remove(struct Scsi_Host *shost, bool is_shutdown)
{
struct iscsi_host *ihost = shost_priv(shost);
unsigned long flags;
@@ -2841,7 +2842,11 @@ void iscsi_host_remove(struct Scsi_Host *shost)
ihost->state = ISCSI_HOST_REMOVED;
spin_unlock_irqrestore(&ihost->lock, flags);

- iscsi_host_for_each_session(shost, iscsi_notify_host_removed);
+ if (!is_shutdown)
+ iscsi_host_for_each_session(shost, iscsi_notify_host_removed);
+ else
+ iscsi_host_for_each_session(shost, iscsi_force_destroy_session);
+
wait_event_interruptible(ihost->session_removal_wq,
ihost->num_sessions == 0);
if (signal_pending(current))
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index add238dc599b..baed4f0146f5 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -5710,7 +5710,6 @@ lpfc_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *cmnd)
cur_iocbq->cmd_flag |= LPFC_IO_VMID;
}
}
- atomic_inc(&ndlp->cmd_pending);

#ifdef CONFIG_SCSI_LPFC_DEBUG_FS
if (unlikely(phba->hdwqstat_on & LPFC_CHECK_SCSI_IO))
diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index 83ffba7f51da..780d975c85b5 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -2414,9 +2414,12 @@ static void __qedi_remove(struct pci_dev *pdev, int mode)
int rval;
u16 retry = 10;

- if (mode == QEDI_MODE_NORMAL || mode == QEDI_MODE_SHUTDOWN) {
- iscsi_host_remove(qedi->shost);
+ if (mode == QEDI_MODE_NORMAL)
+ iscsi_host_remove(qedi->shost, false);
+ else if (mode == QEDI_MODE_SHUTDOWN)
+ iscsi_host_remove(qedi->shost, true);

+ if (mode == QEDI_MODE_NORMAL || mode == QEDI_MODE_SHUTDOWN) {
if (qedi->tmf_thread) {
destroy_workqueue(qedi->tmf_thread);
qedi->tmf_thread = NULL;
@@ -2791,7 +2794,7 @@ static int __qedi_probe(struct pci_dev *pdev, int mode)
#ifdef CONFIG_DEBUG_FS
qedi_dbg_host_exit(&qedi->dbg_ctx);
#endif
- iscsi_host_remove(qedi->shost);
+ iscsi_host_remove(qedi->shost, false);
stop_iscsi_func:
qedi_ops->stop(qedi->cdev);
stop_slowpath:
diff --git a/drivers/scsi/qla2xxx/qla_attr.c b/drivers/scsi/qla2xxx/qla_attr.c
index 3b3e4234f37a..412ad888bdc1 100644
--- a/drivers/scsi/qla2xxx/qla_attr.c
+++ b/drivers/scsi/qla2xxx/qla_attr.c
@@ -2716,17 +2716,24 @@ qla2x00_dev_loss_tmo_callbk(struct fc_rport *rport)
if (!fcport)
return;

- /* Now that the rport has been deleted, set the fcport state to
- FCS_DEVICE_DEAD */
- qla2x00_set_fcport_state(fcport, FCS_DEVICE_DEAD);
+
+ /*
+ * Now that the rport has been deleted, set the fcport state to
+ * FCS_DEVICE_DEAD, if the fcport is still lost.
+ */
+ if (fcport->scan_state != QLA_FCPORT_FOUND)
+ qla2x00_set_fcport_state(fcport, FCS_DEVICE_DEAD);

/*
* Transport has effectively 'deleted' the rport, clear
* all local references.
*/
spin_lock_irqsave(host->host_lock, flags);
- fcport->rport = fcport->drport = NULL;
- *((fc_port_t **)rport->dd_data) = NULL;
+ /* Confirm port has not reappeared before clearing pointers. */
+ if (rport->port_state != FC_PORTSTATE_ONLINE) {
+ fcport->rport = fcport->drport = NULL;
+ *((fc_port_t **)rport->dd_data) = NULL;
+ }
spin_unlock_irqrestore(host->host_lock, flags);

if (test_bit(ABORT_ISP_ACTIVE, &fcport->vha->dpc_flags))
@@ -2759,9 +2766,12 @@ qla2x00_terminate_rport_io(struct fc_rport *rport)
/*
* At this point all fcport's software-states are cleared. Perform any
* final cleanup of firmware resources (PCBs and XCBs).
+ *
+ * Attempt to cleanup only lost devices.
*/
if (fcport->loop_id != FC_NO_LOOP_ID) {
- if (IS_FWI2_CAPABLE(fcport->vha->hw)) {
+ if (IS_FWI2_CAPABLE(fcport->vha->hw) &&
+ fcport->scan_state != QLA_FCPORT_FOUND) {
if (fcport->loop_id != FC_NO_LOOP_ID)
fcport->logout_on_delete = 1;

@@ -2771,7 +2781,7 @@ qla2x00_terminate_rport_io(struct fc_rport *rport)
__LINE__);
qlt_schedule_sess_for_deletion(fcport);
}
- } else {
+ } else if (!IS_FWI2_CAPABLE(fcport->vha->hw)) {
qla2x00_port_logout(fcport->vha, fcport);
}
}
diff --git a/drivers/scsi/qla2xxx/qla_bsg.c b/drivers/scsi/qla2xxx/qla_bsg.c
index c2f00f076f79..726af9e40572 100644
--- a/drivers/scsi/qla2xxx/qla_bsg.c
+++ b/drivers/scsi/qla2xxx/qla_bsg.c
@@ -2975,6 +2975,13 @@ qla24xx_bsg_timeout(struct bsg_job *bsg_job)

ql_log(ql_log_info, vha, 0x708b, "%s CMD timeout. bsg ptr %p.\n",
__func__, bsg_job);
+
+ if (qla2x00_isp_reg_stat(ha)) {
+ ql_log(ql_log_info, vha, 0x9007,
+ "PCI/Register disconnect.\n");
+ qla_pci_set_eeh_busy(vha);
+ }
+
/* find the bsg job from the active list of commands */
spin_lock_irqsave(&ha->hardware_lock, flags);
for (que = 0; que < ha->max_req_queues; que++) {
@@ -2992,7 +2999,8 @@ qla24xx_bsg_timeout(struct bsg_job *bsg_job)
sp->u.bsg_job == bsg_job) {
req->outstanding_cmds[cnt] = NULL;
spin_unlock_irqrestore(&ha->hardware_lock, flags);
- if (ha->isp_ops->abort_command(sp)) {
+
+ if (!ha->flags.eeh_busy && ha->isp_ops->abort_command(sp)) {
ql_log(ql_log_warn, vha, 0x7089,
"mbx abort_command failed.\n");
bsg_reply->result = -EIO;
diff --git a/drivers/scsi/qla2xxx/qla_dbg.h b/drivers/scsi/qla2xxx/qla_dbg.h
index f1f6c740bdcd..feeb1666227f 100644
--- a/drivers/scsi/qla2xxx/qla_dbg.h
+++ b/drivers/scsi/qla2xxx/qla_dbg.h
@@ -383,5 +383,5 @@ ql_mask_match(uint level)
if (ql2xextended_error_logging == 1)
ql2xextended_error_logging = QL_DBG_DEFAULT1_MASK;

- return (level & ql2xextended_error_logging) == level;
+ return level && ((level & ql2xextended_error_logging) == level);
}
diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
index e8f69c486be1..01cdd5f8723c 100644
--- a/drivers/scsi/qla2xxx/qla_def.h
+++ b/drivers/scsi/qla2xxx/qla_def.h
@@ -2158,6 +2158,11 @@ typedef struct {
#define CS_IOCB_ERROR 0x31 /* Generic error for IOCB request
failure */
#define CS_REJECT_RECEIVED 0x4E /* Reject received */
+#define CS_EDIF_AUTH_ERROR 0x63 /* decrypt error */
+#define CS_EDIF_PAD_LEN_ERROR 0x65 /* pad > frame size, not 4byte align */
+#define CS_EDIF_INV_REQ 0x66 /* invalid request */
+#define CS_EDIF_SPI_ERROR 0x67 /* rx frame unable to locate sa */
+#define CS_EDIF_HDR_ERROR 0x69 /* data frame != expected len */
#define CS_BAD_PAYLOAD 0x80 /* Driver defined */
#define CS_UNKNOWN 0x81 /* Driver defined */
#define CS_RETRY 0x82 /* Driver defined */
@@ -2626,7 +2631,6 @@ typedef struct fc_port {
struct {
uint32_t enable:1; /* device is edif enabled/req'd */
uint32_t app_stop:2;
- uint32_t app_started:1;
uint32_t aes_gmac:1;
uint32_t app_sess_online:1;
uint32_t tx_sa_set:1;
@@ -2637,6 +2641,7 @@ typedef struct fc_port {
uint32_t rx_rekey_cnt;
uint64_t tx_bytes;
uint64_t rx_bytes;
+ uint8_t sess_down_acked;
uint8_t auth_state;
uint16_t authok:1;
uint16_t rekey_cnt;
@@ -3204,6 +3209,8 @@ struct ct_sns_rsp {
#define GFF_NVME_OFFSET 23 /* type = 28h */
struct {
uint8_t fc4_features[128];
+#define FC4_FF_TARGET BIT_0
+#define FC4_FF_INITIATOR BIT_1
} gff_id;
struct {
uint8_t reserved;
@@ -3975,6 +3982,7 @@ struct qla_hw_data {
/* SRB cache. */
#define SRB_MIN_REQ 128
mempool_t *srb_mempool;
+ u8 port_name[WWN_SIZE];

volatile struct {
uint32_t mbox_int :1;
@@ -4040,6 +4048,9 @@ struct qla_hw_data {
uint32_t n2n_fw_acc_sec:1;
uint32_t plogi_template_valid:1;
uint32_t port_isolated:1;
+ uint32_t eeh_flush:2;
+#define EEH_FLUSH_RDY 1
+#define EEH_FLUSH_DONE 2
} flags;

uint16_t max_exchg;
@@ -4074,6 +4085,7 @@ struct qla_hw_data {
uint32_t rsp_que_len;
uint32_t req_que_off;
uint32_t rsp_que_off;
+ unsigned long eeh_jif;

/* Multi queue data structs */
device_reg_t *mqiobase;
@@ -4256,8 +4268,8 @@ struct qla_hw_data {
#define IS_OEM_001(ha) ((ha)->device_type & DT_OEM_001)
#define HAS_EXTENDED_IDS(ha) ((ha)->device_type & DT_EXTENDED_IDS)
#define IS_CT6_SUPPORTED(ha) ((ha)->device_type & DT_CT6_SUPPORTED)
-#define IS_MQUE_CAPABLE(ha) ((ha)->mqenable || IS_QLA83XX(ha) || \
- IS_QLA27XX(ha) || IS_QLA28XX(ha))
+#define IS_MQUE_CAPABLE(ha) (IS_QLA83XX(ha) || IS_QLA27XX(ha) || \
+ IS_QLA28XX(ha))
#define IS_BIDI_CAPABLE(ha) \
(IS_QLA25XX(ha) || IS_QLA2031(ha) || IS_QLA27XX(ha) || IS_QLA28XX(ha))
/* Bit 21 of fw_attributes decides the MCTP capabilities */
diff --git a/drivers/scsi/qla2xxx/qla_edif.c b/drivers/scsi/qla2xxx/qla_edif.c
index 0628633c7c7e..dbe8ef887a9d 100644
--- a/drivers/scsi/qla2xxx/qla_edif.c
+++ b/drivers/scsi/qla2xxx/qla_edif.c
@@ -52,6 +52,31 @@ const char *sc_to_str(uint16_t cmd)
return "unknown";
}

+static struct edb_node *qla_edb_getnext(scsi_qla_host_t *vha)
+{
+ unsigned long flags;
+ struct edb_node *edbnode = NULL;
+
+ spin_lock_irqsave(&vha->e_dbell.db_lock, flags);
+
+ /* db nodes are fifo - no qualifications done */
+ if (!list_empty(&vha->e_dbell.head)) {
+ edbnode = list_first_entry(&vha->e_dbell.head,
+ struct edb_node, list);
+ list_del_init(&edbnode->list);
+ }
+
+ spin_unlock_irqrestore(&vha->e_dbell.db_lock, flags);
+
+ return edbnode;
+}
+
+static void qla_edb_node_free(scsi_qla_host_t *vha, struct edb_node *node)
+{
+ list_del_init(&node->list);
+ kfree(node);
+}
+
static struct edif_list_entry *qla_edif_list_find_sa_index(fc_port_t *fcport,
uint16_t handle)
{
@@ -257,14 +282,8 @@ qla2x00_find_fcport_by_pid(scsi_qla_host_t *vha, port_id_t *id)

f = NULL;
list_for_each_entry_safe(f, tf, &vha->vp_fcports, list) {
- if ((f->flags & FCF_FCSP_DEVICE)) {
- ql_dbg(ql_dbg_edif + ql_dbg_verbose, vha, 0x2058,
- "Found secure fcport - nn %8phN pn %8phN portid=0x%x, 0x%x.\n",
- f->node_name, f->port_name,
- f->d_id.b24, id->b24);
- if (f->d_id.b24 == id->b24)
- return f;
- }
+ if (f->d_id.b24 == id->b24)
+ return f;
}
return NULL;
}
@@ -280,14 +299,19 @@ qla_edif_app_check(scsi_qla_host_t *vha, struct app_id appid)
{
/* check that the app is allow/known to the driver */

- if (appid.app_vid == EDIF_APP_ID) {
- ql_dbg(ql_dbg_edif + ql_dbg_verbose, vha, 0x911d, "%s app id ok\n", __func__);
- return true;
+ if (appid.app_vid != EDIF_APP_ID) {
+ ql_dbg(ql_dbg_edif, vha, 0x911d, "%s app id not ok (%x)",
+ __func__, appid.app_vid);
+ return false;
+ }
+
+ if (appid.version != EDIF_VERSION1) {
+ ql_dbg(ql_dbg_edif, vha, 0x911d, "%s app version is not ok (%x)",
+ __func__, appid.version);
+ return false;
}
- ql_dbg(ql_dbg_edif, vha, 0x911d, "%s app id not ok (%x)",
- __func__, appid.app_vid);

- return false;
+ return true;
}

static void
@@ -486,16 +510,35 @@ qla_edif_app_start(scsi_qla_host_t *vha, struct bsg_job *bsg_job)
/* mark doorbell as active since an app is now present */
vha->e_dbell.db_flags |= EDB_ACTIVE;
} else {
- ql_dbg(ql_dbg_edif, vha, 0x911e, "%s doorbell already active\n",
- __func__);
+ goto out;
}

if (N2N_TOPO(vha->hw)) {
- if (vha->hw->flags.n2n_fw_acc_sec)
- set_bit(N2N_LINK_RESET, &vha->dpc_flags);
- else
+ list_for_each_entry_safe(fcport, tf, &vha->vp_fcports, list)
+ fcport->n2n_link_reset_cnt = 0;
+
+ if (vha->hw->flags.n2n_fw_acc_sec) {
+ list_for_each_entry_safe(fcport, tf, &vha->vp_fcports, list)
+ qla_edif_sa_ctl_init(vha, fcport);
+
+ /*
+ * While authentication app was not running, remote device
+ * could still try to login with this local port. Let's
+ * clear the state and try again.
+ */
+ qla2x00_wait_for_sess_deletion(vha);
+
+ /* bounce the link to get the other guy to relogin */
+ if (!vha->hw->flags.n2n_bigger) {
+ set_bit(N2N_LINK_RESET, &vha->dpc_flags);
+ qla2xxx_wake_dpc(vha);
+ }
+ } else {
+ qla2x00_wait_for_hba_online(vha);
set_bit(ISP_ABORT_NEEDED, &vha->dpc_flags);
- qla2xxx_wake_dpc(vha);
+ qla2xxx_wake_dpc(vha);
+ qla2x00_wait_for_hba_online(vha);
+ }
} else {
list_for_each_entry_safe(fcport, tf, &vha->vp_fcports, list) {
ql_dbg(ql_dbg_edif, vha, 0x2058,
@@ -517,19 +560,31 @@ qla_edif_app_start(scsi_qla_host_t *vha, struct bsg_job *bsg_job)
if (atomic_read(&vha->loop_state) == LOOP_DOWN)
break;

- fcport->edif.app_started = 1;
fcport->login_retry = vha->hw->login_retry_count;

- /* no activity */
fcport->edif.app_stop = 0;
+ fcport->edif.app_sess_online = 0;
+
+ if (fcport->scan_state != QLA_FCPORT_FOUND)
+ continue;
+
+ if (fcport->port_type == FCT_UNKNOWN &&
+ !fcport->fc4_features)
+ rval = qla24xx_async_gffid(vha, fcport, true);
+
+ if (!rval && !(fcport->fc4_features & FC4_FF_TARGET ||
+ fcport->port_type & (FCT_TARGET|FCT_NVME_TARGET)))
+ continue;
+
+ rval = 0;

ql_dbg(ql_dbg_edif, vha, 0x911e,
"%s wwpn %8phC calling qla_edif_reset_auth_wait\n",
__func__, fcport->port_name);
- fcport->edif.app_sess_online = 0;
qlt_schedule_sess_for_deletion(fcport);
qla_edif_sa_ctl_init(vha, fcport);
}
+ set_bit(RELOGIN_NEEDED, &vha->dpc_flags);
}

if (vha->pur_cinfo.enode_flags != ENODE_ACTIVE) {
@@ -540,9 +595,11 @@ qla_edif_app_start(scsi_qla_host_t *vha, struct bsg_job *bsg_job)
__func__);
}

+out:
appreply.host_support_edif = vha->hw->flags.edif_enabled;
appreply.edif_enode_active = vha->pur_cinfo.enode_flags;
appreply.edif_edb_active = vha->e_dbell.db_flags;
+ appreply.version = EDIF_VERSION1;

bsg_job->reply_len = sizeof(struct fc_bsg_reply);

@@ -610,9 +667,6 @@ qla_edif_app_stop(scsi_qla_host_t *vha, struct bsg_job *bsg_job)

fcport->send_els_logo = 1;
qlt_schedule_sess_for_deletion(fcport);
-
- /* qla_edif_flush_sa_ctl_lists(fcport); */
- fcport->edif.app_started = 0;
}
}

@@ -673,6 +727,7 @@ qla_edif_app_authok(scsi_qla_host_t *vha, struct bsg_job *bsg_job)
portid.b.area = appplogiok.u.d_id.b.area;
portid.b.al_pa = appplogiok.u.d_id.b.al_pa;

+ appplogireply.version = EDIF_VERSION1;
switch (appplogiok.type) {
case PL_TYPE_WWPN:
fcport = qla2x00_find_fcport_by_wwpn(vha,
@@ -865,6 +920,8 @@ qla_edif_app_getfcinfo(scsi_qla_host_t *vha, struct bsg_job *bsg_job)
} else {
struct fc_port *fcport = NULL, *tf;

+ app_reply->version = EDIF_VERSION1;
+
list_for_each_entry_safe(fcport, tf, &vha->vp_fcports, list) {
if (!(fcport->flags & FCF_FCSP_DEVICE))
continue;
@@ -881,9 +938,25 @@ qla_edif_app_getfcinfo(scsi_qla_host_t *vha, struct bsg_job *bsg_job)
if (tdid.b24 != 0 && tdid.b24 != fcport->d_id.b24)
continue;

- app_reply->ports[pcnt].rekey_count =
- fcport->edif.rekey_cnt;
+ if (!N2N_TOPO(vha->hw)) {
+ if (fcport->scan_state != QLA_FCPORT_FOUND)
+ continue;
+
+ if (fcport->port_type == FCT_UNKNOWN &&
+ !fcport->fc4_features)
+ rval = qla24xx_async_gffid(vha, fcport,
+ true);
+
+ if (!rval &&
+ !(fcport->fc4_features & FC4_FF_TARGET ||
+ fcport->port_type &
+ (FCT_TARGET | FCT_NVME_TARGET)))
+ continue;
+ }
+
+ rval = 0;

+ app_reply->ports[pcnt].version = EDIF_VERSION1;
app_reply->ports[pcnt].remote_type =
VND_CMD_RTYPE_UNKNOWN;
if (fcport->port_type & (FCT_NVME_TARGET | FCT_TARGET))
@@ -980,6 +1053,8 @@ qla_edif_app_getstats(scsi_qla_host_t *vha, struct bsg_job *bsg_job)
} else {
struct fc_port *fcport = NULL, *tf;

+ app_reply->version = EDIF_VERSION1;
+
list_for_each_entry_safe(fcport, tf, &vha->vp_fcports, list) {
if (fcport->edif.enable) {
if (pcnt > app_req.num_ports)
@@ -1013,6 +1088,164 @@ qla_edif_app_getstats(scsi_qla_host_t *vha, struct bsg_job *bsg_job)
return rval;
}

+static int32_t
+qla_edif_ack(scsi_qla_host_t *vha, struct bsg_job *bsg_job)
+{
+ struct fc_port *fcport;
+ struct aen_complete_cmd ack;
+ struct fc_bsg_reply *bsg_reply = bsg_job->reply;
+
+ sg_copy_to_buffer(bsg_job->request_payload.sg_list,
+ bsg_job->request_payload.sg_cnt, &ack, sizeof(ack));
+
+ ql_dbg(ql_dbg_edif, vha, 0x70cf,
+ "%s: %06x event_code %x\n",
+ __func__, ack.port_id.b24, ack.event_code);
+
+ fcport = qla2x00_find_fcport_by_pid(vha, &ack.port_id);
+ SET_DID_STATUS(bsg_reply->result, DID_OK);
+
+ if (!fcport) {
+ ql_dbg(ql_dbg_edif, vha, 0x70cf,
+ "%s: unable to find fcport %06x \n",
+ __func__, ack.port_id.b24);
+ return 0;
+ }
+
+ switch (ack.event_code) {
+ case VND_CMD_AUTH_STATE_SESSION_SHUTDOWN:
+ fcport->edif.sess_down_acked = 1;
+ break;
+ default:
+ break;
+ }
+ return 0;
+}
+
+static int qla_edif_consume_dbell(scsi_qla_host_t *vha, struct bsg_job *bsg_job)
+{
+ struct fc_bsg_reply *bsg_reply = bsg_job->reply;
+ u32 sg_skip, reply_payload_len;
+ bool keep;
+ struct edb_node *dbnode = NULL;
+ struct edif_app_dbell ap;
+ int dat_size = 0;
+
+ sg_skip = 0;
+ reply_payload_len = bsg_job->reply_payload.payload_len;
+
+ while ((reply_payload_len - sg_skip) >= sizeof(struct edb_node)) {
+ dbnode = qla_edb_getnext(vha);
+ if (dbnode) {
+ keep = true;
+ dat_size = 0;
+ ap.event_code = dbnode->ntype;
+ switch (dbnode->ntype) {
+ case VND_CMD_AUTH_STATE_SESSION_SHUTDOWN:
+ case VND_CMD_AUTH_STATE_NEEDED:
+ ap.port_id = dbnode->u.plogi_did;
+ dat_size += sizeof(ap.port_id);
+ break;
+ case VND_CMD_AUTH_STATE_ELS_RCVD:
+ ap.port_id = dbnode->u.els_sid;
+ dat_size += sizeof(ap.port_id);
+ break;
+ case VND_CMD_AUTH_STATE_SAUPDATE_COMPL:
+ ap.port_id = dbnode->u.sa_aen.port_id;
+ memcpy(&ap.event_data, &dbnode->u,
+ sizeof(struct edif_sa_update_aen));
+ dat_size += sizeof(struct edif_sa_update_aen);
+ break;
+ default:
+ keep = false;
+ ql_log(ql_log_warn, vha, 0x09102,
+ "%s unknown DB type=%d %p\n",
+ __func__, dbnode->ntype, dbnode);
+ break;
+ }
+ ap.event_data_size = dat_size;
+ /* 8 = sizeof(ap.event_code + ap.event_data_size) */
+ dat_size += 8;
+ if (keep)
+ sg_skip += sg_copy_buffer(bsg_job->reply_payload.sg_list,
+ bsg_job->reply_payload.sg_cnt,
+ &ap, dat_size, sg_skip, false);
+
+ ql_dbg(ql_dbg_edif, vha, 0x09102,
+ "%s Doorbell consumed : type=%d %p\n",
+ __func__, dbnode->ntype, dbnode);
+
+ kfree(dbnode);
+ } else {
+ break;
+ }
+ }
+
+ SET_DID_STATUS(bsg_reply->result, DID_OK);
+ bsg_reply->reply_payload_rcv_len = sg_skip;
+ bsg_job->reply_len = sizeof(struct fc_bsg_reply);
+
+ return 0;
+}
+
+static void __qla_edif_dbell_bsg_done(scsi_qla_host_t *vha, struct bsg_job *bsg_job,
+ u32 delay)
+{
+ struct fc_bsg_reply *bsg_reply = bsg_job->reply;
+
+ /* small sleep for doorbell events to accumulate */
+ if (delay)
+ msleep(delay);
+
+ qla_edif_consume_dbell(vha, bsg_job);
+
+ bsg_job_done(bsg_job, bsg_reply->result, bsg_reply->reply_payload_rcv_len);
+}
+
+static void qla_edif_dbell_bsg_done(scsi_qla_host_t *vha)
+{
+ unsigned long flags;
+ struct bsg_job *prev_bsg_job = NULL;
+
+ spin_lock_irqsave(&vha->e_dbell.db_lock, flags);
+ if (vha->e_dbell.dbell_bsg_job) {
+ prev_bsg_job = vha->e_dbell.dbell_bsg_job;
+ vha->e_dbell.dbell_bsg_job = NULL;
+ }
+ spin_unlock_irqrestore(&vha->e_dbell.db_lock, flags);
+
+ if (prev_bsg_job)
+ __qla_edif_dbell_bsg_done(vha, prev_bsg_job, 0);
+}
+
+static int
+qla_edif_dbell_bsg(scsi_qla_host_t *vha, struct bsg_job *bsg_job)
+{
+ unsigned long flags;
+ bool return_bsg = false;
+
+ /* flush previous dbell bsg */
+ qla_edif_dbell_bsg_done(vha);
+
+ spin_lock_irqsave(&vha->e_dbell.db_lock, flags);
+ if (list_empty(&vha->e_dbell.head) && DBELL_ACTIVE(vha)) {
+ /*
+ * when the next db event happens, bsg_job will return.
+ * Otherwise, timer will return it.
+ */
+ vha->e_dbell.dbell_bsg_job = bsg_job;
+ vha->e_dbell.bsg_expire = jiffies + 10 * HZ;
+ } else {
+ return_bsg = true;
+ }
+ spin_unlock_irqrestore(&vha->e_dbell.db_lock, flags);
+
+ if (return_bsg)
+ __qla_edif_dbell_bsg_done(vha, bsg_job, 1);
+
+ return 0;
+}
+
int32_t
qla_edif_app_mgmt(struct bsg_job *bsg_job)
{
@@ -1024,8 +1257,13 @@ qla_edif_app_mgmt(struct bsg_job *bsg_job)
bool done = true;
int32_t rval = 0;
uint32_t vnd_sc = bsg_request->rqst_data.h_vendor.vendor_cmd[1];
+ u32 level = ql_dbg_edif;
+
+ /* doorbell is high traffic */
+ if (vnd_sc == QL_VND_SC_READ_DBELL)
+ level = 0;

- ql_dbg(ql_dbg_edif, vha, 0x911d, "%s vnd subcmd=%x\n",
+ ql_dbg(level, vha, 0x911d, "%s vnd subcmd=%x\n",
__func__, vnd_sc);

sg_copy_to_buffer(bsg_job->request_payload.sg_list,
@@ -1034,7 +1272,7 @@ qla_edif_app_mgmt(struct bsg_job *bsg_job)

if (!vha->hw->flags.edif_enabled ||
test_bit(VPORT_DELETE, &vha->dpc_flags)) {
- ql_dbg(ql_dbg_edif, vha, 0x911d,
+ ql_dbg(level, vha, 0x911d,
"%s edif not enabled or vp delete. bsg ptr done %p. dpc_flags %lx\n",
__func__, bsg_job, vha->dpc_flags);

@@ -1043,7 +1281,7 @@ qla_edif_app_mgmt(struct bsg_job *bsg_job)
}

if (!qla_edif_app_check(vha, appcheck)) {
- ql_dbg(ql_dbg_edif, vha, 0x911d,
+ ql_dbg(level, vha, 0x911d,
"%s app checked failed.\n",
__func__);

@@ -1075,6 +1313,13 @@ qla_edif_app_mgmt(struct bsg_job *bsg_job)
case QL_VND_SC_GET_STATS:
rval = qla_edif_app_getstats(vha, bsg_job);
break;
+ case QL_VND_SC_AEN_COMPLETE:
+ rval = qla_edif_ack(vha, bsg_job);
+ break;
+ case QL_VND_SC_READ_DBELL:
+ rval = qla_edif_dbell_bsg(vha, bsg_job);
+ done = false;
+ break;
default:
ql_dbg(ql_dbg_edif, vha, 0x911d, "%s unknown cmd=%x\n",
__func__,
@@ -1086,7 +1331,7 @@ qla_edif_app_mgmt(struct bsg_job *bsg_job)

done:
if (done) {
- ql_dbg(ql_dbg_user, vha, 0x7009,
+ ql_dbg(level, vha, 0x7009,
"%s: %d bsg ptr done %p\n", __func__, __LINE__, bsg_job);
bsg_job_done(bsg_job, bsg_reply->result,
bsg_reply->reply_payload_rcv_len);
@@ -1248,6 +1493,8 @@ qla24xx_check_sadb_avail_slot(struct bsg_job *bsg_job, fc_port_t *fcport,

#define QLA_SA_UPDATE_FLAGS_RX_KEY 0x0
#define QLA_SA_UPDATE_FLAGS_TX_KEY 0x2
+#define EDIF_MSLEEP_INTERVAL 100
+#define EDIF_RETRY_COUNT 50

int
qla24xx_sadb_update(struct bsg_job *bsg_job)
@@ -1260,7 +1507,7 @@ qla24xx_sadb_update(struct bsg_job *bsg_job)
struct edif_list_entry *edif_entry = NULL;
int found = 0;
int rval = 0;
- int result = 0;
+ int result = 0, cnt;
struct qla_sa_update_frame sa_frame;
struct srb_iocb *iocb_cmd;
port_id_t portid;
@@ -1501,11 +1748,23 @@ qla24xx_sadb_update(struct bsg_job *bsg_job)
sp->done = qla2x00_bsg_job_done;
iocb_cmd = &sp->u.iocb_cmd;
iocb_cmd->u.sa_update.sa_frame = sa_frame;
-
+ cnt = 0;
+retry:
rval = qla2x00_start_sp(sp);
- if (rval != QLA_SUCCESS) {
+ switch (rval) {
+ case QLA_SUCCESS:
+ break;
+ case EAGAIN:
+ msleep(EDIF_MSLEEP_INTERVAL);
+ cnt++;
+ if (cnt < EDIF_RETRY_COUNT)
+ goto retry;
+
+ fallthrough;
+ default:
ql_log(ql_dbg_edif, vha, 0x70e3,
- "qla2x00_start_sp failed=%d.\n", rval);
+ "%s qla2x00_start_sp failed=%d.\n",
+ __func__, rval);

qla2x00_rel_sp(sp);
rval = -EIO;
@@ -1798,30 +2057,6 @@ qla_edb_init(scsi_qla_host_t *vha)
/* initialize lock which protects doorbell & init list */
spin_lock_init(&vha->e_dbell.db_lock);
INIT_LIST_HEAD(&vha->e_dbell.head);
-
- /* create and initialize doorbell */
- init_completion(&vha->e_dbell.dbell);
-}
-
-static void
-qla_edb_node_free(scsi_qla_host_t *vha, struct edb_node *node)
-{
- /*
- * releases the space held by this edb node entry
- * this function does _not_ free the edb node itself
- * NB: the edb node entry passed should not be on any list
- *
- * currently for doorbell there's no additional cleanup
- * needed, but here as a placeholder for furture use.
- */
-
- if (!node) {
- ql_dbg(ql_dbg_edif, vha, 0x09122,
- "%s error - no valid node passed\n", __func__);
- return;
- }
-
- node->ntype = N_UNDEF;
}

static void qla_edb_clear(scsi_qla_host_t *vha, port_id_t portid)
@@ -1868,11 +2103,8 @@ static void qla_edb_clear(scsi_qla_host_t *vha, port_id_t portid)
}
spin_unlock_irqrestore(&vha->e_dbell.db_lock, flags);

- list_for_each_entry_safe(e, tmp, &edb_list, list) {
+ list_for_each_entry_safe(e, tmp, &edb_list, list)
qla_edb_node_free(vha, e);
- list_del_init(&e->list);
- kfree(e);
- }
}

/* function called when app is stopping */
@@ -1900,14 +2132,10 @@ qla_edb_stop(scsi_qla_host_t *vha)
"%s freeing edb_node type=%x\n",
__func__, node->ntype);
qla_edb_node_free(vha, node);
- list_del(&node->list);
-
- kfree(node);
}
spin_unlock_irqrestore(&vha->e_dbell.db_lock, flags);

- /* wake up doorbell waiters - they'll be dismissed with error code */
- complete_all(&vha->e_dbell.dbell);
+ qla_edif_dbell_bsg_done(vha);
}

static struct edb_node *
@@ -1945,9 +2173,6 @@ qla_edb_node_add(scsi_qla_host_t *vha, struct edb_node *ptr)
list_add_tail(&ptr->list, &vha->e_dbell.head);
spin_unlock_irqrestore(&vha->e_dbell.db_lock, flags);

- /* ring doorbell for waiters */
- complete(&vha->e_dbell.dbell);
-
return true;
}

@@ -2011,47 +2236,29 @@ qla_edb_eventcreate(scsi_qla_host_t *vha, uint32_t dbtype,
edbnode->u.sa_aen.port_id = fcport->d_id;
edbnode->u.sa_aen.status = data;
edbnode->u.sa_aen.key_type = data2;
+ edbnode->u.sa_aen.version = EDIF_VERSION1;
break;
default:
ql_dbg(ql_dbg_edif, vha, 0x09102,
"%s unknown type: %x\n", __func__, dbtype);
- qla_edb_node_free(vha, edbnode);
kfree(edbnode);
edbnode = NULL;
break;
}

- if (edbnode && (!qla_edb_node_add(vha, edbnode))) {
+ if (edbnode) {
+ if (!qla_edb_node_add(vha, edbnode)) {
+ ql_dbg(ql_dbg_edif, vha, 0x09102,
+ "%s unable to add dbnode\n", __func__);
+ kfree(edbnode);
+ return;
+ }
ql_dbg(ql_dbg_edif, vha, 0x09102,
- "%s unable to add dbnode\n", __func__);
- qla_edb_node_free(vha, edbnode);
- kfree(edbnode);
- return;
- }
- if (edbnode && fcport)
- fcport->edif.auth_state = dbtype;
- ql_dbg(ql_dbg_edif, vha, 0x09102,
- "%s Doorbell produced : type=%d %p\n", __func__, dbtype, edbnode);
-}
-
-static struct edb_node *
-qla_edb_getnext(scsi_qla_host_t *vha)
-{
- unsigned long flags;
- struct edb_node *edbnode = NULL;
-
- spin_lock_irqsave(&vha->e_dbell.db_lock, flags);
-
- /* db nodes are fifo - no qualifications done */
- if (!list_empty(&vha->e_dbell.head)) {
- edbnode = list_first_entry(&vha->e_dbell.head,
- struct edb_node, list);
- list_del(&edbnode->list);
+ "%s Doorbell produced : type=%d %p\n", __func__, dbtype, edbnode);
+ qla_edif_dbell_bsg_done(vha);
+ if (fcport)
+ fcport->edif.auth_state = dbtype;
}
-
- spin_unlock_irqrestore(&vha->e_dbell.db_lock, flags);
-
- return edbnode;
}

void
@@ -2079,6 +2286,9 @@ qla_edif_timer(scsi_qla_host_t *vha)
ha->edif_post_stop_cnt_down = 60;
}
}
+
+ if (vha->e_dbell.dbell_bsg_job && time_after_eq(jiffies, vha->e_dbell.bsg_expire))
+ qla_edif_dbell_bsg_done(vha);
}

/*
@@ -2146,7 +2356,6 @@ edif_doorbell_show(struct device *dev, struct device_attribute *attr,
"%s Doorbell consumed : type=%d %p\n",
__func__, dbnode->ntype, dbnode);
/* we're done with the db node, so free it up */
- qla_edb_node_free(vha, dbnode);
kfree(dbnode);
} else {
break;
@@ -2162,6 +2371,7 @@ edif_doorbell_show(struct device *dev, struct device_attribute *attr,

static void qla_noop_sp_done(srb_t *sp, int res)
{
+ sp->fcport->flags &= ~(FCF_ASYNC_SENT | FCF_ASYNC_ACTIVE);
/* ref: INIT */
kref_put(&sp->cmd_kref, qla2x00_sp_release);
}
@@ -2186,7 +2396,8 @@ qla24xx_issue_sa_replace_iocb(scsi_qla_host_t *vha, struct qla_work_evt *e)
if (!sa_ctl) {
ql_dbg(ql_dbg_edif, vha, 0x70e6,
"sa_ctl allocation failed\n");
- return -ENOMEM;
+ rval = -ENOMEM;
+ goto done;
}

fcport = sa_ctl->fcport;
@@ -2196,7 +2407,8 @@ qla24xx_issue_sa_replace_iocb(scsi_qla_host_t *vha, struct qla_work_evt *e)
if (!sp) {
ql_dbg(ql_dbg_edif, vha, 0x70e6,
"SRB allocation failed\n");
- return -ENOMEM;
+ rval = -ENOMEM;
+ goto done;
}

fcport->flags |= FCF_ASYNC_SENT;
@@ -2225,9 +2437,16 @@ qla24xx_issue_sa_replace_iocb(scsi_qla_host_t *vha, struct qla_work_evt *e)

rval = qla2x00_start_sp(sp);

- if (rval != QLA_SUCCESS)
- rval = QLA_FUNCTION_FAILED;
+ if (rval != QLA_SUCCESS) {
+ goto done_free_sp;
+ }

+ return rval;
+done_free_sp:
+ kref_put(&sp->cmd_kref, qla2x00_sp_release);
+ fcport->flags &= ~FCF_ASYNC_SENT;
+done:
+ fcport->flags &= ~FCF_ASYNC_ACTIVE;
return rval;
}

@@ -2447,8 +2666,7 @@ void qla24xx_auth_els(scsi_qla_host_t *vha, void **pkt, struct rsp_que **rsp)

fcport = qla2x00_find_fcport_by_pid(host, &purex->pur_info.pur_sid);

- if (DBELL_INACTIVE(vha) ||
- (fcport && EDIF_SESSION_DOWN(fcport))) {
+ if (DBELL_INACTIVE(vha)) {
ql_dbg(ql_dbg_edif, host, 0x0910c, "%s e_dbell.db_flags =%x %06x\n",
__func__, host->e_dbell.db_flags,
fcport ? fcport->d_id.b24 : 0);
@@ -2458,6 +2676,22 @@ void qla24xx_auth_els(scsi_qla_host_t *vha, void **pkt, struct rsp_que **rsp)
return;
}

+ if (fcport && EDIF_SESSION_DOWN(fcport)) {
+ ql_dbg(ql_dbg_edif, host, 0x13b6,
+ "%s terminate exchange. Send logo to 0x%x\n",
+ __func__, a.did.b24);
+
+ a.tx_byte_count = a.tx_len = 0;
+ a.tx_addr = 0;
+ a.control_flags = EPD_RX_XCHG; /* EPD_RX_XCHG = terminate cmd */
+ qla_els_reject_iocb(host, (*rsp)->qpair, &a);
+ qla_enode_free(host, ptr);
+ /* send logo to let remote port knows to tear down session */
+ fcport->send_els_logo = 1;
+ qlt_schedule_sess_for_deletion(fcport);
+ return;
+ }
+
/* add the local enode to the list */
qla_enode_add(host, ptr);

@@ -3350,10 +3584,14 @@ int qla_edif_process_els(scsi_qla_host_t *vha, struct bsg_job *bsg_job)
fc_port_t *fcport = NULL;
struct qla_hw_data *ha = vha->hw;
srb_t *sp;
- int rval = (DID_ERROR << 16);
+ int rval = (DID_ERROR << 16), cnt;
port_id_t d_id;
struct qla_bsg_auth_els_request *p =
(struct qla_bsg_auth_els_request *)bsg_job->request;
+ struct qla_bsg_auth_els_reply *rpl =
+ (struct qla_bsg_auth_els_reply *)bsg_job->reply;
+
+ rpl->version = EDIF_VERSION1;

d_id.b.al_pa = bsg_request->rqst_data.h_els.port_id[2];
d_id.b.area = bsg_request->rqst_data.h_els.port_id[1];
@@ -3372,7 +3610,7 @@ int qla_edif_process_els(scsi_qla_host_t *vha, struct bsg_job *bsg_job)
if (qla_bsg_check(vha, bsg_job, fcport))
return 0;

- if (fcport->loop_id == FC_NO_LOOP_ID) {
+ if (EDIF_SESS_DELETE(fcport)) {
ql_dbg(ql_dbg_edif, vha, 0x910d,
"%s ELS code %x, no loop id.\n", __func__,
bsg_request->rqst_data.r_els.els_code);
@@ -3441,17 +3679,26 @@ int qla_edif_process_els(scsi_qla_host_t *vha, struct bsg_job *bsg_job)
sp->free = qla2x00_bsg_sp_free;
sp->done = qla2x00_bsg_job_done;

+ cnt = 0;
+retry:
rval = qla2x00_start_sp(sp);
-
- ql_dbg(ql_dbg_edif, vha, 0x700a,
- "%s %s %8phN xchg %x ctlflag %x hdl %x reqlen %xh bsg ptr %p\n",
- __func__, sc_to_str(p->e.sub_cmd), fcport->port_name,
- p->e.extra_rx_xchg_address, p->e.extra_control_flags,
- sp->handle, sp->remap.req.len, bsg_job);
-
- if (rval != QLA_SUCCESS) {
+ switch (rval) {
+ case QLA_SUCCESS:
+ ql_dbg(ql_dbg_edif, vha, 0x700a,
+ "%s %s %8phN xchg %x ctlflag %x hdl %x reqlen %xh bsg ptr %p\n",
+ __func__, sc_to_str(p->e.sub_cmd), fcport->port_name,
+ p->e.extra_rx_xchg_address, p->e.extra_control_flags,
+ sp->handle, sp->remap.req.len, bsg_job);
+ break;
+ case EAGAIN:
+ msleep(EDIF_MSLEEP_INTERVAL);
+ cnt++;
+ if (cnt < EDIF_RETRY_COUNT)
+ goto retry;
+ fallthrough;
+ default:
ql_log(ql_log_warn, vha, 0x700e,
- "qla2x00_start_sp failed = %d\n", rval);
+ "%s qla2x00_start_sp failed = %d\n", __func__, rval);
SET_DID_STATUS(bsg_reply->result, DID_IMM_RETRY);
rval = -EIO;
goto done_free_remap_rsp;
@@ -3473,14 +3720,29 @@ int qla_edif_process_els(scsi_qla_host_t *vha, struct bsg_job *bsg_job)

void qla_edif_sess_down(struct scsi_qla_host *vha, struct fc_port *sess)
{
+ u16 cnt = 0;
+
if (sess->edif.app_sess_online && DBELL_ACTIVE(vha)) {
ql_dbg(ql_dbg_disc, vha, 0xf09c,
"%s: sess %8phN send port_offline event\n",
__func__, sess->port_name);
sess->edif.app_sess_online = 0;
+ sess->edif.sess_down_acked = 0;
qla_edb_eventcreate(vha, VND_CMD_AUTH_STATE_SESSION_SHUTDOWN,
sess->d_id.b24, 0, sess);
qla2x00_post_aen_work(vha, FCH_EVT_PORT_OFFLINE, sess->d_id.b24);
+
+ while (!READ_ONCE(sess->edif.sess_down_acked) &&
+ !test_bit(VPORT_DELETE, &vha->dpc_flags)) {
+ msleep(100);
+ cnt++;
+ if (cnt > 100)
+ break;
+ }
+ sess->edif.sess_down_acked = 0;
+ ql_dbg(ql_dbg_disc, vha, 0xf09c,
+ "%s: sess %8phN port_offline event completed\n",
+ __func__, sess->port_name);
}
}

diff --git a/drivers/scsi/qla2xxx/qla_edif.h b/drivers/scsi/qla2xxx/qla_edif.h
index a965ca8e47ce..7cdb89ccdc6e 100644
--- a/drivers/scsi/qla2xxx/qla_edif.h
+++ b/drivers/scsi/qla2xxx/qla_edif.h
@@ -51,7 +51,8 @@ struct edif_dbell {
enum db_flags_t db_flags;
spinlock_t db_lock;
struct list_head head;
- struct completion dbell;
+ struct bsg_job *dbell_bsg_job;
+ unsigned long bsg_expire;
};

#define SA_UPDATE_IOCB_TYPE 0x71 /* Security Association Update IOCB entry */
@@ -140,4 +141,8 @@ struct enode {
(DBELL_ACTIVE(_fcport->vha) && \
(_fcport->disc_state == DSC_LOGIN_AUTH_PEND))

+#define EDIF_SESS_DELETE(_s) \
+ (qla_ini_mode_enabled(_s->vha) && (_s->disc_state == DSC_DELETE_PEND || \
+ _s->disc_state == DSC_DELETED))
+
#endif /* __QLA_EDIF_H */
diff --git a/drivers/scsi/qla2xxx/qla_edif_bsg.h b/drivers/scsi/qla2xxx/qla_edif_bsg.h
index 5a26c77157da..0931f4e4e127 100644
--- a/drivers/scsi/qla2xxx/qla_edif_bsg.h
+++ b/drivers/scsi/qla2xxx/qla_edif_bsg.h
@@ -7,13 +7,15 @@
#ifndef __QLA_EDIF_BSG_H
#define __QLA_EDIF_BSG_H

+#define EDIF_VERSION1 1
+
/* BSG Vendor specific commands */
#define ELS_MAX_PAYLOAD 2112
#ifndef WWN_SIZE
#define WWN_SIZE 8
#endif
-#define VND_CMD_APP_RESERVED_SIZE 32
-
+#define VND_CMD_APP_RESERVED_SIZE 28
+#define VND_CMD_PAD_SIZE 3
enum auth_els_sub_cmd {
SEND_ELS = 0,
SEND_ELS_REPLY,
@@ -28,7 +30,9 @@ struct extra_auth_els {
#define BSG_CTL_FLAG_LS_ACC 1
#define BSG_CTL_FLAG_LS_RJT 2
#define BSG_CTL_FLAG_TRM 3
- uint8_t extra_rsvd[3];
+ uint8_t version;
+ uint8_t pad[2];
+ uint8_t reserved[VND_CMD_APP_RESERVED_SIZE];
} __packed;

struct qla_bsg_auth_els_request {
@@ -39,51 +43,46 @@ struct qla_bsg_auth_els_request {
struct qla_bsg_auth_els_reply {
struct fc_bsg_reply r;
uint32_t rx_xchg_address;
+ uint8_t version;
+ uint8_t pad[VND_CMD_PAD_SIZE];
+ uint8_t reserved[VND_CMD_APP_RESERVED_SIZE];
};

struct app_id {
int app_vid;
- uint8_t app_key[32];
+ uint8_t version;
+ uint8_t pad[VND_CMD_PAD_SIZE];
+ uint8_t reserved[VND_CMD_APP_RESERVED_SIZE];
} __packed;

struct app_start_reply {
uint32_t host_support_edif;
uint32_t edif_enode_active;
uint32_t edif_edb_active;
- uint32_t reserved[VND_CMD_APP_RESERVED_SIZE];
+ uint8_t version;
+ uint8_t pad[VND_CMD_PAD_SIZE];
+ uint8_t reserved[VND_CMD_APP_RESERVED_SIZE];
} __packed;

struct app_start {
struct app_id app_info;
- uint32_t prli_to;
- uint32_t key_shred;
uint8_t app_start_flags;
- uint8_t reserved[VND_CMD_APP_RESERVED_SIZE - 1];
+ uint8_t version;
+ uint8_t pad[2];
+ uint8_t reserved[VND_CMD_APP_RESERVED_SIZE];
} __packed;

struct app_stop {
struct app_id app_info;
- char buf[16];
+ uint8_t version;
+ uint8_t pad[VND_CMD_PAD_SIZE];
+ uint8_t reserved[VND_CMD_APP_RESERVED_SIZE];
} __packed;

struct app_plogi_reply {
uint32_t prli_status;
- uint8_t reserved[VND_CMD_APP_RESERVED_SIZE];
-} __packed;
-
-#define RECFG_TIME 1
-#define RECFG_BYTES 2
-
-struct app_rekey_cfg {
- struct app_id app_info;
- uint8_t rekey_mode;
- port_id_t d_id;
- uint8_t force;
- union {
- int64_t bytes;
- int64_t time;
- } rky_units;
-
+ uint8_t version;
+ uint8_t pad[VND_CMD_PAD_SIZE];
uint8_t reserved[VND_CMD_APP_RESERVED_SIZE];
} __packed;

@@ -91,7 +90,9 @@ struct app_pinfo_req {
struct app_id app_info;
uint8_t num_ports;
port_id_t remote_pid;
- uint8_t reserved[VND_CMD_APP_RESERVED_SIZE];
+ uint8_t version;
+ uint8_t pad[VND_CMD_PAD_SIZE];
+ uint8_t reserved[VND_CMD_APP_RESERVED_SIZE];
} __packed;

struct app_pinfo {
@@ -103,11 +104,8 @@ struct app_pinfo {
#define VND_CMD_RTYPE_INITIATOR 2
uint8_t remote_state;
uint8_t auth_state;
- uint8_t rekey_mode;
- int64_t rekey_count;
- int64_t rekey_config_value;
- int64_t rekey_consumed_value;
-
+ uint8_t version;
+ uint8_t pad[VND_CMD_PAD_SIZE];
uint8_t reserved[VND_CMD_APP_RESERVED_SIZE];
} __packed;

@@ -120,6 +118,8 @@ struct app_pinfo {

struct app_pinfo_reply {
uint8_t port_count;
+ uint8_t version;
+ uint8_t pad[VND_CMD_PAD_SIZE];
uint8_t reserved[VND_CMD_APP_RESERVED_SIZE];
struct app_pinfo ports[];
} __packed;
@@ -127,6 +127,8 @@ struct app_pinfo_reply {
struct app_sinfo_req {
struct app_id app_info;
uint8_t num_ports;
+ uint8_t version;
+ uint8_t pad[VND_CMD_PAD_SIZE];
uint8_t reserved[VND_CMD_APP_RESERVED_SIZE];
} __packed;

@@ -140,6 +142,9 @@ struct app_sinfo {

struct app_stats_reply {
uint8_t elem_count;
+ uint8_t version;
+ uint8_t pad[VND_CMD_PAD_SIZE];
+ uint8_t reserved[VND_CMD_APP_RESERVED_SIZE];
struct app_sinfo elem[];
} __packed;

@@ -163,9 +168,11 @@ struct qla_sa_update_frame {
uint8_t node_name[WWN_SIZE];
uint8_t port_name[WWN_SIZE];
port_id_t port_id;
+ uint8_t version;
+ uint8_t pad[VND_CMD_PAD_SIZE];
+ uint8_t reserved2[VND_CMD_APP_RESERVED_SIZE];
} __packed;

-// used for edif mgmt bsg interface
#define QL_VND_SC_UNDEF 0
#define QL_VND_SC_SA_UPDATE 1
#define QL_VND_SC_APP_START 2
@@ -175,6 +182,22 @@ struct qla_sa_update_frame {
#define QL_VND_SC_REKEY_CONFIG 6
#define QL_VND_SC_GET_FCINFO 7
#define QL_VND_SC_GET_STATS 8
+#define QL_VND_SC_AEN_COMPLETE 9
+#define QL_VND_SC_READ_DBELL 10
+
+/*
+ * bsg caller to provide empty buffer for doorbell events.
+ *
+ * sg_io_v4.din_xferp = empty buffer for door bell events
+ * sg_io_v4.dout_xferp = struct edif_read_dbell *buf
+ */
+struct edif_read_dbell {
+ struct app_id app_info;
+ uint8_t version;
+ uint8_t pad[VND_CMD_PAD_SIZE];
+ uint8_t reserved[VND_CMD_APP_RESERVED_SIZE];
+};
+

/* Application interface data structure for rtn data */
#define EXT_DEF_EVENT_DATA_SIZE 64
@@ -191,7 +214,9 @@ struct edif_sa_update_aen {
port_id_t port_id;
uint32_t key_type; /* Tx (1) or RX (2) */
uint32_t status; /* 0 succes, 1 failed, 2 timeout , 3 error */
- uint8_t reserved[16];
+ uint8_t version;
+ uint8_t pad[VND_CMD_PAD_SIZE];
+ uint8_t reserved[VND_CMD_APP_RESERVED_SIZE];
} __packed;

#define QL_VND_SA_STAT_SUCCESS 0
@@ -212,9 +237,22 @@ struct auth_complete_cmd {
uint8_t wwpn[WWN_SIZE];
port_id_t d_id;
} u;
- uint32_t reserved[VND_CMD_APP_RESERVED_SIZE];
+ uint8_t version;
+ uint8_t pad[VND_CMD_PAD_SIZE];
+ uint8_t reserved[VND_CMD_APP_RESERVED_SIZE];
+} __packed;
+
+struct aen_complete_cmd {
+ struct app_id app_info;
+ port_id_t port_id;
+ uint32_t event_code;
+ uint8_t version;
+ uint8_t pad[VND_CMD_PAD_SIZE];
+ uint8_t reserved[VND_CMD_APP_RESERVED_SIZE];
} __packed;

#define RX_DELAY_DELETE_TIMEOUT 20

+#define FCH_EVT_VENDOR_UNIQUE_VPORT_DOWN 1
+
#endif /* QLA_EDIF_BSG_H */
diff --git a/drivers/scsi/qla2xxx/qla_fw.h b/drivers/scsi/qla2xxx/qla_fw.h
index 0bb1d562f0bf..361015b5763e 100644
--- a/drivers/scsi/qla2xxx/qla_fw.h
+++ b/drivers/scsi/qla2xxx/qla_fw.h
@@ -807,7 +807,7 @@ struct els_entry_24xx {
#define EPD_ELS_COMMAND (0 << 13)
#define EPD_ELS_ACC (1 << 13)
#define EPD_ELS_RJT (2 << 13)
-#define EPD_RX_XCHG (3 << 13)
+#define EPD_RX_XCHG (3 << 13) /* terminate exchange */
#define ECF_CLR_PASSTHRU_PEND BIT_12
#define ECF_INCL_FRAME_HDR BIT_11
#define ECF_SEC_LOGIN BIT_3
diff --git a/drivers/scsi/qla2xxx/qla_gbl.h b/drivers/scsi/qla2xxx/qla_gbl.h
index dac27b5ff0ac..2e5b65072b75 100644
--- a/drivers/scsi/qla2xxx/qla_gbl.h
+++ b/drivers/scsi/qla2xxx/qla_gbl.h
@@ -335,6 +335,7 @@ extern int qla24xx_configure_prot_mode(srb_t *, uint16_t *);
extern int qla24xx_issue_sa_replace_iocb(scsi_qla_host_t *vha,
struct qla_work_evt *e);
void qla2x00_sp_release(struct kref *kref);
+void qla2x00_els_dcmd2_iocb_timeout(void *data);

/*
* Global Function Prototypes in qla_mbx.c source file.
@@ -433,7 +434,8 @@ extern int
qla2x00_get_resource_cnts(scsi_qla_host_t *);

extern int
-qla2x00_get_fcal_position_map(scsi_qla_host_t *ha, char *pos_map);
+qla2x00_get_fcal_position_map(scsi_qla_host_t *ha, char *pos_map,
+ u8 *num_entries);

extern int
qla2x00_get_link_status(scsi_qla_host_t *, uint16_t, struct link_statistics *,
@@ -727,7 +729,7 @@ int qla24xx_async_gpsc(scsi_qla_host_t *, fc_port_t *);
void qla24xx_handle_gpsc_event(scsi_qla_host_t *, struct event_arg *);
int qla2x00_mgmt_svr_login(scsi_qla_host_t *);
void qla24xx_handle_gffid_event(scsi_qla_host_t *vha, struct event_arg *ea);
-int qla24xx_async_gffid(scsi_qla_host_t *vha, fc_port_t *fcport);
+int qla24xx_async_gffid(scsi_qla_host_t *vha, fc_port_t *fcport, bool);
int qla24xx_async_gpnft(scsi_qla_host_t *, u8, srb_t *);
void qla24xx_async_gpnft_done(scsi_qla_host_t *, srb_t *);
void qla24xx_async_gnnft_done(scsi_qla_host_t *, srb_t *);
diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c
index e811de2f6a25..7ca734337000 100644
--- a/drivers/scsi/qla2xxx/qla_gs.c
+++ b/drivers/scsi/qla2xxx/qla_gs.c
@@ -1596,7 +1596,6 @@ qla2x00_hba_attributes(scsi_qla_host_t *vha, void *entries,
unsigned int callopt)
{
struct qla_hw_data *ha = vha->hw;
- struct init_cb_24xx *icb24 = (void *)ha->init_cb;
struct new_utsname *p_sysid = utsname();
struct ct_fdmi_hba_attr *eiter;
uint16_t alen;
@@ -1758,8 +1757,8 @@ qla2x00_hba_attributes(scsi_qla_host_t *vha, void *entries,
/* MAX CT Payload Length */
eiter = entries + size;
eiter->type = cpu_to_be16(FDMI_HBA_MAXIMUM_CT_PAYLOAD_LENGTH);
- eiter->a.max_ct_len = cpu_to_be32(le16_to_cpu(IS_FWI2_CAPABLE(ha) ?
- icb24->frame_payload_size : ha->init_cb->frame_payload_size));
+ eiter->a.max_ct_len = cpu_to_be32(ha->frame_payload_size >> 2);
+
alen = sizeof(eiter->a.max_ct_len);
alen += FDMI_ATTR_TYPELEN(eiter);
eiter->len = cpu_to_be16(alen);
@@ -1851,7 +1850,6 @@ qla2x00_port_attributes(scsi_qla_host_t *vha, void *entries,
unsigned int callopt)
{
struct qla_hw_data *ha = vha->hw;
- struct init_cb_24xx *icb24 = (void *)ha->init_cb;
struct new_utsname *p_sysid = utsname();
char *hostname = p_sysid ?
p_sysid->nodename : fc_host_system_hostname(vha->host);
@@ -1903,8 +1901,7 @@ qla2x00_port_attributes(scsi_qla_host_t *vha, void *entries,
/* Max frame size. */
eiter = entries + size;
eiter->type = cpu_to_be16(FDMI_PORT_MAX_FRAME_SIZE);
- eiter->a.max_frame_size = cpu_to_be32(le16_to_cpu(IS_FWI2_CAPABLE(ha) ?
- icb24->frame_payload_size : ha->init_cb->frame_payload_size));
+ eiter->a.max_frame_size = cpu_to_be32(ha->frame_payload_size);
alen = sizeof(eiter->a.max_frame_size);
alen += FDMI_ATTR_TYPELEN(eiter);
eiter->len = cpu_to_be16(alen);
@@ -3280,19 +3277,12 @@ int qla24xx_async_gpnid(scsi_qla_host_t *vha, port_id_t *id)
return rval;
}

-void qla24xx_handle_gffid_event(scsi_qla_host_t *vha, struct event_arg *ea)
-{
- fc_port_t *fcport = ea->fcport;
-
- qla24xx_post_gnl_work(vha, fcport);
-}

void qla24xx_async_gffid_sp_done(srb_t *sp, int res)
{
struct scsi_qla_host *vha = sp->vha;
fc_port_t *fcport = sp->fcport;
struct ct_sns_rsp *ct_rsp;
- struct event_arg ea;
uint8_t fc4_scsi_feat;
uint8_t fc4_nvme_feat;

@@ -3300,10 +3290,10 @@ void qla24xx_async_gffid_sp_done(srb_t *sp, int res)
"Async done-%s res %x ID %x. %8phC\n",
sp->name, res, fcport->d_id.b24, fcport->port_name);

- fcport->flags &= ~FCF_ASYNC_SENT;
- ct_rsp = &fcport->ct_desc.ct_sns->p.rsp;
+ ct_rsp = sp->u.iocb_cmd.u.ctarg.rsp;
fc4_scsi_feat = ct_rsp->rsp.gff_id.fc4_features[GFF_FCP_SCSI_OFFSET];
fc4_nvme_feat = ct_rsp->rsp.gff_id.fc4_features[GFF_NVME_OFFSET];
+ sp->rc = res;

/*
* FC-GS-7, 5.2.3.12 FC-4 Features - format
@@ -3324,24 +3314,42 @@ void qla24xx_async_gffid_sp_done(srb_t *sp, int res)
}
}

- memset(&ea, 0, sizeof(ea));
- ea.sp = sp;
- ea.fcport = sp->fcport;
- ea.rc = res;
+ if (sp->flags & SRB_WAKEUP_ON_COMP) {
+ complete(sp->comp);
+ } else {
+ if (sp->u.iocb_cmd.u.ctarg.req) {
+ dma_free_coherent(&vha->hw->pdev->dev,
+ sp->u.iocb_cmd.u.ctarg.req_allocated_size,
+ sp->u.iocb_cmd.u.ctarg.req,
+ sp->u.iocb_cmd.u.ctarg.req_dma);
+ sp->u.iocb_cmd.u.ctarg.req = NULL;
+ }

- qla24xx_handle_gffid_event(vha, &ea);
- /* ref: INIT */
- kref_put(&sp->cmd_kref, qla2x00_sp_release);
+ if (sp->u.iocb_cmd.u.ctarg.rsp) {
+ dma_free_coherent(&vha->hw->pdev->dev,
+ sp->u.iocb_cmd.u.ctarg.rsp_allocated_size,
+ sp->u.iocb_cmd.u.ctarg.rsp,
+ sp->u.iocb_cmd.u.ctarg.rsp_dma);
+ sp->u.iocb_cmd.u.ctarg.rsp = NULL;
+ }
+
+ /* ref: INIT */
+ kref_put(&sp->cmd_kref, qla2x00_sp_release);
+ /* we should not be here */
+ dump_stack();
+ }
}

/* Get FC4 Feature with Nport ID. */
-int qla24xx_async_gffid(scsi_qla_host_t *vha, fc_port_t *fcport)
+int qla24xx_async_gffid(scsi_qla_host_t *vha, fc_port_t *fcport, bool wait)
{
int rval = QLA_FUNCTION_FAILED;
struct ct_sns_req *ct_req;
srb_t *sp;
+ DECLARE_COMPLETION_ONSTACK(comp);

- if (!vha->flags.online || (fcport->flags & FCF_ASYNC_SENT))
+ /* this routine does not have handling for no wait */
+ if (!vha->flags.online || !wait)
return rval;

/* ref: INIT */
@@ -3349,43 +3357,86 @@ int qla24xx_async_gffid(scsi_qla_host_t *vha, fc_port_t *fcport)
if (!sp)
return rval;

- fcport->flags |= FCF_ASYNC_SENT;
sp->type = SRB_CT_PTHRU_CMD;
sp->name = "gffid";
sp->gen1 = fcport->rscn_gen;
sp->gen2 = fcport->login_gen;
qla2x00_init_async_sp(sp, qla2x00_get_async_timeout(vha) + 2,
qla24xx_async_gffid_sp_done);
+ sp->comp = &comp;
+ sp->u.iocb_cmd.timeout = qla2x00_els_dcmd2_iocb_timeout;
+
+ if (wait)
+ sp->flags = SRB_WAKEUP_ON_COMP;
+
+ sp->u.iocb_cmd.u.ctarg.req_allocated_size = sizeof(struct ct_sns_pkt);
+ sp->u.iocb_cmd.u.ctarg.req = dma_alloc_coherent(&vha->hw->pdev->dev,
+ sp->u.iocb_cmd.u.ctarg.req_allocated_size,
+ &sp->u.iocb_cmd.u.ctarg.req_dma,
+ GFP_KERNEL);
+ if (!sp->u.iocb_cmd.u.ctarg.req) {
+ ql_log(ql_log_warn, vha, 0xd041,
+ "%s: Failed to allocate ct_sns request.\n",
+ __func__);
+ goto done_free_sp;
+ }
+
+ sp->u.iocb_cmd.u.ctarg.rsp_allocated_size = sizeof(struct ct_sns_pkt);
+ sp->u.iocb_cmd.u.ctarg.rsp = dma_alloc_coherent(&vha->hw->pdev->dev,
+ sp->u.iocb_cmd.u.ctarg.rsp_allocated_size,
+ &sp->u.iocb_cmd.u.ctarg.rsp_dma,
+ GFP_KERNEL);
+ if (!sp->u.iocb_cmd.u.ctarg.rsp) {
+ ql_log(ql_log_warn, vha, 0xd041,
+ "%s: Failed to allocate ct_sns response.\n",
+ __func__);
+ goto done_free_sp;
+ }

/* CT_IU preamble */
- ct_req = qla2x00_prep_ct_req(fcport->ct_desc.ct_sns, GFF_ID_CMD,
- GFF_ID_RSP_SIZE);
+ ct_req = qla2x00_prep_ct_req(sp->u.iocb_cmd.u.ctarg.req, GFF_ID_CMD, GFF_ID_RSP_SIZE);

ct_req->req.gff_id.port_id[0] = fcport->d_id.b.domain;
ct_req->req.gff_id.port_id[1] = fcport->d_id.b.area;
ct_req->req.gff_id.port_id[2] = fcport->d_id.b.al_pa;

- sp->u.iocb_cmd.u.ctarg.req = fcport->ct_desc.ct_sns;
- sp->u.iocb_cmd.u.ctarg.req_dma = fcport->ct_desc.ct_sns_dma;
- sp->u.iocb_cmd.u.ctarg.rsp = fcport->ct_desc.ct_sns;
- sp->u.iocb_cmd.u.ctarg.rsp_dma = fcport->ct_desc.ct_sns_dma;
sp->u.iocb_cmd.u.ctarg.req_size = GFF_ID_REQ_SIZE;
sp->u.iocb_cmd.u.ctarg.rsp_size = GFF_ID_RSP_SIZE;
sp->u.iocb_cmd.u.ctarg.nport_handle = NPH_SNS;

- ql_dbg(ql_dbg_disc, vha, 0x2132,
- "Async-%s hdl=%x %8phC.\n", sp->name,
- sp->handle, fcport->port_name);
-
rval = qla2x00_start_sp(sp);
- if (rval != QLA_SUCCESS)
+
+ if (rval != QLA_SUCCESS) {
+ rval = QLA_FUNCTION_FAILED;
goto done_free_sp;
+ } else {
+ ql_dbg(ql_dbg_disc, vha, 0x3074,
+ "Async-%s hdl=%x portid %06x\n",
+ sp->name, sp->handle, fcport->d_id.b24);
+ }
+
+ wait_for_completion(sp->comp);
+ rval = sp->rc;

- return rval;
done_free_sp:
+ if (sp->u.iocb_cmd.u.ctarg.req) {
+ dma_free_coherent(&vha->hw->pdev->dev,
+ sp->u.iocb_cmd.u.ctarg.req_allocated_size,
+ sp->u.iocb_cmd.u.ctarg.req,
+ sp->u.iocb_cmd.u.ctarg.req_dma);
+ sp->u.iocb_cmd.u.ctarg.req = NULL;
+ }
+
+ if (sp->u.iocb_cmd.u.ctarg.rsp) {
+ dma_free_coherent(&vha->hw->pdev->dev,
+ sp->u.iocb_cmd.u.ctarg.rsp_allocated_size,
+ sp->u.iocb_cmd.u.ctarg.rsp,
+ sp->u.iocb_cmd.u.ctarg.rsp_dma);
+ sp->u.iocb_cmd.u.ctarg.rsp = NULL;
+ }
+
/* ref: INIT */
kref_put(&sp->cmd_kref, qla2x00_sp_release);
- fcport->flags &= ~FCF_ASYNC_SENT;
return rval;
}

@@ -3578,7 +3629,7 @@ void qla24xx_async_gnnft_done(scsi_qla_host_t *vha, srb_t *sp)
do_delete) {
if (fcport->loop_id != FC_NO_LOOP_ID) {
if (fcport->flags & FCF_FCP2_DEVICE)
- fcport->logout_on_delete = 0;
+ continue;

ql_log(ql_log_warn, vha, 0x20f0,
"%s %d %8phC post del sess\n",
diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index 3f3417a3e891..51503a316b10 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -47,6 +47,7 @@ qla2x00_sp_timeout(struct timer_list *t)
{
srb_t *sp = from_timer(sp, t, u.iocb_cmd.timer);
struct srb_iocb *iocb;
+ scsi_qla_host_t *vha = sp->vha;

WARN_ON(irqs_disabled());
iocb = &sp->u.iocb_cmd;
@@ -54,6 +55,12 @@ qla2x00_sp_timeout(struct timer_list *t)

/* ref: TMR */
kref_put(&sp->cmd_kref, qla2x00_sp_release);
+
+ if (vha && qla2x00_isp_reg_stat(vha->hw)) {
+ ql_log(ql_log_info, vha, 0x9008,
+ "PCI/Register disconnect.\n");
+ qla_pci_set_eeh_busy(vha);
+ }
}

void qla2x00_sp_free(srb_t *sp)
@@ -161,6 +168,7 @@ int qla24xx_async_abort_cmd(srb_t *cmd_sp, bool wait)
struct srb_iocb *abt_iocb;
srb_t *sp;
int rval = QLA_FUNCTION_FAILED;
+ uint8_t bail;

/* ref: INIT for ABTS command */
sp = qla2xxx_get_qpair_sp(cmd_sp->vha, cmd_sp->qpair, cmd_sp->fcport,
@@ -168,6 +176,7 @@ int qla24xx_async_abort_cmd(srb_t *cmd_sp, bool wait)
if (!sp)
return QLA_MEMORY_ALLOC_FAILED;

+ QLA_VHA_MARK_BUSY(vha, bail);
abt_iocb = &sp->u.iocb_cmd;
sp->type = SRB_ABT_CMD;
sp->name = "abort";
@@ -1480,7 +1489,6 @@ static int qla_chk_secure_login(scsi_qla_host_t *vha, fc_port_t *fcport,
ql_dbg(ql_dbg_disc, vha, 0x20ef,
"%s %d %8phC EDIF: post DB_AUTH: AUTH needed\n",
__func__, __LINE__, fcport->port_name);
- fcport->edif.app_started = 1;
fcport->edif.app_sess_online = 1;

qla_edb_eventcreate(vha, VND_CMD_AUTH_STATE_NEEDED,
@@ -1763,8 +1771,16 @@ int qla24xx_fcport_handle_login(struct scsi_qla_host *vha, fc_port_t *fcport)
break;

case DSC_LOGIN_PEND:
- if (fcport->fw_login_state == DSC_LS_PLOGI_COMP)
+ if (vha->hw->flags.edif_enabled)
+ break;
+
+ if (fcport->fw_login_state == DSC_LS_PLOGI_COMP) {
+ ql_dbg(ql_dbg_disc, vha, 0x2118,
+ "%s %d %8phC post %s PRLI\n",
+ __func__, __LINE__, fcport->port_name,
+ NVME_TARGET(vha->hw, fcport) ? "NVME" : "FC");
qla24xx_post_prli_work(vha, fcport);
+ }
break;

case DSC_UPD_FCPORT:
@@ -1818,7 +1834,8 @@ void qla2x00_handle_rscn(scsi_qla_host_t *vha, struct event_arg *ea)
case RSCN_PORT_ADDR:
fcport = qla2x00_find_fcport_by_nportid(vha, &ea->id, 1);
if (fcport) {
- if (fcport->flags & FCF_FCP2_DEVICE) {
+ if (fcport->flags & FCF_FCP2_DEVICE &&
+ atomic_read(&fcport->state) == FCS_ONLINE) {
ql_dbg(ql_dbg_disc, vha, 0x2115,
"Delaying session delete for FCP2 portid=%06x %8phC ",
fcport->d_id.b24, fcport->port_name);
@@ -1850,7 +1867,8 @@ void qla2x00_handle_rscn(scsi_qla_host_t *vha, struct event_arg *ea)
break;
case RSCN_AREA_ADDR:
list_for_each_entry(fcport, &vha->vp_fcports, list) {
- if (fcport->flags & FCF_FCP2_DEVICE)
+ if (fcport->flags & FCF_FCP2_DEVICE &&
+ atomic_read(&fcport->state) == FCS_ONLINE)
continue;

if ((ea->id.b24 & 0xffff00) == (fcport->d_id.b24 & 0xffff00)) {
@@ -1861,7 +1879,8 @@ void qla2x00_handle_rscn(scsi_qla_host_t *vha, struct event_arg *ea)
break;
case RSCN_DOM_ADDR:
list_for_each_entry(fcport, &vha->vp_fcports, list) {
- if (fcport->flags & FCF_FCP2_DEVICE)
+ if (fcport->flags & FCF_FCP2_DEVICE &&
+ atomic_read(&fcport->state) == FCS_ONLINE)
continue;

if ((ea->id.b24 & 0xff0000) == (fcport->d_id.b24 & 0xff0000)) {
@@ -1873,7 +1892,8 @@ void qla2x00_handle_rscn(scsi_qla_host_t *vha, struct event_arg *ea)
case RSCN_FAB_ADDR:
default:
list_for_each_entry(fcport, &vha->vp_fcports, list) {
- if (fcport->flags & FCF_FCP2_DEVICE)
+ if (fcport->flags & FCF_FCP2_DEVICE &&
+ atomic_read(&fcport->state) == FCS_ONLINE)
continue;

fcport->scan_needed = 1;
@@ -2000,12 +2020,14 @@ qla2x00_async_tm_cmd(fc_port_t *fcport, uint32_t flags, uint32_t lun,
struct srb_iocb *tm_iocb;
srb_t *sp;
int rval = QLA_FUNCTION_FAILED;
+ uint8_t bail;

/* ref: INIT */
sp = qla2x00_get_sp(vha, fcport, GFP_KERNEL);
if (!sp)
goto done;

+ QLA_VHA_MARK_BUSY(vha, bail);
sp->type = SRB_TM_CMD;
sp->name = "tmf";
qla2x00_init_async_sp(sp, qla2x00_get_async_timeout(vha),
@@ -2124,6 +2146,13 @@ qla24xx_handle_prli_done_event(struct scsi_qla_host *vha, struct event_arg *ea)
}

if (N2N_TOPO(vha->hw)) {
+ if (ea->fcport->n2n_link_reset_cnt ==
+ vha->hw->login_retry_count &&
+ ea->fcport->flags & FCF_FCSP_DEVICE) {
+ /* remote authentication app just started */
+ ea->fcport->n2n_link_reset_cnt = 0;
+ }
+
if (ea->fcport->n2n_link_reset_cnt <
vha->hw->login_retry_count) {
ea->fcport->n2n_link_reset_cnt++;
@@ -4509,6 +4538,8 @@ qla2x00_init_rings(scsi_qla_host_t *vha)
BIT_6) != 0;
ql_dbg(ql_dbg_init, vha, 0x00bc, "FA-WWPN Support: %s.\n",
(ha->flags.fawwpn_enabled) ? "enabled" : "disabled");
+ /* Init_cb will be reused for other command(s). Save a backup copy of port_name */
+ memcpy(ha->port_name, ha->init_cb->port_name, WWN_SIZE);
}

/* ELS pass through payload is limit by frame size. */
@@ -5273,9 +5304,6 @@ qla2x00_alloc_fcport(scsi_qla_host_t *vha, gfp_t flags)
INIT_LIST_HEAD(&fcport->edif.tx_sa_list);
INIT_LIST_HEAD(&fcport->edif.rx_sa_list);

- if (vha->e_dbell.db_flags == EDB_ACTIVE)
- fcport->edif.app_started = 1;
-
spin_lock_init(&fcport->edif.indx_list_lock);
INIT_LIST_HEAD(&fcport->edif.edif_indx_list);

@@ -5488,6 +5516,22 @@ static int qla2x00_configure_n2n_loop(scsi_qla_host_t *vha)
return QLA_FUNCTION_FAILED;
}

+static void
+qla_reinitialize_link(scsi_qla_host_t *vha)
+{
+ int rval;
+
+ atomic_set(&vha->loop_state, LOOP_DOWN);
+ atomic_set(&vha->loop_down_timer, LOOP_DOWN_TIME);
+ rval = qla2x00_full_login_lip(vha);
+ if (rval == QLA_SUCCESS) {
+ ql_dbg(ql_dbg_disc, vha, 0xd050, "Link reinitialized\n");
+ } else {
+ ql_dbg(ql_dbg_disc, vha, 0xd051,
+ "Link reinitialization failed (%d)\n", rval);
+ }
+}
+
/*
* qla2x00_configure_local_loop
* Updates Fibre Channel Device Database with local loop devices.
@@ -5539,6 +5583,19 @@ qla2x00_configure_local_loop(scsi_qla_host_t *vha)
spin_unlock_irqrestore(&vha->work_lock, flags);

if (vha->scan.scan_retry < MAX_SCAN_RETRIES) {
+ u8 loop_map_entries = 0;
+ int rc;
+
+ rc = qla2x00_get_fcal_position_map(vha, NULL,
+ &loop_map_entries);
+ if (rc == QLA_SUCCESS && loop_map_entries > 1) {
+ /*
+ * There are devices that are still not logged
+ * in. Reinitialize to give them a chance.
+ */
+ qla_reinitialize_link(vha);
+ return QLA_FUNCTION_FAILED;
+ }
set_bit(LOCAL_LOOP_UPDATE, &vha->dpc_flags);
set_bit(LOOP_RESYNC_NEEDED, &vha->dpc_flags);
}
@@ -5767,8 +5824,6 @@ qla2x00_reg_remote_port(scsi_qla_host_t *vha, fc_port_t *fcport)
if (atomic_read(&fcport->state) == FCS_ONLINE)
return;

- qla2x00_set_fcport_state(fcport, FCS_ONLINE);
-
rport_ids.node_name = wwn_to_u64(fcport->node_name);
rport_ids.port_name = wwn_to_u64(fcport->port_name);
rport_ids.port_id = fcport->d_id.b.domain << 16 |
@@ -5869,7 +5924,6 @@ qla2x00_update_fcport(scsi_qla_host_t *vha, fc_port_t *fcport)
qla2x00_reg_remote_port(vha, fcport);
break;
case MODE_TARGET:
- qla2x00_set_fcport_state(fcport, FCS_ONLINE);
if (!vha->vha_tgt.qla_tgt->tgt_stop &&
!vha->vha_tgt.qla_tgt->tgt_stopped)
qlt_fc_port_added(vha, fcport);
@@ -5887,6 +5941,8 @@ qla2x00_update_fcport(scsi_qla_host_t *vha, fc_port_t *fcport)
if (NVME_TARGET(vha->hw, fcport))
qla_nvme_register_remote(vha, fcport);

+ qla2x00_set_fcport_state(fcport, FCS_ONLINE);
+
if (IS_IIDMA_CAPABLE(vha->hw) && vha->hw->flags.gpsc_supported) {
if (fcport->id_changed) {
fcport->id_changed = 0;
@@ -9657,6 +9713,12 @@ int qla2xxx_disable_port(struct Scsi_Host *host)

vha->hw->flags.port_isolated = 1;

+ if (qla2x00_isp_reg_stat(vha->hw)) {
+ ql_log(ql_log_info, vha, 0x9006,
+ "PCI/Register disconnect, exiting.\n");
+ qla_pci_set_eeh_busy(vha);
+ return FAILED;
+ }
if (qla2x00_chip_is_down(vha))
return 0;

@@ -9672,6 +9734,13 @@ int qla2xxx_enable_port(struct Scsi_Host *host)
{
scsi_qla_host_t *vha = shost_priv(host);

+ if (qla2x00_isp_reg_stat(vha->hw)) {
+ ql_log(ql_log_info, vha, 0x9001,
+ "PCI/Register disconnect, exiting.\n");
+ qla_pci_set_eeh_busy(vha);
+ return FAILED;
+ }
+
vha->hw->flags.port_isolated = 0;
/* Set the flag to 1, so that isp_abort can proceed */
vha->flags.online = 1;
diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c
index e0fe9ddb4bd2..42ce4e1fe744 100644
--- a/drivers/scsi/qla2xxx/qla_iocb.c
+++ b/drivers/scsi/qla2xxx/qla_iocb.c
@@ -2819,7 +2819,7 @@ qla24xx_els_logo_iocb(srb_t *sp, struct els_entry_24xx *els_iocb)
sp->vha->qla_stats.control_requests++;
}

-static void
+void
qla2x00_els_dcmd2_iocb_timeout(void *data)
{
srb_t *sp = data;
@@ -2882,6 +2882,9 @@ static void qla2x00_els_dcmd2_sp_done(srb_t *sp, int res)
sp->name, res, sp->handle, fcport->d_id.b24, fcport->port_name);

fcport->flags &= ~(FCF_ASYNC_SENT|FCF_ASYNC_ACTIVE);
+ /* For edif, set logout on delete to ensure any residual key from FW is flushed.*/
+ fcport->logout_on_delete = 1;
+ fcport->chip_reset = vha->hw->base_qpair->chip_reset;

if (sp->flags & SRB_WAKEUP_ON_COMP)
complete(&lio->u.els_plogi.comp);
diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index 21b31d6359c8..de348628aa53 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -1354,9 +1354,7 @@ qla2x00_async_event(scsi_qla_host_t *vha, struct rsp_que *rsp, uint16_t *mb)
if (!vha->vp_idx) {
if (ha->flags.fawwpn_enabled &&
(ha->current_topology == ISP_CFG_F)) {
- void *wwpn = ha->init_cb->port_name;
-
- memcpy(vha->port_name, wwpn, WWN_SIZE);
+ memcpy(vha->port_name, ha->port_name, WWN_SIZE);
fc_host_port_name(vha->host) =
wwn_to_u64(vha->port_name);
ql_dbg(ql_dbg_init + ql_dbg_verbose,
@@ -2639,7 +2637,7 @@ static void qla24xx_nvme_iocb_entry(scsi_qla_host_t *vha, struct req_que *req,
}

if (unlikely(logit))
- ql_log(ql_dbg_io, fcport->vha, 0x5060,
+ ql_dbg(ql_dbg_io, fcport->vha, 0x5060,
"NVME-%s ERR Handling - hdl=%x status(%x) tr_len:%x resid=%x ox_id=%x\n",
sp->name, sp->handle, comp_status,
fd->transferred_length, le32_to_cpu(sts->residual_len),
@@ -3426,6 +3424,7 @@ qla2x00_status_entry(scsi_qla_host_t *vha, struct rsp_que *rsp, void *pkt)
case CS_PORT_UNAVAILABLE:
case CS_TIMEOUT:
case CS_RESET:
+ case CS_EDIF_INV_REQ:

/*
* We are going to have the fc class block the rport
@@ -3496,7 +3495,7 @@ qla2x00_status_entry(scsi_qla_host_t *vha, struct rsp_que *rsp, void *pkt)

out:
if (logit)
- ql_log(ql_dbg_io, fcport->vha, 0x3022,
+ ql_dbg(ql_dbg_io, fcport->vha, 0x3022,
"FCP command status: 0x%x-0x%x (0x%x) nexus=%ld:%d:%llu portid=%02x%02x%02x oxid=0x%x cdb=%10phN len=0x%x rsp_info=0x%x resid=0x%x fw_resid=0x%x sp=%p cp=%p.\n",
comp_status, scsi_status, res, vha->host_no,
cp->device->id, cp->device->lun, fcport->d_id.b.domain,
@@ -4420,16 +4419,12 @@ qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
}

/* Enable MSI-X vector for response queue update for queue 0 */
- if (IS_QLA83XX(ha) || IS_QLA27XX(ha) || IS_QLA28XX(ha)) {
- if (ha->msixbase && ha->mqiobase &&
- (ha->max_rsp_queues > 1 || ha->max_req_queues > 1 ||
- ql2xmqsupport))
- ha->mqenable = 1;
- } else
- if (ha->mqiobase &&
- (ha->max_rsp_queues > 1 || ha->max_req_queues > 1 ||
- ql2xmqsupport))
- ha->mqenable = 1;
+ if (IS_MQUE_CAPABLE(ha) &&
+ (ha->msixbase && ha->mqiobase && ha->max_qpairs))
+ ha->mqenable = 1;
+ else
+ ha->mqenable = 0;
+
ql_dbg(ql_dbg_multiq, vha, 0xc005,
"mqiobase=%p, max_rsp_queues=%d, max_req_queues=%d.\n",
ha->mqiobase, ha->max_rsp_queues, ha->max_req_queues);
diff --git a/drivers/scsi/qla2xxx/qla_mbx.c b/drivers/scsi/qla2xxx/qla_mbx.c
index 892caf2475df..86d8c455c07a 100644
--- a/drivers/scsi/qla2xxx/qla_mbx.c
+++ b/drivers/scsi/qla2xxx/qla_mbx.c
@@ -238,6 +238,8 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
ql_dbg(ql_dbg_mbx, vha, 0x1112,
"mbox[%d]<-0x%04x\n", cnt, *iptr);
wrt_reg_word(optr, *iptr);
+ } else {
+ wrt_reg_word(optr, 0);
}

mboxes >>= 1;
@@ -274,6 +276,12 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
atomic_inc(&ha->num_pend_mbx_stage3);
if (!wait_for_completion_timeout(&ha->mbx_intr_comp,
mcp->tov * HZ)) {
+ ql_dbg(ql_dbg_mbx, vha, 0x117a,
+ "cmd=%x Timeout.\n", command);
+ spin_lock_irqsave(&ha->hardware_lock, flags);
+ clear_bit(MBX_INTR_WAIT, &ha->mbx_cmd_flags);
+ spin_unlock_irqrestore(&ha->hardware_lock, flags);
+
if (chip_reset != ha->chip_reset) {
eeh_delay = ha->flags.eeh_busy ? 1 : 0;

@@ -286,12 +294,6 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
rval = QLA_ABORTED;
goto premature_exit;
}
- ql_dbg(ql_dbg_mbx, vha, 0x117a,
- "cmd=%x Timeout.\n", command);
- spin_lock_irqsave(&ha->hardware_lock, flags);
- clear_bit(MBX_INTR_WAIT, &ha->mbx_cmd_flags);
- spin_unlock_irqrestore(&ha->hardware_lock, flags);
-
} else if (ha->flags.purge_mbox ||
chip_reset != ha->chip_reset) {
eeh_delay = ha->flags.eeh_busy ? 1 : 0;
@@ -3066,7 +3068,8 @@ qla2x00_get_resource_cnts(scsi_qla_host_t *vha)
* Kernel context.
*/
int
-qla2x00_get_fcal_position_map(scsi_qla_host_t *vha, char *pos_map)
+qla2x00_get_fcal_position_map(scsi_qla_host_t *vha, char *pos_map,
+ u8 *num_entries)
{
int rval;
mbx_cmd_t mc;
@@ -3106,6 +3109,8 @@ qla2x00_get_fcal_position_map(scsi_qla_host_t *vha, char *pos_map)

if (pos_map)
memcpy(pos_map, pmap, FCAL_MAP_SIZE);
+ if (num_entries)
+ *num_entries = pmap[0];
}
dma_pool_free(ha->s_dma_pool, pmap, pmap_dma);

diff --git a/drivers/scsi/qla2xxx/qla_mid.c b/drivers/scsi/qla2xxx/qla_mid.c
index e6b5c4ccce97..eb43a5f1b399 100644
--- a/drivers/scsi/qla2xxx/qla_mid.c
+++ b/drivers/scsi/qla2xxx/qla_mid.c
@@ -166,9 +166,13 @@ qla24xx_disable_vp(scsi_qla_host_t *vha)
int ret = QLA_SUCCESS;
fc_port_t *fcport;

- if (vha->hw->flags.edif_enabled)
+ if (vha->hw->flags.edif_enabled) {
+ if (DBELL_ACTIVE(vha))
+ qla2x00_post_aen_work(vha, FCH_EVT_VENDOR_UNIQUE,
+ FCH_EVT_VENDOR_UNIQUE_VPORT_DOWN);
/* delete sessions and flush sa_indexes */
qla2x00_wait_for_sess_deletion(vha);
+ }

if (vha->hw->flags.fw_started)
ret = qla24xx_control_vp(vha, VCE_COMMAND_DISABLE_VPS_LOGO_ALL);
diff --git a/drivers/scsi/qla2xxx/qla_nvme.c b/drivers/scsi/qla2xxx/qla_nvme.c
index 87c9404aa401..7450c3458be7 100644
--- a/drivers/scsi/qla2xxx/qla_nvme.c
+++ b/drivers/scsi/qla2xxx/qla_nvme.c
@@ -37,11 +37,6 @@ int qla_nvme_register_remote(struct scsi_qla_host *vha, struct fc_port *fcport)
(fcport->nvme_flag & NVME_FLAG_REGISTERED))
return 0;

- if (atomic_read(&fcport->state) == FCS_ONLINE)
- return 0;
-
- qla2x00_set_fcport_state(fcport, FCS_ONLINE);
-
fcport->nvme_flag &= ~NVME_FLAG_RESETTING;

memset(&req, 0, sizeof(struct nvme_fc_port_info));
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 762229d495a8..f9ad0847782d 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -333,6 +333,11 @@ MODULE_PARM_DESC(ql2xabts_wait_nvme,
"To wait for ABTS response on I/O timeouts for NVMe. (default: 1)");

+u32 ql2xdelay_before_pci_error_handling = 5;
+module_param(ql2xdelay_before_pci_error_handling, uint, 0644);
+MODULE_PARM_DESC(ql2xdelay_before_pci_error_handling,
+ "Number of seconds delayed before qla begin PCI error self-handling (default: 5).\n");
+
static void qla2x00_clear_drv_active(struct qla_hw_data *);
static void qla2x00_free_device(scsi_qla_host_t *);
static int qla2xxx_map_queues(struct Scsi_Host *shost);
@@ -1337,21 +1342,20 @@ qla2xxx_eh_abort(struct scsi_cmnd *cmd)
/*
* Returns: QLA_SUCCESS or QLA_FUNCTION_FAILED.
*/
-int
-qla2x00_eh_wait_for_pending_commands(scsi_qla_host_t *vha, unsigned int t,
- uint64_t l, enum nexus_wait_type type)
+static int
+__qla2x00_eh_wait_for_pending_commands(struct qla_qpair *qpair, unsigned int t,
+ uint64_t l, enum nexus_wait_type type)
{
int cnt, match, status;
unsigned long flags;
- struct qla_hw_data *ha = vha->hw;
- struct req_que *req;
+ scsi_qla_host_t *vha = qpair->vha;
+ struct req_que *req = qpair->req;
srb_t *sp;
struct scsi_cmnd *cmd;

status = QLA_SUCCESS;

- spin_lock_irqsave(&ha->hardware_lock, flags);
- req = vha->req;
+ spin_lock_irqsave(qpair->qp_lock_ptr, flags);
for (cnt = 1; status == QLA_SUCCESS &&
cnt < req->num_outstanding_cmds; cnt++) {
sp = req->outstanding_cmds[cnt];
@@ -1378,15 +1382,35 @@ qla2x00_eh_wait_for_pending_commands(scsi_qla_host_t *vha, unsigned int t,
if (!match)
continue;

- spin_unlock_irqrestore(&ha->hardware_lock, flags);
+ spin_unlock_irqrestore(qpair->qp_lock_ptr, flags);
status = qla2x00_eh_wait_on_command(cmd);
- spin_lock_irqsave(&ha->hardware_lock, flags);
+ spin_lock_irqsave(qpair->qp_lock_ptr, flags);
}
- spin_unlock_irqrestore(&ha->hardware_lock, flags);
+ spin_unlock_irqrestore(qpair->qp_lock_ptr, flags);

return status;
}

+int
+qla2x00_eh_wait_for_pending_commands(scsi_qla_host_t *vha, unsigned int t,
+ uint64_t l, enum nexus_wait_type type)
+{
+ struct qla_qpair *qpair;
+ struct qla_hw_data *ha = vha->hw;
+ int i, status = QLA_SUCCESS;
+
+ status = __qla2x00_eh_wait_for_pending_commands(ha->base_qpair, t, l,
+ type);
+ for (i = 0; status == QLA_SUCCESS && i < ha->max_qpairs; i++) {
+ qpair = ha->queue_pair_map[i];
+ if (!qpair)
+ continue;
+ status = __qla2x00_eh_wait_for_pending_commands(qpair, t, l,
+ type);
+ }
+ return status;
+}
+
static char *reset_errors[] = {
"HBA not online",
"HBA not ready",
@@ -1420,7 +1444,7 @@ qla2xxx_eh_device_reset(struct scsi_cmnd *cmd)
return err;

if (fcport->deleted)
- return SUCCESS;
+ return FAILED;

ql_log(ql_log_info, vha, 0x8009,
"DEVICE RESET ISSUED nexus=%ld:%d:%llu cmd=%p.\n", vha->host_no,
@@ -1488,7 +1512,7 @@ qla2xxx_eh_target_reset(struct scsi_cmnd *cmd)
return err;

if (fcport->deleted)
- return SUCCESS;
+ return FAILED;

ql_log(ql_log_info, vha, 0x8009,
"TARGET RESET ISSUED nexus=%ld:%d cmd=%p.\n", vha->host_no,
@@ -5473,7 +5497,7 @@ qla2x00_do_work(struct scsi_qla_host *vha)
e->u.fcport.fcport, false);
break;
case QLA_EVT_SA_REPLACE:
- qla24xx_issue_sa_replace_iocb(vha, e);
+ rc = qla24xx_issue_sa_replace_iocb(vha, e);
break;
}

@@ -7239,6 +7263,44 @@ static void qla_heart_beat(struct scsi_qla_host *vha, u16 dpc_started)
}
}

+static void qla_wind_down_chip(scsi_qla_host_t *vha)
+{
+ struct qla_hw_data *ha = vha->hw;
+
+ if (!ha->flags.eeh_busy)
+ return;
+ if (ha->pci_error_state)
+ /* system is trying to recover */
+ return;
+
+ /*
+ * Current system is not handling PCIE error. At this point, this is
+ * best effort to wind down the adapter.
+ */
+ if (time_after_eq(jiffies, ha->eeh_jif + ql2xdelay_before_pci_error_handling * HZ) &&
+ !ha->flags.eeh_flush) {
+ ql_log(ql_log_info, vha, 0x9009,
+ "PCI Error detected, attempting to reset hardware.\n");
+
+ ha->isp_ops->reset_chip(vha);
+ ha->isp_ops->disable_intrs(ha);
+
+ ha->flags.eeh_flush = EEH_FLUSH_RDY;
+ ha->eeh_jif = jiffies;
+
+ } else if (ha->flags.eeh_flush == EEH_FLUSH_RDY &&
+ time_after_eq(jiffies, ha->eeh_jif + 5 * HZ)) {
+ pci_clear_master(ha->pdev);
+
+ /* flush all command */
+ qla2x00_abort_isp_cleanup(vha);
+ ha->flags.eeh_flush = EEH_FLUSH_DONE;
+
+ ql_log(ql_log_info, vha, 0x900a,
+ "PCI Error handling complete, all IOs aborted.\n");
+ }
+}
+
/**************************************************************************
* qla2x00_timer
*
@@ -7262,6 +7324,8 @@ qla2x00_timer(struct timer_list *t)
fc_port_t *fcport = NULL;

if (ha->flags.eeh_busy) {
+ qla_wind_down_chip(vha);
+
ql_dbg(ql_dbg_timer, vha, 0x6000,
"EEH = %d, restarting timer.\n",
ha->flags.eeh_busy);
@@ -7842,6 +7906,9 @@ void qla_pci_set_eeh_busy(struct scsi_qla_host *vha)

spin_lock_irqsave(&base_vha->work_lock, flags);
if (!ha->flags.eeh_busy) {
+ ha->eeh_jif = jiffies;
+ ha->flags.eeh_flush = 0;
+
ha->flags.eeh_busy = 1;
do_cleanup = true;
}
diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
index 6dfcfd8e7337..34b85c80233f 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -988,22 +988,6 @@ void qlt_free_session_done(struct work_struct *work)
sess->send_els_logo);

if (!IS_SW_RESV_ADDR(sess->d_id)) {
- if (ha->flags.edif_enabled &&
- (!own || own->iocb.u.isp24.status_subcode == ELS_PLOGI)) {
- sess->edif.authok = 0;
- if (!ha->flags.host_shutting_down) {
- ql_dbg(ql_dbg_edif, vha, 0x911e,
- "%s wwpn %8phC calling qla2x00_release_all_sadb\n",
- __func__, sess->port_name);
- qla2x00_release_all_sadb(vha, sess);
- } else {
- ql_dbg(ql_dbg_edif, vha, 0x911e,
- "%s bypassing release_all_sadb\n",
- __func__);
- }
- qla_edif_clear_appdata(vha, sess);
- qla_edif_sess_down(vha, sess);
- }
qla2x00_mark_device_lost(vha, sess, 0);

if (sess->send_els_logo) {
@@ -1049,6 +1033,25 @@ void qlt_free_session_done(struct work_struct *work)
sess->nvme_flag |= NVME_FLAG_DELETING;
qla_nvme_unregister_remote_port(sess);
}
+
+ if (ha->flags.edif_enabled &&
+ (!own || (own &&
+ own->iocb.u.isp24.status_subcode == ELS_PLOGI))) {
+ sess->edif.authok = 0;
+ if (!ha->flags.host_shutting_down) {
+ ql_dbg(ql_dbg_edif, vha, 0x911e,
+ "%s wwpn %8phC calling qla2x00_release_all_sadb\n",
+ __func__, sess->port_name);
+ qla2x00_release_all_sadb(vha, sess);
+ } else {
+ ql_dbg(ql_dbg_edif, vha, 0x911e,
+ "%s bypassing release_all_sadb\n",
+ __func__);
+ }
+
+ qla_edif_clear_appdata(vha, sess);
+ qla_edif_sess_down(vha, sess);
+ }
}

/*
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index 5d21f07456c6..2a38cd2d24ef 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -2264,16 +2264,8 @@ static void iscsi_if_disconnect_bound_ep(struct iscsi_cls_conn *conn,
}
}

-static int iscsi_if_stop_conn(struct iscsi_transport *transport,
- struct iscsi_uevent *ev)
+static int iscsi_if_stop_conn(struct iscsi_cls_conn *conn, int flag)
{
- int flag = ev->u.stop_conn.flag;
- struct iscsi_cls_conn *conn;
-
- conn = iscsi_conn_lookup(ev->u.stop_conn.sid, ev->u.stop_conn.cid);
- if (!conn)
- return -EINVAL;
-
ISCSI_DBG_TRANS_CONN(conn, "iscsi if conn stop.\n");
/*
* If this is a termination we have to call stop_conn with that flag
@@ -2349,6 +2341,55 @@ static void iscsi_cleanup_conn_work_fn(struct work_struct *work)
ISCSI_DBG_TRANS_CONN(conn, "cleanup done.\n");
}

+static int iscsi_iter_force_destroy_conn_fn(struct device *dev, void *data)
+{
+ struct iscsi_transport *transport;
+ struct iscsi_cls_conn *conn;
+
+ if (!iscsi_is_conn_dev(dev))
+ return 0;
+
+ conn = iscsi_dev_to_conn(dev);
+ transport = conn->transport;
+
+ if (READ_ONCE(conn->state) != ISCSI_CONN_DOWN)
+ iscsi_if_stop_conn(conn, STOP_CONN_TERM);
+
+ transport->destroy_conn(conn);
+ return 0;
+}
+
+/**
+ * iscsi_force_destroy_session - destroy a session from the kernel
+ * @session: session to destroy
+ *
+ * Force the destruction of a session from the kernel. This should only be
+ * used when userspace is no longer running during system shutdown.
+ */
+void iscsi_force_destroy_session(struct iscsi_cls_session *session)
+{
+ struct iscsi_transport *transport = session->transport;
+ unsigned long flags;
+
+ WARN_ON_ONCE(system_state == SYSTEM_RUNNING);
+
+ spin_lock_irqsave(&sesslock, flags);
+ if (list_empty(&session->sess_list)) {
+ spin_unlock_irqrestore(&sesslock, flags);
+ /*
+ * Conn/ep is already freed. Session is being torn down via
+ * async path. For shutdown we don't care about it so return.
+ */
+ return;
+ }
+ spin_unlock_irqrestore(&sesslock, flags);
+
+ device_for_each_child(&session->dev, NULL,
+ iscsi_iter_force_destroy_conn_fn);
+ transport->destroy_session(session);
+}
+EXPORT_SYMBOL_GPL(iscsi_force_destroy_session);
+
void iscsi_free_session(struct iscsi_cls_session *session)
{
ISCSI_DBG_TRANS_SESSION(session, "Freeing session\n");
@@ -3720,7 +3761,12 @@ static int iscsi_if_transport_conn(struct iscsi_transport *transport,
case ISCSI_UEVENT_DESTROY_CONN:
return iscsi_if_destroy_conn(transport, ev);
case ISCSI_UEVENT_STOP_CONN:
- return iscsi_if_stop_conn(transport, ev);
+ conn = iscsi_conn_lookup(ev->u.stop_conn.sid,
+ ev->u.stop_conn.cid);
+ if (!conn)
+ return -EINVAL;
+
+ return iscsi_if_stop_conn(conn, ev->u.stop_conn.flag);
}

/*
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index cbffa712b9f3..508cb855386d 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -195,7 +195,7 @@ static void sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size);
static void sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp);
static Sg_fd *sg_add_sfp(Sg_device * sdp);
static void sg_remove_sfp(struct kref *);
-static Sg_request *sg_get_rq_mark(Sg_fd * sfp, int pack_id);
+static Sg_request *sg_get_rq_mark(Sg_fd * sfp, int pack_id, bool *busy);
static Sg_request *sg_add_request(Sg_fd * sfp);
static int sg_remove_request(Sg_fd * sfp, Sg_request * srp);
static Sg_device *sg_get_dev(int dev);
@@ -444,6 +444,7 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t * ppos)
Sg_fd *sfp;
Sg_request *srp;
int req_pack_id = -1;
+ bool busy;
sg_io_hdr_t *hp;
struct sg_header *old_hdr;
int retval;
@@ -466,20 +467,16 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t * ppos)
if (retval)
return retval;

- srp = sg_get_rq_mark(sfp, req_pack_id);
+ srp = sg_get_rq_mark(sfp, req_pack_id, &busy);
if (!srp) { /* now wait on packet to arrive */
- if (atomic_read(&sdp->detaching))
- return -ENODEV;
if (filp->f_flags & O_NONBLOCK)
return -EAGAIN;
retval = wait_event_interruptible(sfp->read_wait,
- (atomic_read(&sdp->detaching) ||
- (srp = sg_get_rq_mark(sfp, req_pack_id))));
- if (atomic_read(&sdp->detaching))
- return -ENODEV;
- if (retval)
- /* -ERESTARTSYS as signal hit process */
- return retval;
+ ((srp = sg_get_rq_mark(sfp, req_pack_id, &busy)) ||
+ (!busy && atomic_read(&sdp->detaching))));
+ if (!srp)
+ /* signal or detaching */
+ return retval ? retval : -ENODEV;
}
if (srp->header.interface_id != '\0')
return sg_new_read(sfp, buf, count, srp);
@@ -939,9 +936,7 @@ sg_ioctl_common(struct file *filp, Sg_device *sdp, Sg_fd *sfp,
if (result < 0)
return result;
result = wait_event_interruptible(sfp->read_wait,
- (srp_done(sfp, srp) || atomic_read(&sdp->detaching)));
- if (atomic_read(&sdp->detaching))
- return -ENODEV;
+ srp_done(sfp, srp));
write_lock_irq(&sfp->rq_list_lock);
if (srp->done) {
srp->done = 2;
@@ -2078,19 +2073,28 @@ sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp)
}

static Sg_request *
-sg_get_rq_mark(Sg_fd * sfp, int pack_id)
+sg_get_rq_mark(Sg_fd * sfp, int pack_id, bool *busy)
{
Sg_request *resp;
unsigned long iflags;

+ *busy = false;
write_lock_irqsave(&sfp->rq_list_lock, iflags);
list_for_each_entry(resp, &sfp->rq_list, entry) {
- /* look for requests that are ready + not SG_IO owned */
- if ((1 == resp->done) && (!resp->sg_io_owned) &&
+ /* look for requests that are not SG_IO owned */
+ if ((!resp->sg_io_owned) &&
((-1 == pack_id) || (resp->header.pack_id == pack_id))) {
- resp->done = 2; /* guard against other readers */
- write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
- return resp;
+ switch (resp->done) {
+ case 0: /* request active */
+ *busy = true;
+ break;
+ case 1: /* request done; response ready to return */
+ resp->done = 2; /* guard against other readers */
+ write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+ return resp;
+ case 2: /* response already being returned */
+ break;
+ }
}
}
write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
@@ -2144,6 +2148,15 @@ sg_remove_request(Sg_fd * sfp, Sg_request * srp)
res = 1;
}
write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+
+ /*
+ * If the device is detaching, wakeup any readers in case we just
+ * removed the last response, which would leave nothing for them to
+ * return other than -ENODEV.
+ */
+ if (unlikely(atomic_read(&sfp->parentdp->detaching)))
+ wake_up_interruptible_all(&sfp->read_wait);
+
return res;
}

diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c
index 7c0d069a3158..e1fc6f5b9612 100644
--- a/drivers/scsi/smartpqi/smartpqi_init.c
+++ b/drivers/scsi/smartpqi/smartpqi_init.c
@@ -5484,10 +5484,10 @@ static int pqi_raid_submit_scsi_cmd_with_io_request(
}

switch (scmd->sc_data_direction) {
- case DMA_TO_DEVICE:
+ case DMA_FROM_DEVICE:
request->data_direction = SOP_READ_FLAG;
break;
- case DMA_FROM_DEVICE:
+ case DMA_TO_DEVICE:
request->data_direction = SOP_WRITE_FLAG;
break;
case DMA_NONE:
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 874490f7f5e7..9e6f45aea20e 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -9500,12 +9500,8 @@ EXPORT_SYMBOL(ufshcd_runtime_resume);
int ufshcd_shutdown(struct ufs_hba *hba)
{
if (ufshcd_is_ufs_dev_poweroff(hba) && ufshcd_is_link_off(hba))
- goto out;
-
- pm_runtime_get_sync(hba->dev);
+ ufshcd_suspend(hba);

- ufshcd_suspend(hba);
-out:
hba->is_powered = false;
/* allow force shutdown even in case of errors */
return 0;
diff --git a/drivers/soc/amlogic/meson-mx-socinfo.c b/drivers/soc/amlogic/meson-mx-socinfo.c
index 78f0f1aeca57..92125dd65f33 100644
--- a/drivers/soc/amlogic/meson-mx-socinfo.c
+++ b/drivers/soc/amlogic/meson-mx-socinfo.c
@@ -126,6 +126,7 @@ static int __init meson_mx_socinfo_init(void)
np = of_find_matching_node(NULL, meson_mx_socinfo_analog_top_ids);
if (np) {
analog_top_regmap = syscon_node_to_regmap(np);
+ of_node_put(np);
if (IS_ERR(analog_top_regmap))
return PTR_ERR(analog_top_regmap);

diff --git a/drivers/soc/amlogic/meson-secure-pwrc.c b/drivers/soc/amlogic/meson-secure-pwrc.c
index a10a417a87db..e93518763526 100644
--- a/drivers/soc/amlogic/meson-secure-pwrc.c
+++ b/drivers/soc/amlogic/meson-secure-pwrc.c
@@ -152,8 +152,10 @@ static int meson_secure_pwrc_probe(struct platform_device *pdev)
}

pwrc = devm_kzalloc(&pdev->dev, sizeof(*pwrc), GFP_KERNEL);
- if (!pwrc)
+ if (!pwrc) {
+ of_node_put(sm_np);
return -ENOMEM;
+ }

pwrc->fw = meson_sm_get(sm_np);
of_node_put(sm_np);
diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c
index 5ed2fc1c53a0..be18d46c7b0f 100644
--- a/drivers/soc/fsl/guts.c
+++ b/drivers/soc/fsl/guts.c
@@ -140,7 +140,7 @@ static int fsl_guts_probe(struct platform_device *pdev)
struct device_node *root, *np = pdev->dev.of_node;
struct device *dev = &pdev->dev;
const struct fsl_soc_die_attr *soc_die;
- const char *machine;
+ const char *machine = NULL;
u32 svr;

/* Initialize guts */
diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
index e718b8735444..4472fb22ba04 100644
--- a/drivers/soc/qcom/Kconfig
+++ b/drivers/soc/qcom/Kconfig
@@ -129,6 +129,7 @@ config QCOM_RPMHPD

config QCOM_RPMPD
tristate "Qualcomm RPM Power domain driver"
+ depends on PM
depends on QCOM_SMD_RPM
help
QCOM RPM Power domain driver to support power-domains with
diff --git a/drivers/soc/qcom/ocmem.c b/drivers/soc/qcom/ocmem.c
index 97fd24c178f8..c92d26b73e6f 100644
--- a/drivers/soc/qcom/ocmem.c
+++ b/drivers/soc/qcom/ocmem.c
@@ -194,14 +194,17 @@ struct ocmem *of_get_ocmem(struct device *dev)
devnode = of_parse_phandle(dev->of_node, "sram", 0);
if (!devnode || !devnode->parent) {
dev_err(dev, "Cannot look up sram phandle\n");
+ of_node_put(devnode);
return ERR_PTR(-ENODEV);
}

pdev = of_find_device_by_node(devnode->parent);
if (!pdev) {
dev_err(dev, "Cannot find device node %s\n", devnode->name);
+ of_node_put(devnode);
return ERR_PTR(-EPROBE_DEFER);
}
+ of_node_put(devnode);

ocmem = platform_get_drvdata(pdev);
if (!ocmem) {
diff --git a/drivers/soc/qcom/qcom_aoss.c b/drivers/soc/qcom/qcom_aoss.c
index a59bb34e5eba..18c856056475 100644
--- a/drivers/soc/qcom/qcom_aoss.c
+++ b/drivers/soc/qcom/qcom_aoss.c
@@ -399,8 +399,10 @@ static int qmp_cooling_devices_register(struct qmp *qmp)
continue;
ret = qmp_cooling_device_add(qmp, &qmp->cooling_devs[count++],
child);
- if (ret)
+ if (ret) {
+ of_node_put(child);
goto unroll;
+ }
}

if (!count)
diff --git a/drivers/soc/qcom/socinfo.c b/drivers/soc/qcom/socinfo.c
index 8b38d134720a..f6879983da69 100644
--- a/drivers/soc/qcom/socinfo.c
+++ b/drivers/soc/qcom/socinfo.c
@@ -328,7 +328,8 @@ static const struct soc_id soc_id[] = {
{ 455, "QRB5165" },
{ 457, "SM8450" },
{ 459, "SM7225" },
- { 460, "SA8540P" },
+ { 460, "SA8295P" },
+ { 461, "SA8540P" },
{ 480, "SM8450" },
};

diff --git a/drivers/soc/renesas/r8a779a0-sysc.c b/drivers/soc/renesas/r8a779a0-sysc.c
index fdfc857df334..04f1bc322ae7 100644
--- a/drivers/soc/renesas/r8a779a0-sysc.c
+++ b/drivers/soc/renesas/r8a779a0-sysc.c
@@ -57,11 +57,11 @@ static struct rcar_gen4_sysc_area r8a779a0_areas[] __initdata = {
{ "a2cv6", R8A779A0_PD_A2CV6, R8A779A0_PD_A3IR },
{ "a2cn2", R8A779A0_PD_A2CN2, R8A779A0_PD_A3IR },
{ "a2imp23", R8A779A0_PD_A2IMP23, R8A779A0_PD_A3IR },
- { "a2dp1", R8A779A0_PD_A2DP0, R8A779A0_PD_A3IR },
- { "a2cv2", R8A779A0_PD_A2CV0, R8A779A0_PD_A3IR },
- { "a2cv3", R8A779A0_PD_A2CV1, R8A779A0_PD_A3IR },
- { "a2cv5", R8A779A0_PD_A2CV4, R8A779A0_PD_A3IR },
- { "a2cv7", R8A779A0_PD_A2CV6, R8A779A0_PD_A3IR },
+ { "a2dp1", R8A779A0_PD_A2DP1, R8A779A0_PD_A3IR },
+ { "a2cv2", R8A779A0_PD_A2CV2, R8A779A0_PD_A3IR },
+ { "a2cv3", R8A779A0_PD_A2CV3, R8A779A0_PD_A3IR },
+ { "a2cv5", R8A779A0_PD_A2CV5, R8A779A0_PD_A3IR },
+ { "a2cv7", R8A779A0_PD_A2CV7, R8A779A0_PD_A3IR },
{ "a2cn1", R8A779A0_PD_A2CN1, R8A779A0_PD_A3IR },
{ "a1cnn0", R8A779A0_PD_A1CNN0, R8A779A0_PD_A2CN0 },
{ "a1cnn2", R8A779A0_PD_A1CNN2, R8A779A0_PD_A2CN2 },
diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
index 354d3f89366f..f1cf78c6d477 100644
--- a/drivers/soundwire/bus.c
+++ b/drivers/soundwire/bus.c
@@ -7,6 +7,7 @@
#include <linux/pm_runtime.h>
#include <linux/soundwire/sdw_registers.h>
#include <linux/soundwire/sdw.h>
+#include <linux/soundwire/sdw_type.h>
#include "bus.h"
#include "sysfs_local.h"

@@ -846,15 +847,21 @@ static int sdw_slave_clk_stop_callback(struct sdw_slave *slave,
enum sdw_clk_stop_mode mode,
enum sdw_clk_stop_type type)
{
- int ret;
+ int ret = 0;

- if (slave->ops && slave->ops->clk_stop) {
- ret = slave->ops->clk_stop(slave, mode, type);
- if (ret < 0)
- return ret;
+ mutex_lock(&slave->sdw_dev_lock);
+
+ if (slave->probed) {
+ struct device *dev = &slave->dev;
+ struct sdw_driver *drv = drv_to_sdw_driver(dev->driver);
+
+ if (drv->ops && drv->ops->clk_stop)
+ ret = drv->ops->clk_stop(slave, mode, type);
}

- return 0;
+ mutex_unlock(&slave->sdw_dev_lock);
+
+ return ret;
}

static int sdw_slave_clk_stop_prepare(struct sdw_slave *slave,
@@ -1616,14 +1623,24 @@ static int sdw_handle_slave_alerts(struct sdw_slave *slave)
}

/* Update the Slave driver */
- if (slave_notify && slave->ops &&
- slave->ops->interrupt_callback) {
- slave_intr.sdca_cascade = sdca_cascade;
- slave_intr.control_port = clear;
- memcpy(slave_intr.port, &port_status,
- sizeof(slave_intr.port));
-
- slave->ops->interrupt_callback(slave, &slave_intr);
+ if (slave_notify) {
+ mutex_lock(&slave->sdw_dev_lock);
+
+ if (slave->probed) {
+ struct device *dev = &slave->dev;
+ struct sdw_driver *drv = drv_to_sdw_driver(dev->driver);
+
+ if (drv->ops && drv->ops->interrupt_callback) {
+ slave_intr.sdca_cascade = sdca_cascade;
+ slave_intr.control_port = clear;
+ memcpy(slave_intr.port, &port_status,
+ sizeof(slave_intr.port));
+
+ drv->ops->interrupt_callback(slave, &slave_intr);
+ }
+ }
+
+ mutex_unlock(&slave->sdw_dev_lock);
}

/* Ack interrupt */
@@ -1697,29 +1714,21 @@ static int sdw_handle_slave_alerts(struct sdw_slave *slave)
static int sdw_update_slave_status(struct sdw_slave *slave,
enum sdw_slave_status status)
{
- unsigned long time;
+ int ret = 0;

- if (!slave->probed) {
- /*
- * the slave status update is typically handled in an
- * interrupt thread, which can race with the driver
- * probe, e.g. when a module needs to be loaded.
- *
- * make sure the probe is complete before updating
- * status.
- */
- time = wait_for_completion_timeout(&slave->probe_complete,
- msecs_to_jiffies(DEFAULT_PROBE_TIMEOUT));
- if (!time) {
- dev_err(&slave->dev, "Probe not complete, timed out\n");
- return -ETIMEDOUT;
- }
+ mutex_lock(&slave->sdw_dev_lock);
+
+ if (slave->probed) {
+ struct device *dev = &slave->dev;
+ struct sdw_driver *drv = drv_to_sdw_driver(dev->driver);
+
+ if (drv->ops && drv->ops->update_status)
+ ret = drv->ops->update_status(slave, status);
}

- if (!slave->ops || !slave->ops->update_status)
- return 0;
+ mutex_unlock(&slave->sdw_dev_lock);

- return slave->ops->update_status(slave, status);
+ return ret;
}

/**
diff --git a/drivers/soundwire/bus_type.c b/drivers/soundwire/bus_type.c
index 893296f3fe39..04b3529f8929 100644
--- a/drivers/soundwire/bus_type.c
+++ b/drivers/soundwire/bus_type.c
@@ -98,8 +98,6 @@ static int sdw_drv_probe(struct device *dev)
if (!id)
return -ENODEV;

- slave->ops = drv->ops;
-
/*
* attach to power domain but don't turn on (last arg)
*/
@@ -107,19 +105,23 @@ static int sdw_drv_probe(struct device *dev)
if (ret)
return ret;

+ mutex_lock(&slave->sdw_dev_lock);
+
ret = drv->probe(slave, id);
if (ret) {
name = drv->name;
if (!name)
name = drv->driver.name;
+ mutex_unlock(&slave->sdw_dev_lock);
+
dev_err(dev, "Probe of %s failed: %d\n", name, ret);
dev_pm_domain_detach(dev, false);
return ret;
}

/* device is probed so let's read the properties now */
- if (slave->ops && slave->ops->read_prop)
- slave->ops->read_prop(slave);
+ if (drv->ops && drv->ops->read_prop)
+ drv->ops->read_prop(slave);

/* init the sysfs as we have properties now */
ret = sdw_slave_sysfs_init(slave);
@@ -139,7 +141,19 @@ static int sdw_drv_probe(struct device *dev)
slave->prop.clk_stop_timeout);

slave->probed = true;
- complete(&slave->probe_complete);
+
+ /*
+ * if the probe happened after the bus was started, notify the codec driver
+ * of the current hardware status to e.g. start the initialization.
+ * Errors are only logged as warnings to avoid failing the probe.
+ */
+ if (drv->ops && drv->ops->update_status) {
+ ret = drv->ops->update_status(slave, slave->status);
+ if (ret < 0)
+ dev_warn(dev, "%s: update_status failed with status %d\n", __func__, ret);
+ }
+
+ mutex_unlock(&slave->sdw_dev_lock);

dev_dbg(dev, "probe complete\n");

@@ -152,9 +166,15 @@ static int sdw_drv_remove(struct device *dev)
struct sdw_driver *drv = drv_to_sdw_driver(dev->driver);
int ret = 0;

+ mutex_lock(&slave->sdw_dev_lock);
+
+ slave->probed = false;
+
if (drv->remove)
ret = drv->remove(slave);

+ mutex_unlock(&slave->sdw_dev_lock);
+
dev_pm_domain_detach(dev, false);

return ret;
@@ -193,12 +213,8 @@ int __sdw_register_driver(struct sdw_driver *drv, struct module *owner)

drv->driver.owner = owner;
drv->driver.probe = sdw_drv_probe;
-
- if (drv->remove)
- drv->driver.remove = sdw_drv_remove;
-
- if (drv->shutdown)
- drv->driver.shutdown = sdw_drv_shutdown;
+ drv->driver.remove = sdw_drv_remove;
+ drv->driver.shutdown = sdw_drv_shutdown;

return driver_register(&drv->driver);
}
diff --git a/drivers/soundwire/qcom.c b/drivers/soundwire/qcom.c
index 7b8ef45abee4..eea9d0f338f5 100644
--- a/drivers/soundwire/qcom.c
+++ b/drivers/soundwire/qcom.c
@@ -471,6 +471,10 @@ static int qcom_swrm_enumerate(struct sdw_bus *bus)
char *buf1 = (char *)&val1, *buf2 = (char *)&val2;

for (i = 1; i <= SDW_MAX_DEVICES; i++) {
+ /* do not continue if the status is Not Present */
+ if (!ctrl->status[i])
+ continue;
+
/*SCP_Devid5 - Devid 4*/
ctrl->reg_read(ctrl, SWRM_ENUMERATOR_SLAVE_DEV_ID_1(i), &val1);

diff --git a/drivers/soundwire/slave.c b/drivers/soundwire/slave.c
index 669d7573320b..25e76b5d4a1a 100644
--- a/drivers/soundwire/slave.c
+++ b/drivers/soundwire/slave.c
@@ -12,6 +12,7 @@ static void sdw_slave_release(struct device *dev)
{
struct sdw_slave *slave = dev_to_sdw_dev(dev);

+ mutex_destroy(&slave->sdw_dev_lock);
kfree(slave);
}

@@ -58,9 +59,9 @@ int sdw_slave_add(struct sdw_bus *bus,
init_completion(&slave->enumeration_complete);
init_completion(&slave->initialization_complete);
slave->dev_num = 0;
- init_completion(&slave->probe_complete);
slave->probed = false;
slave->first_interrupt_done = false;
+ mutex_init(&slave->sdw_dev_lock);

for (i = 0; i < SDW_MAX_PORTS; i++)
init_completion(&slave->port_ready[i]);
diff --git a/drivers/soundwire/stream.c b/drivers/soundwire/stream.c
index f273459b2023..6963e5f7ea6d 100644
--- a/drivers/soundwire/stream.c
+++ b/drivers/soundwire/stream.c
@@ -13,6 +13,7 @@
#include <linux/slab.h>
#include <linux/soundwire/sdw_registers.h>
#include <linux/soundwire/sdw.h>
+#include <linux/soundwire/sdw_type.h>
#include <sound/soc.h>
#include "bus.h"

@@ -401,20 +402,26 @@ static int sdw_do_port_prep(struct sdw_slave_runtime *s_rt,
struct sdw_prepare_ch prep_ch,
enum sdw_port_prep_ops cmd)
{
- const struct sdw_slave_ops *ops = s_rt->slave->ops;
- int ret;
+ int ret = 0;
+ struct sdw_slave *slave = s_rt->slave;

- if (ops->port_prep) {
- ret = ops->port_prep(s_rt->slave, &prep_ch, cmd);
- if (ret < 0) {
- dev_err(&s_rt->slave->dev,
- "Slave Port Prep cmd %d failed: %d\n",
- cmd, ret);
- return ret;
+ mutex_lock(&slave->sdw_dev_lock);
+
+ if (slave->probed) {
+ struct device *dev = &slave->dev;
+ struct sdw_driver *drv = drv_to_sdw_driver(dev->driver);
+
+ if (drv->ops && drv->ops->port_prep) {
+ ret = drv->ops->port_prep(slave, &prep_ch, cmd);
+ if (ret < 0)
+ dev_err(dev, "Slave Port Prep cmd %d failed: %d\n",
+ cmd, ret);
}
}

- return 0;
+ mutex_unlock(&slave->sdw_dev_lock);
+
+ return ret;
}

static int sdw_prep_deprep_slave_ports(struct sdw_bus *bus,
@@ -578,7 +585,7 @@ static int sdw_notify_config(struct sdw_master_runtime *m_rt)
struct sdw_slave_runtime *s_rt;
struct sdw_bus *bus = m_rt->bus;
struct sdw_slave *slave;
- int ret = 0;
+ int ret;

if (bus->ops->set_bus_conf) {
ret = bus->ops->set_bus_conf(bus, &bus->params);
@@ -589,17 +596,27 @@ static int sdw_notify_config(struct sdw_master_runtime *m_rt)
list_for_each_entry(s_rt, &m_rt->slave_rt_list, m_rt_node) {
slave = s_rt->slave;

- if (slave->ops->bus_config) {
- ret = slave->ops->bus_config(slave, &bus->params);
- if (ret < 0) {
- dev_err(bus->dev, "Notify Slave: %d failed\n",
- slave->dev_num);
- return ret;
+ mutex_lock(&slave->sdw_dev_lock);
+
+ if (slave->probed) {
+ struct device *dev = &slave->dev;
+ struct sdw_driver *drv = drv_to_sdw_driver(dev->driver);
+
+ if (drv->ops && drv->ops->bus_config) {
+ ret = drv->ops->bus_config(slave, &bus->params);
+ if (ret < 0) {
+ dev_err(dev, "Notify Slave: %d failed\n",
+ slave->dev_num);
+ mutex_unlock(&slave->sdw_dev_lock);
+ return ret;
+ }
}
}
+
+ mutex_unlock(&slave->sdw_dev_lock);
}

- return ret;
+ return 0;
}

/**
diff --git a/drivers/spi/spi-altera-dfl.c b/drivers/spi/spi-altera-dfl.c
index ca40923258af..596e181ae136 100644
--- a/drivers/spi/spi-altera-dfl.c
+++ b/drivers/spi/spi-altera-dfl.c
@@ -128,9 +128,9 @@ static int dfl_spi_altera_probe(struct dfl_device *dfl_dev)
struct spi_master *master;
struct altera_spi *hw;
void __iomem *base;
- int err = -ENODEV;
+ int err;

- master = spi_alloc_master(dev, sizeof(struct altera_spi));
+ master = devm_spi_alloc_master(dev, sizeof(struct altera_spi));
if (!master)
return -ENOMEM;

@@ -159,10 +159,9 @@ static int dfl_spi_altera_probe(struct dfl_device *dfl_dev)
altera_spi_init_master(master);

err = devm_spi_register_master(dev, master);
- if (err) {
- dev_err(dev, "%s failed to register spi master %d\n", __func__, err);
- goto exit;
- }
+ if (err)
+ return dev_err_probe(dev, err, "%s failed to register spi master\n",
+ __func__);

if (dfl_dev->revision == FME_FEATURE_REV_MAX10_SPI_N5010)
strscpy(board_info.modalias, "m10-n5010", SPI_NAME_SIZE);
@@ -179,9 +178,6 @@ static int dfl_spi_altera_probe(struct dfl_device *dfl_dev)
}

return 0;
-exit:
- spi_master_put(master);
- return err;
}

static const struct dfl_device_id dfl_spi_altera_ids[] = {
diff --git a/drivers/spi/spi-dw.h b/drivers/spi/spi-dw.h
index d5ee5130601e..79d853f6d192 100644
--- a/drivers/spi/spi-dw.h
+++ b/drivers/spi/spi-dw.h
@@ -23,7 +23,7 @@
((_dws)->ip == DW_ ## _ip ## _ID)

#define __dw_spi_ver_cmp(_dws, _ip, _ver, _op) \
- (dw_spi_ip_is(_dws, _ip) && (_dws)->ver _op DW_ ## _ip ## _ver)
+ (dw_spi_ip_is(_dws, _ip) && (_dws)->ver _op DW_ ## _ip ## _ ## _ver)

#define dw_spi_ver_is(_dws, _ip, _ver) __dw_spi_ver_cmp(_dws, _ip, _ver, ==)

diff --git a/drivers/spi/spi-rspi.c b/drivers/spi/spi-rspi.c
index 7a014eeec2d0..411b1307b7fd 100644
--- a/drivers/spi/spi-rspi.c
+++ b/drivers/spi/spi-rspi.c
@@ -613,6 +613,10 @@ static int rspi_dma_transfer(struct rspi_data *rspi, struct sg_table *tx,
rspi->dma_callbacked, HZ);
if (ret > 0 && rspi->dma_callbacked) {
ret = 0;
+ if (tx)
+ dmaengine_synchronize(rspi->ctlr->dma_tx);
+ if (rx)
+ dmaengine_synchronize(rspi->ctlr->dma_rx);
} else {
if (!ret) {
dev_err(&rspi->ctlr->dev, "DMA timeout\n");
diff --git a/drivers/spi/spi-s3c64xx.c b/drivers/spi/spi-s3c64xx.c
index c26440e9058d..8fa21afc6a35 100644
--- a/drivers/spi/spi-s3c64xx.c
+++ b/drivers/spi/spi-s3c64xx.c
@@ -1413,7 +1413,7 @@ static const struct s3c64xx_spi_port_config exynos5433_spi_port_config = {
.quirks = S3C64XX_SPI_QUIRK_CS_AUTO,
};

-static struct s3c64xx_spi_port_config fsd_spi_port_config = {
+static const struct s3c64xx_spi_port_config fsd_spi_port_config = {
.fifo_lvl_mask = { 0x7f, 0x7f, 0x7f, 0x7f, 0x7f},
.rx_lvl_offset = 15,
.tx_st_done = 25,
diff --git a/drivers/spi/spi-synquacer.c b/drivers/spi/spi-synquacer.c
index ea706d9629cb..47cbe73137c2 100644
--- a/drivers/spi/spi-synquacer.c
+++ b/drivers/spi/spi-synquacer.c
@@ -783,6 +783,7 @@ static int __maybe_unused synquacer_spi_resume(struct device *dev)

ret = synquacer_spi_enable(master);
if (ret) {
+ clk_disable_unprepare(sspi->clk);
dev_err(dev, "failed to enable spi (%d)\n", ret);
return ret;
}
diff --git a/drivers/spi/spi-tegra20-slink.c b/drivers/spi/spi-tegra20-slink.c
index 80c3787deea9..c91c226be69e 100644
--- a/drivers/spi/spi-tegra20-slink.c
+++ b/drivers/spi/spi-tegra20-slink.c
@@ -1137,7 +1137,7 @@ static int tegra_slink_probe(struct platform_device *pdev)

static int tegra_slink_remove(struct platform_device *pdev)
{
- struct spi_master *master = platform_get_drvdata(pdev);
+ struct spi_master *master = spi_master_get(platform_get_drvdata(pdev));
struct tegra_slink_data *tspi = spi_master_get_devdata(master);

spi_unregister_master(master);
@@ -1152,6 +1152,7 @@ static int tegra_slink_remove(struct platform_device *pdev)
if (tspi->rx_dma_chan)
tegra_slink_deinit_dma_param(tspi, true);

+ spi_master_put(master);
return 0;
}

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 2e6d6bbeb784..6a8850e62078 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -2416,7 +2416,7 @@ static int acpi_spi_add_resource(struct acpi_resource *ares, void *data)

ctlr = acpi_spi_find_controller_by_adev(adev);
if (!ctlr)
- return -ENODEV;
+ return -EPROBE_DEFER;

lookup->ctlr = ctlr;
}
@@ -3068,9 +3068,9 @@ int spi_register_controller(struct spi_controller *ctlr)
}
EXPORT_SYMBOL_GPL(spi_register_controller);

-static void devm_spi_unregister(void *ctlr)
+static void devm_spi_unregister(struct device *dev, void *res)
{
- spi_unregister_controller(ctlr);
+ spi_unregister_controller(*(struct spi_controller **)res);
}

/**
@@ -3089,13 +3089,22 @@ static void devm_spi_unregister(void *ctlr)
int devm_spi_register_controller(struct device *dev,
struct spi_controller *ctlr)
{
+ struct spi_controller **ptr;
int ret;

+ ptr = devres_alloc(devm_spi_unregister, sizeof(*ptr), GFP_KERNEL);
+ if (!ptr)
+ return -ENOMEM;
+
ret = spi_register_controller(ctlr);
- if (ret)
- return ret;
+ if (!ret) {
+ *ptr = ctlr;
+ devres_add(dev, ptr);
+ } else {
+ devres_free(ptr);
+ }

- return devm_add_action_or_reset(dev, devm_spi_unregister, ctlr);
+ return ret;
}
EXPORT_SYMBOL_GPL(devm_spi_register_controller);

diff --git a/drivers/staging/fbtft/fbtft-core.c b/drivers/staging/fbtft/fbtft-core.c
index 9c4d797e7ae4..4137c1a51e1b 100644
--- a/drivers/staging/fbtft/fbtft-core.c
+++ b/drivers/staging/fbtft/fbtft-core.c
@@ -656,7 +656,6 @@ struct fb_info *fbtft_framebuffer_alloc(struct fbtft_display *display,
fbdefio->delay = HZ / fps;
fbdefio->sort_pagelist = true;
fbdefio->deferred_io = fbtft_deferred_io;
- fb_deferred_io_init(info);

snprintf(info->fix.id, sizeof(info->fix.id), "%s", dev->driver->name);
info->fix.type = FB_TYPE_PACKED_PIXELS;
@@ -667,6 +666,7 @@ struct fb_info *fbtft_framebuffer_alloc(struct fbtft_display *display,
info->fix.line_length = width * bpp / 8;
info->fix.accel = FB_ACCEL_NONE;
info->fix.smem_len = vmem_size;
+ fb_deferred_io_init(info);

info->var.rotate = pdata->rotate;
info->var.xres = width;
diff --git a/drivers/staging/media/atomisp/pci/atomisp_cmd.c b/drivers/staging/media/atomisp/pci/atomisp_cmd.c
index 97d5a528969b..0da0b69a4637 100644
--- a/drivers/staging/media/atomisp/pci/atomisp_cmd.c
+++ b/drivers/staging/media/atomisp/pci/atomisp_cmd.c
@@ -901,9 +901,9 @@ void atomisp_buf_done(struct atomisp_sub_device *asd, int error,
int err;
unsigned long irqflags;
struct ia_css_frame *frame = NULL;
- struct atomisp_s3a_buf *s3a_buf = NULL, *_s3a_buf_tmp;
- struct atomisp_dis_buf *dis_buf = NULL, *_dis_buf_tmp;
- struct atomisp_metadata_buf *md_buf = NULL, *_md_buf_tmp;
+ struct atomisp_s3a_buf *s3a_buf = NULL, *_s3a_buf_tmp, *s3a_iter;
+ struct atomisp_dis_buf *dis_buf = NULL, *_dis_buf_tmp, *dis_iter;
+ struct atomisp_metadata_buf *md_buf = NULL, *_md_buf_tmp, *md_iter;
enum atomisp_metadata_type md_type;
struct atomisp_device *isp = asd->isp;
struct v4l2_control ctrl;
@@ -942,60 +942,75 @@ void atomisp_buf_done(struct atomisp_sub_device *asd, int error,

switch (buf_type) {
case IA_CSS_BUFFER_TYPE_3A_STATISTICS:
- list_for_each_entry_safe(s3a_buf, _s3a_buf_tmp,
+ list_for_each_entry_safe(s3a_iter, _s3a_buf_tmp,
&asd->s3a_stats_in_css, list) {
- if (s3a_buf->s3a_data ==
+ if (s3a_iter->s3a_data ==
buffer.css_buffer.data.stats_3a) {
- list_del_init(&s3a_buf->list);
- list_add_tail(&s3a_buf->list,
+ list_del_init(&s3a_iter->list);
+ list_add_tail(&s3a_iter->list,
&asd->s3a_stats_ready);
+ s3a_buf = s3a_iter;
break;
}
}

asd->s3a_bufs_in_css[css_pipe_id]--;
atomisp_3a_stats_ready_event(asd, buffer.css_buffer.exp_id);
- dev_dbg(isp->dev, "%s: s3a stat with exp_id %d is ready\n",
- __func__, s3a_buf->s3a_data->exp_id);
+ if (s3a_buf)
+ dev_dbg(isp->dev, "%s: s3a stat with exp_id %d is ready\n",
+ __func__, s3a_buf->s3a_data->exp_id);
+ else
+ dev_dbg(isp->dev, "%s: s3a stat is ready with no exp_id found\n",
+ __func__);
break;
case IA_CSS_BUFFER_TYPE_METADATA:
if (error)
break;

md_type = atomisp_get_metadata_type(asd, css_pipe_id);
- list_for_each_entry_safe(md_buf, _md_buf_tmp,
+ list_for_each_entry_safe(md_iter, _md_buf_tmp,
&asd->metadata_in_css[md_type], list) {
- if (md_buf->metadata ==
+ if (md_iter->metadata ==
buffer.css_buffer.data.metadata) {
- list_del_init(&md_buf->list);
- list_add_tail(&md_buf->list,
+ list_del_init(&md_iter->list);
+ list_add_tail(&md_iter->list,
&asd->metadata_ready[md_type]);
+ md_buf = md_iter;
break;
}
}
asd->metadata_bufs_in_css[stream_id][css_pipe_id]--;
atomisp_metadata_ready_event(asd, md_type);
- dev_dbg(isp->dev, "%s: metadata with exp_id %d is ready\n",
- __func__, md_buf->metadata->exp_id);
+ if (md_buf)
+ dev_dbg(isp->dev, "%s: metadata with exp_id %d is ready\n",
+ __func__, md_buf->metadata->exp_id);
+ else
+ dev_dbg(isp->dev, "%s: metadata is ready with no exp_id found\n",
+ __func__);
break;
case IA_CSS_BUFFER_TYPE_DIS_STATISTICS:
- list_for_each_entry_safe(dis_buf, _dis_buf_tmp,
+ list_for_each_entry_safe(dis_iter, _dis_buf_tmp,
&asd->dis_stats_in_css, list) {
- if (dis_buf->dis_data ==
+ if (dis_iter->dis_data ==
buffer.css_buffer.data.stats_dvs) {
spin_lock_irqsave(&asd->dis_stats_lock,
irqflags);
- list_del_init(&dis_buf->list);
- list_add(&dis_buf->list, &asd->dis_stats);
+ list_del_init(&dis_iter->list);
+ list_add(&dis_iter->list, &asd->dis_stats);
asd->params.dis_proj_data_valid = true;
spin_unlock_irqrestore(&asd->dis_stats_lock,
irqflags);
+ dis_buf = dis_iter;
break;
}
}
asd->dis_bufs_in_css--;
- dev_dbg(isp->dev, "%s: dis stat with exp_id %d is ready\n",
- __func__, dis_buf->dis_data->exp_id);
+ if (dis_buf)
+ dev_dbg(isp->dev, "%s: dis stat with exp_id %d is ready\n",
+ __func__, dis_buf->dis_data->exp_id);
+ else
+ dev_dbg(isp->dev, "%s: dis stat is ready with no exp_id found\n",
+ __func__);
break;
case IA_CSS_BUFFER_TYPE_VF_OUTPUT_FRAME:
case IA_CSS_BUFFER_TYPE_SEC_VF_OUTPUT_FRAME:
diff --git a/drivers/staging/media/hantro/hantro_drv.c b/drivers/staging/media/hantro/hantro_drv.c
index bd7d11032c94..ac232b5f7825 100644
--- a/drivers/staging/media/hantro/hantro_drv.c
+++ b/drivers/staging/media/hantro/hantro_drv.c
@@ -638,6 +638,7 @@ static const struct of_device_id of_hantro_match[] = {
{ .compatible = "rockchip,rk3288-vpu", .data = &rk3288_vpu_variant, },
{ .compatible = "rockchip,rk3328-vpu", .data = &rk3328_vpu_variant, },
{ .compatible = "rockchip,rk3399-vpu", .data = &rk3399_vpu_variant, },
+ { .compatible = "rockchip,rk3568-vpu", .data = &rk3568_vpu_variant, },
#endif
#ifdef CONFIG_VIDEO_HANTRO_IMX8M
{ .compatible = "nxp,imx8mm-vpu-g1", .data = &imx8mm_vpu_g1_variant, },
diff --git a/drivers/staging/media/hantro/hantro_g2_hevc_dec.c b/drivers/staging/media/hantro/hantro_g2_hevc_dec.c
index 5f3178bac9c8..d28653d04d20 100644
--- a/drivers/staging/media/hantro/hantro_g2_hevc_dec.c
+++ b/drivers/staging/media/hantro/hantro_g2_hevc_dec.c
@@ -8,6 +8,20 @@
#include "hantro_hw.h"
#include "hantro_g2_regs.h"

+#define G2_ALIGN 16
+
+static size_t hantro_hevc_chroma_offset(struct hantro_ctx *ctx)
+{
+ return ctx->dst_fmt.width * ctx->dst_fmt.height;
+}
+
+static size_t hantro_hevc_motion_vectors_offset(struct hantro_ctx *ctx)
+{
+ size_t cr_offset = hantro_hevc_chroma_offset(ctx);
+
+ return ALIGN((cr_offset * 3) / 2, G2_ALIGN);
+}
+
static void prepare_tile_info_buffer(struct hantro_ctx *ctx)
{
struct hantro_dev *vpu = ctx->dev;
@@ -330,7 +344,6 @@ static void set_ref_pic_list(struct hantro_ctx *ctx)
static int set_ref(struct hantro_ctx *ctx)
{
const struct hantro_hevc_dec_ctrls *ctrls = &ctx->hevc_dec.ctrls;
- const struct v4l2_ctrl_hevc_sps *sps = ctrls->sps;
const struct v4l2_ctrl_hevc_pps *pps = ctrls->pps;
const struct v4l2_ctrl_hevc_decode_params *decode_params = ctrls->decode_params;
const struct v4l2_hevc_dpb_entry *dpb = decode_params->dpb;
@@ -338,8 +351,8 @@ static int set_ref(struct hantro_ctx *ctx)
struct hantro_dev *vpu = ctx->dev;
struct vb2_v4l2_buffer *vb2_dst;
struct hantro_decoded_buffer *dst;
- size_t cr_offset = hantro_hevc_chroma_offset(sps);
- size_t mv_offset = hantro_hevc_motion_vectors_offset(sps);
+ size_t cr_offset = hantro_hevc_chroma_offset(ctx);
+ size_t mv_offset = hantro_hevc_motion_vectors_offset(ctx);
u32 max_ref_frames;
u16 dpb_longterm_e;
static const struct hantro_reg cur_poc[] = {
@@ -377,11 +390,10 @@ static int set_ref(struct hantro_ctx *ctx)
!!(pps->flags & V4L2_HEVC_PPS_FLAG_LOOP_FILTER_ACROSS_TILES_ENABLED));

/*
- * Write POC count diff from current pic. For frame decoding only compute
- * pic_order_cnt[0] and ignore pic_order_cnt[1] used in field-coding.
+ * Write POC count diff from current pic.
*/
for (i = 0; i < decode_params->num_active_dpb_entries && i < ARRAY_SIZE(cur_poc); i++) {
- char poc_diff = decode_params->pic_order_cnt_val - dpb[i].pic_order_cnt[0];
+ char poc_diff = decode_params->pic_order_cnt_val - dpb[i].pic_order_cnt_val;

hantro_reg_write(vpu, &cur_poc[i], poc_diff);
}
@@ -401,14 +413,14 @@ static int set_ref(struct hantro_ctx *ctx)

set_ref_pic_list(ctx);

- /* We will only keep the references picture that are still used */
- ctx->hevc_dec.ref_bufs_used = 0;
+ /* We will only keep the reference pictures that are still used */
+ hantro_hevc_ref_init(ctx);

/* Set up addresses of DPB buffers */
dpb_longterm_e = 0;
for (i = 0; i < decode_params->num_active_dpb_entries &&
i < (V4L2_HEVC_DPB_ENTRIES_NUM_MAX - 1); i++) {
- luma_addr = hantro_hevc_get_ref_buf(ctx, dpb[i].pic_order_cnt[0]);
+ luma_addr = hantro_hevc_get_ref_buf(ctx, dpb[i].pic_order_cnt_val);
if (!luma_addr)
return -ENOMEM;

@@ -443,8 +455,6 @@ static int set_ref(struct hantro_ctx *ctx)
hantro_write_addr(vpu, G2_OUT_CHROMA_ADDR, chroma_addr);
hantro_write_addr(vpu, G2_OUT_MV_ADDR, mv_addr);

- hantro_hevc_ref_remove_unused(ctx);
-
for (; i < V4L2_HEVC_DPB_ENTRIES_NUM_MAX; i++) {
hantro_write_addr(vpu, G2_REF_LUMA_ADDR(i), 0);
hantro_write_addr(vpu, G2_REF_CHROMA_ADDR(i), 0);
diff --git a/drivers/staging/media/hantro/hantro_g2_regs.h b/drivers/staging/media/hantro/hantro_g2_regs.h
index b7c6f9877b9d..0f43516d0805 100644
--- a/drivers/staging/media/hantro/hantro_g2_regs.h
+++ b/drivers/staging/media/hantro/hantro_g2_regs.h
@@ -107,7 +107,7 @@

#define g2_start_code_e G2_DEC_REG(10, 31, 0x1)
#define g2_init_qp_old G2_DEC_REG(10, 25, 0x3f)
-#define g2_init_qp G2_DEC_REG(10, 24, 0x3f)
+#define g2_init_qp G2_DEC_REG(10, 24, 0x7f)
#define g2_num_tile_cols_old G2_DEC_REG(10, 20, 0x1f)
#define g2_num_tile_cols G2_DEC_REG(10, 19, 0x1f)
#define g2_num_tile_rows_old G2_DEC_REG(10, 15, 0x1f)
diff --git a/drivers/staging/media/hantro/hantro_hevc.c b/drivers/staging/media/hantro/hantro_hevc.c
index b49a41d7ae91..df1f81952bba 100644
--- a/drivers/staging/media/hantro/hantro_hevc.c
+++ b/drivers/staging/media/hantro/hantro_hevc.c
@@ -25,41 +25,20 @@
#define MAX_TILE_COLS 20
#define MAX_TILE_ROWS 22

-#define UNUSED_REF -1
-
-#define G2_ALIGN 16
-
-size_t hantro_hevc_chroma_offset(const struct v4l2_ctrl_hevc_sps *sps)
-{
- int bytes_per_pixel = sps->bit_depth_luma_minus8 == 0 ? 1 : 2;
-
- return sps->pic_width_in_luma_samples *
- sps->pic_height_in_luma_samples * bytes_per_pixel;
-}
-
-size_t hantro_hevc_motion_vectors_offset(const struct v4l2_ctrl_hevc_sps *sps)
-{
- size_t cr_offset = hantro_hevc_chroma_offset(sps);
-
- return ALIGN((cr_offset * 3) / 2, G2_ALIGN);
-}
-
-static void hantro_hevc_ref_init(struct hantro_ctx *ctx)
+void hantro_hevc_ref_init(struct hantro_ctx *ctx)
{
struct hantro_hevc_dec_hw_ctx *hevc_dec = &ctx->hevc_dec;
- int i;

- for (i = 0; i < NUM_REF_PICTURES; i++)
- hevc_dec->ref_bufs_poc[i] = UNUSED_REF;
+ hevc_dec->ref_bufs_used = 0;
}

dma_addr_t hantro_hevc_get_ref_buf(struct hantro_ctx *ctx,
- int poc)
+ s32 poc)
{
struct hantro_hevc_dec_hw_ctx *hevc_dec = &ctx->hevc_dec;
int i;

- /* Find the reference buffer in already know ones */
+ /* Find the reference buffer in already known ones */
for (i = 0; i < NUM_REF_PICTURES; i++) {
if (hevc_dec->ref_bufs_poc[i] == poc) {
hevc_dec->ref_bufs_used |= 1 << i;
@@ -77,7 +56,7 @@ int hantro_hevc_add_ref_buf(struct hantro_ctx *ctx, int poc, dma_addr_t addr)

/* Add a new reference buffer */
for (i = 0; i < NUM_REF_PICTURES; i++) {
- if (hevc_dec->ref_bufs_poc[i] == UNUSED_REF) {
+ if (!(hevc_dec->ref_bufs_used & 1 << i)) {
hevc_dec->ref_bufs_used |= 1 << i;
hevc_dec->ref_bufs_poc[i] = poc;
hevc_dec->ref_bufs[i].dma = addr;
@@ -88,23 +67,6 @@ int hantro_hevc_add_ref_buf(struct hantro_ctx *ctx, int poc, dma_addr_t addr)
return -EINVAL;
}

-void hantro_hevc_ref_remove_unused(struct hantro_ctx *ctx)
-{
- struct hantro_hevc_dec_hw_ctx *hevc_dec = &ctx->hevc_dec;
- int i;
-
- /* Just tag buffer as unused, do not free them */
- for (i = 0; i < NUM_REF_PICTURES; i++) {
- if (hevc_dec->ref_bufs_poc[i] == UNUSED_REF)
- continue;
-
- if (hevc_dec->ref_bufs_used & (1 << i))
- continue;
-
- hevc_dec->ref_bufs_poc[i] = UNUSED_REF;
- }
-}
-
static int tile_buffer_reallocate(struct hantro_ctx *ctx)
{
struct hantro_dev *vpu = ctx->dev;
@@ -192,6 +154,25 @@ static int tile_buffer_reallocate(struct hantro_ctx *ctx)
return -ENOMEM;
}

+static int hantro_hevc_validate_sps(struct hantro_ctx *ctx, const struct v4l2_ctrl_hevc_sps *sps)
+{
+ /*
+ * for tile pixel format check if the width and height match
+ * hardware constraints
+ */
+ if (ctx->vpu_dst_fmt->fourcc == V4L2_PIX_FMT_NV12_4L4) {
+ if (ctx->dst_fmt.width !=
+ ALIGN(sps->pic_width_in_luma_samples, ctx->vpu_dst_fmt->frmsize.step_width))
+ return -EINVAL;
+
+ if (ctx->dst_fmt.height !=
+ ALIGN(sps->pic_height_in_luma_samples, ctx->vpu_dst_fmt->frmsize.step_height))
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
int hantro_hevc_dec_prepare_run(struct hantro_ctx *ctx)
{
struct hantro_hevc_dec_hw_ctx *hevc_ctx = &ctx->hevc_dec;
@@ -215,6 +196,10 @@ int hantro_hevc_dec_prepare_run(struct hantro_ctx *ctx)
if (WARN_ON(!ctrls->sps))
return -EINVAL;

+ ret = hantro_hevc_validate_sps(ctx, ctrls->sps);
+ if (ret)
+ return ret;
+
ctrls->pps =
hantro_get_ctrl(ctx, V4L2_CID_MPEG_VIDEO_HEVC_PPS);
if (WARN_ON(!ctrls->pps))
diff --git a/drivers/staging/media/hantro/hantro_hw.h b/drivers/staging/media/hantro/hantro_hw.h
index ed018e293ba0..68c313864b06 100644
--- a/drivers/staging/media/hantro/hantro_hw.h
+++ b/drivers/staging/media/hantro/hantro_hw.h
@@ -18,9 +18,21 @@
#define DEC_8190_ALIGN_MASK 0x07U

#define MB_DIM 16
+#define TILE_MB_DIM 4
#define MB_WIDTH(w) DIV_ROUND_UP(w, MB_DIM)
#define MB_HEIGHT(h) DIV_ROUND_UP(h, MB_DIM)

+#define FMT_MIN_WIDTH 48
+#define FMT_MIN_HEIGHT 48
+#define FMT_HD_WIDTH 1280
+#define FMT_HD_HEIGHT 720
+#define FMT_FHD_WIDTH 1920
+#define FMT_FHD_HEIGHT 1088
+#define FMT_UHD_WIDTH 3840
+#define FMT_UHD_HEIGHT 2160
+#define FMT_4K_WIDTH 4096
+#define FMT_4K_HEIGHT 2304
+
#define NUM_REF_PICTURES (V4L2_HEVC_DPB_ENTRIES_NUM_MAX + 1)

struct hantro_dev;
@@ -131,7 +143,7 @@ struct hantro_hevc_dec_hw_ctx {
struct hantro_aux_buf tile_bsd;
struct hantro_aux_buf ref_bufs[NUM_REF_PICTURES];
struct hantro_aux_buf scaling_lists;
- int ref_bufs_poc[NUM_REF_PICTURES];
+ s32 ref_bufs_poc[NUM_REF_PICTURES];
u32 ref_bufs_used;
struct hantro_hevc_dec_ctrls ctrls;
unsigned int num_tile_cols_allocated;
@@ -300,6 +312,7 @@ extern const struct hantro_variant rk3066_vpu_variant;
extern const struct hantro_variant rk3288_vpu_variant;
extern const struct hantro_variant rk3328_vpu_variant;
extern const struct hantro_variant rk3399_vpu_variant;
+extern const struct hantro_variant rk3568_vpu_variant;
extern const struct hantro_variant sama5d4_vdec_variant;
extern const struct hantro_variant sunxi_vpu_variant;

@@ -337,11 +350,10 @@ int hantro_hevc_dec_init(struct hantro_ctx *ctx);
void hantro_hevc_dec_exit(struct hantro_ctx *ctx);
int hantro_g2_hevc_dec_run(struct hantro_ctx *ctx);
int hantro_hevc_dec_prepare_run(struct hantro_ctx *ctx);
-dma_addr_t hantro_hevc_get_ref_buf(struct hantro_ctx *ctx, int poc);
+void hantro_hevc_ref_init(struct hantro_ctx *ctx);
+dma_addr_t hantro_hevc_get_ref_buf(struct hantro_ctx *ctx, s32 poc);
int hantro_hevc_add_ref_buf(struct hantro_ctx *ctx, int poc, dma_addr_t addr);
-void hantro_hevc_ref_remove_unused(struct hantro_ctx *ctx);
-size_t hantro_hevc_chroma_offset(const struct v4l2_ctrl_hevc_sps *sps);
-size_t hantro_hevc_motion_vectors_offset(const struct v4l2_ctrl_hevc_sps *sps);
+

static inline unsigned short hantro_vp9_num_sbs(unsigned short dimension)
{
diff --git a/drivers/staging/media/hantro/hantro_v4l2.c b/drivers/staging/media/hantro/hantro_v4l2.c
index 71a6279750bf..93d0dcf69f4a 100644
--- a/drivers/staging/media/hantro/hantro_v4l2.c
+++ b/drivers/staging/media/hantro/hantro_v4l2.c
@@ -260,7 +260,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
} else if (ctx->is_encoder) {
vpu_fmt = ctx->vpu_dst_fmt;
} else {
- vpu_fmt = ctx->vpu_src_fmt;
+ vpu_fmt = fmt;
/*
* Width/height on the CAPTURE end of a decoder are ignored and
* replaced by the OUTPUT ones.
diff --git a/drivers/staging/media/hantro/imx8m_vpu_hw.c b/drivers/staging/media/hantro/imx8m_vpu_hw.c
index 9802508bade2..77f574fdfa77 100644
--- a/drivers/staging/media/hantro/imx8m_vpu_hw.c
+++ b/drivers/staging/media/hantro/imx8m_vpu_hw.c
@@ -83,6 +83,14 @@ static const struct hantro_fmt imx8m_vpu_postproc_fmts[] = {
.fourcc = V4L2_PIX_FMT_YUYV,
.codec_mode = HANTRO_MODE_NONE,
.postprocessed = true,
+ .frmsize = {
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_UHD_WIDTH,
+ .step_width = MB_DIM,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_UHD_HEIGHT,
+ .step_height = MB_DIM,
+ },
},
};

@@ -90,17 +98,25 @@ static const struct hantro_fmt imx8m_vpu_dec_fmts[] = {
{
.fourcc = V4L2_PIX_FMT_NV12,
.codec_mode = HANTRO_MODE_NONE,
+ .frmsize = {
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_UHD_WIDTH,
+ .step_width = MB_DIM,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_UHD_HEIGHT,
+ .step_height = MB_DIM,
+ },
},
{
.fourcc = V4L2_PIX_FMT_MPEG2_SLICE,
.codec_mode = HANTRO_MODE_MPEG2_DEC,
.max_depth = 2,
.frmsize = {
- .min_width = 48,
- .max_width = 1920,
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_FHD_WIDTH,
.step_width = MB_DIM,
- .min_height = 48,
- .max_height = 1088,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_FHD_HEIGHT,
.step_height = MB_DIM,
},
},
@@ -109,11 +125,11 @@ static const struct hantro_fmt imx8m_vpu_dec_fmts[] = {
.codec_mode = HANTRO_MODE_VP8_DEC,
.max_depth = 2,
.frmsize = {
- .min_width = 48,
- .max_width = 3840,
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_UHD_WIDTH,
.step_width = MB_DIM,
- .min_height = 48,
- .max_height = 2160,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_UHD_HEIGHT,
.step_height = MB_DIM,
},
},
@@ -122,11 +138,11 @@ static const struct hantro_fmt imx8m_vpu_dec_fmts[] = {
.codec_mode = HANTRO_MODE_H264_DEC,
.max_depth = 2,
.frmsize = {
- .min_width = 48,
- .max_width = 3840,
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_UHD_WIDTH,
.step_width = MB_DIM,
- .min_height = 48,
- .max_height = 2160,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_UHD_HEIGHT,
.step_height = MB_DIM,
},
},
@@ -137,6 +153,14 @@ static const struct hantro_fmt imx8m_vpu_g2_postproc_fmts[] = {
.fourcc = V4L2_PIX_FMT_NV12,
.codec_mode = HANTRO_MODE_NONE,
.postprocessed = true,
+ .frmsize = {
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_UHD_WIDTH,
+ .step_width = MB_DIM,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_UHD_HEIGHT,
+ .step_height = MB_DIM,
+ },
},
};

@@ -144,18 +168,26 @@ static const struct hantro_fmt imx8m_vpu_g2_dec_fmts[] = {
{
.fourcc = V4L2_PIX_FMT_NV12_4L4,
.codec_mode = HANTRO_MODE_NONE,
+ .frmsize = {
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_UHD_WIDTH,
+ .step_width = TILE_MB_DIM,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_UHD_HEIGHT,
+ .step_height = TILE_MB_DIM,
+ },
},
{
.fourcc = V4L2_PIX_FMT_HEVC_SLICE,
.codec_mode = HANTRO_MODE_HEVC_DEC,
.max_depth = 2,
.frmsize = {
- .min_width = 48,
- .max_width = 3840,
- .step_width = MB_DIM,
- .min_height = 48,
- .max_height = 2160,
- .step_height = MB_DIM,
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_UHD_WIDTH,
+ .step_width = TILE_MB_DIM,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_UHD_HEIGHT,
+ .step_height = TILE_MB_DIM,
},
},
{
@@ -163,12 +195,12 @@ static const struct hantro_fmt imx8m_vpu_g2_dec_fmts[] = {
.codec_mode = HANTRO_MODE_VP9_DEC,
.max_depth = 2,
.frmsize = {
- .min_width = 48,
- .max_width = 3840,
- .step_width = MB_DIM,
- .min_height = 48,
- .max_height = 2160,
- .step_height = MB_DIM,
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_UHD_WIDTH,
+ .step_width = TILE_MB_DIM,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_UHD_HEIGHT,
+ .step_height = TILE_MB_DIM,
},
},
};
diff --git a/drivers/staging/media/hantro/rockchip_vpu_hw.c b/drivers/staging/media/hantro/rockchip_vpu_hw.c
index 163cf92eafca..26e16b5a6a70 100644
--- a/drivers/staging/media/hantro/rockchip_vpu_hw.c
+++ b/drivers/staging/media/hantro/rockchip_vpu_hw.c
@@ -63,6 +63,14 @@ static const struct hantro_fmt rockchip_vpu1_postproc_fmts[] = {
.fourcc = V4L2_PIX_FMT_YUYV,
.codec_mode = HANTRO_MODE_NONE,
.postprocessed = true,
+ .frmsize = {
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_FHD_WIDTH,
+ .step_width = MB_DIM,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_FHD_HEIGHT,
+ .step_height = MB_DIM,
+ },
},
};

@@ -70,17 +78,25 @@ static const struct hantro_fmt rk3066_vpu_dec_fmts[] = {
{
.fourcc = V4L2_PIX_FMT_NV12,
.codec_mode = HANTRO_MODE_NONE,
+ .frmsize = {
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_FHD_WIDTH,
+ .step_width = MB_DIM,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_FHD_HEIGHT,
+ .step_height = MB_DIM,
+ },
},
{
.fourcc = V4L2_PIX_FMT_H264_SLICE,
.codec_mode = HANTRO_MODE_H264_DEC,
.max_depth = 2,
.frmsize = {
- .min_width = 48,
- .max_width = 1920,
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_FHD_WIDTH,
.step_width = MB_DIM,
- .min_height = 48,
- .max_height = 1088,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_FHD_HEIGHT,
.step_height = MB_DIM,
},
},
@@ -89,11 +105,11 @@ static const struct hantro_fmt rk3066_vpu_dec_fmts[] = {
.codec_mode = HANTRO_MODE_MPEG2_DEC,
.max_depth = 2,
.frmsize = {
- .min_width = 48,
- .max_width = 1920,
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_FHD_WIDTH,
.step_width = MB_DIM,
- .min_height = 48,
- .max_height = 1088,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_FHD_HEIGHT,
.step_height = MB_DIM,
},
},
@@ -102,11 +118,11 @@ static const struct hantro_fmt rk3066_vpu_dec_fmts[] = {
.codec_mode = HANTRO_MODE_VP8_DEC,
.max_depth = 2,
.frmsize = {
- .min_width = 48,
- .max_width = 1920,
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_FHD_WIDTH,
.step_width = MB_DIM,
- .min_height = 48,
- .max_height = 1088,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_FHD_HEIGHT,
.step_height = MB_DIM,
},
},
@@ -116,17 +132,25 @@ static const struct hantro_fmt rk3288_vpu_dec_fmts[] = {
{
.fourcc = V4L2_PIX_FMT_NV12,
.codec_mode = HANTRO_MODE_NONE,
+ .frmsize = {
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_4K_WIDTH,
+ .step_width = MB_DIM,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_4K_HEIGHT,
+ .step_height = MB_DIM,
+ },
},
{
.fourcc = V4L2_PIX_FMT_H264_SLICE,
.codec_mode = HANTRO_MODE_H264_DEC,
.max_depth = 2,
.frmsize = {
- .min_width = 48,
- .max_width = 4096,
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_4K_WIDTH,
.step_width = MB_DIM,
- .min_height = 48,
- .max_height = 2304,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_4K_HEIGHT,
.step_height = MB_DIM,
},
},
@@ -135,11 +159,11 @@ static const struct hantro_fmt rk3288_vpu_dec_fmts[] = {
.codec_mode = HANTRO_MODE_MPEG2_DEC,
.max_depth = 2,
.frmsize = {
- .min_width = 48,
- .max_width = 1920,
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_FHD_WIDTH,
.step_width = MB_DIM,
- .min_height = 48,
- .max_height = 1088,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_FHD_HEIGHT,
.step_height = MB_DIM,
},
},
@@ -148,31 +172,80 @@ static const struct hantro_fmt rk3288_vpu_dec_fmts[] = {
.codec_mode = HANTRO_MODE_VP8_DEC,
.max_depth = 2,
.frmsize = {
- .min_width = 48,
- .max_width = 3840,
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_UHD_WIDTH,
.step_width = MB_DIM,
- .min_height = 48,
- .max_height = 2160,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_UHD_HEIGHT,
.step_height = MB_DIM,
},
},
};

-static const struct hantro_fmt rk3399_vpu_dec_fmts[] = {
+static const struct hantro_fmt rockchip_vdpu2_dec_fmts[] = {
{
.fourcc = V4L2_PIX_FMT_NV12,
.codec_mode = HANTRO_MODE_NONE,
+ .frmsize = {
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_FHD_WIDTH,
+ .step_width = MB_DIM,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_FHD_HEIGHT,
+ .step_height = MB_DIM,
+ },
},
{
.fourcc = V4L2_PIX_FMT_H264_SLICE,
.codec_mode = HANTRO_MODE_H264_DEC,
.max_depth = 2,
.frmsize = {
- .min_width = 48,
- .max_width = 1920,
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_FHD_WIDTH,
+ .step_width = MB_DIM,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_FHD_HEIGHT,
+ .step_height = MB_DIM,
+ },
+ },
+ {
+ .fourcc = V4L2_PIX_FMT_MPEG2_SLICE,
+ .codec_mode = HANTRO_MODE_MPEG2_DEC,
+ .max_depth = 2,
+ .frmsize = {
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_FHD_WIDTH,
+ .step_width = MB_DIM,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_FHD_HEIGHT,
+ .step_height = MB_DIM,
+ },
+ },
+ {
+ .fourcc = V4L2_PIX_FMT_VP8_FRAME,
+ .codec_mode = HANTRO_MODE_VP8_DEC,
+ .max_depth = 2,
+ .frmsize = {
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_UHD_WIDTH,
+ .step_width = MB_DIM,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_UHD_HEIGHT,
+ .step_height = MB_DIM,
+ },
+ },
+};
+
+static const struct hantro_fmt rk3399_vpu_dec_fmts[] = {
+ {
+ .fourcc = V4L2_PIX_FMT_NV12,
+ .codec_mode = HANTRO_MODE_NONE,
+ .frmsize = {
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_FHD_WIDTH,
.step_width = MB_DIM,
- .min_height = 48,
- .max_height = 1088,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_FHD_HEIGHT,
.step_height = MB_DIM,
},
},
@@ -181,11 +254,11 @@ static const struct hantro_fmt rk3399_vpu_dec_fmts[] = {
.codec_mode = HANTRO_MODE_MPEG2_DEC,
.max_depth = 2,
.frmsize = {
- .min_width = 48,
- .max_width = 1920,
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_FHD_WIDTH,
.step_width = MB_DIM,
- .min_height = 48,
- .max_height = 1088,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_FHD_HEIGHT,
.step_height = MB_DIM,
},
},
@@ -194,11 +267,11 @@ static const struct hantro_fmt rk3399_vpu_dec_fmts[] = {
.codec_mode = HANTRO_MODE_VP8_DEC,
.max_depth = 2,
.frmsize = {
- .min_width = 48,
- .max_width = 3840,
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_UHD_WIDTH,
.step_width = MB_DIM,
- .min_height = 48,
- .max_height = 2160,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_UHD_HEIGHT,
.step_height = MB_DIM,
},
},
@@ -516,8 +589,8 @@ const struct hantro_variant rk3288_vpu_variant = {

const struct hantro_variant rk3328_vpu_variant = {
.dec_offset = 0x400,
- .dec_fmts = rk3399_vpu_dec_fmts,
- .num_dec_fmts = ARRAY_SIZE(rk3399_vpu_dec_fmts),
+ .dec_fmts = rockchip_vdpu2_dec_fmts,
+ .num_dec_fmts = ARRAY_SIZE(rockchip_vdpu2_dec_fmts),
.codec = HANTRO_MPEG2_DECODER | HANTRO_VP8_DECODER |
HANTRO_H264_DECODER,
.codec_ops = rk3399_vpu_codec_ops,
@@ -528,6 +601,11 @@ const struct hantro_variant rk3328_vpu_variant = {
.num_clocks = ARRAY_SIZE(rockchip_vpu_clk_names),
};

+/*
+ * H.264 decoding explicitly disabled in RK3399.
+ * This ensures userspace applications use the Rockchip VDEC core,
+ * which has better performance.
+ */
const struct hantro_variant rk3399_vpu_variant = {
.enc_offset = 0x0,
.enc_fmts = rockchip_vpu_enc_fmts,
@@ -545,13 +623,27 @@ const struct hantro_variant rk3399_vpu_variant = {
.num_clocks = ARRAY_SIZE(rockchip_vpu_clk_names)
};

+const struct hantro_variant rk3568_vpu_variant = {
+ .dec_offset = 0x400,
+ .dec_fmts = rockchip_vdpu2_dec_fmts,
+ .num_dec_fmts = ARRAY_SIZE(rockchip_vdpu2_dec_fmts),
+ .codec = HANTRO_MPEG2_DECODER |
+ HANTRO_VP8_DECODER | HANTRO_H264_DECODER,
+ .codec_ops = rk3399_vpu_codec_ops,
+ .irqs = rockchip_vdpu2_irqs,
+ .num_irqs = ARRAY_SIZE(rockchip_vdpu2_irqs),
+ .init = rockchip_vpu_hw_init,
+ .clk_names = rockchip_vpu_clk_names,
+ .num_clocks = ARRAY_SIZE(rockchip_vpu_clk_names)
+};
+
const struct hantro_variant px30_vpu_variant = {
.enc_offset = 0x0,
.enc_fmts = rockchip_vpu_enc_fmts,
.num_enc_fmts = ARRAY_SIZE(rockchip_vpu_enc_fmts),
.dec_offset = 0x400,
- .dec_fmts = rk3399_vpu_dec_fmts,
- .num_dec_fmts = ARRAY_SIZE(rk3399_vpu_dec_fmts),
+ .dec_fmts = rockchip_vdpu2_dec_fmts,
+ .num_dec_fmts = ARRAY_SIZE(rockchip_vdpu2_dec_fmts),
.codec = HANTRO_JPEG_ENCODER | HANTRO_MPEG2_DECODER |
HANTRO_VP8_DECODER | HANTRO_H264_DECODER,
.codec_ops = rk3399_vpu_codec_ops,
diff --git a/drivers/staging/media/hantro/sama5d4_vdec_hw.c b/drivers/staging/media/hantro/sama5d4_vdec_hw.c
index b2fc1c5613e1..b205e2db5b04 100644
--- a/drivers/staging/media/hantro/sama5d4_vdec_hw.c
+++ b/drivers/staging/media/hantro/sama5d4_vdec_hw.c
@@ -16,6 +16,14 @@ static const struct hantro_fmt sama5d4_vdec_postproc_fmts[] = {
.fourcc = V4L2_PIX_FMT_YUYV,
.codec_mode = HANTRO_MODE_NONE,
.postprocessed = true,
+ .frmsize = {
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_HD_WIDTH,
+ .step_width = MB_DIM,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_HD_HEIGHT,
+ .step_height = MB_DIM,
+ },
},
};

@@ -23,17 +31,25 @@ static const struct hantro_fmt sama5d4_vdec_fmts[] = {
{
.fourcc = V4L2_PIX_FMT_NV12,
.codec_mode = HANTRO_MODE_NONE,
+ .frmsize = {
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_HD_WIDTH,
+ .step_width = MB_DIM,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_HD_HEIGHT,
+ .step_height = MB_DIM,
+ },
},
{
.fourcc = V4L2_PIX_FMT_MPEG2_SLICE,
.codec_mode = HANTRO_MODE_MPEG2_DEC,
.max_depth = 2,
.frmsize = {
- .min_width = 48,
- .max_width = 1280,
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_HD_WIDTH,
.step_width = MB_DIM,
- .min_height = 48,
- .max_height = 720,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_HD_HEIGHT,
.step_height = MB_DIM,
},
},
@@ -42,11 +58,11 @@ static const struct hantro_fmt sama5d4_vdec_fmts[] = {
.codec_mode = HANTRO_MODE_VP8_DEC,
.max_depth = 2,
.frmsize = {
- .min_width = 48,
- .max_width = 1280,
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_HD_WIDTH,
.step_width = MB_DIM,
- .min_height = 48,
- .max_height = 720,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_HD_HEIGHT,
.step_height = MB_DIM,
},
},
@@ -55,11 +71,11 @@ static const struct hantro_fmt sama5d4_vdec_fmts[] = {
.codec_mode = HANTRO_MODE_H264_DEC,
.max_depth = 2,
.frmsize = {
- .min_width = 48,
- .max_width = 1280,
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_HD_WIDTH,
.step_width = MB_DIM,
- .min_height = 48,
- .max_height = 720,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_HD_HEIGHT,
.step_height = MB_DIM,
},
},
diff --git a/drivers/staging/media/hantro/sunxi_vpu_hw.c b/drivers/staging/media/hantro/sunxi_vpu_hw.c
index c0edd5856a0c..fbeac81e59e1 100644
--- a/drivers/staging/media/hantro/sunxi_vpu_hw.c
+++ b/drivers/staging/media/hantro/sunxi_vpu_hw.c
@@ -14,6 +14,14 @@ static const struct hantro_fmt sunxi_vpu_postproc_fmts[] = {
.fourcc = V4L2_PIX_FMT_NV12,
.codec_mode = HANTRO_MODE_NONE,
.postprocessed = true,
+ .frmsize = {
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_UHD_WIDTH,
+ .step_width = 32,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_UHD_HEIGHT,
+ .step_height = 32,
+ },
},
};

@@ -21,17 +29,25 @@ static const struct hantro_fmt sunxi_vpu_dec_fmts[] = {
{
.fourcc = V4L2_PIX_FMT_NV12_4L4,
.codec_mode = HANTRO_MODE_NONE,
+ .frmsize = {
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_UHD_WIDTH,
+ .step_width = 32,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_UHD_HEIGHT,
+ .step_height = 32,
+ },
},
{
.fourcc = V4L2_PIX_FMT_VP9_FRAME,
.codec_mode = HANTRO_MODE_VP9_DEC,
.max_depth = 2,
.frmsize = {
- .min_width = 48,
- .max_width = 3840,
+ .min_width = FMT_MIN_WIDTH,
+ .max_width = FMT_UHD_WIDTH,
.step_width = 32,
- .min_height = 48,
- .max_height = 2160,
+ .min_height = FMT_MIN_HEIGHT,
+ .max_height = FMT_UHD_HEIGHT,
.step_height = 32,
},
},
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h265.c b/drivers/staging/media/sunxi/cedrus/cedrus_h265.c
index 44f385be9f6c..04419381ea56 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_h265.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_h265.c
@@ -143,10 +143,13 @@ static void cedrus_h265_frame_info_write_dpb(struct cedrus_ctx *ctx,
for (i = 0; i < num_active_dpb_entries; i++) {
int buffer_index = vb2_find_timestamp(vq, dpb[i].timestamp, 0);
u32 pic_order_cnt[2] = {
- dpb[i].pic_order_cnt[0],
- dpb[i].pic_order_cnt[1]
+ dpb[i].pic_order_cnt_val,
+ dpb[i].pic_order_cnt_val
};

+ if (buffer_index < 0)
+ continue;
+
cedrus_h265_frame_info_write_single(ctx, i, dpb[i].field_pic,
pic_order_cnt,
buffer_index);
@@ -301,6 +304,31 @@ static void cedrus_h265_write_scaling_list(struct cedrus_ctx *ctx,
}
}

+static int cedrus_h265_is_low_delay(struct cedrus_run *run)
+{
+ const struct v4l2_ctrl_hevc_slice_params *slice_params;
+ const struct v4l2_hevc_dpb_entry *dpb;
+ s32 poc;
+ int i;
+
+ slice_params = run->h265.slice_params;
+ poc = run->h265.decode_params->pic_order_cnt_val;
+ dpb = run->h265.decode_params->dpb;
+
+ for (i = 0; i < slice_params->num_ref_idx_l0_active_minus1 + 1; i++)
+ if (dpb[slice_params->ref_idx_l0[i]].pic_order_cnt_val > poc)
+ return 1;
+
+ if (slice_params->slice_type != V4L2_HEVC_SLICE_TYPE_B)
+ return 0;
+
+ for (i = 0; i < slice_params->num_ref_idx_l1_active_minus1 + 1; i++)
+ if (dpb[slice_params->ref_idx_l1[i]].pic_order_cnt_val > poc)
+ return 1;
+
+ return 0;
+}
+
static void cedrus_h265_setup(struct cedrus_ctx *ctx,
struct cedrus_run *run)
{
@@ -559,7 +587,6 @@ static void cedrus_h265_setup(struct cedrus_ctx *ctx,

reg = VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_TC_OFFSET_DIV2(slice_params->slice_tc_offset_div2) |
VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_BETA_OFFSET_DIV2(slice_params->slice_beta_offset_div2) |
- VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_POC_BIGEST_IN_RPS_ST(decode_params->num_poc_st_curr_after == 0) |
VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_CR_QP_OFFSET(slice_params->slice_cr_qp_offset) |
VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_CB_QP_OFFSET(slice_params->slice_cb_qp_offset) |
VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_QP_DELTA(slice_params->slice_qp_delta);
@@ -572,6 +599,9 @@ static void cedrus_h265_setup(struct cedrus_ctx *ctx,
V4L2_HEVC_SLICE_PARAMS_FLAG_SLICE_LOOP_FILTER_ACROSS_SLICES_ENABLED,
slice_params->flags);

+ if (slice_params->slice_type != V4L2_HEVC_SLICE_TYPE_I && !cedrus_h265_is_low_delay(run))
+ reg |= VE_DEC_H265_DEC_SLICE_HDR_INFO1_FLAG_SLICE_NOT_LOW_DELAY;
+
cedrus_write(dev, VE_DEC_H265_DEC_SLICE_HDR_INFO1, reg);

chroma_log2_weight_denom = pred_weight_table->luma_log2_weight_denom +
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h
index bdb062ad8682..d81f7513ade0 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h
@@ -377,13 +377,12 @@

#define VE_DEC_H265_DEC_SLICE_HDR_INFO1_FLAG_SLICE_DEBLOCKING_FILTER_DISABLED BIT(23)
#define VE_DEC_H265_DEC_SLICE_HDR_INFO1_FLAG_SLICE_LOOP_FILTER_ACROSS_SLICES_ENABLED BIT(22)
+#define VE_DEC_H265_DEC_SLICE_HDR_INFO1_FLAG_SLICE_NOT_LOW_DELAY BIT(21)

#define VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_TC_OFFSET_DIV2(v) \
SHIFT_AND_MASK_BITS(v, 31, 28)
#define VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_BETA_OFFSET_DIV2(v) \
SHIFT_AND_MASK_BITS(v, 27, 24)
-#define VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_POC_BIGEST_IN_RPS_ST(v) \
- ((v) ? BIT(21) : 0)
#define VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_CR_QP_OFFSET(v) \
SHIFT_AND_MASK_BITS(v, 20, 16)
#define VE_DEC_H265_DEC_SLICE_HDR_INFO1_SLICE_CB_QP_OFFSET(v) \
diff --git a/drivers/staging/rtl8192u/r8192U.h b/drivers/staging/rtl8192u/r8192U.h
index 14ca00a2789b..1942cb849374 100644
--- a/drivers/staging/rtl8192u/r8192U.h
+++ b/drivers/staging/rtl8192u/r8192U.h
@@ -1013,7 +1013,7 @@ typedef struct r8192_priv {
bool bis_any_nonbepkts;
bool bcurrent_turbo_EDCA;
bool bis_cur_rdlstate;
- struct timer_list fsync_timer;
+ struct delayed_work fsync_work;
bool bfsync_processing; /* 500ms Fsync timer is active or not */
u32 rate_record;
u32 rateCountDiffRecord;
diff --git a/drivers/staging/rtl8192u/r8192U_dm.c b/drivers/staging/rtl8192u/r8192U_dm.c
index 725bf5ca9e34..0fcfcaa6500b 100644
--- a/drivers/staging/rtl8192u/r8192U_dm.c
+++ b/drivers/staging/rtl8192u/r8192U_dm.c
@@ -2578,19 +2578,20 @@ static void dm_init_fsync(struct net_device *dev)
priv->ieee80211->fsync_seconddiff_ratethreshold = 200;
priv->ieee80211->fsync_state = Default_Fsync;
priv->framesyncMonitor = 1; /* current default 0xc38 monitor on */
- timer_setup(&priv->fsync_timer, dm_fsync_timer_callback, 0);
+ INIT_DELAYED_WORK(&priv->fsync_work, dm_fsync_work_callback);
}

static void dm_deInit_fsync(struct net_device *dev)
{
struct r8192_priv *priv = ieee80211_priv(dev);

- del_timer_sync(&priv->fsync_timer);
+ cancel_delayed_work_sync(&priv->fsync_work);
}

-void dm_fsync_timer_callback(struct timer_list *t)
+void dm_fsync_work_callback(struct work_struct *work)
{
- struct r8192_priv *priv = from_timer(priv, t, fsync_timer);
+ struct r8192_priv *priv =
+ container_of(work, struct r8192_priv, fsync_work.work);
struct net_device *dev = priv->ieee80211->dev;
u32 rate_index, rate_count = 0, rate_count_diff = 0;
bool bSwitchFromCountDiff = false;
@@ -2657,17 +2658,16 @@ void dm_fsync_timer_callback(struct timer_list *t)
}
}
if (bDoubleTimeInterval) {
- if (timer_pending(&priv->fsync_timer))
- del_timer_sync(&priv->fsync_timer);
- priv->fsync_timer.expires = jiffies +
- msecs_to_jiffies(priv->ieee80211->fsync_time_interval*priv->ieee80211->fsync_multiple_timeinterval);
- add_timer(&priv->fsync_timer);
+ cancel_delayed_work_sync(&priv->fsync_work);
+ schedule_delayed_work(&priv->fsync_work,
+ msecs_to_jiffies(priv
+ ->ieee80211->fsync_time_interval *
+ priv->ieee80211->fsync_multiple_timeinterval));
} else {
- if (timer_pending(&priv->fsync_timer))
- del_timer_sync(&priv->fsync_timer);
- priv->fsync_timer.expires = jiffies +
- msecs_to_jiffies(priv->ieee80211->fsync_time_interval);
- add_timer(&priv->fsync_timer);
+ cancel_delayed_work_sync(&priv->fsync_work);
+ schedule_delayed_work(&priv->fsync_work,
+ msecs_to_jiffies(priv
+ ->ieee80211->fsync_time_interval));
}
} else {
/* Let Register return to default value; */
@@ -2695,7 +2695,7 @@ static void dm_EndSWFsync(struct net_device *dev)
struct r8192_priv *priv = ieee80211_priv(dev);

RT_TRACE(COMP_HALDM, "%s\n", __func__);
- del_timer_sync(&(priv->fsync_timer));
+ cancel_delayed_work_sync(&priv->fsync_work);

/* Let Register return to default value; */
if (priv->bswitch_fsync) {
@@ -2736,11 +2736,9 @@ static void dm_StartSWFsync(struct net_device *dev)
if (priv->ieee80211->fsync_rate_bitmap & rateBitmap)
priv->rate_record += priv->stats.received_rate_histogram[1][rateIndex];
}
- if (timer_pending(&priv->fsync_timer))
- del_timer_sync(&priv->fsync_timer);
- priv->fsync_timer.expires = jiffies +
- msecs_to_jiffies(priv->ieee80211->fsync_time_interval);
- add_timer(&priv->fsync_timer);
+ cancel_delayed_work_sync(&priv->fsync_work);
+ schedule_delayed_work(&priv->fsync_work,
+ msecs_to_jiffies(priv->ieee80211->fsync_time_interval));

write_nic_dword(dev, rOFDM0_RxDetector2, 0x465c12cd);
}
diff --git a/drivers/staging/rtl8192u/r8192U_dm.h b/drivers/staging/rtl8192u/r8192U_dm.h
index 0b2a1c688597..2159018b4e38 100644
--- a/drivers/staging/rtl8192u/r8192U_dm.h
+++ b/drivers/staging/rtl8192u/r8192U_dm.h
@@ -166,7 +166,7 @@ void dm_force_tx_fw_info(struct net_device *dev,
void dm_init_edca_turbo(struct net_device *dev);
void dm_rf_operation_test_callback(unsigned long data);
void dm_rf_pathcheck_workitemcallback(struct work_struct *work);
-void dm_fsync_timer_callback(struct timer_list *t);
+void dm_fsync_work_callback(struct work_struct *work);
void dm_cck_txpower_adjust(struct net_device *dev, bool binch14);
void dm_shadow_init(struct net_device *dev);
void dm_initialize_txpower_tracking(struct net_device *dev);
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index 1c4aac8464a7..1e5a78131aba 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -813,12 +813,13 @@ static const struct attribute_group cooling_device_stats_attr_group = {

static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
{
+ const struct attribute_group *stats_attr_group = NULL;
struct cooling_dev_stats *stats;
unsigned long states;
int var;

if (cdev->ops->get_max_state(cdev, &states))
- return;
+ goto out;

states++; /* Total number of states is highest state + 1 */

@@ -828,7 +829,7 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)

stats = kzalloc(var, GFP_KERNEL);
if (!stats)
- return;
+ goto out;

stats->time_in_state = (ktime_t *)(stats + 1);
stats->trans_table = (unsigned int *)(stats->time_in_state + states);
@@ -838,9 +839,12 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)

spin_lock_init(&stats->lock);

+ stats_attr_group = &cooling_device_stats_attr_group;
+
+out:
/* Fill the empty slot left in cooling_device_attr_groups */
var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
- cooling_device_attr_groups[var] = &cooling_device_stats_attr_group;
+ cooling_device_attr_groups[var] = stats_attr_group;
}

static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev)
diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index 9f2a8c0e1e33..3dd71ede4cd4 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -5,6 +5,14 @@
*
* * THIS IS A DEVELOPMENT SNAPSHOT IT IS NOT A FINAL RELEASE *
*
+ * Outgoing path:
+ * tty -> DLCI fifo -> scheduler -> GSM MUX data queue ---o-> ldisc
+ * control message -> GSM MUX control queue --´
+ *
+ * Incoming path:
+ * ldisc -> gsm_queue() -o--> tty
+ * `-> gsm_control_response()
+ *
* TO DO:
* Mostly done: ioctls for setting modes/timing
* Partly done: hooks so you can pull off frames to non tty devs
@@ -210,6 +218,9 @@ struct gsm_mux {
/* Events on the GSM channel */
wait_queue_head_t event;

+ /* ldisc send work */
+ struct work_struct tx_work;
+
/* Bits for GSM mode decoding */

/* Framing Layer */
@@ -235,14 +246,17 @@ struct gsm_mux {
struct gsm_dlci *dlci[NUM_DLCI];
int old_c_iflag; /* termios c_iflag value before attach */
bool constipated; /* Asked by remote to shut up */
+ bool has_devices; /* Devices were registered */

spinlock_t tx_lock;
unsigned int tx_bytes; /* TX data outstanding */
#define TX_THRESH_HI 8192
#define TX_THRESH_LO 2048
- struct list_head tx_list; /* Pending data packets */
+ struct list_head tx_ctrl_list; /* Pending control packets */
+ struct list_head tx_data_list; /* Pending data packets */

/* Control messages */
+ struct timer_list kick_timer; /* Kick TX queuing on timeout */
struct timer_list t2_timer; /* Retransmit timer for commands */
int cretries; /* Command retry counter */
struct gsm_control *pending_cmd;/* Our current pending command */
@@ -369,6 +383,11 @@ static const u8 gsm_fcs8[256] = {

static int gsmld_output(struct gsm_mux *gsm, u8 *data, int len);
static int gsm_modem_update(struct gsm_dlci *dlci, u8 brk);
+static struct gsm_msg *gsm_data_alloc(struct gsm_mux *gsm, u8 addr, int len,
+ u8 ctrl);
+static int gsm_send_packet(struct gsm_mux *gsm, struct gsm_msg *msg);
+static void gsmld_write_trigger(struct gsm_mux *gsm);
+static void gsmld_write_task(struct work_struct *work);

/**
* gsm_fcs_add - update FCS
@@ -419,6 +438,27 @@ static int gsm_read_ea(unsigned int *val, u8 c)
return c & EA;
}

+/**
+ * gsm_read_ea_val - read a value until EA
+ * @val: variable holding value
+ * @data: buffer of data
+ * @dlen: length of data
+ *
+ * Processes an EA value. Updates the passed variable and
+ * returns the processed data length.
+ */
+static unsigned int gsm_read_ea_val(unsigned int *val, const u8 *data, int dlen)
+{
+ unsigned int len = 0;
+
+ for (; dlen > 0; dlen--) {
+ len++;
+ if (gsm_read_ea(val, *data++))
+ break;
+ }
+ return len;
+}
+
/**
* gsm_encode_modem - encode modem data bits
* @dlci: DLCI to encode from
@@ -463,6 +503,68 @@ static void gsm_hex_dump_bytes(const char *fname, const u8 *data,
kfree(prefix);
}

+/**
+ * gsm_register_devices - register all tty devices for a given mux index
+ *
+ * @driver: the tty driver that describes the tty devices
+ * @index: the mux number is used to calculate the minor numbers of the
+ * ttys for this mux and may differ from the position in the
+ * mux array.
+ */
+static int gsm_register_devices(struct tty_driver *driver, unsigned int index)
+{
+ struct device *dev;
+ int i;
+ unsigned int base;
+
+ if (!driver || index >= MAX_MUX)
+ return -EINVAL;
+
+ base = index * NUM_DLCI; /* first minor for this index */
+ for (i = 1; i < NUM_DLCI; i++) {
+ /* Don't register device 0 - this is the control channel
+ * and not a usable tty interface
+ */
+ dev = tty_register_device(gsm_tty_driver, base + i, NULL);
+ if (IS_ERR(dev)) {
+ if (debug & 8)
+ pr_info("%s failed to register device minor %u",
+ __func__, base + i);
+ for (i--; i >= 1; i--)
+ tty_unregister_device(gsm_tty_driver, base + i);
+ return PTR_ERR(dev);
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * gsm_unregister_devices - unregister all tty devices for a given mux index
+ *
+ * @driver: the tty driver that describes the tty devices
+ * @index: the mux number is used to calculate the minor numbers of the
+ * ttys for this mux and may differ from the position in the
+ * mux array.
+ */
+static void gsm_unregister_devices(struct tty_driver *driver,
+ unsigned int index)
+{
+ int i;
+ unsigned int base;
+
+ if (!driver || index >= MAX_MUX)
+ return;
+
+ base = index * NUM_DLCI; /* first minor for this index */
+ for (i = 1; i < NUM_DLCI; i++) {
+ /* Don't unregister device 0 - this is the control
+ * channel and not a usable tty interface
+ */
+ tty_unregister_device(gsm_tty_driver, base + i);
+ }
+}
+
/**
* gsm_print_packet - display a frame for debug
* @hdr: header to print before decode
@@ -570,57 +672,73 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, int len)
* @cr: command/response bit seen as initiator
* @control: control byte including PF bit
*
- * Format up and transmit a control frame. These do not go via the
- * queueing logic as they should be transmitted ahead of data when
- * they are needed.
- *
- * FIXME: Lock versus data TX path
+ * Format up and transmit a control frame. These should be transmitted
+ * ahead of data when they are needed.
*/
-
-static void gsm_send(struct gsm_mux *gsm, int addr, int cr, int control)
+static int gsm_send(struct gsm_mux *gsm, int addr, int cr, int control)
{
- int len;
- u8 cbuf[10];
- u8 ibuf[3];
+ struct gsm_msg *msg;
+ u8 *dp;
int ocr;
+ unsigned long flags;
+
+ msg = gsm_data_alloc(gsm, addr, 0, control);
+ if (!msg)
+ return -ENOMEM;

/* toggle C/R coding if not initiator */
ocr = cr ^ (gsm->initiator ? 0 : 1);

- switch (gsm->encoding) {
- case 0:
- cbuf[0] = GSM0_SOF;
- cbuf[1] = (addr << 2) | (ocr << 1) | EA;
- cbuf[2] = control;
- cbuf[3] = EA; /* Length of data = 0 */
- cbuf[4] = 0xFF - gsm_fcs_add_block(INIT_FCS, cbuf + 1, 3);
- cbuf[5] = GSM0_SOF;
- len = 6;
- break;
- case 1:
- case 2:
- /* Control frame + packing (but not frame stuffing) in mode 1 */
- ibuf[0] = (addr << 2) | (ocr << 1) | EA;
- ibuf[1] = control;
- ibuf[2] = 0xFF - gsm_fcs_add_block(INIT_FCS, ibuf, 2);
- /* Stuffing may double the size worst case */
- len = gsm_stuff_frame(ibuf, cbuf + 1, 3);
- /* Now add the SOF markers */
- cbuf[0] = GSM1_SOF;
- cbuf[len + 1] = GSM1_SOF;
- /* FIXME: we can omit the lead one in many cases */
- len += 2;
- break;
- default:
- WARN_ON(1);
- return;
- }
- gsmld_output(gsm, cbuf, len);
- if (!gsm->initiator) {
- cr = cr & gsm->initiator;
- control = control & ~PF;
+ msg->data -= 3;
+ dp = msg->data;
+ *dp++ = (addr << 2) | (ocr << 1) | EA;
+ *dp++ = control;
+
+ if (gsm->encoding == 0)
+ *dp++ = EA; /* Length of data = 0 */
+
+ *dp = 0xFF - gsm_fcs_add_block(INIT_FCS, msg->data, dp - msg->data);
+ msg->len = (dp - msg->data) + 1;
+
+ gsm_print_packet("Q->", addr, cr, control, NULL, 0);
+
+ spin_lock_irqsave(&gsm->tx_lock, flags);
+ list_add_tail(&msg->list, &gsm->tx_ctrl_list);
+ gsm->tx_bytes += msg->len;
+ spin_unlock_irqrestore(&gsm->tx_lock, flags);
+ gsmld_write_trigger(gsm);
+
+ return 0;
+}
+
+/**
+ * gsm_dlci_clear_queues - remove outstanding data for a DLCI
+ * @gsm: mux
+ * @dlci: clear for this DLCI
+ *
+ * Clears the data queues for a given DLCI.
+ */
+static void gsm_dlci_clear_queues(struct gsm_mux *gsm, struct gsm_dlci *dlci)
+{
+ struct gsm_msg *msg, *nmsg;
+ int addr = dlci->addr;
+ unsigned long flags;
+
+ /* Clear DLCI write fifo first */
+ spin_lock_irqsave(&dlci->lock, flags);
+ kfifo_reset(&dlci->fifo);
+ spin_unlock_irqrestore(&dlci->lock, flags);
+
+ /* Clear data packets in MUX write queue */
+ spin_lock_irqsave(&gsm->tx_lock, flags);
+ list_for_each_entry_safe(msg, nmsg, &gsm->tx_data_list, list) {
+ if (msg->addr != addr)
+ continue;
+ gsm->tx_bytes -= msg->len;
+ list_del(&msg->list);
+ kfree(msg);
}
- gsm_print_packet("-->", addr, cr, control, NULL, 0);
+ spin_unlock_irqrestore(&gsm->tx_lock, flags);
}

/**
@@ -683,59 +801,151 @@ static struct gsm_msg *gsm_data_alloc(struct gsm_mux *gsm, u8 addr, int len,
}

/**
- * gsm_data_kick - poke the queue
+ * gsm_send_packet - sends a single packet
* @gsm: GSM Mux
- * @dlci: DLCI sending the data
+ * @msg: packet to send
*
- * The tty device has called us to indicate that room has appeared in
- * the transmit queue. Ram more data into the pipe if we have any
- * If we have been flow-stopped by a CMD_FCOFF, then we can only
- * send messages on DLCI0 until CMD_FCON
+ * The given packet is encoded and sent out. No memory is freed.
+ * The caller must hold the gsm tx lock.
+ */
+static int gsm_send_packet(struct gsm_mux *gsm, struct gsm_msg *msg)
+{
+ int len, ret;
+
+
+ if (gsm->encoding == 0) {
+ gsm->txframe[0] = GSM0_SOF;
+ memcpy(gsm->txframe + 1, msg->data, msg->len);
+ gsm->txframe[msg->len + 1] = GSM0_SOF;
+ len = msg->len + 2;
+ } else {
+ gsm->txframe[0] = GSM1_SOF;
+ len = gsm_stuff_frame(msg->data, gsm->txframe + 1, msg->len);
+ gsm->txframe[len + 1] = GSM1_SOF;
+ len += 2;
+ }
+
+ if (debug & 4)
+ gsm_hex_dump_bytes(__func__, gsm->txframe, len);
+ gsm_print_packet("-->", msg->addr, gsm->initiator, msg->ctrl, msg->data,
+ msg->len);
+
+ ret = gsmld_output(gsm, gsm->txframe, len);
+ if (ret <= 0)
+ return ret;
+ /* FIXME: Can eliminate one SOF in many more cases */
+ gsm->tx_bytes -= msg->len;
+
+ return 0;
+}
+
+/**
+ * gsm_is_flow_ctrl_msg - checks if flow control message
+ * @msg: message to check
*
- * FIXME: lock against link layer control transmissions
+ * Returns true if the given message is a flow control command of the
+ * control channel. False is returned in any other case.
*/
+static bool gsm_is_flow_ctrl_msg(struct gsm_msg *msg)
+{
+ unsigned int cmd;
+
+ if (msg->addr > 0)
+ return false;
+
+ switch (msg->ctrl & ~PF) {
+ case UI:
+ case UIH:
+ cmd = 0;
+ if (gsm_read_ea_val(&cmd, msg->data + 2, msg->len - 2) < 1)
+ break;
+ switch (cmd & ~PF) {
+ case CMD_FCOFF:
+ case CMD_FCON:
+ return true;
+ }
+ break;
+ }
+
+ return false;
+}

-static void gsm_data_kick(struct gsm_mux *gsm, struct gsm_dlci *dlci)
+/**
+ * gsm_data_kick - poke the queue
+ * @gsm: GSM Mux
+ *
+ * The tty device has called us to indicate that room has appeared in
+ * the transmit queue. Ram more data into the pipe if we have any.
+ * If we have been flow-stopped by a CMD_FCOFF, then we can only
+ * send messages on DLCI0 until CMD_FCON. The caller must hold
+ * the gsm tx lock.
+ */
+static int gsm_data_kick(struct gsm_mux *gsm)
{
struct gsm_msg *msg, *nmsg;
- int len;
+ struct gsm_dlci *dlci;
+ int ret;

- list_for_each_entry_safe(msg, nmsg, &gsm->tx_list, list) {
- if (gsm->constipated && msg->addr)
- continue;
- if (gsm->encoding != 0) {
- gsm->txframe[0] = GSM1_SOF;
- len = gsm_stuff_frame(msg->data,
- gsm->txframe + 1, msg->len);
- gsm->txframe[len + 1] = GSM1_SOF;
- len += 2;
- } else {
- gsm->txframe[0] = GSM0_SOF;
- memcpy(gsm->txframe + 1 , msg->data, msg->len);
- gsm->txframe[msg->len + 1] = GSM0_SOF;
- len = msg->len + 2;
- }
+ clear_bit(TTY_DO_WRITE_WAKEUP, &gsm->tty->flags);

- if (debug & 4)
- gsm_hex_dump_bytes(__func__, gsm->txframe, len);
- if (gsmld_output(gsm, gsm->txframe, len) <= 0)
+ /* Serialize control messages and control channel messages first */
+ list_for_each_entry_safe(msg, nmsg, &gsm->tx_ctrl_list, list) {
+ if (gsm->constipated && !gsm_is_flow_ctrl_msg(msg))
+ continue;
+ ret = gsm_send_packet(gsm, msg);
+ switch (ret) {
+ case -ENOSPC:
+ return -ENOSPC;
+ case -ENODEV:
+ /* ldisc not open */
+ gsm->tx_bytes -= msg->len;
+ list_del(&msg->list);
+ kfree(msg);
+ continue;
+ default:
+ if (ret >= 0) {
+ list_del(&msg->list);
+ kfree(msg);
+ }
break;
- /* FIXME: Can eliminate one SOF in many more cases */
- gsm->tx_bytes -= msg->len;
-
- list_del(&msg->list);
- kfree(msg);
+ }
+ }

- if (dlci) {
- tty_port_tty_wakeup(&dlci->port);
- } else {
- int i = 0;
+ if (gsm->constipated)
+ return -EAGAIN;

- for (i = 0; i < NUM_DLCI; i++)
- if (gsm->dlci[i])
- tty_port_tty_wakeup(&gsm->dlci[i]->port);
+ /* Serialize other channels */
+ if (list_empty(&gsm->tx_data_list))
+ return 0;
+ list_for_each_entry_safe(msg, nmsg, &gsm->tx_data_list, list) {
+ dlci = gsm->dlci[msg->addr];
+ /* Send only messages for DLCIs with valid state */
+ if (dlci->state != DLCI_OPEN) {
+ gsm->tx_bytes -= msg->len;
+ list_del(&msg->list);
+ kfree(msg);
+ continue;
+ }
+ ret = gsm_send_packet(gsm, msg);
+ switch (ret) {
+ case -ENOSPC:
+ return -ENOSPC;
+ case -ENODEV:
+ /* ldisc not open */
+ gsm->tx_bytes -= msg->len;
+ list_del(&msg->list);
+ kfree(msg);
+ continue;
+ default:
+ if (ret >= 0) {
+ list_del(&msg->list);
+ kfree(msg);
+ }
+ break;
}
}
+
+ return 1;
}

/**
@@ -784,9 +994,22 @@ static void __gsm_data_queue(struct gsm_dlci *dlci, struct gsm_msg *msg)
msg->data = dp;

/* Add to the actual output queue */
- list_add_tail(&msg->list, &gsm->tx_list);
+ switch (msg->ctrl & ~PF) {
+ case UI:
+ case UIH:
+ if (msg->addr > 0) {
+ list_add_tail(&msg->list, &gsm->tx_data_list);
+ break;
+ }
+ fallthrough;
+ default:
+ list_add_tail(&msg->list, &gsm->tx_ctrl_list);
+ break;
+ }
gsm->tx_bytes += msg->len;
- gsm_data_kick(gsm, dlci);
+
+ gsmld_write_trigger(gsm);
+ mod_timer(&gsm->kick_timer, jiffies + 10 * gsm->t1 * HZ / 100);
}

/**
@@ -823,41 +1046,48 @@ static int gsm_dlci_data_output(struct gsm_mux *gsm, struct gsm_dlci *dlci)
{
struct gsm_msg *msg;
u8 *dp;
- int len, total_size, size;
- int h = dlci->adaption - 1;
+ int h, len, size;

- total_size = 0;
- while (1) {
- len = kfifo_len(&dlci->fifo);
- if (len == 0)
- return total_size;
-
- /* MTU/MRU count only the data bits */
- if (len > gsm->mtu)
- len = gsm->mtu;
-
- size = len + h;
-
- msg = gsm_data_alloc(gsm, dlci->addr, size, gsm->ftype);
- /* FIXME: need a timer or something to kick this so it can't
- get stuck with no work outstanding and no buffer free */
- if (msg == NULL)
- return -ENOMEM;
- dp = msg->data;
- switch (dlci->adaption) {
- case 1: /* Unstructured */
- break;
- case 2: /* Unstructed with modem bits.
- Always one byte as we never send inline break data */
- *dp++ = (gsm_encode_modem(dlci) << 1) | EA;
- break;
- }
- WARN_ON(kfifo_out_locked(&dlci->fifo, dp , len, &dlci->lock) != len);
- __gsm_data_queue(dlci, msg);
- total_size += size;
+ /* for modem bits without break data */
+ h = ((dlci->adaption == 1) ? 0 : 1);
+
+ len = kfifo_len(&dlci->fifo);
+ if (len == 0)
+ return 0;
+
+ /* MTU/MRU count only the data bits but watch adaption mode */
+ if ((len + h) > gsm->mtu)
+ len = gsm->mtu - h;
+
+ size = len + h;
+
+ msg = gsm_data_alloc(gsm, dlci->addr, size, gsm->ftype);
+ if (!msg)
+ return -ENOMEM;
+ dp = msg->data;
+ switch (dlci->adaption) {
+ case 1: /* Unstructured */
+ break;
+ case 2: /* Unstructured with modem bits.
+ * Always one byte as we never send inline break data
+ */
+ *dp++ = (gsm_encode_modem(dlci) << 1) | EA;
+ break;
+ default:
+ pr_err("%s: unsupported adaption %d\n", __func__,
+ dlci->adaption);
+ break;
}
+
+ WARN_ON(len != kfifo_out_locked(&dlci->fifo, dp, len,
+ &dlci->lock));
+
+ /* Notify upper layer about available send space. */
+ tty_port_tty_wakeup(&dlci->port);
+
+ __gsm_data_queue(dlci, msg);
/* Bytes of data we used up */
- return total_size;
+ return size;
}

/**
@@ -908,9 +1138,6 @@ static int gsm_dlci_data_output_framed(struct gsm_mux *gsm,

size = len + overhead;
msg = gsm_data_alloc(gsm, dlci->addr, size, gsm->ftype);
-
- /* FIXME: need a timer or something to kick this so it can't
- get stuck with no work outstanding and no buffer free */
if (msg == NULL) {
skb_queue_tail(&dlci->skb_list, dlci->skb);
dlci->skb = NULL;
@@ -1006,32 +1233,43 @@ static int gsm_dlci_modem_output(struct gsm_mux *gsm, struct gsm_dlci *dlci,
* renegotiate DLCI priorities with optional stuff. Needs optimising.
*/

-static void gsm_dlci_data_sweep(struct gsm_mux *gsm)
+static int gsm_dlci_data_sweep(struct gsm_mux *gsm)
{
- int len;
/* Priority ordering: We should do priority with RR of the groups */
- int i = 1;
-
- while (i < NUM_DLCI) {
- struct gsm_dlci *dlci;
+ int i, len, ret = 0;
+ bool sent;
+ struct gsm_dlci *dlci;

- if (gsm->tx_bytes > TX_THRESH_HI)
- break;
- dlci = gsm->dlci[i];
- if (dlci == NULL || dlci->constipated) {
- i++;
- continue;
+ while (gsm->tx_bytes < TX_THRESH_HI) {
+ for (sent = false, i = 1; i < NUM_DLCI; i++) {
+ dlci = gsm->dlci[i];
+ /* skip unused or blocked channel */
+ if (!dlci || dlci->constipated)
+ continue;
+ /* skip channels with invalid state */
+ if (dlci->state != DLCI_OPEN)
+ continue;
+ /* count the sent data per adaption */
+ if (dlci->adaption < 3 && !dlci->net)
+ len = gsm_dlci_data_output(gsm, dlci);
+ else
+ len = gsm_dlci_data_output_framed(gsm, dlci);
+ /* on error exit */
+ if (len < 0)
+ return ret;
+ if (len > 0) {
+ ret++;
+ sent = true;
+ /* The lower DLCs can starve the higher DLCs! */
+ break;
+ }
+ /* try next */
}
- if (dlci->adaption < 3 && !dlci->net)
- len = gsm_dlci_data_output(gsm, dlci);
- else
- len = gsm_dlci_data_output_framed(gsm, dlci);
- if (len < 0)
+ if (!sent)
break;
- /* DLCI empty - try the next */
- if (len == 0)
- i++;
- }
+ };
+
+ return ret;
}

/**
@@ -1277,7 +1515,6 @@ static void gsm_control_message(struct gsm_mux *gsm, unsigned int command,
const u8 *data, int clen)
{
u8 buf[1];
- unsigned long flags;

switch (command) {
case CMD_CLD: {
@@ -1299,9 +1536,7 @@ static void gsm_control_message(struct gsm_mux *gsm, unsigned int command,
gsm->constipated = false;
gsm_control_reply(gsm, CMD_FCON, NULL, 0);
/* Kick the link in case it is idling */
- spin_lock_irqsave(&gsm->tx_lock, flags);
- gsm_data_kick(gsm, NULL);
- spin_unlock_irqrestore(&gsm->tx_lock, flags);
+ gsmld_write_trigger(gsm);
break;
case CMD_FCOFF:
/* Modem wants us to STFU */
@@ -1407,7 +1642,7 @@ static void gsm_control_retransmit(struct timer_list *t)
spin_lock_irqsave(&gsm->control_lock, flags);
ctrl = gsm->pending_cmd;
if (ctrl) {
- if (gsm->cretries == 0) {
+ if (gsm->cretries == 0 || !gsm->dlci[0] || gsm->dlci[0]->dead) {
gsm->pending_cmd = NULL;
ctrl->error = -ETIMEDOUT;
ctrl->done = 1;
@@ -1504,25 +1739,24 @@ static int gsm_control_wait(struct gsm_mux *gsm, struct gsm_control *control)

static void gsm_dlci_close(struct gsm_dlci *dlci)
{
- unsigned long flags;
-
del_timer(&dlci->t1);
if (debug & 8)
pr_debug("DLCI %d goes closed.\n", dlci->addr);
dlci->state = DLCI_CLOSED;
+ /* Prevent us from sending data before the link is up again */
+ dlci->constipated = true;
if (dlci->addr != 0) {
tty_port_tty_hangup(&dlci->port, false);
- spin_lock_irqsave(&dlci->lock, flags);
- kfifo_reset(&dlci->fifo);
- spin_unlock_irqrestore(&dlci->lock, flags);
+ gsm_dlci_clear_queues(dlci->gsm, dlci);
/* Ensure that gsmtty_open() can return. */
tty_port_set_initialized(&dlci->port, 0);
wake_up_interruptible(&dlci->port.open_wait);
} else
dlci->gsm->dead = true;
- wake_up(&dlci->gsm->event);
/* A DLCI 0 close is a MUX termination so we need to kick that
back to userspace somehow */
+ gsm_dlci_data_kick(dlci);
+ wake_up(&dlci->gsm->event);
}

/**
@@ -1539,11 +1773,13 @@ static void gsm_dlci_open(struct gsm_dlci *dlci)
del_timer(&dlci->t1);
/* This will let a tty open continue */
dlci->state = DLCI_OPEN;
+ dlci->constipated = false;
if (debug & 8)
pr_debug("DLCI %d goes open.\n", dlci->addr);
/* Send current modem state */
if (dlci->addr)
gsm_modem_update(dlci, 0);
+ gsm_dlci_data_kick(dlci);
wake_up(&dlci->gsm->event);
}

@@ -1569,8 +1805,8 @@ static void gsm_dlci_t1(struct timer_list *t)

switch (dlci->state) {
case DLCI_OPENING:
- dlci->retries--;
if (dlci->retries) {
+ dlci->retries--;
gsm_command(dlci->gsm, dlci->addr, SABM|PF);
mod_timer(&dlci->t1, jiffies + gsm->t1 * HZ / 100);
} else if (!dlci->addr && gsm->control == (DM | PF)) {
@@ -1585,8 +1821,8 @@ static void gsm_dlci_t1(struct timer_list *t)

break;
case DLCI_CLOSING:
- dlci->retries--;
if (dlci->retries) {
+ dlci->retries--;
gsm_command(dlci->gsm, dlci->addr, DISC|PF);
mod_timer(&dlci->t1, jiffies + gsm->t1 * HZ / 100);
} else
@@ -1619,6 +1855,25 @@ static void gsm_dlci_begin_open(struct gsm_dlci *dlci)
mod_timer(&dlci->t1, jiffies + gsm->t1 * HZ / 100);
}

+/**
+ * gsm_dlci_set_opening - change state to opening
+ * @dlci: DLCI to open
+ *
+ * Change internal state to wait for DLCI open from initiator side.
+ * We set off timers and responses upon reception of an SABM.
+ */
+static void gsm_dlci_set_opening(struct gsm_dlci *dlci)
+{
+ switch (dlci->state) {
+ case DLCI_CLOSED:
+ case DLCI_CLOSING:
+ dlci->state = DLCI_OPENING;
+ break;
+ default:
+ break;
+ }
+}
+
/**
* gsm_dlci_begin_close - start channel open procedure
* @dlci: DLCI to open
@@ -1728,6 +1983,30 @@ static void gsm_dlci_command(struct gsm_dlci *dlci, const u8 *data, int len)
}
}

+/**
+ * gsm_kick_timer - transmit if possible
+ * @t: timer contained in our gsm object
+ *
+ * Transmit data from DLCIs if the queue is empty. We can't rely on
+ * a tty wakeup except when we filled the pipe so we need to fire off
+ * new data ourselves in other cases.
+ */
+static void gsm_kick_timer(struct timer_list *t)
+{
+ struct gsm_mux *gsm = from_timer(gsm, t, kick_timer);
+ unsigned long flags;
+ int sent = 0;
+
+ spin_lock_irqsave(&gsm->tx_lock, flags);
+ /* If we have nothing running then we need to fire up */
+ if (gsm->tx_bytes < TX_THRESH_LO)
+ sent = gsm_dlci_data_sweep(gsm);
+ spin_unlock_irqrestore(&gsm->tx_lock, flags);
+
+ if (sent && debug & 4)
+ pr_info("%s TX queue stalled\n", __func__);
+}
+
/*
* Allocate/Free DLCI channels
*/
@@ -1762,10 +2041,13 @@ static struct gsm_dlci *gsm_dlci_alloc(struct gsm_mux *gsm, int addr)
dlci->addr = addr;
dlci->adaption = gsm->adaption;
dlci->state = DLCI_CLOSED;
- if (addr)
+ if (addr) {
dlci->data = gsm_dlci_data;
- else
+ /* Prevent us from sending data before the link is up */
+ dlci->constipated = true;
+ } else {
dlci->data = gsm_dlci_command;
+ }
gsm->dlci[addr] = dlci;
return dlci;
}
@@ -1929,7 +2211,7 @@ static void gsm_queue(struct gsm_mux *gsm)
goto invalid;
#endif
if (dlci == NULL || dlci->state != DLCI_OPEN) {
- gsm_command(gsm, address, DM|PF);
+ gsm_response(gsm, address, DM|PF);
return;
}
dlci->data(dlci, gsm->buf, gsm->len);
@@ -2052,7 +2334,7 @@ static void gsm1_receive(struct gsm_mux *gsm, unsigned char c)
} else if ((c & ISO_IEC_646_MASK) == XOFF) {
gsm->constipated = false;
/* Kick the link in case it is idling */
- gsm_data_kick(gsm, NULL);
+ gsmld_write_trigger(gsm);
return;
}
if (c == GSM1_SOF) {
@@ -2180,18 +2462,29 @@ static void gsm_cleanup_mux(struct gsm_mux *gsm, bool disc)
}

/* Finish outstanding timers, making sure they are done */
+ del_timer_sync(&gsm->kick_timer);
del_timer_sync(&gsm->t2_timer);

+ /* Finish writing to ldisc */
+ flush_work(&gsm->tx_work);
+
/* Free up any link layer users and finally the control channel */
+ if (gsm->has_devices) {
+ gsm_unregister_devices(gsm_tty_driver, gsm->num);
+ gsm->has_devices = false;
+ }
for (i = NUM_DLCI - 1; i >= 0; i--)
if (gsm->dlci[i])
gsm_dlci_release(gsm->dlci[i]);
mutex_unlock(&gsm->mutex);
/* Now wipe the queues */
tty_ldisc_flush(gsm->tty);
- list_for_each_entry_safe(txq, ntxq, &gsm->tx_list, list)
+ list_for_each_entry_safe(txq, ntxq, &gsm->tx_ctrl_list, list)
+ kfree(txq);
+ INIT_LIST_HEAD(&gsm->tx_ctrl_list);
+ list_for_each_entry_safe(txq, ntxq, &gsm->tx_data_list, list)
kfree(txq);
- INIT_LIST_HEAD(&gsm->tx_list);
+ INIT_LIST_HEAD(&gsm->tx_data_list);
}

/**
@@ -2206,8 +2499,15 @@ static void gsm_cleanup_mux(struct gsm_mux *gsm, bool disc)
static int gsm_activate_mux(struct gsm_mux *gsm)
{
struct gsm_dlci *dlci;
+ int ret;
+
+ dlci = gsm_dlci_alloc(gsm, 0);
+ if (dlci == NULL)
+ return -ENOMEM;

+ timer_setup(&gsm->kick_timer, gsm_kick_timer, 0);
timer_setup(&gsm->t2_timer, gsm_control_retransmit, 0);
+ INIT_WORK(&gsm->tx_work, gsmld_write_task);
init_waitqueue_head(&gsm->event);
spin_lock_init(&gsm->control_lock);
spin_lock_init(&gsm->tx_lock);
@@ -2217,9 +2517,11 @@ static int gsm_activate_mux(struct gsm_mux *gsm)
else
gsm->receive = gsm1_receive;

- dlci = gsm_dlci_alloc(gsm, 0);
- if (dlci == NULL)
- return -ENOMEM;
+ ret = gsm_register_devices(gsm_tty_driver, gsm->num);
+ if (ret)
+ return ret;
+
+ gsm->has_devices = true;
gsm->dead = false; /* Tty opens are now permissible */
return 0;
}
@@ -2312,7 +2614,8 @@ static struct gsm_mux *gsm_alloc_mux(void)
spin_lock_init(&gsm->lock);
mutex_init(&gsm->mutex);
kref_init(&gsm->ref);
- INIT_LIST_HEAD(&gsm->tx_list);
+ INIT_LIST_HEAD(&gsm->tx_ctrl_list);
+ INIT_LIST_HEAD(&gsm->tx_data_list);

gsm->t1 = T1;
gsm->t2 = T2;
@@ -2469,6 +2772,47 @@ static int gsmld_output(struct gsm_mux *gsm, u8 *data, int len)
return gsm->tty->ops->write(gsm->tty, data, len);
}

+
+/**
+ * gsmld_write_trigger - schedule ldisc write task
+ * @gsm: our mux
+ */
+static void gsmld_write_trigger(struct gsm_mux *gsm)
+{
+ if (!gsm || !gsm->dlci[0] || gsm->dlci[0]->dead)
+ return;
+ schedule_work(&gsm->tx_work);
+}
+
+
+/**
+ * gsmld_write_task - ldisc write task
+ * @work: our tx write work
+ *
+ * Writes out data to the ldisc if possible. We are doing this here to
+ * avoid dead-locking. This returns if no space or data is left for output.
+ */
+static void gsmld_write_task(struct work_struct *work)
+{
+ struct gsm_mux *gsm = container_of(work, struct gsm_mux, tx_work);
+ unsigned long flags;
+ int i, ret;
+
+ /* All outstanding control channel and control messages and one data
+ * frame is sent.
+ */
+ ret = -ENODEV;
+ spin_lock_irqsave(&gsm->tx_lock, flags);
+ if (gsm->tty)
+ ret = gsm_data_kick(gsm);
+ spin_unlock_irqrestore(&gsm->tx_lock, flags);
+
+ if (ret >= 0)
+ for (i = 0; i < NUM_DLCI; i++)
+ if (gsm->dlci[i])
+ tty_port_tty_wakeup(&gsm->dlci[i]->port);
+}
+
/**
* gsmld_attach_gsm - mode set up
* @tty: our tty structure
@@ -2479,39 +2823,14 @@ static int gsmld_output(struct gsm_mux *gsm, u8 *data, int len)
* will need moving to an ioctl path.
*/

-static int gsmld_attach_gsm(struct tty_struct *tty, struct gsm_mux *gsm)
+static void gsmld_attach_gsm(struct tty_struct *tty, struct gsm_mux *gsm)
{
- unsigned int base;
- int ret, i;
-
gsm->tty = tty_kref_get(tty);
/* Turn off tty XON/XOFF handling to handle it explicitly. */
gsm->old_c_iflag = tty->termios.c_iflag;
tty->termios.c_iflag &= (IXON | IXOFF);
- ret = gsm_activate_mux(gsm);
- if (ret != 0)
- tty_kref_put(gsm->tty);
- else {
- /* Don't register device 0 - this is the control channel and not
- a usable tty interface */
- base = mux_num_to_base(gsm); /* Base for this MUX */
- for (i = 1; i < NUM_DLCI; i++) {
- struct device *dev;
-
- dev = tty_register_device(gsm_tty_driver,
- base + i, NULL);
- if (IS_ERR(dev)) {
- for (i--; i >= 1; i--)
- tty_unregister_device(gsm_tty_driver,
- base + i);
- return PTR_ERR(dev);
- }
- }
- }
- return ret;
}

-
/**
* gsmld_detach_gsm - stop doing 0710 mux
* @tty: tty attached to the mux
@@ -2522,12 +2841,7 @@ static int gsmld_attach_gsm(struct tty_struct *tty, struct gsm_mux *gsm)

static void gsmld_detach_gsm(struct tty_struct *tty, struct gsm_mux *gsm)
{
- unsigned int base = mux_num_to_base(gsm); /* Base for this MUX */
- int i;
-
WARN_ON(tty != gsm->tty);
- for (i = 1; i < NUM_DLCI; i++)
- tty_unregister_device(gsm_tty_driver, base + i);
/* Restore tty XON/XOFF handling. */
gsm->tty->termios.c_iflag = gsm->old_c_iflag;
tty_kref_put(gsm->tty);
@@ -2619,7 +2933,6 @@ static void gsmld_close(struct tty_struct *tty)
static int gsmld_open(struct tty_struct *tty)
{
struct gsm_mux *gsm;
- int ret;

if (tty->ops->write == NULL)
return -EINVAL;
@@ -2635,12 +2948,13 @@ static int gsmld_open(struct tty_struct *tty)
/* Attach the initial passive connection */
gsm->encoding = 1;

- ret = gsmld_attach_gsm(tty, gsm);
- if (ret != 0) {
- gsm_cleanup_mux(gsm, false);
- mux_put(gsm);
- }
- return ret;
+ gsmld_attach_gsm(tty, gsm);
+
+ timer_setup(&gsm->kick_timer, gsm_kick_timer, 0);
+ timer_setup(&gsm->t2_timer, gsm_control_retransmit, 0);
+ INIT_WORK(&gsm->tx_work, gsmld_write_task);
+
+ return 0;
}

/**
@@ -2655,16 +2969,9 @@ static int gsmld_open(struct tty_struct *tty)
static void gsmld_write_wakeup(struct tty_struct *tty)
{
struct gsm_mux *gsm = tty->disc_data;
- unsigned long flags;

/* Queue poll */
- clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
- spin_lock_irqsave(&gsm->tx_lock, flags);
- gsm_data_kick(gsm, NULL);
- if (gsm->tx_bytes < TX_THRESH_LO) {
- gsm_dlci_data_sweep(gsm);
- }
- spin_unlock_irqrestore(&gsm->tx_lock, flags);
+ gsmld_write_trigger(gsm);
}

/**
@@ -2708,11 +3015,24 @@ static ssize_t gsmld_read(struct tty_struct *tty, struct file *file,
static ssize_t gsmld_write(struct tty_struct *tty, struct file *file,
const unsigned char *buf, size_t nr)
{
- int space = tty_write_room(tty);
+ struct gsm_mux *gsm = tty->disc_data;
+ unsigned long flags;
+ int space;
+ int ret;
+
+ if (!gsm)
+ return -ENODEV;
+
+ ret = -ENOBUFS;
+ spin_lock_irqsave(&gsm->tx_lock, flags);
+ space = tty_write_room(tty);
if (space >= nr)
- return tty->ops->write(tty, buf, nr);
- set_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
- return -ENOBUFS;
+ ret = tty->ops->write(tty, buf, nr);
+ else
+ set_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
+ spin_unlock_irqrestore(&gsm->tx_lock, flags);
+
+ return ret;
}

/**
@@ -2737,12 +3057,15 @@ static __poll_t gsmld_poll(struct tty_struct *tty, struct file *file,

poll_wait(file, &tty->read_wait, wait);
poll_wait(file, &tty->write_wait, wait);
+
+ if (gsm->dead)
+ mask |= EPOLLHUP;
if (tty_hung_up_p(file))
mask |= EPOLLHUP;
+ if (test_bit(TTY_OTHER_CLOSED, &tty->flags))
+ mask |= EPOLLHUP;
if (!tty_is_writelocked(tty) && tty_write_room(tty) > 0)
mask |= EPOLLOUT | EPOLLWRNORM;
- if (gsm->dead)
- mask |= EPOLLHUP;
return mask;
}

@@ -3178,6 +3501,8 @@ static int gsmtty_open(struct tty_struct *tty, struct file *filp)
/* Start sending off SABM messages */
if (gsm->initiator)
gsm_dlci_begin_open(dlci);
+ else
+ gsm_dlci_set_opening(dlci);
/* And wait for virtual carrier */
return tty_port_block_til_ready(port, tty, filp);
}
diff --git a/drivers/tty/serial/8250/8250.h b/drivers/tty/serial/8250/8250.h
index db784ace25d8..467372534d1c 100644
--- a/drivers/tty/serial/8250/8250.h
+++ b/drivers/tty/serial/8250/8250.h
@@ -120,6 +120,28 @@ static inline void serial_out(struct uart_8250_port *up, int offset, int value)
up->port.serial_out(&up->port, offset, value);
}

+/*
+ * For the 16C950
+ */
+static void serial_icr_write(struct uart_8250_port *up, int offset, int value)
+{
+ serial_out(up, UART_SCR, offset);
+ serial_out(up, UART_ICR, value);
+}
+
+static unsigned int __maybe_unused serial_icr_read(struct uart_8250_port *up,
+ int offset)
+{
+ unsigned int value;
+
+ serial_icr_write(up, UART_ACR, up->acr | UART_ACR_ICRRD);
+ serial_out(up, UART_SCR, offset);
+ value = serial_in(up, UART_ICR);
+ serial_icr_write(up, UART_ACR, up->acr);
+
+ return value;
+}
+
void serial8250_clear_and_reinit_fifos(struct uart_8250_port *p);

static inline int serial_dl_read(struct uart_8250_port *up)
diff --git a/drivers/tty/serial/8250/8250_bcm2835aux.c b/drivers/tty/serial/8250/8250_bcm2835aux.c
index 2a1226a78a0c..21939bb44613 100644
--- a/drivers/tty/serial/8250/8250_bcm2835aux.c
+++ b/drivers/tty/serial/8250/8250_bcm2835aux.c
@@ -166,8 +166,10 @@ static int bcm2835aux_serial_probe(struct platform_device *pdev)
uartclk = clk_get_rate(data->clk);
if (!uartclk) {
ret = device_property_read_u32(&pdev->dev, "clock-frequency", &uartclk);
- if (ret)
- return dev_err_probe(&pdev->dev, ret, "could not get clk rate\n");
+ if (ret) {
+ dev_err_probe(&pdev->dev, ret, "could not get clk rate\n");
+ goto dis_clk;
+ }
}

/* the HW-clock divider for bcm2835aux is 8,
diff --git a/drivers/tty/serial/8250/8250_bcm7271.c b/drivers/tty/serial/8250/8250_bcm7271.c
index 9b878d023dac..8efdc271eb75 100644
--- a/drivers/tty/serial/8250/8250_bcm7271.c
+++ b/drivers/tty/serial/8250/8250_bcm7271.c
@@ -1139,16 +1139,19 @@ static int __maybe_unused brcmuart_suspend(struct device *dev)
struct brcmuart_priv *priv = dev_get_drvdata(dev);
struct uart_8250_port *up = serial8250_get_port(priv->line);
struct uart_port *port = &up->port;
-
- serial8250_suspend_port(priv->line);
- clk_disable_unprepare(priv->baud_mux_clk);
+ unsigned long flags;

/*
* This will prevent resume from enabling RTS before the
- * baud rate has been resored.
+ * baud rate has been restored.
*/
+ spin_lock_irqsave(&port->lock, flags);
priv->saved_mctrl = port->mctrl;
- port->mctrl = 0;
+ port->mctrl &= ~TIOCM_RTS;
+ spin_unlock_irqrestore(&port->lock, flags);
+
+ serial8250_suspend_port(priv->line);
+ clk_disable_unprepare(priv->baud_mux_clk);

return 0;
}
@@ -1158,6 +1161,7 @@ static int __maybe_unused brcmuart_resume(struct device *dev)
struct brcmuart_priv *priv = dev_get_drvdata(dev);
struct uart_8250_port *up = serial8250_get_port(priv->line);
struct uart_port *port = &up->port;
+ unsigned long flags;
int ret;

ret = clk_prepare_enable(priv->baud_mux_clk);
@@ -1180,7 +1184,15 @@ static int __maybe_unused brcmuart_resume(struct device *dev)
start_rx_dma(serial8250_get_port(priv->line));
}
serial8250_resume_port(priv->line);
- port->mctrl = priv->saved_mctrl;
+
+ if (priv->saved_mctrl & TIOCM_RTS) {
+ /* Restore RTS */
+ spin_lock_irqsave(&port->lock, flags);
+ port->mctrl |= TIOCM_RTS;
+ port->ops->set_mctrl(port, port->mctrl);
+ spin_unlock_irqrestore(&port->lock, flags);
+ }
+
return 0;
}

diff --git a/drivers/tty/serial/8250/8250_fsl.c b/drivers/tty/serial/8250/8250_fsl.c
index 9c01c531349d..71ce43685797 100644
--- a/drivers/tty/serial/8250/8250_fsl.c
+++ b/drivers/tty/serial/8250/8250_fsl.c
@@ -77,7 +77,7 @@ int fsl8250_handle_irq(struct uart_port *port)
if ((lsr & UART_LSR_THRE) && (up->ier & UART_IER_THRI))
serial8250_tx_chars(up);

- up->lsr_saved_flags = orig_lsr;
+ up->lsr_saved_flags |= orig_lsr & UART_LSR_BI;

uart_unlock_and_check_sysrq_irqrestore(&up->port, flags);

diff --git a/drivers/tty/serial/8250/8250_pci.c b/drivers/tty/serial/8250/8250_pci.c
index a293e9f107d0..aeac20f7cbb2 100644
--- a/drivers/tty/serial/8250/8250_pci.c
+++ b/drivers/tty/serial/8250/8250_pci.c
@@ -11,6 +11,7 @@
#include <linux/pci.h>
#include <linux/string.h>
#include <linux/kernel.h>
+#include <linux/math.h>
#include <linux/slab.h>
#include <linux/delay.h>
#include <linux/tty.h>
@@ -994,41 +995,29 @@ static void pci_ite887x_exit(struct pci_dev *dev)
}

/*
- * EndRun Technologies.
- * Determine the number of ports available on the device.
+ * Oxford Semiconductor Inc.
+ * Check if an OxSemi device is part of the Tornado range of devices.
*/
#define PCI_VENDOR_ID_ENDRUN 0x7401
#define PCI_DEVICE_ID_ENDRUN_1588 0xe100

-static int pci_endrun_init(struct pci_dev *dev)
+static bool pci_oxsemi_tornado_p(struct pci_dev *dev)
{
- u8 __iomem *p;
- unsigned long deviceID;
- unsigned int number_uarts = 0;
+ /* OxSemi Tornado devices are all 0xCxxx */
+ if (dev->vendor == PCI_VENDOR_ID_OXSEMI &&
+ (dev->device & 0xf000) != 0xc000)
+ return false;

- /* EndRun device is all 0xexxx */
+ /* EndRun devices are all 0xExxx */
if (dev->vendor == PCI_VENDOR_ID_ENDRUN &&
- (dev->device & 0xf000) != 0xe000)
- return 0;
-
- p = pci_iomap(dev, 0, 5);
- if (p == NULL)
- return -ENOMEM;
+ (dev->device & 0xf000) != 0xe000)
+ return false;

- deviceID = ioread32(p);
- /* EndRun device */
- if (deviceID == 0x07000200) {
- number_uarts = ioread8(p + 4);
- pci_dbg(dev, "%d ports detected on EndRun PCI Express device\n", number_uarts);
- }
- pci_iounmap(dev, p);
- return number_uarts;
+ return true;
}

/*
- * Oxford Semiconductor Inc.
- * Check that device is part of the Tornado range of devices, then determine
- * the number of ports available on the device.
+ * Determine the number of ports available on a Tornado device.
*/
static int pci_oxsemi_tornado_init(struct pci_dev *dev)
{
@@ -1036,9 +1025,7 @@ static int pci_oxsemi_tornado_init(struct pci_dev *dev)
unsigned long deviceID;
unsigned int number_uarts = 0;

- /* OxSemi Tornado devices are all 0xCxxx */
- if (dev->vendor == PCI_VENDOR_ID_OXSEMI &&
- (dev->device & 0xF000) != 0xC000)
+ if (!pci_oxsemi_tornado_p(dev))
return 0;

p = pci_iomap(dev, 0, 5);
@@ -1049,12 +1036,217 @@ static int pci_oxsemi_tornado_init(struct pci_dev *dev)
/* Tornado device */
if (deviceID == 0x07000200) {
number_uarts = ioread8(p + 4);
- pci_dbg(dev, "%d ports detected on Oxford PCI Express device\n", number_uarts);
+ pci_dbg(dev, "%d ports detected on %s PCI Express device\n",
+ number_uarts,
+ dev->vendor == PCI_VENDOR_ID_ENDRUN ?
+ "EndRun" : "Oxford");
}
pci_iounmap(dev, p);
return number_uarts;
}

+/* Tornado-specific constants for the TCR and CPR registers; see below. */
+#define OXSEMI_TORNADO_TCR_MASK 0xf
+#define OXSEMI_TORNADO_CPR_MASK 0x1ff
+#define OXSEMI_TORNADO_CPR_MIN 0x008
+#define OXSEMI_TORNADO_CPR_DEF 0x10f
+
+/*
+ * Determine the oversampling rate, the clock prescaler, and the clock
+ * divisor for the requested baud rate. The clock rate is 62.5 MHz,
+ * which is four times the baud base, and the prescaler increments in
+ * steps of 1/8. Therefore to make calculations on integers we need
+ * to use a scaled clock rate, which is the baud base multiplied by 32
+ * (or our assumed UART clock rate multiplied by 2).
+ *
+ * The allowed oversampling rates are from 4 up to 16 inclusive (values
+ * from 0 to 3 inclusive map to 16). Likewise the clock prescaler allows
+ * values between 1.000 and 63.875 inclusive (operation for values from
+ * 0.000 to 0.875 has not been specified). The clock divisor is the usual
+ * unsigned 16-bit integer.
+ *
+ * For the most accurate baud rate we use a table of predetermined
+ * oversampling rates and clock prescalers that records all possible
+ * products of the two parameters in the range from 4 up to 255 inclusive,
+ * and additionally 335 for the 1500000bps rate, with the prescaler scaled
+ * by 8. The table is sorted by the decreasing value of the oversampling
+ * rate and ties are resolved by sorting by the decreasing value of the
+ * product. This way preference is given to higher oversampling rates.
+ *
+ * We iterate over the table and choose the product of an oversampling
+ * rate and a clock prescaler that gives the lowest integer division
+ * result deviation, or if an exact integer divider is found we stop
+ * looking for it right away. We do some fixup if the resulting clock
+ * divisor required would be out of its unsigned 16-bit integer range.
+ *
+ * Finally we abuse the supposed fractional part returned to encode the
+ * 4-bit value of the oversampling rate and the 9-bit value of the clock
+ * prescaler which will end up in the TCR and CPR/CPR2 registers.
+ */
+static unsigned int pci_oxsemi_tornado_get_divisor(struct uart_port *port,
+ unsigned int baud,
+ unsigned int *frac)
+{
+ static u8 p[][2] = {
+ { 16, 14, }, { 16, 13, }, { 16, 12, }, { 16, 11, },
+ { 16, 10, }, { 16, 9, }, { 16, 8, }, { 15, 17, },
+ { 15, 16, }, { 15, 15, }, { 15, 14, }, { 15, 13, },
+ { 15, 12, }, { 15, 11, }, { 15, 10, }, { 15, 9, },
+ { 15, 8, }, { 14, 18, }, { 14, 17, }, { 14, 14, },
+ { 14, 13, }, { 14, 12, }, { 14, 11, }, { 14, 10, },
+ { 14, 9, }, { 14, 8, }, { 13, 19, }, { 13, 18, },
+ { 13, 17, }, { 13, 13, }, { 13, 12, }, { 13, 11, },
+ { 13, 10, }, { 13, 9, }, { 13, 8, }, { 12, 19, },
+ { 12, 18, }, { 12, 17, }, { 12, 11, }, { 12, 9, },
+ { 12, 8, }, { 11, 23, }, { 11, 22, }, { 11, 21, },
+ { 11, 20, }, { 11, 19, }, { 11, 18, }, { 11, 17, },
+ { 11, 11, }, { 11, 10, }, { 11, 9, }, { 11, 8, },
+ { 10, 25, }, { 10, 23, }, { 10, 20, }, { 10, 19, },
+ { 10, 17, }, { 10, 10, }, { 10, 9, }, { 10, 8, },
+ { 9, 27, }, { 9, 23, }, { 9, 21, }, { 9, 19, },
+ { 9, 18, }, { 9, 17, }, { 9, 9, }, { 9, 8, },
+ { 8, 31, }, { 8, 29, }, { 8, 23, }, { 8, 19, },
+ { 8, 17, }, { 8, 8, }, { 7, 35, }, { 7, 31, },
+ { 7, 29, }, { 7, 25, }, { 7, 23, }, { 7, 21, },
+ { 7, 19, }, { 7, 17, }, { 7, 15, }, { 7, 14, },
+ { 7, 13, }, { 7, 12, }, { 7, 11, }, { 7, 10, },
+ { 7, 9, }, { 7, 8, }, { 6, 41, }, { 6, 37, },
+ { 6, 31, }, { 6, 29, }, { 6, 23, }, { 6, 19, },
+ { 6, 17, }, { 6, 13, }, { 6, 11, }, { 6, 10, },
+ { 6, 9, }, { 6, 8, }, { 5, 67, }, { 5, 47, },
+ { 5, 43, }, { 5, 41, }, { 5, 37, }, { 5, 31, },
+ { 5, 29, }, { 5, 25, }, { 5, 23, }, { 5, 19, },
+ { 5, 17, }, { 5, 15, }, { 5, 13, }, { 5, 11, },
+ { 5, 10, }, { 5, 9, }, { 5, 8, }, { 4, 61, },
+ { 4, 59, }, { 4, 53, }, { 4, 47, }, { 4, 43, },
+ { 4, 41, }, { 4, 37, }, { 4, 31, }, { 4, 29, },
+ { 4, 23, }, { 4, 19, }, { 4, 17, }, { 4, 13, },
+ { 4, 9, }, { 4, 8, },
+ };
+ /* Scale the quotient for comparison to get the fractional part. */
+ const unsigned int quot_scale = 65536;
+ unsigned int sclk = port->uartclk * 2;
+ unsigned int sdiv = DIV_ROUND_CLOSEST(sclk, baud);
+ unsigned int best_squot;
+ unsigned int squot;
+ unsigned int quot;
+ u16 cpr;
+ u8 tcr;
+ int i;
+
+ /* Old custom speed handling. */
+ if (baud == 38400 && (port->flags & UPF_SPD_MASK) == UPF_SPD_CUST) {
+ unsigned int cust_div = port->custom_divisor;
+
+ quot = cust_div & UART_DIV_MAX;
+ tcr = (cust_div >> 16) & OXSEMI_TORNADO_TCR_MASK;
+ cpr = (cust_div >> 20) & OXSEMI_TORNADO_CPR_MASK;
+ if (cpr < OXSEMI_TORNADO_CPR_MIN)
+ cpr = OXSEMI_TORNADO_CPR_DEF;
+ } else {
+ best_squot = quot_scale;
+ for (i = 0; i < ARRAY_SIZE(p); i++) {
+ unsigned int spre;
+ unsigned int srem;
+ u8 cp;
+ u8 tc;
+
+ tc = p[i][0];
+ cp = p[i][1];
+ spre = tc * cp;
+
+ srem = sdiv % spre;
+ if (srem > spre / 2)
+ srem = spre - srem;
+ squot = DIV_ROUND_CLOSEST(srem * quot_scale, spre);
+
+ if (srem == 0) {
+ tcr = tc;
+ cpr = cp;
+ quot = sdiv / spre;
+ break;
+ } else if (squot < best_squot) {
+ best_squot = squot;
+ tcr = tc;
+ cpr = cp;
+ quot = DIV_ROUND_CLOSEST(sdiv, spre);
+ }
+ }
+ while (tcr <= (OXSEMI_TORNADO_TCR_MASK + 1) >> 1 &&
+ quot % 2 == 0) {
+ quot >>= 1;
+ tcr <<= 1;
+ }
+ while (quot > UART_DIV_MAX) {
+ if (tcr <= (OXSEMI_TORNADO_TCR_MASK + 1) >> 1) {
+ quot >>= 1;
+ tcr <<= 1;
+ } else if (cpr <= OXSEMI_TORNADO_CPR_MASK >> 1) {
+ quot >>= 1;
+ cpr <<= 1;
+ } else {
+ quot = quot * cpr / OXSEMI_TORNADO_CPR_MASK;
+ cpr = OXSEMI_TORNADO_CPR_MASK;
+ }
+ }
+ }
+
+ *frac = (cpr << 8) | (tcr & OXSEMI_TORNADO_TCR_MASK);
+ return quot;
+}
+
+/*
+ * Set the oversampling rate in the transmitter clock cycle register (TCR),
+ * the clock prescaler in the clock prescaler register (CPR and CPR2), and
+ * the clock divisor in the divisor latch (DLL and DLM). Note that for
+ * backwards compatibility any write to CPR clears CPR2 and therefore CPR
+ * has to be written first, followed by CPR2, which occupies the location
+ * of CKS used with earlier UART designs.
+ */
+static void pci_oxsemi_tornado_set_divisor(struct uart_port *port,
+ unsigned int baud,
+ unsigned int quot,
+ unsigned int quot_frac)
+{
+ struct uart_8250_port *up = up_to_u8250p(port);
+ u8 cpr2 = quot_frac >> 16;
+ u8 cpr = quot_frac >> 8;
+ u8 tcr = quot_frac;
+
+ serial_icr_write(up, UART_TCR, tcr);
+ serial_icr_write(up, UART_CPR, cpr);
+ serial_icr_write(up, UART_CKS, cpr2);
+ serial8250_do_set_divisor(port, baud, quot, 0);
+}
+
+/*
+ * For Tornado devices we force MCR[7] set for the Divide-by-M N/8 baud rate
+ * generator prescaler (CPR and CPR2). Otherwise no prescaler would be used.
+ */
+static void pci_oxsemi_tornado_set_mctrl(struct uart_port *port,
+ unsigned int mctrl)
+{
+ struct uart_8250_port *up = up_to_u8250p(port);
+
+ up->mcr |= UART_MCR_CLKSEL;
+ serial8250_do_set_mctrl(port, mctrl);
+}
+
+static int pci_oxsemi_tornado_setup(struct serial_private *priv,
+ const struct pciserial_board *board,
+ struct uart_8250_port *up, int idx)
+{
+ struct pci_dev *dev = priv->dev;
+
+ if (pci_oxsemi_tornado_p(dev)) {
+ up->port.get_divisor = pci_oxsemi_tornado_get_divisor;
+ up->port.set_divisor = pci_oxsemi_tornado_set_divisor;
+ up->port.set_mctrl = pci_oxsemi_tornado_set_mctrl;
+ }
+
+ return pci_default_setup(priv, board, up, idx);
+}
+
static int pci_asix_setup(struct serial_private *priv,
const struct pciserial_board *board,
struct uart_8250_port *port, int idx)
@@ -2244,7 +2436,7 @@ static struct pci_serial_quirk pci_serial_quirks[] = {
.device = PCI_ANY_ID,
.subvendor = PCI_ANY_ID,
.subdevice = PCI_ANY_ID,
- .init = pci_endrun_init,
+ .init = pci_oxsemi_tornado_init,
.setup = pci_default_setup,
},
/*
@@ -2256,7 +2448,7 @@ static struct pci_serial_quirk pci_serial_quirks[] = {
.subvendor = PCI_ANY_ID,
.subdevice = PCI_ANY_ID,
.init = pci_oxsemi_tornado_init,
- .setup = pci_default_setup,
+ .setup = pci_oxsemi_tornado_setup,
},
{
.vendor = PCI_VENDOR_ID_MAINPINE,
@@ -2264,7 +2456,7 @@ static struct pci_serial_quirk pci_serial_quirks[] = {
.subvendor = PCI_ANY_ID,
.subdevice = PCI_ANY_ID,
.init = pci_oxsemi_tornado_init,
- .setup = pci_default_setup,
+ .setup = pci_oxsemi_tornado_setup,
},
{
.vendor = PCI_VENDOR_ID_DIGI,
@@ -2272,7 +2464,7 @@ static struct pci_serial_quirk pci_serial_quirks[] = {
.subvendor = PCI_SUBVENDOR_ID_IBM,
.subdevice = PCI_ANY_ID,
.init = pci_oxsemi_tornado_init,
- .setup = pci_default_setup,
+ .setup = pci_oxsemi_tornado_setup,
},
{
.vendor = PCI_VENDOR_ID_INTEL,
@@ -2589,7 +2781,7 @@ enum pci_board_num_t {
pbn_b0_2_1843200,
pbn_b0_4_1843200,

- pbn_b0_1_3906250,
+ pbn_b0_1_15625000,

pbn_b0_bt_1_115200,
pbn_b0_bt_2_115200,
@@ -2667,12 +2859,11 @@ enum pci_board_num_t {
pbn_panacom2,
pbn_panacom4,
pbn_plx_romulus,
- pbn_endrun_2_3906250,
pbn_oxsemi,
- pbn_oxsemi_1_3906250,
- pbn_oxsemi_2_3906250,
- pbn_oxsemi_4_3906250,
- pbn_oxsemi_8_3906250,
+ pbn_oxsemi_1_15625000,
+ pbn_oxsemi_2_15625000,
+ pbn_oxsemi_4_15625000,
+ pbn_oxsemi_8_15625000,
pbn_intel_i960,
pbn_sgi_ioc3,
pbn_computone_4,
@@ -2815,10 +3006,10 @@ static struct pciserial_board pci_boards[] = {
.uart_offset = 8,
},

- [pbn_b0_1_3906250] = {
+ [pbn_b0_1_15625000] = {
.flags = FL_BASE0,
.num_ports = 1,
- .base_baud = 3906250,
+ .base_baud = 15625000,
.uart_offset = 8,
},

@@ -3189,20 +3380,6 @@ static struct pciserial_board pci_boards[] = {
.first_offset = 0x03,
},

- /*
- * EndRun Technologies
- * Uses the size of PCI Base region 0 to
- * signal now many ports are available
- * 2 port 952 Uart support
- */
- [pbn_endrun_2_3906250] = {
- .flags = FL_BASE0,
- .num_ports = 2,
- .base_baud = 3906250,
- .uart_offset = 0x200,
- .first_offset = 0x1000,
- },
-
/*
* This board uses the size of PCI Base region 0 to
* signal now many ports are available
@@ -3213,31 +3390,31 @@ static struct pciserial_board pci_boards[] = {
.base_baud = 115200,
.uart_offset = 8,
},
- [pbn_oxsemi_1_3906250] = {
+ [pbn_oxsemi_1_15625000] = {
.flags = FL_BASE0,
.num_ports = 1,
- .base_baud = 3906250,
+ .base_baud = 15625000,
.uart_offset = 0x200,
.first_offset = 0x1000,
},
- [pbn_oxsemi_2_3906250] = {
+ [pbn_oxsemi_2_15625000] = {
.flags = FL_BASE0,
.num_ports = 2,
- .base_baud = 3906250,
+ .base_baud = 15625000,
.uart_offset = 0x200,
.first_offset = 0x1000,
},
- [pbn_oxsemi_4_3906250] = {
+ [pbn_oxsemi_4_15625000] = {
.flags = FL_BASE0,
.num_ports = 4,
- .base_baud = 3906250,
+ .base_baud = 15625000,
.uart_offset = 0x200,
.first_offset = 0x1000,
},
- [pbn_oxsemi_8_3906250] = {
+ [pbn_oxsemi_8_15625000] = {
.flags = FL_BASE0,
.num_ports = 8,
- .base_baud = 3906250,
+ .base_baud = 15625000,
.uart_offset = 0x200,
.first_offset = 0x1000,
},
@@ -4109,13 +4286,6 @@ static const struct pci_device_id serial_pci_tbl[] = {
{ PCI_VENDOR_ID_PLX, PCI_DEVICE_ID_PLX_ROMULUS,
0x10b5, 0x106a, 0, 0,
pbn_plx_romulus },
- /*
- * EndRun Technologies. PCI express device range.
- * EndRun PTP/1588 has 2 Native UARTs.
- */
- { PCI_VENDOR_ID_ENDRUN, PCI_DEVICE_ID_ENDRUN_1588,
- PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_endrun_2_3906250 },
/*
* Quatech cards. These actually have configurable clocks but for
* now we just use the default.
@@ -4225,158 +4395,165 @@ static const struct pci_device_id serial_pci_tbl[] = {
*/
{ PCI_VENDOR_ID_OXSEMI, 0xc101, /* OXPCIe952 1 Legacy UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_b0_1_3906250 },
+ pbn_b0_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc105, /* OXPCIe952 1 Legacy UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_b0_1_3906250 },
+ pbn_b0_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc11b, /* OXPCIe952 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc11f, /* OXPCIe952 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc120, /* OXPCIe952 1 Legacy UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_b0_1_3906250 },
+ pbn_b0_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc124, /* OXPCIe952 1 Legacy UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_b0_1_3906250 },
+ pbn_b0_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc138, /* OXPCIe952 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc13d, /* OXPCIe952 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc140, /* OXPCIe952 1 Legacy UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_b0_1_3906250 },
+ pbn_b0_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc141, /* OXPCIe952 1 Legacy UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_b0_1_3906250 },
+ pbn_b0_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc144, /* OXPCIe952 1 Legacy UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_b0_1_3906250 },
+ pbn_b0_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc145, /* OXPCIe952 1 Legacy UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_b0_1_3906250 },
+ pbn_b0_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc158, /* OXPCIe952 2 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_2_3906250 },
+ pbn_oxsemi_2_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc15d, /* OXPCIe952 2 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_2_3906250 },
+ pbn_oxsemi_2_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc208, /* OXPCIe954 4 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_4_3906250 },
+ pbn_oxsemi_4_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc20d, /* OXPCIe954 4 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_4_3906250 },
+ pbn_oxsemi_4_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc308, /* OXPCIe958 8 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_8_3906250 },
+ pbn_oxsemi_8_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc30d, /* OXPCIe958 8 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_8_3906250 },
+ pbn_oxsemi_8_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc40b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc40f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc41b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc41f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc42b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc42f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc43b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc43f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc44b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc44f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc45b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc45f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc46b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc46f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc47b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc47f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc48b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc48f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc49b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc49f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc4ab, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc4af, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc4bb, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc4bf, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc4cb, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc4cf, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
/*
* Mainpine Inc. IQ Express "Rev3" utilizing OxSemi Tornado
*/
{ PCI_VENDOR_ID_MAINPINE, 0x4000, /* IQ Express 1 Port V.34 Super-G3 Fax */
PCI_VENDOR_ID_MAINPINE, 0x4001, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_MAINPINE, 0x4000, /* IQ Express 2 Port V.34 Super-G3 Fax */
PCI_VENDOR_ID_MAINPINE, 0x4002, 0, 0,
- pbn_oxsemi_2_3906250 },
+ pbn_oxsemi_2_15625000 },
{ PCI_VENDOR_ID_MAINPINE, 0x4000, /* IQ Express 4 Port V.34 Super-G3 Fax */
PCI_VENDOR_ID_MAINPINE, 0x4004, 0, 0,
- pbn_oxsemi_4_3906250 },
+ pbn_oxsemi_4_15625000 },
{ PCI_VENDOR_ID_MAINPINE, 0x4000, /* IQ Express 8 Port V.34 Super-G3 Fax */
PCI_VENDOR_ID_MAINPINE, 0x4008, 0, 0,
- pbn_oxsemi_8_3906250 },
+ pbn_oxsemi_8_15625000 },

/*
* Digi/IBM PCIe 2-port Async EIA-232 Adapter utilizing OxSemi Tornado
*/
{ PCI_VENDOR_ID_DIGI, PCIE_DEVICE_ID_NEO_2_OX_IBM,
PCI_SUBVENDOR_ID_IBM, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_2_3906250 },
+ pbn_oxsemi_2_15625000 },
+ /*
+ * EndRun Technologies. PCI express device range.
+ * EndRun PTP/1588 has 2 Native UARTs utilizing OxSemi 952.
+ */
+ { PCI_VENDOR_ID_ENDRUN, PCI_DEVICE_ID_ENDRUN_1588,
+ PCI_ANY_ID, PCI_ANY_ID, 0, 0,
+ pbn_oxsemi_2_15625000 },

/*
* SBS Technologies, Inc. P-Octal and PMC-OCTPRO cards,
@@ -4886,6 +5063,115 @@ static const struct pci_device_id serial_pci_tbl[] = {
PCI_ANY_ID, PCI_ANY_ID,
0, 0,
pbn_b2_4_115200 },
+ /*
+ * Brainboxes PX-101
+ */
+ { PCI_VENDOR_ID_INTASHIELD, 0x4005,
+ PCI_ANY_ID, PCI_ANY_ID,
+ 0, 0,
+ pbn_b0_2_115200 },
+ { PCI_VENDOR_ID_INTASHIELD, 0x4019,
+ PCI_ANY_ID, PCI_ANY_ID,
+ 0, 0,
+ pbn_oxsemi_2_15625000 },
+ /*
+ * Brainboxes PX-235/246
+ */
+ { PCI_VENDOR_ID_INTASHIELD, 0x4004,
+ PCI_ANY_ID, PCI_ANY_ID,
+ 0, 0,
+ pbn_b0_1_115200 },
+ { PCI_VENDOR_ID_INTASHIELD, 0x4016,
+ PCI_ANY_ID, PCI_ANY_ID,
+ 0, 0,
+ pbn_oxsemi_1_15625000 },
+ /*
+ * Brainboxes PX-203/PX-257
+ */
+ { PCI_VENDOR_ID_INTASHIELD, 0x4006,
+ PCI_ANY_ID, PCI_ANY_ID,
+ 0, 0,
+ pbn_b0_2_115200 },
+ { PCI_VENDOR_ID_INTASHIELD, 0x4015,
+ PCI_ANY_ID, PCI_ANY_ID,
+ 0, 0,
+ pbn_oxsemi_4_15625000 },
+ /*
+ * Brainboxes PX-260/PX-701
+ */
+ { PCI_VENDOR_ID_INTASHIELD, 0x400A,
+ PCI_ANY_ID, PCI_ANY_ID,
+ 0, 0,
+ pbn_oxsemi_4_15625000 },
+ /*
+ * Brainboxes PX-310
+ */
+ { PCI_VENDOR_ID_INTASHIELD, 0x400E,
+ PCI_ANY_ID, PCI_ANY_ID,
+ 0, 0,
+ pbn_oxsemi_2_15625000 },
+ /*
+ * Brainboxes PX-313
+ */
+ { PCI_VENDOR_ID_INTASHIELD, 0x400C,
+ PCI_ANY_ID, PCI_ANY_ID,
+ 0, 0,
+ pbn_oxsemi_2_15625000 },
+ /*
+ * Brainboxes PX-320/324/PX-376/PX-387
+ */
+ { PCI_VENDOR_ID_INTASHIELD, 0x400B,
+ PCI_ANY_ID, PCI_ANY_ID,
+ 0, 0,
+ pbn_oxsemi_1_15625000 },
+ /*
+ * Brainboxes PX-335/346
+ */
+ { PCI_VENDOR_ID_INTASHIELD, 0x400F,
+ PCI_ANY_ID, PCI_ANY_ID,
+ 0, 0,
+ pbn_oxsemi_4_15625000 },
+ /*
+ * Brainboxes PX-368
+ */
+ { PCI_VENDOR_ID_INTASHIELD, 0x4010,
+ PCI_ANY_ID, PCI_ANY_ID,
+ 0, 0,
+ pbn_oxsemi_4_15625000 },
+ /*
+ * Brainboxes PX-420
+ */
+ { PCI_VENDOR_ID_INTASHIELD, 0x4000,
+ PCI_ANY_ID, PCI_ANY_ID,
+ 0, 0,
+ pbn_b0_4_115200 },
+ { PCI_VENDOR_ID_INTASHIELD, 0x4011,
+ PCI_ANY_ID, PCI_ANY_ID,
+ 0, 0,
+ pbn_oxsemi_4_15625000 },
+ /*
+ * Brainboxes PX-803
+ */
+ { PCI_VENDOR_ID_INTASHIELD, 0x4009,
+ PCI_ANY_ID, PCI_ANY_ID,
+ 0, 0,
+ pbn_b0_1_115200 },
+ { PCI_VENDOR_ID_INTASHIELD, 0x401E,
+ PCI_ANY_ID, PCI_ANY_ID,
+ 0, 0,
+ pbn_oxsemi_1_15625000 },
+ /*
+ * Brainboxes PX-846
+ */
+ { PCI_VENDOR_ID_INTASHIELD, 0x4008,
+ PCI_ANY_ID, PCI_ANY_ID,
+ 0, 0,
+ pbn_b0_1_115200 },
+ { PCI_VENDOR_ID_INTASHIELD, 0x4017,
+ PCI_ANY_ID, PCI_ANY_ID,
+ 0, 0,
+ pbn_oxsemi_1_15625000 },
+
/*
* Perle PCI-RAS cards
*/
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index ddaf35daf316..291fd99bd7f1 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -537,27 +537,6 @@ serial_port_out_sync(struct uart_port *p, int offset, int value)
}
}

-/*
- * For the 16C950
- */
-static void serial_icr_write(struct uart_8250_port *up, int offset, int value)
-{
- serial_out(up, UART_SCR, offset);
- serial_out(up, UART_ICR, value);
-}
-
-static unsigned int serial_icr_read(struct uart_8250_port *up, int offset)
-{
- unsigned int value;
-
- serial_icr_write(up, UART_ACR, up->acr | UART_ACR_ICRRD);
- serial_out(up, UART_SCR, offset);
- value = serial_in(up, UART_ICR);
- serial_icr_write(up, UART_ACR, up->acr);
-
- return value;
-}
-
/*
* FIFO support.
*/
diff --git a/drivers/tty/serial/fsl_lpuart.c b/drivers/tty/serial/fsl_lpuart.c
index 2cb89491dd09..65b76adf107c 100644
--- a/drivers/tty/serial/fsl_lpuart.c
+++ b/drivers/tty/serial/fsl_lpuart.c
@@ -990,12 +990,12 @@ static void lpuart32_rxint(struct lpuart_port *sport)

if (sr & (UARTSTAT_PE | UARTSTAT_OR | UARTSTAT_FE)) {
if (sr & UARTSTAT_PE) {
+ sport->port.icount.parity++;
+ } else if (sr & UARTSTAT_FE) {
if (is_break)
sport->port.icount.brk++;
else
- sport->port.icount.parity++;
- } else if (sr & UARTSTAT_FE) {
- sport->port.icount.frame++;
+ sport->port.icount.frame++;
}

if (sr & UARTSTAT_OR)
@@ -1010,12 +1010,12 @@ static void lpuart32_rxint(struct lpuart_port *sport)
sr &= sport->port.read_status_mask;

if (sr & UARTSTAT_PE) {
+ flg = TTY_PARITY;
+ } else if (sr & UARTSTAT_FE) {
if (is_break)
flg = TTY_BREAK;
else
- flg = TTY_PARITY;
- } else if (sr & UARTSTAT_FE) {
- flg = TTY_FRAME;
+ flg = TTY_FRAME;
}

if (sr & UARTSTAT_OR)
diff --git a/drivers/tty/serial/mvebu-uart.c b/drivers/tty/serial/mvebu-uart.c
index 93489fe334d0..65eaecd10b7c 100644
--- a/drivers/tty/serial/mvebu-uart.c
+++ b/drivers/tty/serial/mvebu-uart.c
@@ -265,6 +265,7 @@ static void mvebu_uart_rx_chars(struct uart_port *port, unsigned int status)
struct tty_port *tport = &port->state->port;
unsigned char ch = 0;
char flag = 0;
+ int ret;

do {
if (status & STAT_RX_RDY(port)) {
@@ -277,6 +278,16 @@ static void mvebu_uart_rx_chars(struct uart_port *port, unsigned int status)
port->icount.parity++;
}

+ /*
+ * For UART2, error bits are not cleared on buffer read.
+ * This causes interrupt loop and system hang.
+ */
+ if (IS_EXTENDED(port) && (status & STAT_BRK_ERR)) {
+ ret = readl(port->membase + UART_STAT);
+ ret |= STAT_BRK_ERR;
+ writel(ret, port->membase + UART_STAT);
+ }
+
if (status & STAT_BRK_DET) {
port->icount.brk++;
status &= ~(STAT_FRM_ERR | STAT_PAR_ERR);
diff --git a/drivers/tty/serial/pic32_uart.c b/drivers/tty/serial/pic32_uart.c
index b7a3a1b959b1..a967b586a0a0 100644
--- a/drivers/tty/serial/pic32_uart.c
+++ b/drivers/tty/serial/pic32_uart.c
@@ -427,7 +427,7 @@ static int pic32_uart_startup(struct uart_port *port)
if (!sport->irq_fault_name) {
dev_err(port->dev, "%s: kasprintf err!", __func__);
ret = -ENOMEM;
- goto out_done;
+ goto out_disable_clk;
}
irq_set_status_flags(sport->irq_fault, IRQ_NOAUTOEN);
ret = request_irq(sport->irq_fault, pic32_uart_fault_interrupt,
@@ -493,14 +493,16 @@ static int pic32_uart_startup(struct uart_port *port)
return 0;

out_t:
- kfree(sport->irq_tx_name);
free_irq(sport->irq_tx, port);
+ kfree(sport->irq_tx_name);
out_r:
- kfree(sport->irq_rx_name);
free_irq(sport->irq_rx, port);
+ kfree(sport->irq_rx_name);
out_f:
- kfree(sport->irq_fault_name);
free_irq(sport->irq_fault, port);
+ kfree(sport->irq_fault_name);
+out_disable_clk:
+ clk_disable_unprepare(sport->clk);
out_done:
return ret;
}
@@ -519,8 +521,11 @@ static void pic32_uart_shutdown(struct uart_port *port)

/* free all 3 interrupts for this UART */
free_irq(sport->irq_fault, port);
+ kfree(sport->irq_fault_name);
free_irq(sport->irq_tx, port);
+ kfree(sport->irq_tx_name);
free_irq(sport->irq_rx, port);
+ kfree(sport->irq_rx_name);
}

/* serial core request to change current uart setting */
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index dfc1f4b445f3..6eaf8eb84661 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -344,7 +344,7 @@ static struct uni_screen *vc_uniscr_alloc(unsigned int cols, unsigned int rows)
/* allocate everything in one go */
memsize = cols * rows * sizeof(char32_t);
memsize += rows * sizeof(char32_t *);
- p = vmalloc(memsize);
+ p = vzalloc(memsize);
if (!p)
return NULL;

diff --git a/drivers/usb/cdns3/cdns3-gadget.c b/drivers/usb/cdns3/cdns3-gadget.c
index d6d515d598dc..ae049eb28b93 100644
--- a/drivers/usb/cdns3/cdns3-gadget.c
+++ b/drivers/usb/cdns3/cdns3-gadget.c
@@ -2280,11 +2280,16 @@ static int cdns3_gadget_ep_enable(struct usb_ep *ep,
int ret = 0;
int val;

+ if (!ep) {
+ pr_debug("usbss: ep not configured?\n");
+ return -EINVAL;
+ }
+
priv_ep = ep_to_cdns3_ep(ep);
priv_dev = priv_ep->cdns3_dev;
comp_desc = priv_ep->endpoint.comp_desc;

- if (!ep || !desc || desc->bDescriptorType != USB_DT_ENDPOINT) {
+ if (!desc || desc->bDescriptorType != USB_DT_ENDPOINT) {
dev_dbg(priv_dev->dev, "usbss: invalid parameters\n");
return -EINVAL;
}
@@ -2596,7 +2601,7 @@ int cdns3_gadget_ep_dequeue(struct usb_ep *ep,
struct usb_request *request)
{
struct cdns3_endpoint *priv_ep = ep_to_cdns3_ep(ep);
- struct cdns3_device *priv_dev = priv_ep->cdns3_dev;
+ struct cdns3_device *priv_dev;
struct usb_request *req, *req_temp;
struct cdns3_request *priv_req;
struct cdns3_trb *link_trb;
@@ -2607,6 +2612,8 @@ int cdns3_gadget_ep_dequeue(struct usb_ep *ep,
if (!ep || !request || !ep->desc)
return -EINVAL;

+ priv_dev = priv_ep->cdns3_dev;
+
spin_lock_irqsave(&priv_dev->lock, flags);

priv_req = to_cdns3_request(request);
diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 06eea8848ccc..11c8ea0cccc8 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -1691,7 +1691,6 @@ static void usb_giveback_urb_bh(struct tasklet_struct *t)

spin_lock_irq(&bh->lock);
bh->running = true;
- restart:
list_replace_init(&bh->head, &local_list);
spin_unlock_irq(&bh->lock);

@@ -1705,10 +1704,17 @@ static void usb_giveback_urb_bh(struct tasklet_struct *t)
bh->completing_ep = NULL;
}

- /* check if there are new URBs to giveback */
+ /*
+ * giveback new URBs next time to prevent this function
+ * from not exiting for a long time.
+ */
spin_lock_irq(&bh->lock);
- if (!list_empty(&bh->head))
- goto restart;
+ if (!list_empty(&bh->head)) {
+ if (bh->high_prio)
+ tasklet_hi_schedule(&bh->bh);
+ else
+ tasklet_schedule(&bh->bh);
+ }
bh->running = false;
spin_unlock_irq(&bh->lock);
}
@@ -1737,7 +1743,7 @@ static void usb_giveback_urb_bh(struct tasklet_struct *t)
void usb_hcd_giveback_urb(struct usb_hcd *hcd, struct urb *urb, int status)
{
struct giveback_urb_bh *bh;
- bool running, high_prio_bh;
+ bool running;

/* pass status to tasklet via unlinked */
if (likely(!urb->unlinked))
@@ -1748,13 +1754,10 @@ void usb_hcd_giveback_urb(struct usb_hcd *hcd, struct urb *urb, int status)
return;
}

- if (usb_pipeisoc(urb->pipe) || usb_pipeint(urb->pipe)) {
+ if (usb_pipeisoc(urb->pipe) || usb_pipeint(urb->pipe))
bh = &hcd->high_prio_bh;
- high_prio_bh = true;
- } else {
+ else
bh = &hcd->low_prio_bh;
- high_prio_bh = false;
- }

spin_lock(&bh->lock);
list_add_tail(&urb->urb_list, &bh->head);
@@ -1763,7 +1766,7 @@ void usb_hcd_giveback_urb(struct usb_hcd *hcd, struct urb *urb, int status)

if (running)
;
- else if (high_prio_bh)
+ else if (bh->high_prio)
tasklet_hi_schedule(&bh->bh);
else
tasklet_schedule(&bh->bh);
@@ -2959,6 +2962,7 @@ int usb_add_hcd(struct usb_hcd *hcd,

/* initialize tasklets */
init_giveback_urb_bh(&hcd->high_prio_bh);
+ hcd->high_prio_bh.high_prio = true;
init_giveback_urb_bh(&hcd->low_prio_bh);

/* enable irqs just before we start the controller,
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index d28cd1a6709b..ecec9cdfa0a6 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -157,8 +157,13 @@ static void __dwc3_set_mode(struct work_struct *work)
break;
}

- /* For DRD host or device mode only */
- if (dwc->desired_dr_role != DWC3_GCTL_PRTCAP_OTG) {
+ /*
+ * When current_dr_role is not set, there's no role switching.
+ * Only perform GCTL.CoreSoftReset when there's DRD role switching.
+ */
+ if (dwc->current_dr_role && ((DWC3_IP_IS(DWC3) ||
+ DWC3_VER_IS_PRIOR(DWC31, 190A)) &&
+ dwc->desired_dr_role != DWC3_GCTL_PRTCAP_OTG)) {
reg = dwc3_readl(dwc->regs, DWC3_GCTL);
reg |= DWC3_GCTL_CORESOFTRESET;
dwc3_writel(dwc->regs, DWC3_GCTL, reg);
diff --git a/drivers/usb/dwc3/dwc3-qcom.c b/drivers/usb/dwc3/dwc3-qcom.c
index 6cba990da32e..3582fd6dfa14 100644
--- a/drivers/usb/dwc3/dwc3-qcom.c
+++ b/drivers/usb/dwc3/dwc3-qcom.c
@@ -443,9 +443,9 @@ static int dwc3_qcom_get_irq(struct platform_device *pdev,
int ret;

if (np)
- ret = platform_get_irq_byname(pdev_irq, name);
+ ret = platform_get_irq_byname_optional(pdev_irq, name);
else
- ret = platform_get_irq(pdev_irq, num);
+ ret = platform_get_irq_optional(pdev_irq, num);

return ret;
}
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index f46fc02669a8..d3be2b424244 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1181,17 +1181,49 @@ static u32 dwc3_calc_trbs_left(struct dwc3_ep *dep)
return trbs_left;
}

-static void __dwc3_prepare_one_trb(struct dwc3_ep *dep, struct dwc3_trb *trb,
- dma_addr_t dma, unsigned int length, unsigned int chain,
- unsigned int node, unsigned int stream_id,
- unsigned int short_not_ok, unsigned int no_interrupt,
- unsigned int is_last, bool must_interrupt)
+/**
+ * dwc3_prepare_one_trb - setup one TRB from one request
+ * @dep: endpoint for which this request is prepared
+ * @req: dwc3_request pointer
+ * @trb_length: buffer size of the TRB
+ * @chain: should this TRB be chained to the next?
+ * @node: only for isochronous endpoints. First TRB needs different type.
+ * @use_bounce_buffer: set to use bounce buffer
+ * @must_interrupt: set to interrupt on TRB completion
+ */
+static void dwc3_prepare_one_trb(struct dwc3_ep *dep,
+ struct dwc3_request *req, unsigned int trb_length,
+ unsigned int chain, unsigned int node, bool use_bounce_buffer,
+ bool must_interrupt)
{
+ struct dwc3_trb *trb;
+ dma_addr_t dma;
+ unsigned int stream_id = req->request.stream_id;
+ unsigned int short_not_ok = req->request.short_not_ok;
+ unsigned int no_interrupt = req->request.no_interrupt;
+ unsigned int is_last = req->request.is_last;
struct dwc3 *dwc = dep->dwc;
struct usb_gadget *gadget = dwc->gadget;
enum usb_device_speed speed = gadget->speed;

- trb->size = DWC3_TRB_SIZE_LENGTH(length);
+ if (use_bounce_buffer)
+ dma = dep->dwc->bounce_addr;
+ else if (req->request.num_sgs > 0)
+ dma = sg_dma_address(req->start_sg);
+ else
+ dma = req->request.dma;
+
+ trb = &dep->trb_pool[dep->trb_enqueue];
+
+ if (!req->trb) {
+ dwc3_gadget_move_started_request(req);
+ req->trb = trb;
+ req->trb_dma = dwc3_trb_dma_offset(dep, trb);
+ }
+
+ req->num_trbs++;
+
+ trb->size = DWC3_TRB_SIZE_LENGTH(trb_length);
trb->bpl = lower_32_bits(dma);
trb->bph = upper_32_bits(dma);

@@ -1231,10 +1263,10 @@ static void __dwc3_prepare_one_trb(struct dwc3_ep *dep, struct dwc3_trb *trb,
unsigned int mult = 2;
unsigned int maxp = usb_endpoint_maxp(ep->desc);

- if (length <= (2 * maxp))
+ if (req->request.length <= (2 * maxp))
mult--;

- if (length <= maxp)
+ if (req->request.length <= maxp)
mult--;

trb->size |= DWC3_TRB_SIZE_PCM1(mult);
@@ -1308,50 +1340,6 @@ static void __dwc3_prepare_one_trb(struct dwc3_ep *dep, struct dwc3_trb *trb,
trace_dwc3_prepare_trb(dep, trb);
}

-/**
- * dwc3_prepare_one_trb - setup one TRB from one request
- * @dep: endpoint for which this request is prepared
- * @req: dwc3_request pointer
- * @trb_length: buffer size of the TRB
- * @chain: should this TRB be chained to the next?
- * @node: only for isochronous endpoints. First TRB needs different type.
- * @use_bounce_buffer: set to use bounce buffer
- * @must_interrupt: set to interrupt on TRB completion
- */
-static void dwc3_prepare_one_trb(struct dwc3_ep *dep,
- struct dwc3_request *req, unsigned int trb_length,
- unsigned int chain, unsigned int node, bool use_bounce_buffer,
- bool must_interrupt)
-{
- struct dwc3_trb *trb;
- dma_addr_t dma;
- unsigned int stream_id = req->request.stream_id;
- unsigned int short_not_ok = req->request.short_not_ok;
- unsigned int no_interrupt = req->request.no_interrupt;
- unsigned int is_last = req->request.is_last;
-
- if (use_bounce_buffer)
- dma = dep->dwc->bounce_addr;
- else if (req->request.num_sgs > 0)
- dma = sg_dma_address(req->start_sg);
- else
- dma = req->request.dma;
-
- trb = &dep->trb_pool[dep->trb_enqueue];
-
- if (!req->trb) {
- dwc3_gadget_move_started_request(req);
- req->trb = trb;
- req->trb_dma = dwc3_trb_dma_offset(dep, trb);
- }
-
- req->num_trbs++;
-
- __dwc3_prepare_one_trb(dep, trb, dma, trb_length, chain, node,
- stream_id, short_not_ok, no_interrupt, is_last,
- must_interrupt);
-}
-
static bool dwc3_needs_extra_trb(struct dwc3_ep *dep, struct dwc3_request *req)
{
unsigned int maxp = usb_endpoint_maxp(dep->endpoint.desc);
diff --git a/drivers/usb/gadget/function/f_mass_storage.c b/drivers/usb/gadget/function/f_mass_storage.c
index 3a77bca0ebe1..e884f295504f 100644
--- a/drivers/usb/gadget/function/f_mass_storage.c
+++ b/drivers/usb/gadget/function/f_mass_storage.c
@@ -1192,13 +1192,14 @@ static int do_read_toc(struct fsg_common *common, struct fsg_buffhd *bh)
u8 format;
int i, len;

+ format = common->cmnd[2] & 0xf;
+
if ((common->cmnd[1] & ~0x02) != 0 || /* Mask away MSF */
- start_track > 1) {
+ (start_track > 1 && format != 0x1)) {
curlun->sense_data = SS_INVALID_FIELD_IN_CDB;
return -EINVAL;
}

- format = common->cmnd[2] & 0xf;
/*
* Check if CDB is old style SFF-8020i
* i.e. format is in 2 MSBs of byte 9
@@ -1208,8 +1209,8 @@ static int do_read_toc(struct fsg_common *common, struct fsg_buffhd *bh)
format = (common->cmnd[9] >> 6) & 0x3;

switch (format) {
- case 0:
- /* Formatted TOC */
+ case 0: /* Formatted TOC */
+ case 1: /* Multi-session info */
len = 4 + 2*8; /* 4 byte header + 2 descriptors */
memset(buf, 0, len);
buf[1] = len - 2; /* TOC Length excludes length field */
@@ -1250,7 +1251,7 @@ static int do_read_toc(struct fsg_common *common, struct fsg_buffhd *bh)
return len;

default:
- /* Multi-session, PMA, ATIP, CD-TEXT not supported/required */
+ /* PMA, ATIP, CD-TEXT not supported/required */
curlun->sense_data = SS_INVALID_FIELD_IN_CDB;
return -EINVAL;
}
diff --git a/drivers/usb/gadget/udc/Kconfig b/drivers/usb/gadget/udc/Kconfig
index 69394dc1cdfb..2cdd37be165a 100644
--- a/drivers/usb/gadget/udc/Kconfig
+++ b/drivers/usb/gadget/udc/Kconfig
@@ -311,7 +311,7 @@ source "drivers/usb/gadget/udc/bdc/Kconfig"

config USB_AMD5536UDC
tristate "AMD5536 UDC"
- depends on USB_PCI
+ depends on USB_PCI && HAS_DMA
select USB_SNP_CORE
help
The AMD5536 UDC is part of the AMD Geode CS5536, an x86 southbridge.
diff --git a/drivers/usb/gadget/udc/aspeed-vhub/hub.c b/drivers/usb/gadget/udc/aspeed-vhub/hub.c
index 65cd4e46f031..e2207d014620 100644
--- a/drivers/usb/gadget/udc/aspeed-vhub/hub.c
+++ b/drivers/usb/gadget/udc/aspeed-vhub/hub.c
@@ -1059,8 +1059,10 @@ static int ast_vhub_init_desc(struct ast_vhub *vhub)
/* Initialize vhub String Descriptors. */
INIT_LIST_HEAD(&vhub->vhub_str_desc);
desc_np = of_get_child_by_name(vhub_np, "vhub-strings");
- if (desc_np)
+ if (desc_np) {
ret = ast_vhub_of_parse_str_desc(vhub, desc_np);
+ of_node_put(desc_np);
+ }
else
ret = ast_vhub_str_alloc_add(vhub, &ast_vhub_strings);

diff --git a/drivers/usb/gadget/udc/tegra-xudc.c b/drivers/usb/gadget/udc/tegra-xudc.c
index d9c406bdb680..c25765628625 100644
--- a/drivers/usb/gadget/udc/tegra-xudc.c
+++ b/drivers/usb/gadget/udc/tegra-xudc.c
@@ -3691,15 +3691,15 @@ static int tegra_xudc_powerdomain_init(struct tegra_xudc *xudc)
int err;

xudc->genpd_dev_device = dev_pm_domain_attach_by_name(dev, "dev");
- if (IS_ERR(xudc->genpd_dev_device)) {
- err = PTR_ERR(xudc->genpd_dev_device);
+ if (IS_ERR_OR_NULL(xudc->genpd_dev_device)) {
+ err = PTR_ERR(xudc->genpd_dev_device) ? : -ENODATA;
dev_err(dev, "failed to get device power domain: %d\n", err);
return err;
}

xudc->genpd_dev_ss = dev_pm_domain_attach_by_name(dev, "ss");
- if (IS_ERR(xudc->genpd_dev_ss)) {
- err = PTR_ERR(xudc->genpd_dev_ss);
+ if (IS_ERR_OR_NULL(xudc->genpd_dev_ss)) {
+ err = PTR_ERR(xudc->genpd_dev_ss) ? : -ENODATA;
dev_err(dev, "failed to get SuperSpeed power domain: %d\n", err);
return err;
}
diff --git a/drivers/usb/host/ehci-ppc-of.c b/drivers/usb/host/ehci-ppc-of.c
index 6bbaee74f7e7..28a19693c19f 100644
--- a/drivers/usb/host/ehci-ppc-of.c
+++ b/drivers/usb/host/ehci-ppc-of.c
@@ -148,6 +148,7 @@ static int ehci_hcd_ppc_of_probe(struct platform_device *op)
} else {
ehci->has_amcc_usb23 = 1;
}
+ of_node_put(np);
}

if (of_get_property(dn, "big-endian", NULL)) {
diff --git a/drivers/usb/host/ohci-nxp.c b/drivers/usb/host/ohci-nxp.c
index 85878e8ad331..106a6bcefb08 100644
--- a/drivers/usb/host/ohci-nxp.c
+++ b/drivers/usb/host/ohci-nxp.c
@@ -164,6 +164,7 @@ static int ohci_hcd_nxp_probe(struct platform_device *pdev)
}

isp1301_i2c_client = isp1301_get_client(isp1301_node);
+ of_node_put(isp1301_node);
if (!isp1301_i2c_client)
return -EPROBE_DEFER;

diff --git a/drivers/usb/host/xhci-tegra.c b/drivers/usb/host/xhci-tegra.c
index 996958a6565c..bdb776553826 100644
--- a/drivers/usb/host/xhci-tegra.c
+++ b/drivers/usb/host/xhci-tegra.c
@@ -1010,15 +1010,15 @@ static int tegra_xusb_powerdomain_init(struct device *dev,
int err;

tegra->genpd_dev_host = dev_pm_domain_attach_by_name(dev, "xusb_host");
- if (IS_ERR(tegra->genpd_dev_host)) {
- err = PTR_ERR(tegra->genpd_dev_host);
+ if (IS_ERR_OR_NULL(tegra->genpd_dev_host)) {
+ err = PTR_ERR(tegra->genpd_dev_host) ? : -ENODATA;
dev_err(dev, "failed to get host pm-domain: %d\n", err);
return err;
}

tegra->genpd_dev_ss = dev_pm_domain_attach_by_name(dev, "xusb_ss");
- if (IS_ERR(tegra->genpd_dev_ss)) {
- err = PTR_ERR(tegra->genpd_dev_ss);
+ if (IS_ERR_OR_NULL(tegra->genpd_dev_ss)) {
+ err = PTR_ERR(tegra->genpd_dev_ss) ? : -ENODATA;
dev_err(dev, "failed to get superspeed pm-domain: %d\n", err);
return err;
}
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 1f3f311d9951..1d60e62752f3 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -2393,7 +2393,7 @@ static inline const char *xhci_decode_trb(char *str, size_t size,
field3 & TRB_CYCLE ? 'C' : 'c');
break;
case TRB_STOP_RING:
- sprintf(str,
+ snprintf(str, size,
"%s: slot %d sp %d ep %d flags %c",
xhci_trb_type_string(type),
TRB_TO_SLOT_ID(field3),
diff --git a/drivers/usb/serial/sierra.c b/drivers/usb/serial/sierra.c
index 9d56138133a9..ef6a2891f290 100644
--- a/drivers/usb/serial/sierra.c
+++ b/drivers/usb/serial/sierra.c
@@ -737,7 +737,8 @@ static void sierra_close(struct usb_serial_port *port)

/*
* Need to take susp_lock to make sure port is not already being
- * resumed, but no need to hold it due to initialized
+ * resumed, but no need to hold it due to the tty-port initialized
+ * flag.
*/
spin_lock_irq(&intfdata->susp_lock);
if (--intfdata->open_ports == 0)
diff --git a/drivers/usb/serial/usb-serial.c b/drivers/usb/serial/usb-serial.c
index 24101bd7fcad..e35bea2235c1 100644
--- a/drivers/usb/serial/usb-serial.c
+++ b/drivers/usb/serial/usb-serial.c
@@ -295,7 +295,7 @@ static int serial_open(struct tty_struct *tty, struct file *filp)
*
* Shut down a USB serial port. Serialized against activate by the
* tport mutex and kept to matching open/close pairs
- * of calls by the initialized flag.
+ * of calls by the tty-port initialized flag.
*
* Not called if tty is console.
*/
diff --git a/drivers/usb/serial/usb_wwan.c b/drivers/usb/serial/usb_wwan.c
index dab38b63eaf7..cc81ab7ef4da 100644
--- a/drivers/usb/serial/usb_wwan.c
+++ b/drivers/usb/serial/usb_wwan.c
@@ -388,7 +388,8 @@ void usb_wwan_close(struct usb_serial_port *port)

/*
* Need to take susp_lock to make sure port is not already being
- * resumed, but no need to hold it due to initialized
+ * resumed, but no need to hold it due to the tty-port initialized
+ * flag.
*/
spin_lock_irq(&intfdata->susp_lock);
if (--intfdata->open_ports == 0)
diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c
index a6045aef0d04..668880c3a472 100644
--- a/drivers/usb/typec/ucsi/ucsi.c
+++ b/drivers/usb/typec/ucsi/ucsi.c
@@ -76,6 +76,10 @@ static int ucsi_read_error(struct ucsi *ucsi)
if (ret)
return ret;

+ ret = ucsi_acknowledge_command(ucsi);
+ if (ret)
+ return ret;
+
switch (error) {
case UCSI_ERROR_INCOMPATIBLE_PARTNER:
return -EOPNOTSUPP;
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 767b5d47631a..e92376837b29 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -337,6 +337,14 @@ static int vf_qm_cache_wb(struct hisi_qm *qm)
return 0;
}

+static struct hisi_acc_vf_core_device *hssi_acc_drvdata(struct pci_dev *pdev)
+{
+ struct vfio_pci_core_device *core_device = dev_get_drvdata(&pdev->dev);
+
+ return container_of(core_device, struct hisi_acc_vf_core_device,
+ core_device);
+}
+
static void vf_qm_fun_reset(struct hisi_acc_vf_core_device *hisi_acc_vdev,
struct hisi_qm *qm)
{
@@ -962,7 +970,7 @@ hisi_acc_vfio_pci_get_device_state(struct vfio_device *vdev,

static void hisi_acc_vf_pci_aer_reset_done(struct pci_dev *pdev)
{
- struct hisi_acc_vf_core_device *hisi_acc_vdev = dev_get_drvdata(&pdev->dev);
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = hssi_acc_drvdata(pdev);

if (hisi_acc_vdev->core_device.vdev.migration_flags !=
VFIO_MIGRATION_STOP_COPY)
@@ -1274,11 +1282,10 @@ static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device
&hisi_acc_vfio_pci_ops);
}

+ dev_set_drvdata(&pdev->dev, &hisi_acc_vdev->core_device);
ret = vfio_pci_core_register_device(&hisi_acc_vdev->core_device);
if (ret)
goto out_free;
-
- dev_set_drvdata(&pdev->dev, hisi_acc_vdev);
return 0;

out_free:
@@ -1289,7 +1296,7 @@ static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device

static void hisi_acc_vfio_pci_remove(struct pci_dev *pdev)
{
- struct hisi_acc_vf_core_device *hisi_acc_vdev = dev_get_drvdata(&pdev->dev);
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = hssi_acc_drvdata(pdev);

vfio_pci_core_unregister_device(&hisi_acc_vdev->core_device);
vfio_pci_core_uninit_device(&hisi_acc_vdev->core_device);
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index bbec5d288fee..9f59f5807b8a 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -39,6 +39,14 @@ struct mlx5vf_pci_core_device {
struct mlx5_vf_migration_file *saving_migf;
};

+static struct mlx5vf_pci_core_device *mlx5vf_drvdata(struct pci_dev *pdev)
+{
+ struct vfio_pci_core_device *core_device = dev_get_drvdata(&pdev->dev);
+
+ return container_of(core_device, struct mlx5vf_pci_core_device,
+ core_device);
+}
+
static struct page *
mlx5vf_get_migration_page(struct mlx5_vf_migration_file *migf,
unsigned long offset)
@@ -505,7 +513,7 @@ static int mlx5vf_pci_get_device_state(struct vfio_device *vdev,

static void mlx5vf_pci_aer_reset_done(struct pci_dev *pdev)
{
- struct mlx5vf_pci_core_device *mvdev = dev_get_drvdata(&pdev->dev);
+ struct mlx5vf_pci_core_device *mvdev = mlx5vf_drvdata(pdev);

if (!mvdev->migrate_cap)
return;
@@ -614,11 +622,10 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,
}
}

+ dev_set_drvdata(&pdev->dev, &mvdev->core_device);
ret = vfio_pci_core_register_device(&mvdev->core_device);
if (ret)
goto out_free;
-
- dev_set_drvdata(&pdev->dev, mvdev);
return 0;

out_free:
@@ -629,7 +636,7 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,

static void mlx5vf_pci_remove(struct pci_dev *pdev)
{
- struct mlx5vf_pci_core_device *mvdev = dev_get_drvdata(&pdev->dev);
+ struct mlx5vf_pci_core_device *mvdev = mlx5vf_drvdata(pdev);

vfio_pci_core_unregister_device(&mvdev->core_device);
vfio_pci_core_uninit_device(&mvdev->core_device);
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 2b047469e02f..8c990a1a7def 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -151,10 +151,10 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
return -ENOMEM;
vfio_pci_core_init_device(vdev, pdev, &vfio_pci_ops);

+ dev_set_drvdata(&pdev->dev, vdev);
ret = vfio_pci_core_register_device(vdev);
if (ret)
goto out_free;
- dev_set_drvdata(&pdev->dev, vdev);
return 0;

out_free:
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 06b6f3594a13..65587fd5c021 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1821,6 +1821,10 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
struct pci_dev *pdev = vdev->pdev;
int ret;

+ /* Drivers must set the vfio_pci_core_device to their drvdata */
+ if (WARN_ON(vdev != dev_get_drvdata(&vdev->pdev->dev)))
+ return -EINVAL;
+
if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
return -EINVAL;

diff --git a/drivers/video/fbdev/amba-clcd.c b/drivers/video/fbdev/amba-clcd.c
index 8080116aea84..f65c96d1394d 100644
--- a/drivers/video/fbdev/amba-clcd.c
+++ b/drivers/video/fbdev/amba-clcd.c
@@ -698,16 +698,18 @@ static int clcdfb_of_init_display(struct clcd_fb *fb)
return -ENODEV;

panel = of_graph_get_remote_port_parent(endpoint);
- if (!panel)
- return -ENODEV;
+ if (!panel) {
+ err = -ENODEV;
+ goto out_endpoint_put;
+ }

err = clcdfb_of_get_backlight(&fb->dev->dev, fb->panel);
if (err)
- return err;
+ goto out_panel_put;

err = clcdfb_of_get_mode(&fb->dev->dev, panel, fb->panel);
if (err)
- return err;
+ goto out_panel_put;

err = of_property_read_u32(fb->dev->dev.of_node, "max-memory-bandwidth",
&max_bandwidth);
@@ -736,11 +738,21 @@ static int clcdfb_of_init_display(struct clcd_fb *fb)

if (of_property_read_u32_array(endpoint,
"arm,pl11x,tft-r0g0b0-pads",
- tft_r0b0g0, ARRAY_SIZE(tft_r0b0g0)) != 0)
- return -ENOENT;
+ tft_r0b0g0, ARRAY_SIZE(tft_r0b0g0)) != 0) {
+ err = -ENOENT;
+ goto out_panel_put;
+ }
+
+ of_node_put(panel);
+ of_node_put(endpoint);

return clcdfb_of_init_tft_panel(fb, tft_r0b0g0[0],
tft_r0b0g0[1], tft_r0b0g0[2]);
+out_panel_put:
+ of_node_put(panel);
+out_endpoint_put:
+ of_node_put(endpoint);
+ return err;
}

static int clcdfb_of_vram_setup(struct clcd_fb *fb)
diff --git a/drivers/video/fbdev/arkfb.c b/drivers/video/fbdev/arkfb.c
index eb3e47c58c5f..a2a381631628 100644
--- a/drivers/video/fbdev/arkfb.c
+++ b/drivers/video/fbdev/arkfb.c
@@ -781,7 +781,12 @@ static int arkfb_set_par(struct fb_info *info)
return -EINVAL;
}

- ark_set_pixclock(info, (hdiv * info->var.pixclock) / hmul);
+ value = (hdiv * info->var.pixclock) / hmul;
+ if (!value) {
+ fb_dbg(info, "invalid pixclock\n");
+ value = 1;
+ }
+ ark_set_pixclock(info, value);
svga_set_timings(par->state.vgabase, &ark_timing_regs, &(info->var), hmul, hdiv,
(info->var.vmode & FB_VMODE_DOUBLE) ? 2 : 1,
(info->var.vmode & FB_VMODE_INTERLACED) ? 2 : 1,
@@ -792,6 +797,8 @@ static int arkfb_set_par(struct fb_info *info)
value = ((value * hmul / hdiv) / 8) - 5;
vga_wcrt(par->state.vgabase, 0x42, (value + 1) / 2);

+ if (screen_size > info->screen_size)
+ screen_size = info->screen_size;
memset_io(info->screen_base, 0x00, screen_size);
/* Device and screen back on */
svga_wcrt_mask(par->state.vgabase, 0x17, 0x80, 0x80);
diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c
index 526a7d2de491..d597f0c31578 100644
--- a/drivers/video/fbdev/core/fbcon.c
+++ b/drivers/video/fbdev/core/fbcon.c
@@ -115,8 +115,8 @@ static int logo_lines;
enums. */
static int logo_shown = FBCON_LOGO_CANSHOW;
/* console mappings */
-static int first_fb_vc;
-static int last_fb_vc = MAX_NR_CONSOLES - 1;
+static unsigned int first_fb_vc;
+static unsigned int last_fb_vc = MAX_NR_CONSOLES - 1;
static int fbcon_is_default = 1;
static int primary_device = -1;
static int fbcon_has_console_bind;
@@ -464,10 +464,12 @@ static int __init fb_console_setup(char *this_opt)
options += 3;
if (*options)
first_fb_vc = simple_strtoul(options, &options, 10) - 1;
- if (first_fb_vc < 0)
+ if (first_fb_vc >= MAX_NR_CONSOLES)
first_fb_vc = 0;
if (*options++ == '-')
last_fb_vc = simple_strtoul(options, &options, 10) - 1;
+ if (last_fb_vc < first_fb_vc || last_fb_vc >= MAX_NR_CONSOLES)
+ last_fb_vc = MAX_NR_CONSOLES - 1;
fbcon_is_default = 0;
continue;
}
@@ -1704,8 +1706,6 @@ static bool fbcon_scroll(struct vc_data *vc, unsigned int t, unsigned int b,
case SM_UP:
if (count > vc->vc_rows) /* Maximum realistic size */
count = vc->vc_rows;
- if (logo_shown >= 0)
- goto redraw_up;
switch (fb_scrollmode(p)) {
case SCROLL_MOVE:
fbcon_redraw_blit(vc, info, p, t, b - t - count,
@@ -1794,8 +1794,6 @@ static bool fbcon_scroll(struct vc_data *vc, unsigned int t, unsigned int b,
case SM_DOWN:
if (count > vc->vc_rows) /* Maximum realistic size */
count = vc->vc_rows;
- if (logo_shown >= 0)
- goto redraw_down;
switch (fb_scrollmode(p)) {
case SCROLL_MOVE:
fbcon_redraw_blit(vc, info, p, b - 1, b - t - count,
diff --git a/drivers/video/fbdev/s3fb.c b/drivers/video/fbdev/s3fb.c
index b93c8eb02336..5069f6f67923 100644
--- a/drivers/video/fbdev/s3fb.c
+++ b/drivers/video/fbdev/s3fb.c
@@ -905,6 +905,8 @@ static int s3fb_set_par(struct fb_info *info)
value = clamp((htotal + hsstart + 1) / 2 + 2, hsstart + 4, htotal + 1);
svga_wcrt_multi(par->state.vgabase, s3_dtpc_regs, value);

+ if (screen_size > info->screen_size)
+ screen_size = info->screen_size;
memset_io(info->screen_base, 0x00, screen_size);
/* Device and screen back on */
svga_wcrt_mask(par->state.vgabase, 0x17, 0x80, 0x80);
diff --git a/drivers/video/fbdev/sis/init.c b/drivers/video/fbdev/sis/init.c
index b568c646a76c..2ba91d62af92 100644
--- a/drivers/video/fbdev/sis/init.c
+++ b/drivers/video/fbdev/sis/init.c
@@ -355,12 +355,12 @@ SiS_GetModeID(int VGAEngine, unsigned int VBFlags, int HDisplay, int VDisplay,
}
break;
case 400:
- if((!(VBFlags & CRT1_LCDA)) || ((LCDwidth >= 800) && (LCDwidth >= 600))) {
+ if((!(VBFlags & CRT1_LCDA)) || ((LCDwidth >= 800) && (LCDheight >= 600))) {
if(VDisplay == 300) ModeIndex = ModeIndex_400x300[Depth];
}
break;
case 512:
- if((!(VBFlags & CRT1_LCDA)) || ((LCDwidth >= 1024) && (LCDwidth >= 768))) {
+ if((!(VBFlags & CRT1_LCDA)) || ((LCDwidth >= 1024) && (LCDheight >= 768))) {
if(VDisplay == 384) ModeIndex = ModeIndex_512x384[Depth];
}
break;
diff --git a/drivers/video/fbdev/vt8623fb.c b/drivers/video/fbdev/vt8623fb.c
index a92a8c670cf0..4274c6efb249 100644
--- a/drivers/video/fbdev/vt8623fb.c
+++ b/drivers/video/fbdev/vt8623fb.c
@@ -507,6 +507,8 @@ static int vt8623fb_set_par(struct fb_info *info)
(info->var.vmode & FB_VMODE_DOUBLE) ? 2 : 1, 1,
1, info->node);

+ if (screen_size > info->screen_size)
+ screen_size = info->screen_size;
memset_io(info->screen_base, 0x00, screen_size);

/* Device and screen back on */
diff --git a/drivers/watchdog/armada_37xx_wdt.c b/drivers/watchdog/armada_37xx_wdt.c
index 1635f421ef2c..854b1cc723cb 100644
--- a/drivers/watchdog/armada_37xx_wdt.c
+++ b/drivers/watchdog/armada_37xx_wdt.c
@@ -274,6 +274,8 @@ static int armada_37xx_wdt_probe(struct platform_device *pdev)
if (!res)
return -ENODEV;
dev->reg = devm_ioremap(&pdev->dev, res->start, resource_size(res));
+ if (!dev->reg)
+ return -ENOMEM;

/* init clock */
dev->clk = devm_clk_get(&pdev->dev, NULL);
diff --git a/drivers/watchdog/f71808e_wdt.c b/drivers/watchdog/f71808e_wdt.c
index 7f59c680de25..6a16d3d0bb1e 100644
--- a/drivers/watchdog/f71808e_wdt.c
+++ b/drivers/watchdog/f71808e_wdt.c
@@ -634,7 +634,9 @@ static int __init fintek_wdt_init(void)

pdata.type = ret;

- platform_driver_register(&fintek_wdt_driver);
+ ret = platform_driver_register(&fintek_wdt_driver);
+ if (ret)
+ return ret;

wdt_res.name = "superio port";
wdt_res.flags = IORESOURCE_IO;
diff --git a/drivers/watchdog/sp5100_tco.c b/drivers/watchdog/sp5100_tco.c
index 86ffb58fbc85..ae54dd33e233 100644
--- a/drivers/watchdog/sp5100_tco.c
+++ b/drivers/watchdog/sp5100_tco.c
@@ -402,6 +402,7 @@ static int sp5100_tco_setupdevice_mmio(struct device *dev,
iounmap(addr);

release_resource(res);
+ kfree(res);

return ret;
}
diff --git a/fs/Makefile b/fs/Makefile
index 208a74e0b00e..93b80529f8e8 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -34,8 +34,6 @@ obj-$(CONFIG_TIMERFD) += timerfd.o
obj-$(CONFIG_EVENTFD) += eventfd.o
obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
obj-$(CONFIG_AIO) += aio.o
-obj-$(CONFIG_IO_URING) += io_uring.o
-obj-$(CONFIG_IO_WQ) += io-wq.o
obj-$(CONFIG_FS_DAX) += dax.o
obj-$(CONFIG_FS_ENCRYPTION) += crypto/
obj-$(CONFIG_FS_VERITY) += verity/
diff --git a/fs/attr.c b/fs/attr.c
index dbe996b0dedf..f581c4d00897 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -184,6 +184,8 @@ EXPORT_SYMBOL(setattr_prepare);
*/
int inode_newsize_ok(const struct inode *inode, loff_t offset)
{
+ if (offset < 0)
+ return -EINVAL;
if (inode->i_size < offset) {
unsigned long limit;

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 667b7025d503..0c7fe3142d7c 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1033,8 +1033,13 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
< block_group->zone_unusable);
WARN_ON(block_group->space_info->disk_total
< block_group->length * factor);
+ WARN_ON(block_group->zone_is_active &&
+ block_group->space_info->active_total_bytes
+ < block_group->length);
}
block_group->space_info->total_bytes -= block_group->length;
+ if (block_group->zone_is_active)
+ block_group->space_info->active_total_bytes -= block_group->length;
block_group->space_info->bytes_readonly -=
(block_group->length - block_group->zone_unusable);
block_group->space_info->bytes_zone_unusable -=
@@ -2102,7 +2107,8 @@ static int read_one_block_group(struct btrfs_fs_info *info,
trace_btrfs_add_block_group(info, cache, 0);
btrfs_update_space_info(info, cache->flags, cache->length,
cache->used, cache->bytes_super,
- cache->zone_unusable, &space_info);
+ cache->zone_unusable, cache->zone_is_active,
+ &space_info);

cache->space_info = space_info;

@@ -2172,7 +2178,7 @@ static int fill_dummy_bgs(struct btrfs_fs_info *fs_info)
}

btrfs_update_space_info(fs_info, bg->flags, em->len, em->len,
- 0, 0, &space_info);
+ 0, 0, false, &space_info);
bg->space_info = space_info;
link_block_group(bg);

@@ -2553,7 +2559,7 @@ struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *tran
trace_btrfs_add_block_group(fs_info, cache, 1);
btrfs_update_space_info(fs_info, cache->flags, size, bytes_used,
cache->bytes_super, cache->zone_unusable,
- &cache->space_info);
+ cache->zone_is_active, &cache->space_info);
btrfs_update_global_block_rsv(fs_info);

link_block_group(cache);
@@ -2653,6 +2659,14 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
ret = btrfs_chunk_alloc(trans, alloc_flags, CHUNK_ALLOC_FORCE);
if (ret < 0)
goto out;
+ /*
+ * We have allocated a new chunk. We also need to activate that chunk to
+ * grant metadata tickets for zoned filesystem.
+ */
+ ret = btrfs_zoned_activate_one_bg(fs_info, cache->space_info, true);
+ if (ret < 0)
+ goto out;
+
ret = inc_block_group_ro(cache, 0);
if (ret == -ETXTBSY)
goto unlock_out;
@@ -3724,6 +3738,7 @@ int btrfs_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
* attempt.
*/
wait_for_alloc = true;
+ force = CHUNK_ALLOC_NO_FORCE;
spin_unlock(&space_info->lock);
mutex_lock(&fs_info->chunk_mutex);
mutex_unlock(&fs_info->chunk_mutex);
@@ -3846,6 +3861,14 @@ static void reserve_chunk_space(struct btrfs_trans_handle *trans,
if (IS_ERR(bg)) {
ret = PTR_ERR(bg);
} else {
+ /*
+ * We have a new chunk. We also need to activate it for
+ * zoned filesystem.
+ */
+ ret = btrfs_zoned_activate_one_bg(fs_info, info, true);
+ if (ret < 0)
+ return;
+
/*
* If we fail to add the chunk item here, we end up
* trying again at phase 2 of chunk allocation, at
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 077c95e9baa5..4ce0977ec9e1 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -107,14 +107,6 @@ struct btrfs_ioctl_encoded_io_args;
#define BTRFS_STAT_CURR 0
#define BTRFS_STAT_PREV 1

-/*
- * Count how many BTRFS_MAX_EXTENT_SIZE cover the @size
- */
-static inline u32 count_max_extents(u64 size)
-{
- return div_u64(size + BTRFS_MAX_EXTENT_SIZE - 1, BTRFS_MAX_EXTENT_SIZE);
-}
-
static inline unsigned long btrfs_chunk_item_size(int num_stripes)
{
BUG_ON(num_stripes == 0);
@@ -635,6 +627,9 @@ enum {
/* Indicate we have half completed snapshot deletions pending. */
BTRFS_FS_UNFINISHED_DROPS,

+ /* Indicate we have to finish a zone to do next allocation. */
+ BTRFS_FS_NEED_ZONE_FINISH,
+
#if BITS_PER_LONG == 32
/* Indicate if we have error/warn message printed on 32bit systems */
BTRFS_FS_32BIT_ERROR,
@@ -1032,6 +1027,12 @@ struct btrfs_fs_info {
u32 csums_per_leaf;
u32 stripesize;

+ /*
+ * Maximum size of an extent. BTRFS_MAX_EXTENT_SIZE on regular
+ * filesystem, on zoned it depends on the device constraints.
+ */
+ u64 max_extent_size;
+
/* Block groups and devices containing active swapfiles. */
spinlock_t swapfile_pins_lock;
struct rb_root swapfile_pins;
@@ -1050,6 +1051,8 @@ struct btrfs_fs_info {
u64 zoned;
};

+ /* Max size to emit ZONE_APPEND write command */
+ u64 max_zone_append_size;
struct mutex zoned_meta_io_lock;
spinlock_t treelog_bg_lock;
u64 treelog_bg;
@@ -1066,6 +1069,8 @@ struct btrfs_fs_info {

spinlock_t zone_active_bgs_lock;
struct list_head zone_active_bgs;
+ /* Waiters when BTRFS_FS_NEED_ZONE_FINISH is set */
+ wait_queue_head_t zone_finish_wait;

#ifdef CONFIG_BTRFS_FS_REF_VERIFY
spinlock_t ref_verify_lock;
@@ -3932,6 +3937,19 @@ static inline bool btrfs_is_zoned(const struct btrfs_fs_info *fs_info)
return fs_info->zoned != 0;
}

+/*
+ * Count how many fs_info->max_extent_size cover the @size
+ */
+static inline u32 count_max_extents(struct btrfs_fs_info *fs_info, u64 size)
+{
+#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
+ if (!fs_info)
+ return div_u64(size + BTRFS_MAX_EXTENT_SIZE - 1, BTRFS_MAX_EXTENT_SIZE);
+#endif
+
+ return div_u64(size + fs_info->max_extent_size - 1, fs_info->max_extent_size);
+}
+
static inline bool btrfs_is_data_reloc_root(const struct btrfs_root *root)
{
return root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID;
diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
index bd8267c4687d..a224f7e52bf7 100644
--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -273,7 +273,7 @@ static void calc_inode_reservations(struct btrfs_fs_info *fs_info,
u64 num_bytes, u64 disk_num_bytes,
u64 *meta_reserve, u64 *qgroup_reserve)
{
- u64 nr_extents = count_max_extents(num_bytes);
+ u64 nr_extents = count_max_extents(fs_info, num_bytes);
u64 csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, disk_num_bytes);
u64 inode_update = btrfs_calc_metadata_size(fs_info, 1);

@@ -349,7 +349,7 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes,
* needs to free the reservation we just made.
*/
spin_lock(&inode->lock);
- nr_extents = count_max_extents(num_bytes);
+ nr_extents = count_max_extents(fs_info, num_bytes);
btrfs_mod_outstanding_extents(inode, nr_extents);
inode->csum_bytes += disk_num_bytes;
btrfs_calculate_inode_block_rsv_size(fs_info, inode);
@@ -412,7 +412,7 @@ void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes)
unsigned num_extents;

spin_lock(&inode->lock);
- num_extents = count_max_extents(num_bytes);
+ num_extents = count_max_extents(fs_info, num_bytes);
btrfs_mod_outstanding_extents(inode, -num_extents);
btrfs_calculate_inode_block_rsv_size(fs_info, inode);
spin_unlock(&inode->lock);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6f30413ed9a9..59fa7bf3a2e5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3239,6 +3239,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
init_waitqueue_head(&fs_info->transaction_blocked_wait);
init_waitqueue_head(&fs_info->async_submit_wait);
init_waitqueue_head(&fs_info->delayed_iputs_wait);
+ init_waitqueue_head(&fs_info->zone_finish_wait);

/* Usable values until the real ones are cached from the superblock */
fs_info->nodesize = 4096;
@@ -3246,6 +3247,8 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
fs_info->sectorsize_bits = ilog2(4096);
fs_info->stripesize = 4096;

+ fs_info->max_extent_size = BTRFS_MAX_EXTENT_SIZE;
+
spin_lock_init(&fs_info->swapfile_pins_lock);
fs_info->swapfile_pins = RB_ROOT;

@@ -3577,16 +3580,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
*/
fs_info->compress_type = BTRFS_COMPRESS_ZLIB;

- /*
- * Flag our filesystem as having big metadata blocks if they are bigger
- * than the page size.
- */
- if (btrfs_super_nodesize(disk_super) > PAGE_SIZE) {
- if (!(features & BTRFS_FEATURE_INCOMPAT_BIG_METADATA))
- btrfs_info(fs_info,
- "flagging fs with big metadata feature");
- features |= BTRFS_FEATURE_INCOMPAT_BIG_METADATA;
- }

/* Set up fs_info before parsing mount options */
nodesize = btrfs_super_nodesize(disk_super);
@@ -3627,6 +3620,17 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
if (features & BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA)
btrfs_info(fs_info, "has skinny extents");

+ /*
+ * Flag our filesystem as having big metadata blocks if they are bigger
+ * than the page size.
+ */
+ if (btrfs_super_nodesize(disk_super) > PAGE_SIZE) {
+ if (!(features & BTRFS_FEATURE_INCOMPAT_BIG_METADATA))
+ btrfs_info(fs_info,
+ "flagging fs with big metadata feature");
+ features |= BTRFS_FEATURE_INCOMPAT_BIG_METADATA;
+ }
+
/*
* mixed block groups end up with duplicate but slightly offset
* extent buffers for the same range. It leads to corruptions
@@ -3654,6 +3658,20 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
err = -EINVAL;
goto fail_alloc;
}
+ /*
+ * We have unsupported RO compat features, although RO mounted, we
+ * should not cause any metadata write, including log replay.
+ * Or we could screw up whatever the new feature requires.
+ */
+ if (unlikely(features && btrfs_super_log_root(disk_super) &&
+ !btrfs_test_opt(fs_info, NOLOGREPLAY))) {
+ btrfs_err(fs_info,
+"cannot replay dirty log with unsupported compat_ro features (0x%llx), try rescue=nologreplay",
+ features);
+ err = -EINVAL;
+ goto fail_alloc;
+ }
+

if (sectorsize < PAGE_SIZE) {
struct btrfs_subpage_info *subpage_info;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f45ecd939a2c..eee68a6f2be7 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3803,8 +3803,7 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,

/* Check RO and no space case before trying to activate it */
spin_lock(&block_group->lock);
- if (block_group->ro ||
- block_group->alloc_offset == block_group->zone_capacity) {
+ if (block_group->ro || btrfs_zoned_bg_is_full(block_group)) {
ret = 1;
/*
* May need to clear fs_info->{treelog,data_reloc}_bg.
@@ -3985,23 +3984,63 @@ static void found_extent(struct find_free_extent_ctl *ffe_ctl,
}
}

-static bool can_allocate_chunk(struct btrfs_fs_info *fs_info,
- struct find_free_extent_ctl *ffe_ctl)
+static int can_allocate_chunk_zoned(struct btrfs_fs_info *fs_info,
+ struct find_free_extent_ctl *ffe_ctl)
+{
+ /* If we can activate new zone, just allocate a chunk and use it */
+ if (btrfs_can_activate_zone(fs_info->fs_devices, ffe_ctl->flags))
+ return 0;
+
+ /*
+ * We already reached the max active zones. Try to finish one block
+ * group to make a room for a new block group. This is only possible
+ * for a data block group because btrfs_zone_finish() may need to wait
+ * for a running transaction which can cause a deadlock for metadata
+ * allocation.
+ */
+ if (ffe_ctl->flags & BTRFS_BLOCK_GROUP_DATA) {
+ int ret = btrfs_zone_finish_one_bg(fs_info);
+
+ if (ret == 1)
+ return 0;
+ else if (ret < 0)
+ return ret;
+ }
+
+ /*
+ * If we have enough free space left in an already active block group
+ * and we can't activate any other zone now, do not allow allocating a
+ * new chunk and let find_free_extent() retry with a smaller size.
+ */
+ if (ffe_ctl->max_extent_size >= ffe_ctl->min_alloc_size)
+ return -ENOSPC;
+
+ /*
+ * Even min_alloc_size is not left in any block groups. Since we cannot
+ * activate a new block group, allocating it may not help. Let's tell a
+ * caller to try again and hope it progress something by writing some
+ * parts of the region. That is only possible for data block groups,
+ * where a part of the region can be written.
+ */
+ if (ffe_ctl->flags & BTRFS_BLOCK_GROUP_DATA)
+ return -EAGAIN;
+
+ /*
+ * We cannot activate a new block group and no enough space left in any
+ * block groups. So, allocating a new block group may not help. But,
+ * there is nothing to do anyway, so let's go with it.
+ */
+ return 0;
+}
+
+static int can_allocate_chunk(struct btrfs_fs_info *fs_info,
+ struct find_free_extent_ctl *ffe_ctl)
{
switch (ffe_ctl->policy) {
case BTRFS_EXTENT_ALLOC_CLUSTERED:
- return true;
+ return 0;
case BTRFS_EXTENT_ALLOC_ZONED:
- /*
- * If we have enough free space left in an already
- * active block group and we can't activate any other
- * zone now, do not allow allocating a new chunk and
- * let find_free_extent() retry with a smaller size.
- */
- if (ffe_ctl->max_extent_size >= ffe_ctl->min_alloc_size &&
- !btrfs_can_activate_zone(fs_info->fs_devices, ffe_ctl->flags))
- return false;
- return true;
+ return can_allocate_chunk_zoned(fs_info, ffe_ctl);
default:
BUG();
}
@@ -4083,8 +4122,9 @@ static int find_free_extent_update_loop(struct btrfs_fs_info *fs_info,
int exist = 0;

/*Check if allocation policy allows to create a new chunk */
- if (!can_allocate_chunk(fs_info, ffe_ctl))
- return -ENOSPC;
+ ret = can_allocate_chunk(fs_info, ffe_ctl);
+ if (ret)
+ return ret;

trans = current->journal_info;
if (trans)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 68ddd90685d9..bfc7d5b31156 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1992,10 +1992,12 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
struct page *locked_page, u64 *start,
u64 *end)
{
+ struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
const u64 orig_start = *start;
const u64 orig_end = *end;
- u64 max_bytes = BTRFS_MAX_EXTENT_SIZE;
+ /* The sanity tests may not set a valid fs_info. */
+ u64 max_bytes = fs_info ? fs_info->max_extent_size : BTRFS_MAX_EXTENT_SIZE;
u64 delalloc_start;
u64 delalloc_end;
bool found;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 153920acd226..2d24f2dcc0ea 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2344,7 +2344,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
btrfs_release_log_ctx_extents(&ctx);
if (ret < 0) {
/* Fallthrough and commit/free transaction. */
- ret = 1;
+ ret = BTRFS_LOG_FORCE_COMMIT;
}

/* we've logged all the items and now have a consistent
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 01a408db5683..ef84bc5030cd 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2630,16 +2630,19 @@ int __btrfs_add_free_space(struct btrfs_block_group *block_group,
static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
u64 bytenr, u64 size, bool used)
{
- struct btrfs_fs_info *fs_info = block_group->fs_info;
+ struct btrfs_space_info *sinfo = block_group->space_info;
struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
u64 offset = bytenr - block_group->start;
u64 to_free, to_unusable;
- const int bg_reclaim_threshold = READ_ONCE(fs_info->bg_reclaim_threshold);
+ int bg_reclaim_threshold = 0;
bool initial = (size == block_group->length);
u64 reclaimable_unusable;

WARN_ON(!initial && offset + size > block_group->zone_capacity);

+ if (!initial)
+ bg_reclaim_threshold = READ_ONCE(sinfo->bg_reclaim_threshold);
+
spin_lock(&ctl->tree_lock);
if (!used)
to_free = size;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5d15e374d032..6cf4b3146667 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -92,7 +92,8 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent);
static noinline int cow_file_range(struct btrfs_inode *inode,
struct page *locked_page,
u64 start, u64 end, int *page_started,
- unsigned long *nr_written, int unlock);
+ unsigned long *nr_written, int unlock,
+ u64 *done_offset);
static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
u64 len, u64 orig_start, u64 block_start,
u64 block_len, u64 orig_block_len,
@@ -884,15 +885,25 @@ static int submit_uncompressed_range(struct btrfs_inode *inode,
* can directly submit them without interruption.
*/
ret = cow_file_range(inode, locked_page, start, end, &page_started,
- &nr_written, 0);
+ &nr_written, 0, NULL);
/* Inline extent inserted, page gets unlocked and everything is done */
if (page_started) {
ret = 0;
goto out;
}
if (ret < 0) {
- if (locked_page)
+ btrfs_cleanup_ordered_extents(inode, locked_page, start, end - start + 1);
+ if (locked_page) {
+ const u64 page_start = page_offset(locked_page);
+ const u64 page_end = page_start + PAGE_SIZE - 1;
+
+ btrfs_page_set_error(inode->root->fs_info, locked_page,
+ page_start, PAGE_SIZE);
+ set_page_writeback(locked_page);
+ end_page_writeback(locked_page);
+ end_extent_writepage(locked_page, ret, page_start, page_end);
unlock_page(locked_page);
+ }
goto out;
}

@@ -1097,15 +1108,39 @@ static u64 get_extent_allocation_hint(struct btrfs_inode *inode, u64 start,
* *page_started is set to one if we unlock locked_page and do everything
* required to start IO on it. It may be clean and already done with
* IO when we return.
+ *
+ * When unlock == 1, we unlock the pages in successfully allocated regions.
+ * When unlock == 0, we leave them locked for writing them out.
+ *
+ * However, we unlock all the pages except @locked_page in case of failure.
+ *
+ * In summary, page locking state will be as follow:
+ *
+ * - page_started == 1 (return value)
+ * - All the pages are unlocked. IO is started.
+ * - Note that this can happen only on success
+ * - unlock == 1
+ * - All the pages except @locked_page are unlocked in any case
+ * - unlock == 0
+ * - On success, all the pages are locked for writing out them
+ * - On failure, all the pages except @locked_page are unlocked
+ *
+ * When a failure happens in the second or later iteration of the
+ * while-loop, the ordered extents created in previous iterations are kept
+ * intact. So, the caller must clean them up by calling
+ * btrfs_cleanup_ordered_extents(). See btrfs_run_delalloc_range() for
+ * example.
*/
static noinline int cow_file_range(struct btrfs_inode *inode,
struct page *locked_page,
u64 start, u64 end, int *page_started,
- unsigned long *nr_written, int unlock)
+ unsigned long *nr_written, int unlock,
+ u64 *done_offset)
{
struct btrfs_root *root = inode->root;
struct btrfs_fs_info *fs_info = root->fs_info;
u64 alloc_hint = 0;
+ u64 orig_start = start;
u64 num_bytes;
unsigned long ram_size;
u64 cur_alloc_size = 0;
@@ -1293,18 +1328,62 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
btrfs_dec_block_group_reservations(fs_info, ins.objectid);
btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 1);
out_unlock:
+ /*
+ * If done_offset is non-NULL and ret == -EAGAIN, we expect the
+ * caller to write out the successfully allocated region and retry.
+ */
+ if (done_offset && ret == -EAGAIN) {
+ if (orig_start < start)
+ *done_offset = start - 1;
+ else
+ *done_offset = start;
+ return ret;
+ } else if (ret == -EAGAIN) {
+ /* Convert to -ENOSPC since the caller cannot retry. */
+ ret = -ENOSPC;
+ }
+
+ /*
+ * Now, we have three regions to clean up:
+ *
+ * |-------(1)----|---(2)---|-------------(3)----------|
+ * `- orig_start `- start `- start + cur_alloc_size `- end
+ *
+ * We process each region below.
+ */
+
clear_bits = EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DELALLOC_NEW |
EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV;
page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK | PAGE_END_WRITEBACK;
+
/*
- * If we reserved an extent for our delalloc range (or a subrange) and
- * failed to create the respective ordered extent, then it means that
- * when we reserved the extent we decremented the extent's size from
- * the data space_info's bytes_may_use counter and incremented the
- * space_info's bytes_reserved counter by the same amount. We must make
- * sure extent_clear_unlock_delalloc() does not try to decrement again
- * the data space_info's bytes_may_use counter, therefore we do not pass
- * it the flag EXTENT_CLEAR_DATA_RESV.
+ * For the range (1). We have already instantiated the ordered extents
+ * for this region. They are cleaned up by
+ * btrfs_cleanup_ordered_extents() in e.g,
+ * btrfs_run_delalloc_range(). EXTENT_LOCKED | EXTENT_DELALLOC are
+ * already cleared in the above loop. And, EXTENT_DELALLOC_NEW |
+ * EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV are handled by the cleanup
+ * function.
+ *
+ * However, in case of unlock == 0, we still need to unlock the pages
+ * (except @locked_page) to ensure all the pages are unlocked.
+ */
+ if (!unlock && orig_start < start) {
+ if (!locked_page)
+ mapping_set_error(inode->vfs_inode.i_mapping, ret);
+ extent_clear_unlock_delalloc(inode, orig_start, start - 1,
+ locked_page, 0, page_ops);
+ }
+
+ /*
+ * For the range (2). If we reserved an extent for our delalloc range
+ * (or a subrange) and failed to create the respective ordered extent,
+ * then it means that when we reserved the extent we decremented the
+ * extent's size from the data space_info's bytes_may_use counter and
+ * incremented the space_info's bytes_reserved counter by the same
+ * amount. We must make sure extent_clear_unlock_delalloc() does not try
+ * to decrement again the data space_info's bytes_may_use counter,
+ * therefore we do not pass it the flag EXTENT_CLEAR_DATA_RESV.
*/
if (extent_reserved) {
extent_clear_unlock_delalloc(inode, start,
@@ -1316,6 +1395,13 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
if (start >= end)
goto out;
}
+
+ /*
+ * For the range (3). We never touched the region. In addition to the
+ * clear_bits above, we add EXTENT_CLEAR_DATA_RESV to release the data
+ * space_info's bytes_may_use counter, reserved in
+ * btrfs_check_data_free_space().
+ */
extent_clear_unlock_delalloc(inode, start, end, locked_page,
clear_bits | EXTENT_CLEAR_DATA_RESV,
page_ops);
@@ -1502,19 +1588,42 @@ static noinline int run_delalloc_zoned(struct btrfs_inode *inode,
u64 end, int *page_started,
unsigned long *nr_written)
{
+ u64 done_offset = end;
int ret;
+ bool locked_page_done = false;

- ret = cow_file_range(inode, locked_page, start, end, page_started,
- nr_written, 0);
- if (ret)
- return ret;
+ while (start <= end) {
+ ret = cow_file_range(inode, locked_page, start, end, page_started,
+ nr_written, 0, &done_offset);
+ if (ret && ret != -EAGAIN)
+ return ret;

- if (*page_started)
- return 0;
+ if (*page_started) {
+ ASSERT(ret == 0);
+ return 0;
+ }
+
+ if (ret == 0)
+ done_offset = end;
+
+ if (done_offset == start) {
+ struct btrfs_fs_info *info = inode->root->fs_info;
+
+ wait_var_event(&info->zone_finish_wait,
+ !test_bit(BTRFS_FS_NEED_ZONE_FINISH, &info->flags));
+ continue;
+ }
+
+ if (!locked_page_done) {
+ __set_page_dirty_nobuffers(locked_page);
+ account_page_redirty(locked_page);
+ }
+ locked_page_done = true;
+ extent_write_locked_range(&inode->vfs_inode, start, done_offset);
+
+ start = done_offset + 1;
+ }

- __set_page_dirty_nobuffers(locked_page);
- account_page_redirty(locked_page);
- extent_write_locked_range(&inode->vfs_inode, start, end);
*page_started = 1;

return 0;
@@ -1606,7 +1715,7 @@ static int fallback_to_cow(struct btrfs_inode *inode, struct page *locked_page,
}

return cow_file_range(inode, locked_page, start, end, page_started,
- nr_written, 1);
+ nr_written, 1, NULL);
}

/*
@@ -2017,7 +2126,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page
page_started, nr_written);
else
ret = cow_file_range(inode, locked_page, start, end,
- page_started, nr_written, 1);
+ page_started, nr_written, 1, NULL);
} else {
set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags);
ret = cow_file_range_async(inode, wbc, locked_page, start, end,
@@ -2033,6 +2142,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page
void btrfs_split_delalloc_extent(struct inode *inode,
struct extent_state *orig, u64 split)
{
+ struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
u64 size;

/* not delalloc, ignore it */
@@ -2040,7 +2150,7 @@ void btrfs_split_delalloc_extent(struct inode *inode,
return;

size = orig->end - orig->start + 1;
- if (size > BTRFS_MAX_EXTENT_SIZE) {
+ if (size > fs_info->max_extent_size) {
u32 num_extents;
u64 new_size;

@@ -2049,10 +2159,10 @@ void btrfs_split_delalloc_extent(struct inode *inode,
* applies here, just in reverse.
*/
new_size = orig->end - split + 1;
- num_extents = count_max_extents(new_size);
+ num_extents = count_max_extents(fs_info, new_size);
new_size = split - orig->start;
- num_extents += count_max_extents(new_size);
- if (count_max_extents(size) >= num_extents)
+ num_extents += count_max_extents(fs_info, new_size);
+ if (count_max_extents(fs_info, size) >= num_extents)
return;
}

@@ -2069,6 +2179,7 @@ void btrfs_split_delalloc_extent(struct inode *inode,
void btrfs_merge_delalloc_extent(struct inode *inode, struct extent_state *new,
struct extent_state *other)
{
+ struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
u64 new_size, old_size;
u32 num_extents;

@@ -2082,7 +2193,7 @@ void btrfs_merge_delalloc_extent(struct inode *inode, struct extent_state *new,
new_size = other->end - new->start + 1;

/* we're not bigger than the max, unreserve the space and go */
- if (new_size <= BTRFS_MAX_EXTENT_SIZE) {
+ if (new_size <= fs_info->max_extent_size) {
spin_lock(&BTRFS_I(inode)->lock);
btrfs_mod_outstanding_extents(BTRFS_I(inode), -1);
spin_unlock(&BTRFS_I(inode)->lock);
@@ -2108,10 +2219,10 @@ void btrfs_merge_delalloc_extent(struct inode *inode, struct extent_state *new,
* this case.
*/
old_size = other->end - other->start + 1;
- num_extents = count_max_extents(old_size);
+ num_extents = count_max_extents(fs_info, old_size);
old_size = new->end - new->start + 1;
- num_extents += count_max_extents(old_size);
- if (count_max_extents(new_size) >= num_extents)
+ num_extents += count_max_extents(fs_info, old_size);
+ if (count_max_extents(fs_info, new_size) >= num_extents)
return;

spin_lock(&BTRFS_I(inode)->lock);
@@ -2190,7 +2301,7 @@ void btrfs_set_delalloc_extent(struct inode *inode, struct extent_state *state,
if (!(state->state & EXTENT_DELALLOC) && (*bits & EXTENT_DELALLOC)) {
struct btrfs_root *root = BTRFS_I(inode)->root;
u64 len = state->end + 1 - state->start;
- u32 num_extents = count_max_extents(len);
+ u32 num_extents = count_max_extents(fs_info, len);
bool do_list = !btrfs_is_free_space_inode(BTRFS_I(inode));

spin_lock(&BTRFS_I(inode)->lock);
@@ -2232,7 +2343,7 @@ void btrfs_clear_delalloc_extent(struct inode *vfs_inode,
struct btrfs_inode *inode = BTRFS_I(vfs_inode);
struct btrfs_fs_info *fs_info = btrfs_sb(vfs_inode->i_sb);
u64 len = state->end + 1 - state->start;
- u32 num_extents = count_max_extents(len);
+ u32 num_extents = count_max_extents(fs_info, len);

if ((state->state & EXTENT_DEFRAG) && (*bits & EXTENT_DEFRAG)) {
spin_lock(&inode->lock);
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index b87931a458eb..104cbc901c0e 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -9,6 +9,7 @@
#include "ordered-data.h"
#include "transaction.h"
#include "block-group.h"
+#include "zoned.h"

/*
* HOW DOES SPACE RESERVATION WORK
@@ -181,6 +182,43 @@ void btrfs_clear_space_info_full(struct btrfs_fs_info *info)
found->full = 0;
}

+/*
+ * Block groups with more than this value (percents) of unusable space will be
+ * scheduled for background reclaim.
+ */
+#define BTRFS_DEFAULT_ZONED_RECLAIM_THRESH (75)
+
+/*
+ * Calculate chunk size depending on volume type (regular or zoned).
+ */
+static u64 calc_chunk_size(const struct btrfs_fs_info *fs_info, u64 flags)
+{
+ if (btrfs_is_zoned(fs_info))
+ return fs_info->zone_size;
+
+ ASSERT(flags & BTRFS_BLOCK_GROUP_TYPE_MASK);
+
+ if (flags & BTRFS_BLOCK_GROUP_DATA)
+ return SZ_1G;
+ else if (flags & BTRFS_BLOCK_GROUP_SYSTEM)
+ return SZ_32M;
+
+ /* Handle BTRFS_BLOCK_GROUP_METADATA */
+ if (fs_info->fs_devices->total_rw_bytes > 50ULL * SZ_1G)
+ return SZ_1G;
+
+ return SZ_256M;
+}
+
+/*
+ * Update default chunk size.
+ */
+void btrfs_update_space_info_chunk_size(struct btrfs_space_info *space_info,
+ u64 chunk_size)
+{
+ WRITE_ONCE(space_info->chunk_size, chunk_size);
+}
+
static int create_space_info(struct btrfs_fs_info *info, u64 flags)
{

@@ -202,6 +240,10 @@ static int create_space_info(struct btrfs_fs_info *info, u64 flags)
INIT_LIST_HEAD(&space_info->tickets);
INIT_LIST_HEAD(&space_info->priority_tickets);
space_info->clamp = 1;
+ btrfs_update_space_info_chunk_size(space_info, calc_chunk_size(info, flags));
+
+ if (btrfs_is_zoned(info))
+ space_info->bg_reclaim_threshold = BTRFS_DEFAULT_ZONED_RECLAIM_THRESH;

ret = btrfs_sysfs_add_space_info_type(info, space_info);
if (ret)
@@ -254,7 +296,7 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info)
void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags,
u64 total_bytes, u64 bytes_used,
u64 bytes_readonly, u64 bytes_zone_unusable,
- struct btrfs_space_info **space_info)
+ bool active, struct btrfs_space_info **space_info)
{
struct btrfs_space_info *found;
int factor;
@@ -265,6 +307,8 @@ void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags,
ASSERT(found);
spin_lock(&found->lock);
found->total_bytes += total_bytes;
+ if (active)
+ found->active_total_bytes += total_bytes;
found->disk_total += total_bytes * factor;
found->bytes_used += bytes_used;
found->disk_used += bytes_used * factor;
@@ -328,6 +372,22 @@ static u64 calc_available_free_space(struct btrfs_fs_info *fs_info,
return avail;
}

+static inline u64 writable_total_bytes(struct btrfs_fs_info *fs_info,
+ struct btrfs_space_info *space_info)
+{
+ /*
+ * On regular filesystem, all total_bytes are always writable. On zoned
+ * filesystem, there may be a limitation imposed by max_active_zones.
+ * For metadata allocation, we cannot finish an existing active block
+ * group to avoid a deadlock. Thus, we need to consider only the active
+ * groups to be writable for metadata space.
+ */
+ if (!btrfs_is_zoned(fs_info) || (space_info->flags & BTRFS_BLOCK_GROUP_DATA))
+ return space_info->total_bytes;
+
+ return space_info->active_total_bytes;
+}
+
int btrfs_can_overcommit(struct btrfs_fs_info *fs_info,
struct btrfs_space_info *space_info, u64 bytes,
enum btrfs_reserve_flush_enum flush)
@@ -340,9 +400,12 @@ int btrfs_can_overcommit(struct btrfs_fs_info *fs_info,
return 0;

used = btrfs_space_info_used(space_info, true);
- avail = calc_available_free_space(fs_info, space_info, flush);
+ if (btrfs_is_zoned(fs_info) && (space_info->flags & BTRFS_BLOCK_GROUP_METADATA))
+ avail = 0;
+ else
+ avail = calc_available_free_space(fs_info, space_info, flush);

- if (used + bytes < space_info->total_bytes + avail)
+ if (used + bytes < writable_total_bytes(fs_info, space_info) + avail)
return 1;
return 0;
}
@@ -378,7 +441,7 @@ void btrfs_try_granting_tickets(struct btrfs_fs_info *fs_info,
ticket = list_first_entry(head, struct reserve_ticket, list);

/* Check and see if our ticket can be satisfied now. */
- if ((used + ticket->bytes <= space_info->total_bytes) ||
+ if ((used + ticket->bytes <= writable_total_bytes(fs_info, space_info)) ||
btrfs_can_overcommit(fs_info, space_info, ticket->bytes,
flush)) {
btrfs_space_info_update_bytes_may_use(fs_info,
@@ -662,6 +725,18 @@ static void flush_space(struct btrfs_fs_info *fs_info,
break;
case ALLOC_CHUNK:
case ALLOC_CHUNK_FORCE:
+ /*
+ * For metadata space on zoned filesystem, reaching here means we
+ * don't have enough space left in active_total_bytes. Try to
+ * activate a block group first, because we may have inactive
+ * block group already allocated.
+ */
+ ret = btrfs_zoned_activate_one_bg(fs_info, space_info, false);
+ if (ret < 0)
+ break;
+ else if (ret == 1)
+ break;
+
trans = btrfs_join_transaction(root);
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
@@ -672,6 +747,23 @@ static void flush_space(struct btrfs_fs_info *fs_info,
(state == ALLOC_CHUNK) ? CHUNK_ALLOC_NO_FORCE :
CHUNK_ALLOC_FORCE);
btrfs_end_transaction(trans);
+
+ /*
+ * For metadata space on zoned filesystem, allocating a new chunk
+ * is not enough. We still need to activate the block * group.
+ * Active the newly allocated block group by (maybe) finishing
+ * a block group.
+ */
+ if (ret == 1) {
+ ret = btrfs_zoned_activate_one_bg(fs_info, space_info, true);
+ /*
+ * Revert to the original ret regardless we could finish
+ * one block group or not.
+ */
+ if (ret >= 0)
+ ret = 1;
+ }
+
if (ret > 0 || ret == -ENOSPC)
ret = 0;
break;
@@ -709,6 +801,7 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info,
{
u64 used;
u64 avail;
+ u64 total;
u64 to_reclaim = space_info->reclaim_size;

lockdep_assert_held(&space_info->lock);
@@ -723,8 +816,9 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info,
* space. If that's the case add in our overage so we make sure to put
* appropriate pressure on the flushing state machine.
*/
- if (space_info->total_bytes + avail < used)
- to_reclaim += used - (space_info->total_bytes + avail);
+ total = writable_total_bytes(fs_info, space_info);
+ if (total + avail < used)
+ to_reclaim += used - (total + avail);

return to_reclaim;
}
@@ -734,9 +828,12 @@ static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info,
{
u64 global_rsv_size = fs_info->global_block_rsv.reserved;
u64 ordered, delalloc;
- u64 thresh = div_factor_fine(space_info->total_bytes, 90);
+ u64 total = writable_total_bytes(fs_info, space_info);
+ u64 thresh;
u64 used;

+ thresh = div_factor_fine(total, 90);
+
lockdep_assert_held(&space_info->lock);

/* If we're just plain full then async reclaim just slows us down. */
@@ -798,8 +895,8 @@ static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info,
BTRFS_RESERVE_FLUSH_ALL);
used = space_info->bytes_used + space_info->bytes_reserved +
space_info->bytes_readonly + global_rsv_size;
- if (used < space_info->total_bytes)
- thresh += space_info->total_bytes - used;
+ if (used < total)
+ thresh += total - used;
thresh >>= space_info->clamp;

used = space_info->bytes_pinned;
@@ -1516,7 +1613,7 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info,
* can_overcommit() to ensure we can overcommit to continue.
*/
if (!pending_tickets &&
- ((used + orig_bytes <= space_info->total_bytes) ||
+ ((used + orig_bytes <= writable_total_bytes(fs_info, space_info)) ||
btrfs_can_overcommit(fs_info, space_info, orig_bytes, flush))) {
btrfs_space_info_update_bytes_may_use(fs_info, space_info,
orig_bytes);
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index d841fed73492..b8cee27df213 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -17,12 +17,22 @@ struct btrfs_space_info {
u64 bytes_may_use; /* number of bytes that may be used for
delalloc/allocations */
u64 bytes_readonly; /* total bytes that are read only */
+ /* Total bytes in the space, but only accounts active block groups. */
+ u64 active_total_bytes;
u64 bytes_zone_unusable; /* total bytes that are unusable until
resetting the device zone */

u64 max_extent_size; /* This will hold the maximum extent size of
the space info if we had an ENOSPC in the
allocator. */
+ /* Chunk size in bytes */
+ u64 chunk_size;
+
+ /*
+ * Once a block group drops below this threshold (percents) we'll
+ * schedule it for reclaim.
+ */
+ int bg_reclaim_threshold;

int clamp; /* Used to scale our threshold for preemptive
flushing. The value is >> clamp, so turns
@@ -114,7 +124,9 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info);
void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags,
u64 total_bytes, u64 bytes_used,
u64 bytes_readonly, u64 bytes_zone_unusable,
- struct btrfs_space_info **space_info);
+ bool active, struct btrfs_space_info **space_info);
+void btrfs_update_space_info_chunk_size(struct btrfs_space_info *space_info,
+ u64 chunk_size);
struct btrfs_space_info *btrfs_find_space_info(struct btrfs_fs_info *info,
u64 flags);
u64 __pure btrfs_space_info_used(struct btrfs_space_info *s_info,
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index ba78ca5aabbb..43845cae0c74 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -722,6 +722,42 @@ SPACE_INFO_ATTR(bytes_zone_unusable);
SPACE_INFO_ATTR(disk_used);
SPACE_INFO_ATTR(disk_total);

+static ssize_t btrfs_sinfo_bg_reclaim_threshold_show(struct kobject *kobj,
+ struct kobj_attribute *a,
+ char *buf)
+{
+ struct btrfs_space_info *space_info = to_space_info(kobj);
+ ssize_t ret;
+
+ ret = sysfs_emit(buf, "%d\n", READ_ONCE(space_info->bg_reclaim_threshold));
+
+ return ret;
+}
+
+static ssize_t btrfs_sinfo_bg_reclaim_threshold_store(struct kobject *kobj,
+ struct kobj_attribute *a,
+ const char *buf, size_t len)
+{
+ struct btrfs_space_info *space_info = to_space_info(kobj);
+ int thresh;
+ int ret;
+
+ ret = kstrtoint(buf, 10, &thresh);
+ if (ret)
+ return ret;
+
+ if (thresh != 0 && (thresh <= 50 || thresh > 100))
+ return -EINVAL;
+
+ WRITE_ONCE(space_info->bg_reclaim_threshold, thresh);
+
+ return len;
+}
+
+BTRFS_ATTR_RW(space_info, bg_reclaim_threshold,
+ btrfs_sinfo_bg_reclaim_threshold_show,
+ btrfs_sinfo_bg_reclaim_threshold_store);
+
/*
* Allocation information about block group types.
*
@@ -738,6 +774,7 @@ static struct attribute *space_info_attrs[] = {
BTRFS_ATTR_PTR(space_info, bytes_zone_unusable),
BTRFS_ATTR_PTR(space_info, disk_used),
BTRFS_ATTR_PTR(space_info, disk_total),
+ BTRFS_ATTR_PTR(space_info, bg_reclaim_threshold),
NULL,
};
ATTRIBUTE_GROUPS(space_info);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index e65633686378..2cc376775ae4 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -171,7 +171,7 @@ static int start_log_trans(struct btrfs_trans_handle *trans,
int index = (root->log_transid + 1) % 2;

if (btrfs_need_log_full_commit(trans)) {
- ret = -EAGAIN;
+ ret = BTRFS_LOG_FORCE_COMMIT;
goto out;
}

@@ -194,7 +194,7 @@ static int start_log_trans(struct btrfs_trans_handle *trans,
* writing.
*/
if (zoned && !created) {
- ret = -EAGAIN;
+ ret = BTRFS_LOG_FORCE_COMMIT;
goto out;
}

@@ -3122,7 +3122,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,

/* bail out if we need to do a full commit */
if (btrfs_need_log_full_commit(trans)) {
- ret = -EAGAIN;
+ ret = BTRFS_LOG_FORCE_COMMIT;
mutex_unlock(&root->log_mutex);
goto out;
}
@@ -3223,7 +3223,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
}
btrfs_wait_tree_log_extents(log, mark);
mutex_unlock(&log_root_tree->log_mutex);
- ret = -EAGAIN;
+ ret = BTRFS_LOG_FORCE_COMMIT;
goto out;
}

@@ -3262,7 +3262,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
blk_finish_plug(&plug);
btrfs_wait_tree_log_extents(log, mark);
mutex_unlock(&log_root_tree->log_mutex);
- ret = -EAGAIN;
+ ret = BTRFS_LOG_FORCE_COMMIT;
goto out_wake_log_root;
}

@@ -5849,7 +5849,7 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
inode_only == LOG_INODE_ALL &&
inode->last_unlink_trans >= trans->transid) {
btrfs_set_log_full_commit(trans);
- ret = 1;
+ ret = BTRFS_LOG_FORCE_COMMIT;
goto out_unlock;
}

@@ -6563,12 +6563,12 @@ static int btrfs_log_inode_parent(struct btrfs_trans_handle *trans,
bool log_dentries = false;

if (btrfs_test_opt(fs_info, NOTREELOG)) {
- ret = 1;
+ ret = BTRFS_LOG_FORCE_COMMIT;
goto end_no_trans;
}

if (btrfs_root_refs(&root->root_item) == 0) {
- ret = 1;
+ ret = BTRFS_LOG_FORCE_COMMIT;
goto end_no_trans;
}

@@ -6666,7 +6666,7 @@ static int btrfs_log_inode_parent(struct btrfs_trans_handle *trans,
end_trans:
if (ret < 0) {
btrfs_set_log_full_commit(trans);
- ret = 1;
+ ret = BTRFS_LOG_FORCE_COMMIT;
}

if (ret)
@@ -7030,8 +7030,15 @@ void btrfs_log_new_name(struct btrfs_trans_handle *trans,
* anyone from syncing the log until we have updated both inodes
* in the log.
*/
+ ret = join_running_log_trans(root);
+ /*
+ * At least one of the inodes was logged before, so this should
+ * not fail, but if it does, it's not serious, just bail out and
+ * mark the log for a full commit.
+ */
+ if (WARN_ON_ONCE(ret < 0))
+ goto out;
log_pinned = true;
- btrfs_pin_log_trans(root);

path = btrfs_alloc_path();
if (!path) {
diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
index 1620f8170629..57ab5f3b8dc7 100644
--- a/fs/btrfs/tree-log.h
+++ b/fs/btrfs/tree-log.h
@@ -12,6 +12,9 @@
/* return value for btrfs_log_dentry_safe that means we don't need to log it at all */
#define BTRFS_NO_LOG_SYNC 256

+/* We can't use the tree log for whatever reason, force a transaction commit */
+#define BTRFS_LOG_FORCE_COMMIT (1)
+
struct btrfs_log_ctx {
int log_ret;
int log_transid;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 659575526e9f..4bc97e7d8e46 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5091,26 +5091,16 @@ static void init_alloc_chunk_ctl_policy_regular(
struct btrfs_fs_devices *fs_devices,
struct alloc_chunk_ctl *ctl)
{
- u64 type = ctl->type;
+ struct btrfs_space_info *space_info;

- if (type & BTRFS_BLOCK_GROUP_DATA) {
- ctl->max_stripe_size = SZ_1G;
- ctl->max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE;
- } else if (type & BTRFS_BLOCK_GROUP_METADATA) {
- /* For larger filesystems, use larger metadata chunks */
- if (fs_devices->total_rw_bytes > 50ULL * SZ_1G)
- ctl->max_stripe_size = SZ_1G;
- else
- ctl->max_stripe_size = SZ_256M;
- ctl->max_chunk_size = ctl->max_stripe_size;
- } else if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
- ctl->max_stripe_size = SZ_32M;
- ctl->max_chunk_size = 2 * ctl->max_stripe_size;
- ctl->devs_max = min_t(int, ctl->devs_max,
- BTRFS_MAX_DEVS_SYS_CHUNK);
- } else {
- BUG();
- }
+ space_info = btrfs_find_space_info(fs_devices->fs_info, ctl->type);
+ ASSERT(space_info);
+
+ ctl->max_chunk_size = READ_ONCE(space_info->chunk_size);
+ ctl->max_stripe_size = ctl->max_chunk_size;
+
+ if (ctl->type & BTRFS_BLOCK_GROUP_SYSTEM)
+ ctl->devs_max = min_t(int, ctl->devs_max, BTRFS_MAX_DEVS_SYS_CHUNK);

/* We don't want a chunk larger than 10% of writable space */
ctl->max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 84b6d39509bd..45e29b8c705c 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -407,6 +407,16 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device, bool populate_cache)
nr_sectors = bdev_nr_sectors(bdev);
zone_info->zone_size_shift = ilog2(zone_info->zone_size);
zone_info->nr_zones = nr_sectors >> ilog2(zone_sectors);
+ /*
+ * We limit max_zone_append_size also by max_segments *
+ * PAGE_SIZE. Technically, we can have multiple pages per segment. But,
+ * since btrfs adds the pages one by one to a bio, and btrfs cannot
+ * increase the metadata reservation even if it increases the number of
+ * extents, it is safe to stick with the limit.
+ */
+ zone_info->max_zone_append_size =
+ min_t(u64, (u64)bdev_max_zone_append_sectors(bdev) << SECTOR_SHIFT,
+ (u64)bdev_max_segments(bdev) << PAGE_SHIFT);
if (!IS_ALIGNED(nr_sectors, zone_sectors))
zone_info->nr_zones++;

@@ -632,6 +642,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info)
u64 zoned_devices = 0;
u64 nr_devices = 0;
u64 zone_size = 0;
+ u64 max_zone_append_size = 0;
const bool incompat_zoned = btrfs_fs_incompat(fs_info, ZONED);
int ret = 0;

@@ -666,6 +677,11 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info)
ret = -EINVAL;
goto out;
}
+ if (!max_zone_append_size ||
+ (zone_info->max_zone_append_size &&
+ zone_info->max_zone_append_size < max_zone_append_size))
+ max_zone_append_size =
+ zone_info->max_zone_append_size;
}
nr_devices++;
}
@@ -715,7 +731,11 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info)
}

fs_info->zone_size = zone_size;
+ fs_info->max_zone_append_size = ALIGN_DOWN(max_zone_append_size,
+ fs_info->sectorsize);
fs_info->fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_ZONED;
+ if (fs_info->max_zone_append_size < fs_info->max_extent_size)
+ fs_info->max_extent_size = fs_info->max_zone_append_size;

/*
* Check mount options here, because we might change fs_info->zoned
@@ -1821,6 +1841,7 @@ struct btrfs_device *btrfs_zoned_get_device(struct btrfs_fs_info *fs_info,
bool btrfs_zone_activate(struct btrfs_block_group *block_group)
{
struct btrfs_fs_info *fs_info = block_group->fs_info;
+ struct btrfs_space_info *space_info = block_group->space_info;
struct map_lookup *map;
struct btrfs_device *device;
u64 physical;
@@ -1832,6 +1853,7 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group)

map = block_group->physical_map;

+ spin_lock(&space_info->lock);
spin_lock(&block_group->lock);
if (block_group->zone_is_active) {
ret = true;
@@ -1839,7 +1861,7 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group)
}

/* No space left */
- if (block_group->alloc_offset == block_group->zone_capacity) {
+ if (btrfs_zoned_bg_is_full(block_group)) {
ret = false;
goto out_unlock;
}
@@ -1860,7 +1882,10 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group)

/* Successfully activated all the zones */
block_group->zone_is_active = 1;
+ space_info->active_total_bytes += block_group->length;
spin_unlock(&block_group->lock);
+ btrfs_try_granting_tickets(fs_info, space_info);
+ spin_unlock(&space_info->lock);

/* For the active block group list */
btrfs_get_block_group(block_group);
@@ -1873,6 +1898,7 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group)

out_unlock:
spin_unlock(&block_group->lock);
+ spin_unlock(&space_info->lock);
return ret;
}

@@ -1967,6 +1993,9 @@ int btrfs_zone_finish(struct btrfs_block_group *block_group)
/* For active_bg_list */
btrfs_put_block_group(block_group);

+ clear_bit(BTRFS_FS_NEED_ZONE_FINISH, &fs_info->flags);
+ wake_up_all(&fs_info->zone_finish_wait);
+
return 0;
}

@@ -1995,6 +2024,9 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags)
}
mutex_unlock(&fs_info->chunk_mutex);

+ if (!ret)
+ set_bit(BTRFS_FS_NEED_ZONE_FINISH, &fs_info->flags);
+
return ret;
}

@@ -2156,3 +2188,96 @@ void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, u64 logica
spin_unlock(&block_group->lock);
btrfs_put_block_group(block_group);
}
+
+int btrfs_zone_finish_one_bg(struct btrfs_fs_info *fs_info)
+{
+ struct btrfs_block_group *block_group;
+ struct btrfs_block_group *min_bg = NULL;
+ u64 min_avail = U64_MAX;
+ int ret;
+
+ spin_lock(&fs_info->zone_active_bgs_lock);
+ list_for_each_entry(block_group, &fs_info->zone_active_bgs,
+ active_bg_list) {
+ u64 avail;
+
+ spin_lock(&block_group->lock);
+ if (block_group->reserved ||
+ (block_group->flags & BTRFS_BLOCK_GROUP_SYSTEM)) {
+ spin_unlock(&block_group->lock);
+ continue;
+ }
+
+ avail = block_group->zone_capacity - block_group->alloc_offset;
+ if (min_avail > avail) {
+ if (min_bg)
+ btrfs_put_block_group(min_bg);
+ min_bg = block_group;
+ min_avail = avail;
+ btrfs_get_block_group(min_bg);
+ }
+ spin_unlock(&block_group->lock);
+ }
+ spin_unlock(&fs_info->zone_active_bgs_lock);
+
+ if (!min_bg)
+ return 0;
+
+ ret = btrfs_zone_finish(min_bg);
+ btrfs_put_block_group(min_bg);
+
+ return ret < 0 ? ret : 1;
+}
+
+int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info,
+ struct btrfs_space_info *space_info,
+ bool do_finish)
+{
+ struct btrfs_block_group *bg;
+ int index;
+
+ if (!btrfs_is_zoned(fs_info) || (space_info->flags & BTRFS_BLOCK_GROUP_DATA))
+ return 0;
+
+ /* No more block groups to activate */
+ if (space_info->active_total_bytes == space_info->total_bytes)
+ return 0;
+
+ for (;;) {
+ int ret;
+ bool need_finish = false;
+
+ down_read(&space_info->groups_sem);
+ for (index = 0; index < BTRFS_NR_RAID_TYPES; index++) {
+ list_for_each_entry(bg, &space_info->block_groups[index],
+ list) {
+ if (!spin_trylock(&bg->lock))
+ continue;
+ if (btrfs_zoned_bg_is_full(bg) || bg->zone_is_active) {
+ spin_unlock(&bg->lock);
+ continue;
+ }
+ spin_unlock(&bg->lock);
+
+ if (btrfs_zone_activate(bg)) {
+ up_read(&space_info->groups_sem);
+ return 1;
+ }
+
+ need_finish = true;
+ }
+ }
+ up_read(&space_info->groups_sem);
+
+ if (!do_finish || !need_finish)
+ break;
+
+ ret = btrfs_zone_finish_one_bg(fs_info);
+ if (ret == 0)
+ break;
+ if (ret < 0)
+ return ret;
+ }
+
+ return 0;
+}
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index cf6320feef46..1cac32266276 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -10,11 +10,7 @@
#include "block-group.h"
#include "btrfs_inode.h"

-/*
- * Block groups with more than this value (percents) of unusable space will be
- * scheduled for background reclaim.
- */
-#define BTRFS_DEFAULT_RECLAIM_THRESH 75
+#define BTRFS_DEFAULT_RECLAIM_THRESH (75)

struct btrfs_zoned_device_info {
/*
@@ -23,6 +19,7 @@ struct btrfs_zoned_device_info {
*/
u64 zone_size;
u8 zone_size_shift;
+ u64 max_zone_append_size;
u32 nr_zones;
unsigned int max_active_zones;
atomic_t active_zones_left;
@@ -82,6 +79,9 @@ void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg);
void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info);
void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, u64 logical,
u64 length);
+int btrfs_zone_finish_one_bg(struct btrfs_fs_info *fs_info);
+int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info,
+ struct btrfs_space_info *space_info, bool do_finish);
#else /* CONFIG_BLK_DEV_ZONED */
static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos,
struct blk_zone *zone)
@@ -246,6 +246,20 @@ static inline void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info) { }

static inline void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info,
u64 logical, u64 length) { }
+
+static inline int btrfs_zone_finish_one_bg(struct btrfs_fs_info *fs_info)
+{
+ return 1;
+}
+
+static inline int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info,
+ struct btrfs_space_info *space_info,
+ bool do_finish)
+{
+ /* Consider all the block groups are active */
+ return 0;
+}
+
#endif

static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
@@ -380,4 +394,10 @@ static inline void btrfs_zoned_data_reloc_unlock(struct btrfs_inode *inode)
mutex_unlock(&root->fs_info->zoned_data_reloc_io_lock);
}

+static inline bool btrfs_zoned_bg_is_full(const struct btrfs_block_group *bg)
+{
+ ASSERT(btrfs_is_zoned(bg->fs_info));
+ return (bg->alloc_offset == bg->zone_capacity);
+}
+
#endif
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 58dce567ceaf..6aeca5b301c8 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -4456,10 +4456,10 @@ static void cifs_readahead(struct readahead_control *ractl)
* TODO: Send a whole batch of pages to be read
* by the cache.
*/
- page = readahead_page(ractl);
- last_batch_size = 1 << thp_order(page);
+ struct folio *folio = readahead_folio(ractl);
+ last_batch_size = folio_nr_pages(folio);
if (cifs_readpage_from_fscache(ractl->mapping->host,
- page) < 0) {
+ &folio->page) < 0) {
/*
* TODO: Deal with cache read failure
* here, but for the moment, delegate
@@ -4467,7 +4467,7 @@ static void cifs_readahead(struct readahead_control *ractl)
*/
caching = false;
}
- unlock_page(page);
+ folio_unlock(folio);
next_cached++;
cache_nr_pages--;
if (cache_nr_pages == 0)
@@ -4807,8 +4807,6 @@ void cifs_oplock_break(struct work_struct *work)
struct TCP_Server_Info *server = tcon->ses->server;
int rc = 0;
bool purge_cache = false;
- bool is_deferred = false;
- struct cifs_deferred_close *dclose;

wait_on_bit(&cinode->flags, CIFS_INODE_PENDING_WRITERS,
TASK_UNINTERRUPTIBLE);
@@ -4844,22 +4842,6 @@ void cifs_oplock_break(struct work_struct *work)
cifs_dbg(VFS, "Push locks rc = %d\n", rc);

oplock_break_ack:
- /*
- * When oplock break is received and there are no active
- * file handles but cached, then schedule deferred close immediately.
- * So, new open will not use cached handle.
- */
- spin_lock(&CIFS_I(inode)->deferred_lock);
- is_deferred = cifs_is_deferred_close(cfile, &dclose);
- spin_unlock(&CIFS_I(inode)->deferred_lock);
- if (is_deferred &&
- cfile->deferred_close_scheduled &&
- delayed_work_pending(&cfile->deferred)) {
- if (cancel_delayed_work(&cfile->deferred)) {
- _cifsFileInfo_put(cfile, false, false);
- goto oplock_break_done;
- }
- }
/*
* releasing stale oplock after recent reconnect of smb session using
* a now incorrect file handle is not a data integrity issue but do
@@ -4871,7 +4853,7 @@ void cifs_oplock_break(struct work_struct *work)
cinode);
cifs_dbg(FYI, "Oplock release rc = %d\n", rc);
}
-oplock_break_done:
+
_cifsFileInfo_put(cfile, false /* do not wait for ourself */, false);
cifs_done_oplock_break(cinode);
}
diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index 0e0d1fc0f130..9bbd9a59426b 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -93,14 +93,18 @@ static int z_erofs_lz4_prepare_dstpages(struct z_erofs_lz4_decompress_ctx *ctx,

if (page) {
__clear_bit(j, bounced);
- if (kaddr) {
- if (kaddr + PAGE_SIZE == page_address(page))
+ if (!PageHighMem(page)) {
+ if (!i) {
+ kaddr = page_address(page);
+ continue;
+ }
+ if (kaddr &&
+ kaddr + PAGE_SIZE == page_address(page)) {
kaddr += PAGE_SIZE;
- else
- kaddr = NULL;
- } else if (!i) {
- kaddr = page_address(page);
+ continue;
+ }
}
+ kaddr = NULL;
continue;
}
kaddr = NULL;
diff --git a/fs/erofs/decompressor_lzma.c b/fs/erofs/decompressor_lzma.c
index 05a3063cf2bc..5e59b3f523eb 100644
--- a/fs/erofs/decompressor_lzma.c
+++ b/fs/erofs/decompressor_lzma.c
@@ -143,6 +143,7 @@ int z_erofs_load_lzma_config(struct super_block *sb,
DBG_BUGON(z_erofs_lzma_head);
z_erofs_lzma_head = head;
spin_unlock(&z_erofs_lzma_lock);
+ wake_up_all(&z_erofs_lzma_wq);

z_erofs_lzma_max_dictsize = dict_size;
mutex_unlock(&lzma_resize_mutex);
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index e2daa940ebce..8b56b94e2f56 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1747,6 +1747,21 @@ static struct timespec64 *ep_timeout_to_timespec(struct timespec64 *to, long ms)
return to;
}

+/*
+ * autoremove_wake_function, but remove even on failure to wake up, because we
+ * know that default_wake_function/ttwu will only fail if the thread is already
+ * woken, and in that case the ep_poll loop will remove the entry anyways, not
+ * try to reuse it.
+ */
+static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry,
+ unsigned int mode, int sync, void *key)
+{
+ int ret = default_wake_function(wq_entry, mode, sync, key);
+
+ list_del_init(&wq_entry->entry);
+ return ret;
+}
+
/**
* ep_poll - Retrieves ready events, and delivers them to the caller-supplied
* event buffer.
@@ -1828,8 +1843,15 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
* normal wakeup path no need to call __remove_wait_queue()
* explicitly, thus ep->lock is not taken, which halts the
* event delivery.
+ *
+ * In fact, we now use an even more aggressive function that
+ * unconditionally removes, because we don't reuse the wait
+ * entry between loop iterations. This lets us also avoid the
+ * performance issue if a process is killed, causing all of its
+ * threads to wake up without being removed normally.
*/
init_wait(&wait);
+ wait.func = ep_autoremove_wake_function;

write_lock_irq(&ep->lock);
/*
diff --git a/fs/exec.c b/fs/exec.c
index 5a75e92b1a0a..a9f5acf8f0ec 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1297,6 +1297,9 @@ int begin_new_exec(struct linux_binprm * bprm)
bprm->mm = NULL;

#ifdef CONFIG_POSIX_TIMERS
+ spin_lock_irq(&me->sighand->siglock);
+ posix_cpu_timers_exit(me);
+ spin_unlock_irq(&me->sighand->siglock);
exit_itimers(me);
flush_itimer_signals();
#endif
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index f6a19f6d9f6d..cdffa2a041af 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1059,9 +1059,10 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_frags_per_group);
goto failed_mount;
}
- if (sbi->s_inodes_per_group > sb->s_blocksize * 8) {
+ if (sbi->s_inodes_per_group < sbi->s_inodes_per_block ||
+ sbi->s_inodes_per_group > sb->s_blocksize * 8) {
ext2_msg(sb, KERN_ERR,
- "error: #inodes per group too big: %lu",
+ "error: invalid #inodes per group: %lu",
sbi->s_inodes_per_group);
goto failed_mount;
}
@@ -1071,6 +1072,13 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_groups_count = ((le32_to_cpu(es->s_blocks_count) -
le32_to_cpu(es->s_first_data_block) - 1)
/ EXT2_BLOCKS_PER_GROUP(sb)) + 1;
+ if ((u64)sbi->s_groups_count * sbi->s_inodes_per_group !=
+ le32_to_cpu(es->s_inodes_count)) {
+ ext2_msg(sb, KERN_ERR, "error: invalid #inodes: %u vs computed %llu",
+ le32_to_cpu(es->s_inodes_count),
+ (u64)sbi->s_groups_count * sbi->s_inodes_per_group);
+ goto failed_mount;
+ }
db_count = (sbi->s_groups_count + EXT2_DESC_PER_BLOCK(sb) - 1) /
EXT2_DESC_PER_BLOCK(sb);
sbi->s_group_desc = kmalloc_array(db_count,
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index e9ef5cf30969..84fcd06a8e8a 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -35,6 +35,9 @@ static int get_max_inline_xattr_value_size(struct inode *inode,
struct ext4_inode *raw_inode;
int free, min_offs;

+ if (!EXT4_INODE_HAS_XATTR_SPACE(inode))
+ return 0;
+
min_offs = EXT4_SB(inode->i_sb)->s_inode_size -
EXT4_GOOD_OLD_INODE_SIZE -
EXT4_I(inode)->i_extra_isize -
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index beed9e32571c..e94ec798dce1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -178,6 +178,8 @@ void ext4_evict_inode(struct inode *inode)

trace_ext4_evict_inode(inode);

+ if (EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL)
+ ext4_evict_ea_inode(inode);
if (inode->i_nlink) {
/*
* When journalling data dirty buffers are tracked only in the
@@ -1559,7 +1561,14 @@ static void mpage_release_unused_pages(struct mpage_da_data *mpd,
ext4_lblk_t start, last;
start = index << (PAGE_SHIFT - inode->i_blkbits);
last = end << (PAGE_SHIFT - inode->i_blkbits);
+
+ /*
+ * avoid racing with extent status tree scans made by
+ * ext4_insert_delayed_block()
+ */
+ down_write(&EXT4_I(inode)->i_data_sem);
ext4_es_remove_extent(inode, start, last - start + 1);
+ up_write(&EXT4_I(inode)->i_data_sem);
}

pagevec_init(&pvec);
@@ -3130,13 +3139,15 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
{
struct inode *inode = mapping->host;
journal_t *journal;
+ sector_t ret = 0;
int err;

+ inode_lock_shared(inode);
/*
* We can get here for an inline file via the FIBMAP ioctl
*/
if (ext4_has_inline_data(inode))
- return 0;
+ goto out;

if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) &&
test_opt(inode->i_sb, DELALLOC)) {
@@ -3175,10 +3186,14 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
jbd2_journal_unlock_updates(journal);

if (err)
- return 0;
+ goto out;
}

- return iomap_bmap(mapping, block, &ext4_iomap_ops);
+ ret = iomap_bmap(mapping, block, &ext4_iomap_ops);
+
+out:
+ inode_unlock_shared(inode);
+ return ret;
}

static int ext4_readpage(struct file *file, struct page *page)
@@ -4674,8 +4689,7 @@ static inline int ext4_iget_extra_inode(struct inode *inode,
__le32 *magic = (void *)raw_inode +
EXT4_GOOD_OLD_INODE_SIZE + ei->i_extra_isize;

- if (EXT4_GOOD_OLD_INODE_SIZE + ei->i_extra_isize + sizeof(__le32) <=
- EXT4_INODE_SIZE(inode->i_sb) &&
+ if (EXT4_INODE_HAS_XATTR_SPACE(inode) &&
*magic == cpu_to_le32(EXT4_XATTR_MAGIC)) {
ext4_set_inode_state(inode, EXT4_STATE_XATTR);
return ext4_find_inline_data_nolock(inode);
diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index 7a5353a8cfd7..f2c4d9eee475 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -417,7 +417,7 @@ int ext4_ext_migrate(struct inode *inode)
struct inode *tmp_inode = NULL;
struct migrate_struct lb;
unsigned long max_entries;
- __u32 goal;
+ __u32 goal, tmp_csum_seed;
uid_t owner[2];

/*
@@ -465,6 +465,7 @@ int ext4_ext_migrate(struct inode *inode)
* the migration.
*/
ei = EXT4_I(inode);
+ tmp_csum_seed = EXT4_I(tmp_inode)->i_csum_seed;
EXT4_I(tmp_inode)->i_csum_seed = ei->i_csum_seed;
i_size_write(tmp_inode, i_size_read(inode));
/*
@@ -575,6 +576,7 @@ int ext4_ext_migrate(struct inode *inode)
* the inode is not visible to user space.
*/
tmp_inode->i_blocks = 0;
+ EXT4_I(tmp_inode)->i_csum_seed = tmp_csum_seed;

/* Reset the extent details */
ext4_ext_tree_init(handle, tmp_inode);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 4f0420b1ff3e..13b6265848c2 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -54,6 +54,7 @@ static struct buffer_head *ext4_append(handle_t *handle,
struct inode *inode,
ext4_lblk_t *block)
{
+ struct ext4_map_blocks map;
struct buffer_head *bh;
int err;

@@ -63,6 +64,21 @@ static struct buffer_head *ext4_append(handle_t *handle,
return ERR_PTR(-ENOSPC);

*block = inode->i_size >> inode->i_sb->s_blocksize_bits;
+ map.m_lblk = *block;
+ map.m_len = 1;
+
+ /*
+ * We're appending new directory block. Make sure the block is not
+ * allocated yet, otherwise we will end up corrupting the
+ * directory.
+ */
+ err = ext4_map_blocks(NULL, inode, &map, 0);
+ if (err < 0)
+ return ERR_PTR(err);
+ if (err) {
+ EXT4_ERROR_INODE(inode, "Logical block already allocated");
+ return ERR_PTR(-EFSCORRUPTED);
+ }

bh = ext4_bread(handle, inode, *block, EXT4_GET_BLOCKS_CREATE);
if (IS_ERR(bh))
@@ -110,6 +126,13 @@ static struct buffer_head *__ext4_read_dirblock(struct inode *inode,
struct ext4_dir_entry *dirent;
int is_dx_block = 0;

+ if (block >= inode->i_size) {
+ ext4_error_inode(inode, func, line, block,
+ "Attempting to read directory block (%u) that is past i_size (%llu)",
+ block, inode->i_size);
+ return ERR_PTR(-EFSCORRUPTED);
+ }
+
if (ext4_simulate_fail(inode->i_sb, EXT4_SIM_DIRBLOCK_EIO))
bh = ERR_PTR(-EIO);
else
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 8b70a4701293..e5c2713aa11a 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1484,6 +1484,7 @@ static void ext4_update_super(struct super_block *sb,
* Update the fs overhead information
*/
ext4_calculate_overhead(sb);
+ es->s_overhead_clusters = cpu_to_le32(sbi->s_overhead);

if (test_opt(sb, DEBUG))
printk(KERN_DEBUG "EXT4-fs: added group %u:"
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 042325349098..533216e80fa2 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -436,6 +436,21 @@ static int ext4_xattr_inode_iget(struct inode *parent, unsigned long ea_ino,
return err;
}

+/* Remove entry from mbcache when EA inode is getting evicted */
+void ext4_evict_ea_inode(struct inode *inode)
+{
+ struct mb_cache_entry *oe;
+
+ if (!EA_INODE_CACHE(inode))
+ return;
+ /* Wait for entry to get unused so that we can remove it */
+ while ((oe = mb_cache_entry_delete_or_get(EA_INODE_CACHE(inode),
+ ext4_xattr_inode_get_hash(inode), inode->i_ino))) {
+ mb_cache_entry_wait_unused(oe);
+ mb_cache_entry_put(EA_INODE_CACHE(inode), oe);
+ }
+}
+
static int
ext4_xattr_inode_verify_hashes(struct inode *ea_inode,
struct ext4_xattr_entry *entry, void *buffer,
@@ -976,10 +991,8 @@ int __ext4_xattr_set_credits(struct super_block *sb, struct inode *inode,
static int ext4_xattr_inode_update_ref(handle_t *handle, struct inode *ea_inode,
int ref_change)
{
- struct mb_cache *ea_inode_cache = EA_INODE_CACHE(ea_inode);
struct ext4_iloc iloc;
s64 ref_count;
- u32 hash;
int ret;

inode_lock(ea_inode);
@@ -1002,14 +1015,6 @@ static int ext4_xattr_inode_update_ref(handle_t *handle, struct inode *ea_inode,

set_nlink(ea_inode, 1);
ext4_orphan_del(handle, ea_inode);
-
- if (ea_inode_cache) {
- hash = ext4_xattr_inode_get_hash(ea_inode);
- mb_cache_entry_create(ea_inode_cache,
- GFP_NOFS, hash,
- ea_inode->i_ino,
- true /* reusable */);
- }
}
} else {
WARN_ONCE(ref_count < 0, "EA inode %lu ref_count=%lld",
@@ -1022,12 +1027,6 @@ static int ext4_xattr_inode_update_ref(handle_t *handle, struct inode *ea_inode,

clear_nlink(ea_inode);
ext4_orphan_add(handle, ea_inode);
-
- if (ea_inode_cache) {
- hash = ext4_xattr_inode_get_hash(ea_inode);
- mb_cache_entry_delete(ea_inode_cache, hash,
- ea_inode->i_ino);
- }
}
}

@@ -1237,6 +1236,7 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode,
if (error)
goto out;

+retry_ref:
lock_buffer(bh);
hash = le32_to_cpu(BHDR(bh)->h_hash);
ref = le32_to_cpu(BHDR(bh)->h_refcount);
@@ -1246,9 +1246,18 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode,
* This must happen under buffer lock for
* ext4_xattr_block_set() to reliably detect freed block
*/
- if (ea_block_cache)
- mb_cache_entry_delete(ea_block_cache, hash,
- bh->b_blocknr);
+ if (ea_block_cache) {
+ struct mb_cache_entry *oe;
+
+ oe = mb_cache_entry_delete_or_get(ea_block_cache, hash,
+ bh->b_blocknr);
+ if (oe) {
+ unlock_buffer(bh);
+ mb_cache_entry_wait_unused(oe);
+ mb_cache_entry_put(ea_block_cache, oe);
+ goto retry_ref;
+ }
+ }
get_bh(bh);
unlock_buffer(bh);

@@ -1858,6 +1867,8 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
#define header(x) ((struct ext4_xattr_header *)(x))

if (s->base) {
+ int offset = (char *)s->here - bs->bh->b_data;
+
BUFFER_TRACE(bs->bh, "get_write_access");
error = ext4_journal_get_write_access(handle, sb, bs->bh,
EXT4_JTR_NONE);
@@ -1873,9 +1884,20 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
* ext4_xattr_block_set() to reliably detect modified
* block
*/
- if (ea_block_cache)
- mb_cache_entry_delete(ea_block_cache, hash,
- bs->bh->b_blocknr);
+ if (ea_block_cache) {
+ struct mb_cache_entry *oe;
+
+ oe = mb_cache_entry_delete_or_get(ea_block_cache,
+ hash, bs->bh->b_blocknr);
+ if (oe) {
+ /*
+ * Xattr block is getting reused. Leave
+ * it alone.
+ */
+ mb_cache_entry_put(ea_block_cache, oe);
+ goto clone_block;
+ }
+ }
ea_bdebug(bs->bh, "modifying in-place");
error = ext4_xattr_set_entry(i, s, handle, inode,
true /* is_block */);
@@ -1890,50 +1912,47 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
if (error)
goto cleanup;
goto inserted;
- } else {
- int offset = (char *)s->here - bs->bh->b_data;
+ }
+clone_block:
+ unlock_buffer(bs->bh);
+ ea_bdebug(bs->bh, "cloning");
+ s->base = kmemdup(BHDR(bs->bh), bs->bh->b_size, GFP_NOFS);
+ error = -ENOMEM;
+ if (s->base == NULL)
+ goto cleanup;
+ s->first = ENTRY(header(s->base)+1);
+ header(s->base)->h_refcount = cpu_to_le32(1);
+ s->here = ENTRY(s->base + offset);
+ s->end = s->base + bs->bh->b_size;

- unlock_buffer(bs->bh);
- ea_bdebug(bs->bh, "cloning");
- s->base = kmalloc(bs->bh->b_size, GFP_NOFS);
- error = -ENOMEM;
- if (s->base == NULL)
+ /*
+ * If existing entry points to an xattr inode, we need
+ * to prevent ext4_xattr_set_entry() from decrementing
+ * ref count on it because the reference belongs to the
+ * original block. In this case, make the entry look
+ * like it has an empty value.
+ */
+ if (!s->not_found && s->here->e_value_inum) {
+ ea_ino = le32_to_cpu(s->here->e_value_inum);
+ error = ext4_xattr_inode_iget(inode, ea_ino,
+ le32_to_cpu(s->here->e_hash),
+ &tmp_inode);
+ if (error)
goto cleanup;
- memcpy(s->base, BHDR(bs->bh), bs->bh->b_size);
- s->first = ENTRY(header(s->base)+1);
- header(s->base)->h_refcount = cpu_to_le32(1);
- s->here = ENTRY(s->base + offset);
- s->end = s->base + bs->bh->b_size;
-
- /*
- * If existing entry points to an xattr inode, we need
- * to prevent ext4_xattr_set_entry() from decrementing
- * ref count on it because the reference belongs to the
- * original block. In this case, make the entry look
- * like it has an empty value.
- */
- if (!s->not_found && s->here->e_value_inum) {
- ea_ino = le32_to_cpu(s->here->e_value_inum);
- error = ext4_xattr_inode_iget(inode, ea_ino,
- le32_to_cpu(s->here->e_hash),
- &tmp_inode);
- if (error)
- goto cleanup;
-
- if (!ext4_test_inode_state(tmp_inode,
- EXT4_STATE_LUSTRE_EA_INODE)) {
- /*
- * Defer quota free call for previous
- * inode until success is guaranteed.
- */
- old_ea_inode_quota = le32_to_cpu(
- s->here->e_value_size);
- }
- iput(tmp_inode);

- s->here->e_value_inum = 0;
- s->here->e_value_size = 0;
+ if (!ext4_test_inode_state(tmp_inode,
+ EXT4_STATE_LUSTRE_EA_INODE)) {
+ /*
+ * Defer quota free call for previous
+ * inode until success is guaranteed.
+ */
+ old_ea_inode_quota = le32_to_cpu(
+ s->here->e_value_size);
}
+ iput(tmp_inode);
+
+ s->here->e_value_inum = 0;
+ s->here->e_value_size = 0;
}
} else {
/* Allocate a buffer where we construct the new block. */
@@ -2000,18 +2019,13 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
lock_buffer(new_bh);
/*
* We have to be careful about races with
- * freeing, rehashing or adding references to
- * xattr block. Once we hold buffer lock xattr
- * block's state is stable so we can check
- * whether the block got freed / rehashed or
- * not. Since we unhash mbcache entry under
- * buffer lock when freeing / rehashing xattr
- * block, checking whether entry is still
- * hashed is reliable. Same rules hold for
- * e_reusable handling.
+ * adding references to xattr block. Once we
+ * hold buffer lock xattr block's state is
+ * stable so we can check the additional
+ * reference fits.
*/
- if (hlist_bl_unhashed(&ce->e_hash_list) ||
- !ce->e_reusable) {
+ ref = le32_to_cpu(BHDR(new_bh)->h_refcount) + 1;
+ if (ref > EXT4_XATTR_REFCOUNT_MAX) {
/*
* Undo everything and check mbcache
* again.
@@ -2026,9 +2040,8 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
new_bh = NULL;
goto inserted;
}
- ref = le32_to_cpu(BHDR(new_bh)->h_refcount) + 1;
BHDR(new_bh)->h_refcount = cpu_to_le32(ref);
- if (ref >= EXT4_XATTR_REFCOUNT_MAX)
+ if (ref == EXT4_XATTR_REFCOUNT_MAX)
ce->e_reusable = 0;
ea_bdebug(new_bh, "reusing; refcount now=%d",
ref);
@@ -2176,8 +2189,9 @@ int ext4_xattr_ibody_find(struct inode *inode, struct ext4_xattr_info *i,
struct ext4_inode *raw_inode;
int error;

- if (EXT4_I(inode)->i_extra_isize == 0)
+ if (!EXT4_INODE_HAS_XATTR_SPACE(inode))
return 0;
+
raw_inode = ext4_raw_inode(&is->iloc);
header = IHDR(inode, raw_inode);
is->s.base = is->s.first = IFIRST(header);
@@ -2205,8 +2219,9 @@ int ext4_xattr_ibody_set(handle_t *handle, struct inode *inode,
struct ext4_xattr_search *s = &is->s;
int error;

- if (EXT4_I(inode)->i_extra_isize == 0)
+ if (!EXT4_INODE_HAS_XATTR_SPACE(inode))
return -ENOSPC;
+
error = ext4_xattr_set_entry(i, s, handle, inode, false /* is_block */);
if (error)
return error;
diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h
index 77efb9a627ad..e5e36bd11f05 100644
--- a/fs/ext4/xattr.h
+++ b/fs/ext4/xattr.h
@@ -95,6 +95,19 @@ struct ext4_xattr_entry {

#define EXT4_ZERO_XATTR_VALUE ((void *)-1)

+/*
+ * If we want to add an xattr to the inode, we should make sure that
+ * i_extra_isize is not 0 and that the inode size is not less than
+ * EXT4_GOOD_OLD_INODE_SIZE + extra_isize + pad.
+ * EXT4_GOOD_OLD_INODE_SIZE extra_isize header entry pad data
+ * |--------------------------|------------|------|---------|---|-------|
+ */
+#define EXT4_INODE_HAS_XATTR_SPACE(inode) \
+ ((EXT4_I(inode)->i_extra_isize != 0) && \
+ (EXT4_GOOD_OLD_INODE_SIZE + EXT4_I(inode)->i_extra_isize + \
+ sizeof(struct ext4_xattr_ibody_header) + EXT4_XATTR_PAD <= \
+ EXT4_INODE_SIZE((inode)->i_sb)))
+
struct ext4_xattr_info {
const char *name;
const void *value;
@@ -178,6 +191,7 @@ extern void ext4_xattr_inode_array_free(struct ext4_xattr_inode_array *array);

extern int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize,
struct ext4_inode *raw_inode, handle_t *handle);
+extern void ext4_evict_ea_inode(struct inode *inode);

extern const struct xattr_handler *ext4_xattr_handlers[];

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index beceac9885c3..63ab8c9d674e 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1004,9 +1004,7 @@ static void __add_dirty_inode(struct inode *inode, enum inode_type type)
return;

set_inode_flag(inode, flag);
- if (!f2fs_is_volatile_file(inode))
- list_add_tail(&F2FS_I(inode)->dirty_list,
- &sbi->inode_list[type]);
+ list_add_tail(&F2FS_I(inode)->dirty_list, &sbi->inode_list[type]);
stat_inc_dirty_inode(sbi, type);
}

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9a1a526f2092..07522e82c7c4 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -69,8 +69,7 @@ static bool __is_cp_guaranteed(struct page *page)

if (f2fs_is_compressed_page(page))
return false;
- if ((S_ISREG(inode->i_mode) &&
- (f2fs_is_atomic_file(inode) || IS_NOQUOTA(inode))) ||
+ if ((S_ISREG(inode->i_mode) && IS_NOQUOTA(inode)) ||
page_private_gcing(page))
return true;
return false;
@@ -1436,9 +1435,12 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
*map->m_next_extent = pgofs + map->m_len;

/* for hardware encryption, but to avoid potential issue in future */
- if (flag == F2FS_GET_BLOCK_DIO)
+ if (flag == F2FS_GET_BLOCK_DIO) {
f2fs_wait_on_block_writeback_range(inode,
map->m_pblk, map->m_len);
+ invalidate_mapping_pages(META_MAPPING(sbi),
+ map->m_pblk, map->m_pblk + map->m_len - 1);
+ }

if (map->m_multidev_dio) {
block_t blk_addr = map->m_pblk;
@@ -1655,7 +1657,7 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
f2fs_wait_on_block_writeback_range(inode,
map->m_pblk, map->m_len);
invalidate_mapping_pages(META_MAPPING(sbi),
- map->m_pblk, map->m_pblk);
+ map->m_pblk, map->m_pblk + map->m_len - 1);

if (map->m_multidev_dio) {
block_t blk_addr = map->m_pblk;
@@ -2563,7 +2565,12 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio)
bool ipu_force = false;
int err = 0;

- set_new_dnode(&dn, inode, NULL, NULL, 0);
+ /* Use COW inode to make dnode_of_data for atomic write */
+ if (f2fs_is_atomic_file(inode))
+ set_new_dnode(&dn, F2FS_I(inode)->cow_inode, NULL, NULL, 0);
+ else
+ set_new_dnode(&dn, inode, NULL, NULL, 0);
+
if (need_inplace_update(fio) &&
f2fs_lookup_extent_cache(inode, page->index, &ei)) {
fio->old_blkaddr = ei.blk + page->index - ei.fofs;
@@ -2600,6 +2607,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio)
err = -EFSCORRUPTED;
goto out_writepage;
}
+
/*
* If current allocation needs SSR,
* it had better in-place writes for updated data.
@@ -2736,11 +2744,6 @@ int f2fs_write_single_data_page(struct page *page, int *submitted,
write:
if (f2fs_is_drop_cache(inode))
goto out;
- /* we should not write 0'th page having journal header */
- if (f2fs_is_volatile_file(inode) && (!page->index ||
- (!wbc->for_reclaim &&
- f2fs_available_free_memory(sbi, BASE_CHECK))))
- goto redirty_out;

/* Dentry/quota blocks are controlled by checkpoint */
if (S_ISDIR(inode->i_mode) || IS_NOQUOTA(inode)) {
@@ -3313,6 +3316,100 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
return err;
}

+static int __find_data_block(struct inode *inode, pgoff_t index,
+ block_t *blk_addr)
+{
+ struct dnode_of_data dn;
+ struct page *ipage;
+ struct extent_info ei = {0, };
+ int err = 0;
+
+ ipage = f2fs_get_node_page(F2FS_I_SB(inode), inode->i_ino);
+ if (IS_ERR(ipage))
+ return PTR_ERR(ipage);
+
+ set_new_dnode(&dn, inode, ipage, ipage, 0);
+
+ if (f2fs_lookup_extent_cache(inode, index, &ei)) {
+ dn.data_blkaddr = ei.blk + index - ei.fofs;
+ } else {
+ /* hole case */
+ err = f2fs_get_dnode_of_data(&dn, index, LOOKUP_NODE);
+ if (err) {
+ dn.data_blkaddr = NULL_ADDR;
+ err = 0;
+ }
+ }
+ *blk_addr = dn.data_blkaddr;
+ f2fs_put_dnode(&dn);
+ return err;
+}
+
+static int __reserve_data_block(struct inode *inode, pgoff_t index,
+ block_t *blk_addr, bool *node_changed)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ struct dnode_of_data dn;
+ struct page *ipage;
+ int err = 0;
+
+ f2fs_do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, true);
+
+ ipage = f2fs_get_node_page(sbi, inode->i_ino);
+ if (IS_ERR(ipage)) {
+ err = PTR_ERR(ipage);
+ goto unlock_out;
+ }
+ set_new_dnode(&dn, inode, ipage, ipage, 0);
+
+ err = f2fs_get_block(&dn, index);
+
+ *blk_addr = dn.data_blkaddr;
+ *node_changed = dn.node_changed;
+ f2fs_put_dnode(&dn);
+
+unlock_out:
+ f2fs_do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, false);
+ return err;
+}
+
+static int prepare_atomic_write_begin(struct f2fs_sb_info *sbi,
+ struct page *page, loff_t pos, unsigned int len,
+ block_t *blk_addr, bool *node_changed)
+{
+ struct inode *inode = page->mapping->host;
+ struct inode *cow_inode = F2FS_I(inode)->cow_inode;
+ pgoff_t index = page->index;
+ int err = 0;
+ block_t ori_blk_addr;
+
+ /* If pos is beyond the end of file, reserve a new block in COW inode */
+ if ((pos & PAGE_MASK) >= i_size_read(inode))
+ return __reserve_data_block(cow_inode, index, blk_addr,
+ node_changed);
+
+ /* Look for the block in COW inode first */
+ err = __find_data_block(cow_inode, index, blk_addr);
+ if (err)
+ return err;
+ else if (*blk_addr != NULL_ADDR)
+ return 0;
+
+ /* Look for the block in the original inode */
+ err = __find_data_block(inode, index, &ori_blk_addr);
+ if (err)
+ return err;
+
+ /* Finally, we should reserve a new block in COW inode for the update */
+ err = __reserve_data_block(cow_inode, index, blk_addr, node_changed);
+ if (err)
+ return err;
+
+ if (ori_blk_addr != NULL_ADDR)
+ *blk_addr = ori_blk_addr;
+ return 0;
+}
+
static int f2fs_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata)
@@ -3321,7 +3418,7 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping,
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct page *page = NULL;
pgoff_t index = ((unsigned long long) pos) >> PAGE_SHIFT;
- bool need_balance = false, drop_atomic = false;
+ bool need_balance = false;
block_t blkaddr = NULL_ADDR;
int err = 0;

@@ -3332,14 +3429,6 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping,
goto fail;
}

- if ((f2fs_is_atomic_file(inode) &&
- !f2fs_available_free_memory(sbi, INMEM_PAGES)) ||
- is_inode_flag_set(inode, FI_ATOMIC_REVOKE_REQUEST)) {
- err = -ENOMEM;
- drop_atomic = true;
- goto fail;
- }
-
/*
* We should check this at this moment to avoid deadlock on inode page
* and #0 page. The locking rule for inline_data conversion should be:
@@ -3387,7 +3476,11 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping,

*pagep = page;

- err = prepare_write_begin(sbi, page, pos, len,
+ if (f2fs_is_atomic_file(inode))
+ err = prepare_atomic_write_begin(sbi, page, pos, len,
+ &blkaddr, &need_balance);
+ else
+ err = prepare_write_begin(sbi, page, pos, len,
&blkaddr, &need_balance);
if (err)
goto fail;
@@ -3443,8 +3536,6 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping,
fail:
f2fs_put_page(page, 1);
f2fs_write_failed(inode, pos + len);
- if (drop_atomic)
- f2fs_drop_inmem_pages_all(sbi, false);
return err;
}

@@ -3488,8 +3579,12 @@ static int f2fs_write_end(struct file *file,
set_page_dirty(page);

if (pos + copied > i_size_read(inode) &&
- !f2fs_verity_in_progress(inode))
+ !f2fs_verity_in_progress(inode)) {
f2fs_i_size_write(inode, pos + copied);
+ if (f2fs_is_atomic_file(inode))
+ f2fs_i_size_write(F2FS_I(inode)->cow_inode,
+ pos + copied);
+ }
unlock_out:
f2fs_put_page(page, 1);
f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
@@ -3522,9 +3617,6 @@ void f2fs_invalidate_folio(struct folio *folio, size_t offset, size_t length)
inode->i_ino == F2FS_COMPRESS_INO(sbi))
clear_page_private_data(&folio->page);

- if (page_private_atomic(&folio->page))
- return f2fs_drop_inmem_page(inode, &folio->page);
-
folio_detach_private(folio);
}

@@ -3534,10 +3626,6 @@ int f2fs_release_page(struct page *page, gfp_t wait)
if (PageDirty(page))
return 0;

- /* This is atomic written page, keep Private */
- if (page_private_atomic(page))
- return 0;
-
if (test_opt(F2FS_P_SB(page), COMPRESS_CACHE)) {
struct inode *inode = page->mapping->host;

@@ -3563,18 +3651,6 @@ static bool f2fs_dirty_data_folio(struct address_space *mapping,
folio_mark_uptodate(folio);
BUG_ON(folio_test_swapcache(folio));

- if (f2fs_is_atomic_file(inode) && !f2fs_is_commit_atomic_write(inode)) {
- if (!page_private_atomic(&folio->page)) {
- f2fs_register_inmem_page(inode, &folio->page);
- return true;
- }
- /*
- * Previously, this page has been registered, we just
- * return here.
- */
- return false;
- }
-
if (!folio_test_dirty(folio)) {
filemap_dirty_folio(mapping, folio);
f2fs_update_dirty_folio(inode, folio);
@@ -3654,42 +3730,14 @@ static sector_t f2fs_bmap(struct address_space *mapping, sector_t block)
int f2fs_migrate_page(struct address_space *mapping,
struct page *newpage, struct page *page, enum migrate_mode mode)
{
- int rc, extra_count;
- struct f2fs_inode_info *fi = F2FS_I(mapping->host);
- bool atomic_written = page_private_atomic(page);
+ int rc, extra_count = 0;

BUG_ON(PageWriteback(page));

- /* migrating an atomic written page is safe with the inmem_lock hold */
- if (atomic_written) {
- if (mode != MIGRATE_SYNC)
- return -EBUSY;
- if (!mutex_trylock(&fi->inmem_lock))
- return -EAGAIN;
- }
-
- /* one extra reference was held for atomic_write page */
- extra_count = atomic_written ? 1 : 0;
rc = migrate_page_move_mapping(mapping, newpage,
page, extra_count);
- if (rc != MIGRATEPAGE_SUCCESS) {
- if (atomic_written)
- mutex_unlock(&fi->inmem_lock);
+ if (rc != MIGRATEPAGE_SUCCESS)
return rc;
- }
-
- if (atomic_written) {
- struct inmem_pages *cur;
-
- list_for_each_entry(cur, &fi->inmem_pages, list)
- if (cur->page == page) {
- cur->page = newpage;
- break;
- }
- mutex_unlock(&fi->inmem_lock);
- put_page(page);
- get_page(newpage);
- }

/* guarantee to start from no stale private field */
set_page_private(newpage, 0);
diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index fcdf253cd211..c92625ef16d0 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -91,11 +91,8 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
si->nquota_files = sbi->nquota_files;
si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
- si->inmem_pages = get_pages(sbi, F2FS_INMEM_PAGES);
si->aw_cnt = sbi->atomic_files;
- si->vw_cnt = atomic_read(&sbi->vw_cnt);
si->max_aw_cnt = atomic_read(&sbi->max_aw_cnt);
- si->max_vw_cnt = atomic_read(&sbi->max_vw_cnt);
si->nr_dio_read = get_pages(sbi, F2FS_DIO_READ);
si->nr_dio_write = get_pages(sbi, F2FS_DIO_WRITE);
si->nr_wb_cp_data = get_pages(sbi, F2FS_WB_CP_DATA);
@@ -167,8 +164,6 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->alloc_nids = NM_I(sbi)->nid_cnt[PREALLOC_NID];
si->io_skip_bggc = sbi->io_skip_bggc;
si->other_skip_bggc = sbi->other_skip_bggc;
- si->skipped_atomic_files[BG_GC] = sbi->skipped_atomic_files[BG_GC];
- si->skipped_atomic_files[FG_GC] = sbi->skipped_atomic_files[FG_GC];
si->util_free = (int)(free_user_blocks(sbi) >> sbi->log_blocks_per_seg)
* 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg)
/ 2;
@@ -296,7 +291,6 @@ static void update_mem_info(struct f2fs_sb_info *sbi)
sizeof(struct nat_entry);
si->cache_mem += NM_I(sbi)->nat_cnt[DIRTY_NAT] *
sizeof(struct nat_entry_set);
- si->cache_mem += si->inmem_pages * sizeof(struct inmem_pages);
for (i = 0; i < MAX_INO_ENTRY; i++)
si->cache_mem += sbi->im[i].ino_num * sizeof(struct ino_entry);
si->cache_mem += atomic_read(&sbi->total_ext_tree) *
@@ -491,10 +485,6 @@ static int stat_show(struct seq_file *s, void *v)
si->bg_data_blks);
seq_printf(s, " - node blocks : %d (%d)\n", si->node_blks,
si->bg_node_blks);
- seq_printf(s, "Skipped : atomic write %llu (%llu)\n",
- si->skipped_atomic_files[BG_GC] +
- si->skipped_atomic_files[FG_GC],
- si->skipped_atomic_files[BG_GC]);
seq_printf(s, "BG skip : IO: %u, Other: %u\n",
si->io_skip_bggc, si->other_skip_bggc);
seq_puts(s, "\nExtent Cache:\n");
@@ -519,10 +509,8 @@ static int stat_show(struct seq_file *s, void *v)
si->flush_list_empty,
si->nr_discarding, si->nr_discarded,
si->nr_discard_cmd, si->undiscard_blks);
- seq_printf(s, " - inmem: %4d, atomic IO: %4d (Max. %4d), "
- "volatile IO: %4d (Max. %4d)\n",
- si->inmem_pages, si->aw_cnt, si->max_aw_cnt,
- si->vw_cnt, si->max_vw_cnt);
+ seq_printf(s, " - atomic IO: %4d (Max. %4d)\n",
+ si->aw_cnt, si->max_aw_cnt);
seq_printf(s, " - compress: %4d, hit:%8d\n", si->compress_pages, si->compress_page_hit);
seq_printf(s, " - nodes: %4d in %4d\n",
si->ndirty_node, si->node_pages);
@@ -623,9 +611,7 @@ int f2fs_build_stats(struct f2fs_sb_info *sbi)
for (i = META_CP; i < META_MAX; i++)
atomic_set(&sbi->meta_count[i], 0);

- atomic_set(&sbi->vw_cnt, 0);
atomic_set(&sbi->max_aw_cnt, 0);
- atomic_set(&sbi->max_vw_cnt, 0);

raw_spin_lock_irqsave(&f2fs_stat_lock, flags);
list_add_tail(&si->stat_list, &f2fs_stat_list);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 9b89f26af1f3..e2022fad1574 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -716,7 +716,6 @@ enum {

enum {
GC_FAILURE_PIN,
- GC_FAILURE_ATOMIC,
MAX_GC_FAILURE
};

@@ -738,8 +737,6 @@ enum {
FI_UPDATE_WRITE, /* inode has in-place-update data */
FI_NEED_IPU, /* used for ipu per file */
FI_ATOMIC_FILE, /* indicate atomic file */
- FI_ATOMIC_COMMIT, /* indicate the state of atomical committing */
- FI_VOLATILE_FILE, /* indicate volatile file */
FI_FIRST_BLOCK_WRITTEN, /* indicate #0 data block was written */
FI_DROP_CACHE, /* drop dirty page cache */
FI_DATA_EXIST, /* indicate data exists */
@@ -752,7 +749,6 @@ enum {
FI_EXTRA_ATTR, /* indicate file has extra attribute */
FI_PROJ_INHERIT, /* indicate file inherits projectid */
FI_PIN_FILE, /* indicate file should not be gced */
- FI_ATOMIC_REVOKE_REQUEST, /* request to drop atomic data */
FI_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */
FI_COMPRESSED_FILE, /* indicate file's data can be compressed */
FI_COMPRESS_CORRUPT, /* indicate compressed cluster is corrupted */
@@ -760,6 +756,7 @@ enum {
FI_ENABLE_COMPRESS, /* enable compression in "user" compression mode */
FI_COMPRESS_RELEASED, /* compressed blocks were released */
FI_ALIGNED_WRITE, /* enable aligned write */
+ FI_COW_FILE, /* indicate COW file */
FI_MAX, /* max flag, never be used */
};

@@ -794,11 +791,9 @@ struct f2fs_inode_info {
#endif
struct list_head dirty_list; /* dirty list for dirs and files */
struct list_head gdirty_list; /* linked in global dirty list */
- struct list_head inmem_ilist; /* list for inmem inodes */
- struct list_head inmem_pages; /* inmemory pages managed by f2fs */
- struct task_struct *inmem_task; /* store inmemory task */
- struct mutex inmem_lock; /* lock for inmemory pages */
+ struct task_struct *atomic_write_task; /* store atomic write task */
struct extent_tree *extent_tree; /* cached extent_tree entry */
+ struct inode *cow_inode; /* copy-on-write inode for atomic write */

/* avoid racing between foreground op and gc */
struct f2fs_rwsem i_gc_rwsem[2];
@@ -1092,7 +1087,6 @@ enum count_type {
F2FS_DIRTY_QDATA,
F2FS_DIRTY_NODES,
F2FS_DIRTY_META,
- F2FS_INMEM_PAGES,
F2FS_DIRTY_IMETA,
F2FS_WB_CP_DATA,
F2FS_WB_DATA,
@@ -1122,11 +1116,7 @@ enum page_type {
META,
NR_PAGE_TYPE,
META_FLUSH,
- INMEM, /* the below types are used by tracepoints only. */
- INMEM_DROP,
- INMEM_INVALIDATE,
- INMEM_REVOKE,
- IPU,
+ IPU, /* the below types are used by tracepoints only. */
OPU,
};

@@ -1718,7 +1708,6 @@ struct f2fs_sb_info {

/* for skip statistic */
unsigned int atomic_files; /* # of opened atomic file */
- unsigned long long skipped_atomic_files[2]; /* FG_GC and BG_GC */
unsigned long long skipped_gc_rwsem; /* FG_GC only */

/* threshold for gc trials on pinned files */
@@ -1749,9 +1738,7 @@ struct f2fs_sb_info {
atomic_t inline_dir; /* # of inline_dentry inodes */
atomic_t compr_inode; /* # of compressed inodes */
atomic64_t compr_blocks; /* # of compressed blocks */
- atomic_t vw_cnt; /* # of volatile writes */
atomic_t max_aw_cnt; /* max # of atomic writes */
- atomic_t max_vw_cnt; /* max # of volatile writes */
unsigned int io_skip_bggc; /* skip background gc for in-flight IO */
unsigned int other_skip_bggc; /* skip background gc for other reasons */
unsigned int ndirty_inode[NR_INODE_TYPE]; /* # of dirty inodes */
@@ -3202,14 +3189,9 @@ static inline bool f2fs_is_atomic_file(struct inode *inode)
return is_inode_flag_set(inode, FI_ATOMIC_FILE);
}

-static inline bool f2fs_is_commit_atomic_write(struct inode *inode)
+static inline bool f2fs_is_cow_file(struct inode *inode)
{
- return is_inode_flag_set(inode, FI_ATOMIC_COMMIT);
-}
-
-static inline bool f2fs_is_volatile_file(struct inode *inode)
-{
- return is_inode_flag_set(inode, FI_VOLATILE_FILE);
+ return is_inode_flag_set(inode, FI_COW_FILE);
}

static inline bool f2fs_is_first_block_written(struct inode *inode)
@@ -3444,6 +3426,8 @@ void f2fs_handle_failed_inode(struct inode *inode);
int f2fs_update_extension_list(struct f2fs_sb_info *sbi, const char *name,
bool hot, bool set);
struct dentry *f2fs_get_parent(struct dentry *child);
+int f2fs_get_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
+ struct inode **new_inode);

/*
* dir.c
@@ -3579,11 +3563,8 @@ void f2fs_destroy_node_manager_caches(void);
* segment.c
*/
bool f2fs_need_SSR(struct f2fs_sb_info *sbi);
-void f2fs_register_inmem_page(struct inode *inode, struct page *page);
-void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, bool gc_failure);
-void f2fs_drop_inmem_pages(struct inode *inode);
-void f2fs_drop_inmem_page(struct inode *inode, struct page *page);
-int f2fs_commit_inmem_pages(struct inode *inode);
+int f2fs_commit_atomic_write(struct inode *inode);
+void f2fs_abort_atomic_write(struct inode *inode, bool clean);
void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need);
void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi, bool from_bg);
int f2fs_issue_flush(struct f2fs_sb_info *sbi, nid_t ino);
@@ -3815,7 +3796,6 @@ struct f2fs_stat_info {
int ext_tree, zombie_tree, ext_node;
int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
int ndirty_data, ndirty_qdata;
- int inmem_pages;
unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
int nats, dirty_nats, sits, dirty_sits;
int free_nids, avail_nids, alloc_nids;
@@ -3833,7 +3813,7 @@ struct f2fs_stat_info {
int inline_xattr, inline_inode, inline_dir, append, update, orphans;
int compr_inode;
unsigned long long compr_blocks;
- int aw_cnt, max_aw_cnt, vw_cnt, max_vw_cnt;
+ int aw_cnt, max_aw_cnt;
unsigned int valid_count, valid_node_count, valid_inode_count, discard_blks;
unsigned int bimodal, avg_vblocks;
int util_free, util_valid, util_invalid;
@@ -3845,7 +3825,6 @@ struct f2fs_stat_info {
int bg_node_segs, bg_data_segs;
int tot_blks, data_blks, node_blks;
int bg_data_blks, bg_node_blks;
- unsigned long long skipped_atomic_files[2];
int curseg[NR_CURSEG_TYPE];
int cursec[NR_CURSEG_TYPE];
int curzone[NR_CURSEG_TYPE];
@@ -3945,17 +3924,6 @@ static inline struct f2fs_stat_info *F2FS_STAT(struct f2fs_sb_info *sbi)
if (cur > max) \
atomic_set(&F2FS_I_SB(inode)->max_aw_cnt, cur); \
} while (0)
-#define stat_inc_volatile_write(inode) \
- (atomic_inc(&F2FS_I_SB(inode)->vw_cnt))
-#define stat_dec_volatile_write(inode) \
- (atomic_dec(&F2FS_I_SB(inode)->vw_cnt))
-#define stat_update_max_volatile_write(inode) \
- do { \
- int cur = atomic_read(&F2FS_I_SB(inode)->vw_cnt); \
- int max = atomic_read(&F2FS_I_SB(inode)->max_vw_cnt); \
- if (cur > max) \
- atomic_set(&F2FS_I_SB(inode)->max_vw_cnt, cur); \
- } while (0)
#define stat_inc_seg_count(sbi, type, gc_type) \
do { \
struct f2fs_stat_info *si = F2FS_STAT(sbi); \
@@ -4017,9 +3985,6 @@ void f2fs_update_sit_info(struct f2fs_sb_info *sbi);
#define stat_add_compr_blocks(inode, blocks) do { } while (0)
#define stat_sub_compr_blocks(inode, blocks) do { } while (0)
#define stat_update_max_atomic_write(inode) do { } while (0)
-#define stat_inc_volatile_write(inode) do { } while (0)
-#define stat_dec_volatile_write(inode) do { } while (0)
-#define stat_update_max_volatile_write(inode) do { } while (0)
#define stat_inc_meta_count(sbi, blkaddr) do { } while (0)
#define stat_inc_seg_type(sbi, curseg) do { } while (0)
#define stat_inc_block_count(sbi, curseg) do { } while (0)
@@ -4423,8 +4388,7 @@ static inline bool f2fs_lfs_mode(struct f2fs_sb_info *sbi)
static inline bool f2fs_may_compress(struct inode *inode)
{
if (IS_SWAPFILE(inode) || f2fs_is_pinned_file(inode) ||
- f2fs_is_atomic_file(inode) ||
- f2fs_is_volatile_file(inode))
+ f2fs_is_atomic_file(inode) || f2fs_has_inline_data(inode))
return false;
return S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode);
}
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 5d1b97e852e7..d3415aca5736 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1816,16 +1816,8 @@ static int f2fs_release_file(struct inode *inode, struct file *filp)
atomic_read(&inode->i_writecount) != 1)
return 0;

- /* some remained atomic pages should discarded */
if (f2fs_is_atomic_file(inode))
- f2fs_drop_inmem_pages(inode);
- if (f2fs_is_volatile_file(inode)) {
- set_inode_flag(inode, FI_DROP_CACHE);
- filemap_fdatawrite(inode->i_mapping);
- clear_inode_flag(inode, FI_DROP_CACHE);
- clear_inode_flag(inode, FI_VOLATILE_FILE);
- stat_dec_volatile_write(inode);
- }
+ f2fs_abort_atomic_write(inode, true);
return 0;
}

@@ -1840,8 +1832,8 @@ static int f2fs_file_flush(struct file *file, fl_owner_t id)
* before dropping file lock, it needs to do in ->flush.
*/
if (f2fs_is_atomic_file(inode) &&
- F2FS_I(inode)->inmem_task == current)
- f2fs_drop_inmem_pages(inode);
+ F2FS_I(inode)->atomic_write_task == current)
+ f2fs_abort_atomic_write(inode, true);
return 0;
}

@@ -1875,10 +1867,7 @@ static int f2fs_setflags_common(struct inode *inode, u32 iflags, u32 mask)
if (masked_flags & F2FS_COMPR_FL) {
if (!f2fs_disable_compressed_file(inode))
return -EINVAL;
- }
- if (iflags & F2FS_NOCOMP_FL)
- return -EINVAL;
- if (iflags & F2FS_COMPR_FL) {
+ } else {
if (!f2fs_may_compress(inode))
return -EINVAL;
if (S_ISREG(inode->i_mode) && inode->i_size)
@@ -1887,10 +1876,6 @@ static int f2fs_setflags_common(struct inode *inode, u32 iflags, u32 mask)
set_compress_context(inode);
}
}
- if ((iflags ^ masked_flags) & F2FS_NOCOMP_FL) {
- if (masked_flags & F2FS_COMPR_FL)
- return -EINVAL;
- }

fi->i_flags = iflags | (fi->i_flags & ~mask);
f2fs_bug_on(F2FS_I_SB(inode), (fi->i_flags & F2FS_COMPR_FL) &&
@@ -2004,6 +1989,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
struct user_namespace *mnt_userns = file_mnt_user_ns(filp);
struct f2fs_inode_info *fi = F2FS_I(inode);
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ struct inode *pinode;
int ret;

if (!inode_owner_or_capable(mnt_userns, inode))
@@ -2026,11 +2012,8 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
goto out;
}

- if (f2fs_is_atomic_file(inode)) {
- if (is_inode_flag_set(inode, FI_ATOMIC_REVOKE_REQUEST))
- ret = -EINVAL;
+ if (f2fs_is_atomic_file(inode))
goto out;
- }

ret = f2fs_convert_inline_inode(inode);
if (ret)
@@ -2051,19 +2034,33 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
goto out;
}

+ /* Create a COW inode for atomic write */
+ pinode = f2fs_iget(inode->i_sb, fi->i_pino);
+ if (IS_ERR(pinode)) {
+ f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
+ ret = PTR_ERR(pinode);
+ goto out;
+ }
+
+ ret = f2fs_get_tmpfile(mnt_userns, pinode, &fi->cow_inode);
+ iput(pinode);
+ if (ret) {
+ f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
+ goto out;
+ }
+ f2fs_i_size_write(fi->cow_inode, i_size_read(inode));
+
spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
- if (list_empty(&fi->inmem_ilist))
- list_add_tail(&fi->inmem_ilist, &sbi->inode_list[ATOMIC_FILE]);
sbi->atomic_files++;
spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);

- /* add inode in inmem_list first and set atomic_file */
set_inode_flag(inode, FI_ATOMIC_FILE);
- clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
+ set_inode_flag(fi->cow_inode, FI_COW_FILE);
+ clear_inode_flag(fi->cow_inode, FI_INLINE_DATA);
f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);

f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
- F2FS_I(inode)->inmem_task = current;
+ F2FS_I(inode)->atomic_write_task = current;
stat_update_max_atomic_write(inode);
out:
inode_unlock(inode);
@@ -2088,99 +2085,24 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp)

inode_lock(inode);

- if (f2fs_is_volatile_file(inode)) {
- ret = -EINVAL;
- goto err_out;
- }
-
if (f2fs_is_atomic_file(inode)) {
- ret = f2fs_commit_inmem_pages(inode);
+ ret = f2fs_commit_atomic_write(inode);
if (ret)
- goto err_out;
+ goto unlock_out;

ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
if (!ret)
- f2fs_drop_inmem_pages(inode);
+ f2fs_abort_atomic_write(inode, false);
} else {
ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false);
}
-err_out:
- if (is_inode_flag_set(inode, FI_ATOMIC_REVOKE_REQUEST)) {
- clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
- ret = -EINVAL;
- }
+unlock_out:
inode_unlock(inode);
mnt_drop_write_file(filp);
return ret;
}

-static int f2fs_ioc_start_volatile_write(struct file *filp)
-{
- struct inode *inode = file_inode(filp);
- struct user_namespace *mnt_userns = file_mnt_user_ns(filp);
- int ret;
-
- if (!inode_owner_or_capable(mnt_userns, inode))
- return -EACCES;
-
- if (!S_ISREG(inode->i_mode))
- return -EINVAL;
-
- ret = mnt_want_write_file(filp);
- if (ret)
- return ret;
-
- inode_lock(inode);
-
- if (f2fs_is_volatile_file(inode))
- goto out;
-
- ret = f2fs_convert_inline_inode(inode);
- if (ret)
- goto out;
-
- stat_inc_volatile_write(inode);
- stat_update_max_volatile_write(inode);
-
- set_inode_flag(inode, FI_VOLATILE_FILE);
- f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
-out:
- inode_unlock(inode);
- mnt_drop_write_file(filp);
- return ret;
-}
-
-static int f2fs_ioc_release_volatile_write(struct file *filp)
-{
- struct inode *inode = file_inode(filp);
- struct user_namespace *mnt_userns = file_mnt_user_ns(filp);
- int ret;
-
- if (!inode_owner_or_capable(mnt_userns, inode))
- return -EACCES;
-
- ret = mnt_want_write_file(filp);
- if (ret)
- return ret;
-
- inode_lock(inode);
-
- if (!f2fs_is_volatile_file(inode))
- goto out;
-
- if (!f2fs_is_first_block_written(inode)) {
- ret = truncate_partial_data_page(inode, 0, true);
- goto out;
- }
-
- ret = punch_hole(inode, 0, F2FS_BLKSIZE);
-out:
- inode_unlock(inode);
- mnt_drop_write_file(filp);
- return ret;
-}
-
-static int f2fs_ioc_abort_volatile_write(struct file *filp)
+static int f2fs_ioc_abort_atomic_write(struct file *filp)
{
struct inode *inode = file_inode(filp);
struct user_namespace *mnt_userns = file_mnt_user_ns(filp);
@@ -2196,14 +2118,7 @@ static int f2fs_ioc_abort_volatile_write(struct file *filp)
inode_lock(inode);

if (f2fs_is_atomic_file(inode))
- f2fs_drop_inmem_pages(inode);
- if (f2fs_is_volatile_file(inode)) {
- clear_inode_flag(inode, FI_VOLATILE_FILE);
- stat_dec_volatile_write(inode);
- ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
- }
-
- clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
+ f2fs_abort_atomic_write(inode, true);

inode_unlock(inode);

@@ -4024,8 +3939,8 @@ static int f2fs_ioc_decompress_file(struct file *filp, unsigned long arg)
goto out;
}

- if (f2fs_is_mmap_file(inode)) {
- ret = -EBUSY;
+ if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) {
+ ret = -EINVAL;
goto out;
}

@@ -4096,8 +4011,8 @@ static int f2fs_ioc_compress_file(struct file *filp, unsigned long arg)
goto out;
}

- if (f2fs_is_mmap_file(inode)) {
- ret = -EBUSY;
+ if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) {
+ ret = -EINVAL;
goto out;
}

@@ -4149,12 +4064,11 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return f2fs_ioc_start_atomic_write(filp);
case F2FS_IOC_COMMIT_ATOMIC_WRITE:
return f2fs_ioc_commit_atomic_write(filp);
+ case F2FS_IOC_ABORT_ATOMIC_WRITE:
+ return f2fs_ioc_abort_atomic_write(filp);
case F2FS_IOC_START_VOLATILE_WRITE:
- return f2fs_ioc_start_volatile_write(filp);
case F2FS_IOC_RELEASE_VOLATILE_WRITE:
- return f2fs_ioc_release_volatile_write(filp);
- case F2FS_IOC_ABORT_VOLATILE_WRITE:
- return f2fs_ioc_abort_volatile_write(filp);
+ return -EOPNOTSUPP;
case F2FS_IOC_SHUTDOWN:
return f2fs_ioc_shutdown(filp, arg);
case FITRIM:
@@ -4778,7 +4692,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case F2FS_IOC_COMMIT_ATOMIC_WRITE:
case F2FS_IOC_START_VOLATILE_WRITE:
case F2FS_IOC_RELEASE_VOLATILE_WRITE:
- case F2FS_IOC_ABORT_VOLATILE_WRITE:
+ case F2FS_IOC_ABORT_ATOMIC_WRITE:
case F2FS_IOC_SHUTDOWN:
case FITRIM:
case FS_IOC_SET_ENCRYPTION_POLICY:
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index ea5b93b689cd..ba8e93e517be 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -646,6 +646,54 @@ static void release_victim_entry(struct f2fs_sb_info *sbi)
f2fs_bug_on(sbi, !list_empty(&am->victim_list));
}

+static bool f2fs_pin_section(struct f2fs_sb_info *sbi, unsigned int segno)
+{
+ struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
+ unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
+
+ if (!dirty_i->enable_pin_section)
+ return false;
+ if (!test_and_set_bit(secno, dirty_i->pinned_secmap))
+ dirty_i->pinned_secmap_cnt++;
+ return true;
+}
+
+static bool f2fs_pinned_section_exists(struct dirty_seglist_info *dirty_i)
+{
+ return dirty_i->pinned_secmap_cnt;
+}
+
+static bool f2fs_section_is_pinned(struct dirty_seglist_info *dirty_i,
+ unsigned int secno)
+{
+ return dirty_i->enable_pin_section &&
+ f2fs_pinned_section_exists(dirty_i) &&
+ test_bit(secno, dirty_i->pinned_secmap);
+}
+
+static void f2fs_unpin_all_sections(struct f2fs_sb_info *sbi, bool enable)
+{
+ unsigned int bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
+
+ if (f2fs_pinned_section_exists(DIRTY_I(sbi))) {
+ memset(DIRTY_I(sbi)->pinned_secmap, 0, bitmap_size);
+ DIRTY_I(sbi)->pinned_secmap_cnt = 0;
+ }
+ DIRTY_I(sbi)->enable_pin_section = enable;
+}
+
+static int f2fs_gc_pinned_control(struct inode *inode, int gc_type,
+ unsigned int segno)
+{
+ if (!f2fs_is_pinned_file(inode))
+ return 0;
+ if (gc_type != FG_GC)
+ return -EBUSY;
+ if (!f2fs_pin_section(F2FS_I_SB(inode), segno))
+ f2fs_pin_file_control(inode, true);
+ return -EAGAIN;
+}
+
/*
* This function is called from two paths.
* One is garbage collection and the other is SSR segment selection.
@@ -787,6 +835,9 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap))
goto next;

+ if (gc_type == FG_GC && f2fs_section_is_pinned(dirty_i, secno))
+ goto next;
+
if (is_atgc) {
add_victim_entry(sbi, &p, segno);
goto next;
@@ -1194,18 +1245,9 @@ static int move_data_block(struct inode *inode, block_t bidx,
goto out;
}

- if (f2fs_is_atomic_file(inode)) {
- F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC]++;
- F2FS_I_SB(inode)->skipped_atomic_files[gc_type]++;
- err = -EAGAIN;
- goto out;
- }
-
- if (f2fs_is_pinned_file(inode)) {
- f2fs_pin_file_control(inode, true);
- err = -EAGAIN;
+ err = f2fs_gc_pinned_control(inode, gc_type, segno);
+ if (err)
goto out;
- }

set_new_dnode(&dn, inode, NULL, NULL, 0);
err = f2fs_get_dnode_of_data(&dn, bidx, LOOKUP_NODE);
@@ -1344,18 +1386,9 @@ static int move_data_page(struct inode *inode, block_t bidx, int gc_type,
goto out;
}

- if (f2fs_is_atomic_file(inode)) {
- F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC]++;
- F2FS_I_SB(inode)->skipped_atomic_files[gc_type]++;
- err = -EAGAIN;
- goto out;
- }
- if (f2fs_is_pinned_file(inode)) {
- if (gc_type == FG_GC)
- f2fs_pin_file_control(inode, true);
- err = -EAGAIN;
+ err = f2fs_gc_pinned_control(inode, gc_type, segno);
+ if (err)
goto out;
- }

if (gc_type == BG_GC) {
if (PageWriteback(page)) {
@@ -1475,11 +1508,19 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
ofs_in_node = le16_to_cpu(entry->ofs_in_node);

if (phase == 3) {
+ int err;
+
inode = f2fs_iget(sb, dni.ino);
if (IS_ERR(inode) || is_bad_inode(inode) ||
special_file(inode->i_mode))
continue;

+ err = f2fs_gc_pinned_control(inode, gc_type, segno);
+ if (err == -EAGAIN) {
+ iput(inode);
+ return submitted;
+ }
+
if (!f2fs_down_write_trylock(
&F2FS_I(inode)->i_gc_rwsem[WRITE])) {
iput(inode);
@@ -1711,8 +1752,6 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
.ilist = LIST_HEAD_INIT(gc_list.ilist),
.iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
};
- unsigned long long last_skipped = sbi->skipped_atomic_files[FG_GC];
- unsigned long long first_skipped;
unsigned int skipped_round = 0, round = 0;

trace_f2fs_gc_begin(sbi->sb, sync, background,
@@ -1726,7 +1765,6 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,

cpc.reason = __get_cp_reason(sbi);
sbi->skipped_gc_rwsem = 0;
- first_skipped = last_skipped;
gc_more:
if (unlikely(!(sbi->sb->s_flags & SB_ACTIVE))) {
ret = -EINVAL;
@@ -1758,9 +1796,17 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
ret = -EINVAL;
goto stop;
}
+retry:
ret = __get_victim(sbi, &segno, gc_type);
- if (ret)
+ if (ret) {
+ /* allow to search victim from sections has pinned data */
+ if (ret == -ENODATA && gc_type == FG_GC &&
+ f2fs_pinned_section_exists(DIRTY_I(sbi))) {
+ f2fs_unpin_all_sections(sbi, false);
+ goto retry;
+ }
goto stop;
+ }

seg_freed = do_garbage_collect(sbi, segno, &gc_list, gc_type, force);
if (gc_type == FG_GC &&
@@ -1769,10 +1815,8 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
total_freed += seg_freed;

if (gc_type == FG_GC) {
- if (sbi->skipped_atomic_files[FG_GC] > last_skipped ||
- sbi->skipped_gc_rwsem)
+ if (sbi->skipped_gc_rwsem)
skipped_round++;
- last_skipped = sbi->skipped_atomic_files[FG_GC];
round++;
}

@@ -1782,27 +1826,31 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
if (sync)
goto stop;

- if (has_not_enough_free_secs(sbi, sec_freed, 0)) {
- if (skipped_round <= MAX_SKIP_GC_COUNT ||
- skipped_round * 2 < round) {
- segno = NULL_SEGNO;
- goto gc_more;
- }
+ if (!has_not_enough_free_secs(sbi, sec_freed, 0))
+ goto stop;

- if (first_skipped < last_skipped &&
- (last_skipped - first_skipped) >
- sbi->skipped_gc_rwsem) {
- f2fs_drop_inmem_pages_all(sbi, true);
- segno = NULL_SEGNO;
- goto gc_more;
- }
- if (gc_type == FG_GC && !is_sbi_flag_set(sbi, SBI_CP_DISABLED))
+ if (skipped_round <= MAX_SKIP_GC_COUNT || skipped_round * 2 < round) {
+
+ /* Write checkpoint to reclaim prefree segments */
+ if (free_sections(sbi) < NR_CURSEG_PERSIST_TYPE &&
+ prefree_segments(sbi) &&
+ !is_sbi_flag_set(sbi, SBI_CP_DISABLED)) {
ret = f2fs_write_checkpoint(sbi, &cpc);
+ if (ret)
+ goto stop;
+ }
+ segno = NULL_SEGNO;
+ goto gc_more;
}
+ if (gc_type == FG_GC && !is_sbi_flag_set(sbi, SBI_CP_DISABLED))
+ ret = f2fs_write_checkpoint(sbi, &cpc);
stop:
SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0;
SIT_I(sbi)->last_victim[FLUSH_DEVICE] = init_segno;

+ if (gc_type == FG_GC)
+ f2fs_unpin_all_sections(sbi, true);
+
trace_f2fs_gc_end(sbi->sb, ret, total_freed, sec_freed,
get_pages(sbi, F2FS_DIRTY_NODES),
get_pages(sbi, F2FS_DIRTY_DENTS),
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index e9818723103c..938961a9084e 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -744,9 +744,8 @@ void f2fs_evict_inode(struct inode *inode)
nid_t xnid = F2FS_I(inode)->i_xattr_nid;
int err = 0;

- /* some remained atomic pages should discarded */
if (f2fs_is_atomic_file(inode))
- f2fs_drop_inmem_pages(inode);
+ f2fs_abort_atomic_write(inode, true);

trace_f2fs_evict_inode(inode);
truncate_inode_pages_final(&inode->i_data);
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index 3764e12f19db..28d5981443a7 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -848,8 +848,8 @@ static int f2fs_mknod(struct user_namespace *mnt_userns, struct inode *dir,
}

static int __f2fs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
- struct dentry *dentry, umode_t mode,
- struct inode **whiteout)
+ struct dentry *dentry, umode_t mode, bool is_whiteout,
+ struct inode **new_inode)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(dir);
struct inode *inode;
@@ -863,7 +863,7 @@ static int __f2fs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
if (IS_ERR(inode))
return PTR_ERR(inode);

- if (whiteout) {
+ if (is_whiteout) {
init_special_inode(inode, inode->i_mode, WHITEOUT_DEV);
inode->i_op = &f2fs_special_inode_operations;
} else {
@@ -888,21 +888,25 @@ static int __f2fs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
f2fs_add_orphan_inode(inode);
f2fs_alloc_nid_done(sbi, inode->i_ino);

- if (whiteout) {
+ if (is_whiteout) {
f2fs_i_links_write(inode, false);

spin_lock(&inode->i_lock);
inode->i_state |= I_LINKABLE;
spin_unlock(&inode->i_lock);
-
- *whiteout = inode;
} else {
- d_tmpfile(dentry, inode);
+ if (dentry)
+ d_tmpfile(dentry, inode);
+ else
+ f2fs_i_links_write(inode, false);
}
/* link_count was changed by d_tmpfile as well. */
f2fs_unlock_op(sbi);
unlock_new_inode(inode);

+ if (new_inode)
+ *new_inode = inode;
+
f2fs_balance_fs(sbi, true);
return 0;

@@ -923,7 +927,7 @@ static int f2fs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
if (!f2fs_is_checkpoint_ready(sbi))
return -ENOSPC;

- return __f2fs_tmpfile(mnt_userns, dir, dentry, mode, NULL);
+ return __f2fs_tmpfile(mnt_userns, dir, dentry, mode, false, NULL);
}

static int f2fs_create_whiteout(struct user_namespace *mnt_userns,
@@ -933,7 +937,13 @@ static int f2fs_create_whiteout(struct user_namespace *mnt_userns,
return -EIO;

return __f2fs_tmpfile(mnt_userns, dir, NULL,
- S_IFCHR | WHITEOUT_MODE, whiteout);
+ S_IFCHR | WHITEOUT_MODE, true, whiteout);
+}
+
+int f2fs_get_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
+ struct inode **new_inode)
+{
+ return __f2fs_tmpfile(mnt_userns, dir, NULL, S_IFREG, false, new_inode);
}

static int f2fs_rename(struct user_namespace *mnt_userns, struct inode *old_dir,
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index aedc3d334113..9dabf99eedc5 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -90,10 +90,6 @@ bool f2fs_available_free_memory(struct f2fs_sb_info *sbi, int type)
atomic_read(&sbi->total_ext_node) *
sizeof(struct extent_node)) >> PAGE_SHIFT;
res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 1);
- } else if (type == INMEM_PAGES) {
- /* it allows 20% / total_ram for inmemory pages */
- mem_size = get_pages(sbi, F2FS_INMEM_PAGES);
- res = mem_size < (val.totalram / 5);
} else if (type == DISCARD_CACHE) {
mem_size = (atomic_read(&dcc->discard_cmd_cnt) *
sizeof(struct discard_cmd)) >> PAGE_SHIFT;
diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h
index 4c1d34bfea78..3c09cae058b0 100644
--- a/fs/f2fs/node.h
+++ b/fs/f2fs/node.h
@@ -147,7 +147,6 @@ enum mem_type {
DIRTY_DENTS, /* indicates dirty dentry pages */
INO_ENTRIES, /* indicates inode entries */
EXTENT_CACHE, /* indicates extent cache */
- INMEM_PAGES, /* indicates inmemory pages */
DISCARD_CACHE, /* indicates memory of cached discard cmds */
COMPRESS_PAGE, /* indicates memory of cached compressed pages */
BASE_CHECK, /* check kernel status */
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index aa0162664a1e..331ad46722b2 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -30,7 +30,7 @@
static struct kmem_cache *discard_entry_slab;
static struct kmem_cache *discard_cmd_slab;
static struct kmem_cache *sit_entry_set_slab;
-static struct kmem_cache *inmem_entry_slab;
+static struct kmem_cache *revoke_entry_slab;

static unsigned long __reverse_ulong(unsigned char *str)
{
@@ -185,304 +185,180 @@ bool f2fs_need_SSR(struct f2fs_sb_info *sbi)
SM_I(sbi)->min_ssr_sections + reserved_sections(sbi));
}

-void f2fs_register_inmem_page(struct inode *inode, struct page *page)
+void f2fs_abort_atomic_write(struct inode *inode, bool clean)
{
- struct inmem_pages *new;
-
- set_page_private_atomic(page);
-
- new = f2fs_kmem_cache_alloc(inmem_entry_slab,
- GFP_NOFS, true, NULL);
-
- /* add atomic page indices to the list */
- new->page = page;
- INIT_LIST_HEAD(&new->list);
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ struct f2fs_inode_info *fi = F2FS_I(inode);

- /* increase reference count with clean state */
- get_page(page);
- mutex_lock(&F2FS_I(inode)->inmem_lock);
- list_add_tail(&new->list, &F2FS_I(inode)->inmem_pages);
- inc_page_count(F2FS_I_SB(inode), F2FS_INMEM_PAGES);
- mutex_unlock(&F2FS_I(inode)->inmem_lock);
+ if (f2fs_is_atomic_file(inode)) {
+ if (clean)
+ truncate_inode_pages_final(inode->i_mapping);
+ clear_inode_flag(fi->cow_inode, FI_COW_FILE);
+ iput(fi->cow_inode);
+ fi->cow_inode = NULL;
+ clear_inode_flag(inode, FI_ATOMIC_FILE);

- trace_f2fs_register_inmem_page(page, INMEM);
+ spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
+ sbi->atomic_files--;
+ spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
+ }
}

-static int __revoke_inmem_pages(struct inode *inode,
- struct list_head *head, bool drop, bool recover,
- bool trylock)
+static int __replace_atomic_write_block(struct inode *inode, pgoff_t index,
+ block_t new_addr, block_t *old_addr, bool recover)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
- struct inmem_pages *cur, *tmp;
- int err = 0;
-
- list_for_each_entry_safe(cur, tmp, head, list) {
- struct page *page = cur->page;
-
- if (drop)
- trace_f2fs_commit_inmem_page(page, INMEM_DROP);
-
- if (trylock) {
- /*
- * to avoid deadlock in between page lock and
- * inmem_lock.
- */
- if (!trylock_page(page))
- continue;
- } else {
- lock_page(page);
- }
-
- f2fs_wait_on_page_writeback(page, DATA, true, true);
-
- if (recover) {
- struct dnode_of_data dn;
- struct node_info ni;
+ struct dnode_of_data dn;
+ struct node_info ni;
+ int err;

- trace_f2fs_commit_inmem_page(page, INMEM_REVOKE);
retry:
- set_new_dnode(&dn, inode, NULL, NULL, 0);
- err = f2fs_get_dnode_of_data(&dn, page->index,
- LOOKUP_NODE);
- if (err) {
- if (err == -ENOMEM) {
- memalloc_retry_wait(GFP_NOFS);
- goto retry;
- }
- err = -EAGAIN;
- goto next;
- }
-
- err = f2fs_get_node_info(sbi, dn.nid, &ni, false);
- if (err) {
- f2fs_put_dnode(&dn);
- return err;
- }
-
- if (cur->old_addr == NEW_ADDR) {
- f2fs_invalidate_blocks(sbi, dn.data_blkaddr);
- f2fs_update_data_blkaddr(&dn, NEW_ADDR);
- } else
- f2fs_replace_block(sbi, &dn, dn.data_blkaddr,
- cur->old_addr, ni.version, true, true);
- f2fs_put_dnode(&dn);
- }
-next:
- /* we don't need to invalidate this in the sccessful status */
- if (drop || recover) {
- ClearPageUptodate(page);
- clear_page_private_gcing(page);
+ set_new_dnode(&dn, inode, NULL, NULL, 0);
+ err = f2fs_get_dnode_of_data(&dn, index, LOOKUP_NODE_RA);
+ if (err) {
+ if (err == -ENOMEM) {
+ f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
+ goto retry;
}
- detach_page_private(page);
- set_page_private(page, 0);
- f2fs_put_page(page, 1);
-
- list_del(&cur->list);
- kmem_cache_free(inmem_entry_slab, cur);
- dec_page_count(F2FS_I_SB(inode), F2FS_INMEM_PAGES);
+ return err;
}
- return err;
-}

-void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, bool gc_failure)
-{
- struct list_head *head = &sbi->inode_list[ATOMIC_FILE];
- struct inode *inode;
- struct f2fs_inode_info *fi;
- unsigned int count = sbi->atomic_files;
- unsigned int looped = 0;
-next:
- spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
- if (list_empty(head)) {
- spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
- return;
+ err = f2fs_get_node_info(sbi, dn.nid, &ni, false);
+ if (err) {
+ f2fs_put_dnode(&dn);
+ return err;
}
- fi = list_first_entry(head, struct f2fs_inode_info, inmem_ilist);
- inode = igrab(&fi->vfs_inode);
- if (inode)
- list_move_tail(&fi->inmem_ilist, head);
- spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);

- if (inode) {
- if (gc_failure) {
- if (!fi->i_gc_failures[GC_FAILURE_ATOMIC])
- goto skip;
+ if (recover) {
+ /* dn.data_blkaddr is always valid */
+ if (!__is_valid_data_blkaddr(new_addr)) {
+ if (new_addr == NULL_ADDR)
+ dec_valid_block_count(sbi, inode, 1);
+ f2fs_invalidate_blocks(sbi, dn.data_blkaddr);
+ f2fs_update_data_blkaddr(&dn, new_addr);
+ } else {
+ f2fs_replace_block(sbi, &dn, dn.data_blkaddr,
+ new_addr, ni.version, true, true);
}
- set_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
- f2fs_drop_inmem_pages(inode);
-skip:
- iput(inode);
- }
- f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
- if (gc_failure) {
- if (++looped >= count)
- return;
- }
- goto next;
-}
-
-void f2fs_drop_inmem_pages(struct inode *inode)
-{
- struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
- struct f2fs_inode_info *fi = F2FS_I(inode);
+ } else {
+ blkcnt_t count = 1;

- do {
- mutex_lock(&fi->inmem_lock);
- if (list_empty(&fi->inmem_pages)) {
- fi->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
-
- spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
- if (!list_empty(&fi->inmem_ilist))
- list_del_init(&fi->inmem_ilist);
- if (f2fs_is_atomic_file(inode)) {
- clear_inode_flag(inode, FI_ATOMIC_FILE);
- sbi->atomic_files--;
- }
- spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
+ *old_addr = dn.data_blkaddr;
+ f2fs_truncate_data_blocks_range(&dn, 1);
+ dec_valid_block_count(sbi, F2FS_I(inode)->cow_inode, count);
+ inc_valid_block_count(sbi, inode, &count);
+ f2fs_replace_block(sbi, &dn, dn.data_blkaddr, new_addr,
+ ni.version, true, false);
+ }

- mutex_unlock(&fi->inmem_lock);
- break;
- }
- __revoke_inmem_pages(inode, &fi->inmem_pages,
- true, false, true);
- mutex_unlock(&fi->inmem_lock);
- } while (1);
+ f2fs_put_dnode(&dn);
+ return 0;
}

-void f2fs_drop_inmem_page(struct inode *inode, struct page *page)
+static void __complete_revoke_list(struct inode *inode, struct list_head *head,
+ bool revoke)
{
- struct f2fs_inode_info *fi = F2FS_I(inode);
- struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
- struct list_head *head = &fi->inmem_pages;
- struct inmem_pages *cur = NULL;
- struct inmem_pages *tmp;
-
- f2fs_bug_on(sbi, !page_private_atomic(page));
+ struct revoke_entry *cur, *tmp;

- mutex_lock(&fi->inmem_lock);
- list_for_each_entry(tmp, head, list) {
- if (tmp->page == page) {
- cur = tmp;
- break;
- }
+ list_for_each_entry_safe(cur, tmp, head, list) {
+ if (revoke)
+ __replace_atomic_write_block(inode, cur->index,
+ cur->old_addr, NULL, true);
+ list_del(&cur->list);
+ kmem_cache_free(revoke_entry_slab, cur);
}
-
- f2fs_bug_on(sbi, !cur);
- list_del(&cur->list);
- mutex_unlock(&fi->inmem_lock);
-
- dec_page_count(sbi, F2FS_INMEM_PAGES);
- kmem_cache_free(inmem_entry_slab, cur);
-
- ClearPageUptodate(page);
- clear_page_private_atomic(page);
- f2fs_put_page(page, 0);
-
- detach_page_private(page);
- set_page_private(page, 0);
-
- trace_f2fs_commit_inmem_page(page, INMEM_INVALIDATE);
}

-static int __f2fs_commit_inmem_pages(struct inode *inode)
+static int __f2fs_commit_atomic_write(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct f2fs_inode_info *fi = F2FS_I(inode);
- struct inmem_pages *cur, *tmp;
- struct f2fs_io_info fio = {
- .sbi = sbi,
- .ino = inode->i_ino,
- .type = DATA,
- .op = REQ_OP_WRITE,
- .op_flags = REQ_SYNC | REQ_PRIO,
- .io_type = FS_DATA_IO,
- };
+ struct inode *cow_inode = fi->cow_inode;
+ struct revoke_entry *new;
struct list_head revoke_list;
- bool submit_bio = false;
- int err = 0;
+ block_t blkaddr;
+ struct dnode_of_data dn;
+ pgoff_t len = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
+ pgoff_t off = 0, blen, index;
+ int ret = 0, i;

INIT_LIST_HEAD(&revoke_list);

- list_for_each_entry_safe(cur, tmp, &fi->inmem_pages, list) {
- struct page *page = cur->page;
+ while (len) {
+ blen = min_t(pgoff_t, ADDRS_PER_BLOCK(cow_inode), len);

- lock_page(page);
- if (page->mapping == inode->i_mapping) {
- trace_f2fs_commit_inmem_page(page, INMEM);
+ set_new_dnode(&dn, cow_inode, NULL, NULL, 0);
+ ret = f2fs_get_dnode_of_data(&dn, off, LOOKUP_NODE_RA);
+ if (ret && ret != -ENOENT) {
+ goto out;
+ } else if (ret == -ENOENT) {
+ ret = 0;
+ if (dn.max_level == 0)
+ goto out;
+ goto next;
+ }

- f2fs_wait_on_page_writeback(page, DATA, true, true);
+ blen = min((pgoff_t)ADDRS_PER_PAGE(dn.node_page, cow_inode),
+ len);
+ index = off;
+ for (i = 0; i < blen; i++, dn.ofs_in_node++, index++) {
+ blkaddr = f2fs_data_blkaddr(&dn);

- set_page_dirty(page);
- if (clear_page_dirty_for_io(page)) {
- inode_dec_dirty_pages(inode);
- f2fs_remove_dirty_inode(inode);
- }
-retry:
- fio.page = page;
- fio.old_blkaddr = NULL_ADDR;
- fio.encrypted_page = NULL;
- fio.need_lock = LOCK_DONE;
- err = f2fs_do_write_data_page(&fio);
- if (err) {
- if (err == -ENOMEM) {
- memalloc_retry_wait(GFP_NOFS);
- goto retry;
- }
- unlock_page(page);
- break;
+ if (!__is_valid_data_blkaddr(blkaddr)) {
+ continue;
+ } else if (!f2fs_is_valid_blkaddr(sbi, blkaddr,
+ DATA_GENERIC_ENHANCE)) {
+ f2fs_put_dnode(&dn);
+ ret = -EFSCORRUPTED;
+ goto out;
}
- /* record old blkaddr for revoking */
- cur->old_addr = fio.old_blkaddr;
- submit_bio = true;
- }
- unlock_page(page);
- list_move_tail(&cur->list, &revoke_list);
- }

- if (submit_bio)
- f2fs_submit_merged_write_cond(sbi, inode, NULL, 0, DATA);
+ new = f2fs_kmem_cache_alloc(revoke_entry_slab, GFP_NOFS,
+ true, NULL);
+ if (!new) {
+ f2fs_put_dnode(&dn);
+ ret = -ENOMEM;
+ goto out;
+ }

- if (err) {
- /*
- * try to revoke all committed pages, but still we could fail
- * due to no memory or other reason, if that happened, EAGAIN
- * will be returned, which means in such case, transaction is
- * already not integrity, caller should use journal to do the
- * recovery or rewrite & commit last transaction. For other
- * error number, revoking was done by filesystem itself.
- */
- err = __revoke_inmem_pages(inode, &revoke_list,
- false, true, false);
+ ret = __replace_atomic_write_block(inode, index, blkaddr,
+ &new->old_addr, false);
+ if (ret) {
+ f2fs_put_dnode(&dn);
+ kmem_cache_free(revoke_entry_slab, new);
+ goto out;
+ }

- /* drop all uncommitted pages */
- __revoke_inmem_pages(inode, &fi->inmem_pages,
- true, false, false);
- } else {
- __revoke_inmem_pages(inode, &revoke_list,
- false, false, false);
+ f2fs_update_data_blkaddr(&dn, NULL_ADDR);
+ new->index = index;
+ list_add_tail(&new->list, &revoke_list);
+ }
+ f2fs_put_dnode(&dn);
+next:
+ off += blen;
+ len -= blen;
}

- return err;
+out:
+ __complete_revoke_list(inode, &revoke_list, ret ? true : false);
+
+ return ret;
}

-int f2fs_commit_inmem_pages(struct inode *inode)
+int f2fs_commit_atomic_write(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct f2fs_inode_info *fi = F2FS_I(inode);
int err;

- f2fs_balance_fs(sbi, true);
+ err = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX);
+ if (err)
+ return err;

f2fs_down_write(&fi->i_gc_rwsem[WRITE]);
-
f2fs_lock_op(sbi);
- set_inode_flag(inode, FI_ATOMIC_COMMIT);
-
- mutex_lock(&fi->inmem_lock);
- err = __f2fs_commit_inmem_pages(inode);
- mutex_unlock(&fi->inmem_lock);

- clear_inode_flag(inode, FI_ATOMIC_COMMIT);
+ err = __f2fs_commit_atomic_write(inode);

f2fs_unlock_op(sbi);
f2fs_up_write(&fi->i_gc_rwsem[WRITE]);
@@ -3291,8 +3167,7 @@ static int __get_segment_type_6(struct f2fs_io_info *fio)
return CURSEG_COLD_DATA;
if (file_is_hot(inode) ||
is_inode_flag_set(inode, FI_HOT_DATA) ||
- f2fs_is_atomic_file(inode) ||
- f2fs_is_volatile_file(inode))
+ f2fs_is_cow_file(inode))
return CURSEG_HOT_DATA;
return f2fs_rw_hint_to_seg_type(inode->i_write_hint);
} else {
@@ -4653,6 +4528,13 @@ static int init_victim_secmap(struct f2fs_sb_info *sbi)
dirty_i->victim_secmap = f2fs_kvzalloc(sbi, bitmap_size, GFP_KERNEL);
if (!dirty_i->victim_secmap)
return -ENOMEM;
+
+ dirty_i->pinned_secmap = f2fs_kvzalloc(sbi, bitmap_size, GFP_KERNEL);
+ if (!dirty_i->pinned_secmap)
+ return -ENOMEM;
+
+ dirty_i->pinned_secmap_cnt = 0;
+ dirty_i->enable_pin_section = true;
return 0;
}

@@ -5241,6 +5123,7 @@ static void destroy_victim_secmap(struct f2fs_sb_info *sbi)
{
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);

+ kvfree(dirty_i->pinned_secmap);
kvfree(dirty_i->victim_secmap);
}

@@ -5351,9 +5234,9 @@ int __init f2fs_create_segment_manager_caches(void)
if (!sit_entry_set_slab)
goto destroy_discard_cmd;

- inmem_entry_slab = f2fs_kmem_cache_create("f2fs_inmem_page_entry",
- sizeof(struct inmem_pages));
- if (!inmem_entry_slab)
+ revoke_entry_slab = f2fs_kmem_cache_create("f2fs_revoke_entry",
+ sizeof(struct revoke_entry));
+ if (!revoke_entry_slab)
goto destroy_sit_entry_set;
return 0;

@@ -5372,5 +5255,5 @@ void f2fs_destroy_segment_manager_caches(void)
kmem_cache_destroy(sit_entry_set_slab);
kmem_cache_destroy(discard_cmd_slab);
kmem_cache_destroy(discard_entry_slab);
- kmem_cache_destroy(inmem_entry_slab);
+ kmem_cache_destroy(revoke_entry_slab);
}
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 1fa26a9603cb..3f277dfcb131 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -225,10 +225,10 @@ struct segment_allocation {

#define MAX_SKIP_GC_COUNT 16

-struct inmem_pages {
+struct revoke_entry {
struct list_head list;
- struct page *page;
block_t old_addr; /* for revoking when fail to commit */
+ pgoff_t index;
};

struct sit_info {
@@ -295,6 +295,9 @@ struct dirty_seglist_info {
struct mutex seglist_lock; /* lock for segment bitmaps */
int nr_dirty[NR_DIRTY_TYPE]; /* # of dirty segments */
unsigned long *victim_secmap; /* background GC victims */
+ unsigned long *pinned_secmap; /* pinned victims from foreground GC */
+ unsigned int pinned_secmap_cnt; /* count of victims which has pinned data */
+ bool enable_pin_section; /* enable pinning section */
};

/* victim selection function for cleaning and SSR */
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 1b59f95606c7..e3539c027a6c 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1339,9 +1339,6 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
spin_lock_init(&fi->i_size_lock);
INIT_LIST_HEAD(&fi->dirty_list);
INIT_LIST_HEAD(&fi->gdirty_list);
- INIT_LIST_HEAD(&fi->inmem_ilist);
- INIT_LIST_HEAD(&fi->inmem_pages);
- mutex_init(&fi->inmem_lock);
init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
init_f2fs_rwsem(&fi->i_xattr_sem);
@@ -1382,9 +1379,8 @@ static int f2fs_drop_inode(struct inode *inode)
atomic_inc(&inode->i_count);
spin_unlock(&inode->i_lock);

- /* some remained atomic pages should discarded */
if (f2fs_is_atomic_file(inode))
- f2fs_drop_inmem_pages(inode);
+ f2fs_abort_atomic_write(inode, true);

/* should remain fi->extent_tree for writepage */
f2fs_destroy_extent_node(inode);
diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c
index 3d793202cc9f..5ac7e756a1bb 100644
--- a/fs/f2fs/verity.c
+++ b/fs/f2fs/verity.c
@@ -128,7 +128,7 @@ static int f2fs_begin_enable_verity(struct file *filp)
if (f2fs_verity_in_progress(inode))
return -EBUSY;

- if (f2fs_is_atomic_file(inode) || f2fs_is_volatile_file(inode))
+ if (f2fs_is_atomic_file(inode))
return -EOPNOTSUPP;

/*
diff --git a/fs/fuse/control.c b/fs/fuse/control.c
index 7cede9a3bc96..247ef4f76761 100644
--- a/fs/fuse/control.c
+++ b/fs/fuse/control.c
@@ -258,7 +258,7 @@ int fuse_ctl_add_conn(struct fuse_conn *fc)
struct dentry *parent;
char name[32];

- if (!fuse_control_sb)
+ if (!fuse_control_sb || fc->no_control)
return 0;

parent = fuse_control_sb->s_root;
@@ -296,7 +296,7 @@ void fuse_ctl_remove_conn(struct fuse_conn *fc)
{
int i;

- if (!fuse_control_sb)
+ if (!fuse_control_sb || fc->no_control)
return;

for (i = fc->ctl_ndents - 1; i >= 0; i--) {
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 9ff27b8a9782..9dfee44e97ad 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -537,6 +537,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
struct fuse_file *ff;
void *security_ctx = NULL;
u32 security_ctxlen;
+ bool trunc = flags & O_TRUNC;

/* Userspace expects S_IFREG in create mode */
BUG_ON((mode & S_IFMT) != S_IFREG);
@@ -561,7 +562,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
inarg.mode = mode;
inarg.umask = current_umask();

- if (fm->fc->handle_killpriv_v2 && (flags & O_TRUNC) &&
+ if (fm->fc->handle_killpriv_v2 && trunc &&
!(flags & O_EXCL) && !capable(CAP_FSETID)) {
inarg.open_flags |= FUSE_OPEN_KILL_SUIDGID;
}
@@ -623,6 +624,10 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
} else {
file->private_data = ff;
fuse_finish_open(inode, file);
+ if (fm->fc->atomic_o_trunc && trunc)
+ truncate_pagecache(inode, 0);
+ else if (!(ff->open_flags & FOPEN_KEEP_CACHE))
+ invalidate_inode_pages2(inode->i_mapping);
}
return err;

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index f18d14d5fea1..9b64e2ff1c96 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -210,13 +210,9 @@ void fuse_finish_open(struct inode *inode, struct file *file)
fi->attr_version = atomic64_inc_return(&fc->attr_version);
i_size_write(inode, 0);
spin_unlock(&fi->lock);
- truncate_pagecache(inode, 0);
file_update_time(file);
fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE);
- } else if (!(ff->open_flags & FOPEN_KEEP_CACHE)) {
- invalidate_inode_pages2(inode->i_mapping);
}
-
if ((file->f_mode & FMODE_WRITE) && fc->writeback_cache)
fuse_link_write_file(file);
}
@@ -239,30 +235,38 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir)
if (err)
return err;

- if (is_wb_truncate || dax_truncate) {
+ if (is_wb_truncate || dax_truncate)
inode_lock(inode);
- fuse_set_nowrite(inode);
- }

if (dax_truncate) {
filemap_invalidate_lock(inode->i_mapping);
err = fuse_dax_break_layouts(inode, 0, 0);
if (err)
- goto out;
+ goto out_inode_unlock;
}

+ if (is_wb_truncate || dax_truncate)
+ fuse_set_nowrite(inode);
+
err = fuse_do_open(fm, get_node_id(inode), file, isdir);
if (!err)
fuse_finish_open(inode, file);

-out:
+ if (is_wb_truncate || dax_truncate)
+ fuse_release_nowrite(inode);
+ if (!err) {
+ struct fuse_file *ff = file->private_data;
+
+ if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC))
+ truncate_pagecache(inode, 0);
+ else if (!(ff->open_flags & FOPEN_KEEP_CACHE))
+ invalidate_inode_pages2(inode->i_mapping);
+ }
if (dax_truncate)
filemap_invalidate_unlock(inode->i_mapping);
-
- if (is_wb_truncate | dax_truncate) {
- fuse_release_nowrite(inode);
+out_inode_unlock:
+ if (is_wb_truncate || dax_truncate)
inode_unlock(inode);
- }

return err;
}
@@ -338,6 +342,15 @@ static int fuse_open(struct inode *inode, struct file *file)

static int fuse_release(struct inode *inode, struct file *file)
{
+ struct fuse_conn *fc = get_fuse_conn(inode);
+
+ /*
+ * Dirty pages might remain despite write_inode_now() call from
+ * fuse_flush() due to writes racing with the close.
+ */
+ if (fc->writeback_cache)
+ write_inode_now(inode, 1);
+
fuse_release_common(file, false);

/* return value is ignored by VFS */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 8c0665c5dff8..7c290089e693 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -180,6 +180,12 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
inode->i_uid = make_kuid(fc->user_ns, attr->uid);
inode->i_gid = make_kgid(fc->user_ns, attr->gid);
inode->i_blocks = attr->blocks;
+
+ /* Sanitize nsecs */
+ attr->atimensec = min_t(u32, attr->atimensec, NSEC_PER_SEC - 1);
+ attr->mtimensec = min_t(u32, attr->mtimensec, NSEC_PER_SEC - 1);
+ attr->ctimensec = min_t(u32, attr->ctimensec, NSEC_PER_SEC - 1);
+
inode->i_atime.tv_sec = attr->atime;
inode->i_atime.tv_nsec = attr->atimensec;
/* mtime from server may be stale due to local buffered write */
diff --git a/fs/fuse/ioctl.c b/fs/fuse/ioctl.c
index 33cde4bbccdc..61d8afcb10a3 100644
--- a/fs/fuse/ioctl.c
+++ b/fs/fuse/ioctl.c
@@ -9,6 +9,17 @@
#include <linux/compat.h>
#include <linux/fileattr.h>

+static ssize_t fuse_send_ioctl(struct fuse_mount *fm, struct fuse_args *args)
+{
+ ssize_t ret = fuse_simple_request(fm, args);
+
+ /* Translate ENOSYS, which shouldn't be returned from fs */
+ if (ret == -ENOSYS)
+ ret = -ENOTTY;
+
+ return ret;
+}
+
/*
* CUSE servers compiled on 32bit broke on 64bit kernels because the
* ABI was defined to be 'struct iovec' which is different on 32bit
@@ -259,7 +270,7 @@ long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg,
ap.args.out_pages = true;
ap.args.out_argvar = true;

- transferred = fuse_simple_request(fm, &ap.args);
+ transferred = fuse_send_ioctl(fm, &ap.args);
err = transferred;
if (transferred < 0)
goto out;
@@ -393,7 +404,7 @@ static int fuse_priv_ioctl(struct inode *inode, struct fuse_file *ff,
args.out_args[1].size = inarg.out_size;
args.out_args[1].value = ptr;

- err = fuse_simple_request(fm, &args);
+ err = fuse_send_ioctl(fm, &args);
if (!err) {
if (outarg.result < 0)
err = outarg.result;
diff --git a/fs/io-wq.c b/fs/io-wq.c
deleted file mode 100644
index 32aeb2c581c5..000000000000
--- a/fs/io-wq.c
+++ /dev/null
@@ -1,1424 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Basic worker thread pool for io_uring
- *
- * Copyright (C) 2019 Jens Axboe
- *
- */
-#include <linux/kernel.h>
-#include <linux/init.h>
-#include <linux/errno.h>
-#include <linux/sched/signal.h>
-#include <linux/percpu.h>
-#include <linux/slab.h>
-#include <linux/rculist_nulls.h>
-#include <linux/cpu.h>
-#include <linux/task_work.h>
-#include <linux/audit.h>
-#include <uapi/linux/io_uring.h>
-
-#include "io-wq.h"
-
-#define WORKER_IDLE_TIMEOUT (5 * HZ)
-
-enum {
- IO_WORKER_F_UP = 1, /* up and active */
- IO_WORKER_F_RUNNING = 2, /* account as running */
- IO_WORKER_F_FREE = 4, /* worker on free list */
- IO_WORKER_F_BOUND = 8, /* is doing bounded work */
-};
-
-enum {
- IO_WQ_BIT_EXIT = 0, /* wq exiting */
-};
-
-enum {
- IO_ACCT_STALLED_BIT = 0, /* stalled on hash */
-};
-
-/*
- * One for each thread in a wqe pool
- */
-struct io_worker {
- refcount_t ref;
- unsigned flags;
- struct hlist_nulls_node nulls_node;
- struct list_head all_list;
- struct task_struct *task;
- struct io_wqe *wqe;
-
- struct io_wq_work *cur_work;
- struct io_wq_work *next_work;
- raw_spinlock_t lock;
-
- struct completion ref_done;
-
- unsigned long create_state;
- struct callback_head create_work;
- int create_index;
-
- union {
- struct rcu_head rcu;
- struct work_struct work;
- };
-};
-
-#if BITS_PER_LONG == 64
-#define IO_WQ_HASH_ORDER 6
-#else
-#define IO_WQ_HASH_ORDER 5
-#endif
-
-#define IO_WQ_NR_HASH_BUCKETS (1u << IO_WQ_HASH_ORDER)
-
-struct io_wqe_acct {
- unsigned nr_workers;
- unsigned max_workers;
- int index;
- atomic_t nr_running;
- raw_spinlock_t lock;
- struct io_wq_work_list work_list;
- unsigned long flags;
-};
-
-enum {
- IO_WQ_ACCT_BOUND,
- IO_WQ_ACCT_UNBOUND,
- IO_WQ_ACCT_NR,
-};
-
-/*
- * Per-node worker thread pool
- */
-struct io_wqe {
- raw_spinlock_t lock;
- struct io_wqe_acct acct[IO_WQ_ACCT_NR];
-
- int node;
-
- struct hlist_nulls_head free_list;
- struct list_head all_list;
-
- struct wait_queue_entry wait;
-
- struct io_wq *wq;
- struct io_wq_work *hash_tail[IO_WQ_NR_HASH_BUCKETS];
-
- cpumask_var_t cpu_mask;
-};
-
-/*
- * Per io_wq state
- */
-struct io_wq {
- unsigned long state;
-
- free_work_fn *free_work;
- io_wq_work_fn *do_work;
-
- struct io_wq_hash *hash;
-
- atomic_t worker_refs;
- struct completion worker_done;
-
- struct hlist_node cpuhp_node;
-
- struct task_struct *task;
-
- struct io_wqe *wqes[];
-};
-
-static enum cpuhp_state io_wq_online;
-
-struct io_cb_cancel_data {
- work_cancel_fn *fn;
- void *data;
- int nr_running;
- int nr_pending;
- bool cancel_all;
-};
-
-static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index);
-static void io_wqe_dec_running(struct io_worker *worker);
-static bool io_acct_cancel_pending_work(struct io_wqe *wqe,
- struct io_wqe_acct *acct,
- struct io_cb_cancel_data *match);
-static void create_worker_cb(struct callback_head *cb);
-static void io_wq_cancel_tw_create(struct io_wq *wq);
-
-static bool io_worker_get(struct io_worker *worker)
-{
- return refcount_inc_not_zero(&worker->ref);
-}
-
-static void io_worker_release(struct io_worker *worker)
-{
- if (refcount_dec_and_test(&worker->ref))
- complete(&worker->ref_done);
-}
-
-static inline struct io_wqe_acct *io_get_acct(struct io_wqe *wqe, bool bound)
-{
- return &wqe->acct[bound ? IO_WQ_ACCT_BOUND : IO_WQ_ACCT_UNBOUND];
-}
-
-static inline struct io_wqe_acct *io_work_get_acct(struct io_wqe *wqe,
- struct io_wq_work *work)
-{
- return io_get_acct(wqe, !(work->flags & IO_WQ_WORK_UNBOUND));
-}
-
-static inline struct io_wqe_acct *io_wqe_get_acct(struct io_worker *worker)
-{
- return io_get_acct(worker->wqe, worker->flags & IO_WORKER_F_BOUND);
-}
-
-static void io_worker_ref_put(struct io_wq *wq)
-{
- if (atomic_dec_and_test(&wq->worker_refs))
- complete(&wq->worker_done);
-}
-
-static void io_worker_cancel_cb(struct io_worker *worker)
-{
- struct io_wqe_acct *acct = io_wqe_get_acct(worker);
- struct io_wqe *wqe = worker->wqe;
- struct io_wq *wq = wqe->wq;
-
- atomic_dec(&acct->nr_running);
- raw_spin_lock(&worker->wqe->lock);
- acct->nr_workers--;
- raw_spin_unlock(&worker->wqe->lock);
- io_worker_ref_put(wq);
- clear_bit_unlock(0, &worker->create_state);
- io_worker_release(worker);
-}
-
-static bool io_task_worker_match(struct callback_head *cb, void *data)
-{
- struct io_worker *worker;
-
- if (cb->func != create_worker_cb)
- return false;
- worker = container_of(cb, struct io_worker, create_work);
- return worker == data;
-}
-
-static void io_worker_exit(struct io_worker *worker)
-{
- struct io_wqe *wqe = worker->wqe;
- struct io_wq *wq = wqe->wq;
-
- while (1) {
- struct callback_head *cb = task_work_cancel_match(wq->task,
- io_task_worker_match, worker);
-
- if (!cb)
- break;
- io_worker_cancel_cb(worker);
- }
-
- io_worker_release(worker);
- wait_for_completion(&worker->ref_done);
-
- raw_spin_lock(&wqe->lock);
- if (worker->flags & IO_WORKER_F_FREE)
- hlist_nulls_del_rcu(&worker->nulls_node);
- list_del_rcu(&worker->all_list);
- raw_spin_unlock(&wqe->lock);
- io_wqe_dec_running(worker);
- worker->flags = 0;
- preempt_disable();
- current->flags &= ~PF_IO_WORKER;
- preempt_enable();
-
- kfree_rcu(worker, rcu);
- io_worker_ref_put(wqe->wq);
- do_exit(0);
-}
-
-static inline bool io_acct_run_queue(struct io_wqe_acct *acct)
-{
- bool ret = false;
-
- raw_spin_lock(&acct->lock);
- if (!wq_list_empty(&acct->work_list) &&
- !test_bit(IO_ACCT_STALLED_BIT, &acct->flags))
- ret = true;
- raw_spin_unlock(&acct->lock);
-
- return ret;
-}
-
-/*
- * Check head of free list for an available worker. If one isn't available,
- * caller must create one.
- */
-static bool io_wqe_activate_free_worker(struct io_wqe *wqe,
- struct io_wqe_acct *acct)
- __must_hold(RCU)
-{
- struct hlist_nulls_node *n;
- struct io_worker *worker;
-
- /*
- * Iterate free_list and see if we can find an idle worker to
- * activate. If a given worker is on the free_list but in the process
- * of exiting, keep trying.
- */
- hlist_nulls_for_each_entry_rcu(worker, n, &wqe->free_list, nulls_node) {
- if (!io_worker_get(worker))
- continue;
- if (io_wqe_get_acct(worker) != acct) {
- io_worker_release(worker);
- continue;
- }
- if (wake_up_process(worker->task)) {
- io_worker_release(worker);
- return true;
- }
- io_worker_release(worker);
- }
-
- return false;
-}
-
-/*
- * We need a worker. If we find a free one, we're good. If not, and we're
- * below the max number of workers, create one.
- */
-static bool io_wqe_create_worker(struct io_wqe *wqe, struct io_wqe_acct *acct)
-{
- /*
- * Most likely an attempt to queue unbounded work on an io_wq that
- * wasn't setup with any unbounded workers.
- */
- if (unlikely(!acct->max_workers))
- pr_warn_once("io-wq is not configured for unbound workers");
-
- raw_spin_lock(&wqe->lock);
- if (acct->nr_workers >= acct->max_workers) {
- raw_spin_unlock(&wqe->lock);
- return true;
- }
- acct->nr_workers++;
- raw_spin_unlock(&wqe->lock);
- atomic_inc(&acct->nr_running);
- atomic_inc(&wqe->wq->worker_refs);
- return create_io_worker(wqe->wq, wqe, acct->index);
-}
-
-static void io_wqe_inc_running(struct io_worker *worker)
-{
- struct io_wqe_acct *acct = io_wqe_get_acct(worker);
-
- atomic_inc(&acct->nr_running);
-}
-
-static void create_worker_cb(struct callback_head *cb)
-{
- struct io_worker *worker;
- struct io_wq *wq;
- struct io_wqe *wqe;
- struct io_wqe_acct *acct;
- bool do_create = false;
-
- worker = container_of(cb, struct io_worker, create_work);
- wqe = worker->wqe;
- wq = wqe->wq;
- acct = &wqe->acct[worker->create_index];
- raw_spin_lock(&wqe->lock);
- if (acct->nr_workers < acct->max_workers) {
- acct->nr_workers++;
- do_create = true;
- }
- raw_spin_unlock(&wqe->lock);
- if (do_create) {
- create_io_worker(wq, wqe, worker->create_index);
- } else {
- atomic_dec(&acct->nr_running);
- io_worker_ref_put(wq);
- }
- clear_bit_unlock(0, &worker->create_state);
- io_worker_release(worker);
-}
-
-static bool io_queue_worker_create(struct io_worker *worker,
- struct io_wqe_acct *acct,
- task_work_func_t func)
-{
- struct io_wqe *wqe = worker->wqe;
- struct io_wq *wq = wqe->wq;
-
- /* raced with exit, just ignore create call */
- if (test_bit(IO_WQ_BIT_EXIT, &wq->state))
- goto fail;
- if (!io_worker_get(worker))
- goto fail;
- /*
- * create_state manages ownership of create_work/index. We should
- * only need one entry per worker, as the worker going to sleep
- * will trigger the condition, and waking will clear it once it
- * runs the task_work.
- */
- if (test_bit(0, &worker->create_state) ||
- test_and_set_bit_lock(0, &worker->create_state))
- goto fail_release;
-
- atomic_inc(&wq->worker_refs);
- init_task_work(&worker->create_work, func);
- worker->create_index = acct->index;
- if (!task_work_add(wq->task, &worker->create_work, TWA_SIGNAL)) {
- /*
- * EXIT may have been set after checking it above, check after
- * adding the task_work and remove any creation item if it is
- * now set. wq exit does that too, but we can have added this
- * work item after we canceled in io_wq_exit_workers().
- */
- if (test_bit(IO_WQ_BIT_EXIT, &wq->state))
- io_wq_cancel_tw_create(wq);
- io_worker_ref_put(wq);
- return true;
- }
- io_worker_ref_put(wq);
- clear_bit_unlock(0, &worker->create_state);
-fail_release:
- io_worker_release(worker);
-fail:
- atomic_dec(&acct->nr_running);
- io_worker_ref_put(wq);
- return false;
-}
-
-static void io_wqe_dec_running(struct io_worker *worker)
-{
- struct io_wqe_acct *acct = io_wqe_get_acct(worker);
- struct io_wqe *wqe = worker->wqe;
-
- if (!(worker->flags & IO_WORKER_F_UP))
- return;
-
- if (!atomic_dec_and_test(&acct->nr_running))
- return;
- if (!io_acct_run_queue(acct))
- return;
-
- atomic_inc(&acct->nr_running);
- atomic_inc(&wqe->wq->worker_refs);
- io_queue_worker_create(worker, acct, create_worker_cb);
-}
-
-/*
- * Worker will start processing some work. Move it to the busy list, if
- * it's currently on the freelist
- */
-static void __io_worker_busy(struct io_wqe *wqe, struct io_worker *worker)
-{
- if (worker->flags & IO_WORKER_F_FREE) {
- worker->flags &= ~IO_WORKER_F_FREE;
- raw_spin_lock(&wqe->lock);
- hlist_nulls_del_init_rcu(&worker->nulls_node);
- raw_spin_unlock(&wqe->lock);
- }
-}
-
-/*
- * No work, worker going to sleep. Move to freelist, and unuse mm if we
- * have one attached. Dropping the mm may potentially sleep, so we drop
- * the lock in that case and return success. Since the caller has to
- * retry the loop in that case (we changed task state), we don't regrab
- * the lock if we return success.
- */
-static void __io_worker_idle(struct io_wqe *wqe, struct io_worker *worker)
- __must_hold(wqe->lock)
-{
- if (!(worker->flags & IO_WORKER_F_FREE)) {
- worker->flags |= IO_WORKER_F_FREE;
- hlist_nulls_add_head_rcu(&worker->nulls_node, &wqe->free_list);
- }
-}
-
-static inline unsigned int io_get_work_hash(struct io_wq_work *work)
-{
- return work->flags >> IO_WQ_HASH_SHIFT;
-}
-
-static bool io_wait_on_hash(struct io_wqe *wqe, unsigned int hash)
-{
- struct io_wq *wq = wqe->wq;
- bool ret = false;
-
- spin_lock_irq(&wq->hash->wait.lock);
- if (list_empty(&wqe->wait.entry)) {
- __add_wait_queue(&wq->hash->wait, &wqe->wait);
- if (!test_bit(hash, &wq->hash->map)) {
- __set_current_state(TASK_RUNNING);
- list_del_init(&wqe->wait.entry);
- ret = true;
- }
- }
- spin_unlock_irq(&wq->hash->wait.lock);
- return ret;
-}
-
-static struct io_wq_work *io_get_next_work(struct io_wqe_acct *acct,
- struct io_worker *worker)
- __must_hold(acct->lock)
-{
- struct io_wq_work_node *node, *prev;
- struct io_wq_work *work, *tail;
- unsigned int stall_hash = -1U;
- struct io_wqe *wqe = worker->wqe;
-
- wq_list_for_each(node, prev, &acct->work_list) {
- unsigned int hash;
-
- work = container_of(node, struct io_wq_work, list);
-
- /* not hashed, can run anytime */
- if (!io_wq_is_hashed(work)) {
- wq_list_del(&acct->work_list, node, prev);
- return work;
- }
-
- hash = io_get_work_hash(work);
- /* all items with this hash lie in [work, tail] */
- tail = wqe->hash_tail[hash];
-
- /* hashed, can run if not already running */
- if (!test_and_set_bit(hash, &wqe->wq->hash->map)) {
- wqe->hash_tail[hash] = NULL;
- wq_list_cut(&acct->work_list, &tail->list, prev);
- return work;
- }
- if (stall_hash == -1U)
- stall_hash = hash;
- /* fast forward to a next hash, for-each will fix up @prev */
- node = &tail->list;
- }
-
- if (stall_hash != -1U) {
- bool unstalled;
-
- /*
- * Set this before dropping the lock to avoid racing with new
- * work being added and clearing the stalled bit.
- */
- set_bit(IO_ACCT_STALLED_BIT, &acct->flags);
- raw_spin_unlock(&acct->lock);
- unstalled = io_wait_on_hash(wqe, stall_hash);
- raw_spin_lock(&acct->lock);
- if (unstalled) {
- clear_bit(IO_ACCT_STALLED_BIT, &acct->flags);
- if (wq_has_sleeper(&wqe->wq->hash->wait))
- wake_up(&wqe->wq->hash->wait);
- }
- }
-
- return NULL;
-}
-
-static bool io_flush_signals(void)
-{
- if (unlikely(test_thread_flag(TIF_NOTIFY_SIGNAL))) {
- __set_current_state(TASK_RUNNING);
- clear_notify_signal();
- if (task_work_pending(current))
- task_work_run();
- return true;
- }
- return false;
-}
-
-static void io_assign_current_work(struct io_worker *worker,
- struct io_wq_work *work)
-{
- if (work) {
- io_flush_signals();
- cond_resched();
- }
-
- raw_spin_lock(&worker->lock);
- worker->cur_work = work;
- worker->next_work = NULL;
- raw_spin_unlock(&worker->lock);
-}
-
-static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work);
-
-static void io_worker_handle_work(struct io_worker *worker)
-{
- struct io_wqe_acct *acct = io_wqe_get_acct(worker);
- struct io_wqe *wqe = worker->wqe;
- struct io_wq *wq = wqe->wq;
- bool do_kill = test_bit(IO_WQ_BIT_EXIT, &wq->state);
-
- do {
- struct io_wq_work *work;
-
- /*
- * If we got some work, mark us as busy. If we didn't, but
- * the list isn't empty, it means we stalled on hashed work.
- * Mark us stalled so we don't keep looking for work when we
- * can't make progress, any work completion or insertion will
- * clear the stalled flag.
- */
- raw_spin_lock(&acct->lock);
- work = io_get_next_work(acct, worker);
- raw_spin_unlock(&acct->lock);
- if (work) {
- __io_worker_busy(wqe, worker);
-
- /*
- * Make sure cancelation can find this, even before
- * it becomes the active work. That avoids a window
- * where the work has been removed from our general
- * work list, but isn't yet discoverable as the
- * current work item for this worker.
- */
- raw_spin_lock(&worker->lock);
- worker->next_work = work;
- raw_spin_unlock(&worker->lock);
- } else {
- break;
- }
- io_assign_current_work(worker, work);
- __set_current_state(TASK_RUNNING);
-
- /* handle a whole dependent link */
- do {
- struct io_wq_work *next_hashed, *linked;
- unsigned int hash = io_get_work_hash(work);
-
- next_hashed = wq_next_work(work);
-
- if (unlikely(do_kill) && (work->flags & IO_WQ_WORK_UNBOUND))
- work->flags |= IO_WQ_WORK_CANCEL;
- wq->do_work(work);
- io_assign_current_work(worker, NULL);
-
- linked = wq->free_work(work);
- work = next_hashed;
- if (!work && linked && !io_wq_is_hashed(linked)) {
- work = linked;
- linked = NULL;
- }
- io_assign_current_work(worker, work);
- if (linked)
- io_wqe_enqueue(wqe, linked);
-
- if (hash != -1U && !next_hashed) {
- /* serialize hash clear with wake_up() */
- spin_lock_irq(&wq->hash->wait.lock);
- clear_bit(hash, &wq->hash->map);
- clear_bit(IO_ACCT_STALLED_BIT, &acct->flags);
- spin_unlock_irq(&wq->hash->wait.lock);
- if (wq_has_sleeper(&wq->hash->wait))
- wake_up(&wq->hash->wait);
- }
- } while (work);
- } while (1);
-}
-
-static int io_wqe_worker(void *data)
-{
- struct io_worker *worker = data;
- struct io_wqe_acct *acct = io_wqe_get_acct(worker);
- struct io_wqe *wqe = worker->wqe;
- struct io_wq *wq = wqe->wq;
- bool last_timeout = false;
- char buf[TASK_COMM_LEN];
-
- worker->flags |= (IO_WORKER_F_UP | IO_WORKER_F_RUNNING);
-
- snprintf(buf, sizeof(buf), "iou-wrk-%d", wq->task->pid);
- set_task_comm(current, buf);
-
- audit_alloc_kernel(current);
-
- while (!test_bit(IO_WQ_BIT_EXIT, &wq->state)) {
- long ret;
-
- set_current_state(TASK_INTERRUPTIBLE);
- while (io_acct_run_queue(acct))
- io_worker_handle_work(worker);
-
- raw_spin_lock(&wqe->lock);
- /* timed out, exit unless we're the last worker */
- if (last_timeout && acct->nr_workers > 1) {
- acct->nr_workers--;
- raw_spin_unlock(&wqe->lock);
- __set_current_state(TASK_RUNNING);
- break;
- }
- last_timeout = false;
- __io_worker_idle(wqe, worker);
- raw_spin_unlock(&wqe->lock);
- if (io_flush_signals())
- continue;
- ret = schedule_timeout(WORKER_IDLE_TIMEOUT);
- if (signal_pending(current)) {
- struct ksignal ksig;
-
- if (!get_signal(&ksig))
- continue;
- break;
- }
- last_timeout = !ret;
- }
-
- if (test_bit(IO_WQ_BIT_EXIT, &wq->state))
- io_worker_handle_work(worker);
-
- audit_free(current);
- io_worker_exit(worker);
- return 0;
-}
-
-/*
- * Called when a worker is scheduled in. Mark us as currently running.
- */
-void io_wq_worker_running(struct task_struct *tsk)
-{
- struct io_worker *worker = tsk->worker_private;
-
- if (!worker)
- return;
- if (!(worker->flags & IO_WORKER_F_UP))
- return;
- if (worker->flags & IO_WORKER_F_RUNNING)
- return;
- worker->flags |= IO_WORKER_F_RUNNING;
- io_wqe_inc_running(worker);
-}
-
-/*
- * Called when worker is going to sleep. If there are no workers currently
- * running and we have work pending, wake up a free one or create a new one.
- */
-void io_wq_worker_sleeping(struct task_struct *tsk)
-{
- struct io_worker *worker = tsk->worker_private;
-
- if (!worker)
- return;
- if (!(worker->flags & IO_WORKER_F_UP))
- return;
- if (!(worker->flags & IO_WORKER_F_RUNNING))
- return;
-
- worker->flags &= ~IO_WORKER_F_RUNNING;
- io_wqe_dec_running(worker);
-}
-
-static void io_init_new_worker(struct io_wqe *wqe, struct io_worker *worker,
- struct task_struct *tsk)
-{
- tsk->worker_private = worker;
- worker->task = tsk;
- set_cpus_allowed_ptr(tsk, wqe->cpu_mask);
- tsk->flags |= PF_NO_SETAFFINITY;
-
- raw_spin_lock(&wqe->lock);
- hlist_nulls_add_head_rcu(&worker->nulls_node, &wqe->free_list);
- list_add_tail_rcu(&worker->all_list, &wqe->all_list);
- worker->flags |= IO_WORKER_F_FREE;
- raw_spin_unlock(&wqe->lock);
- wake_up_new_task(tsk);
-}
-
-static bool io_wq_work_match_all(struct io_wq_work *work, void *data)
-{
- return true;
-}
-
-static inline bool io_should_retry_thread(long err)
-{
- /*
- * Prevent perpetual task_work retry, if the task (or its group) is
- * exiting.
- */
- if (fatal_signal_pending(current))
- return false;
-
- switch (err) {
- case -EAGAIN:
- case -ERESTARTSYS:
- case -ERESTARTNOINTR:
- case -ERESTARTNOHAND:
- return true;
- default:
- return false;
- }
-}
-
-static void create_worker_cont(struct callback_head *cb)
-{
- struct io_worker *worker;
- struct task_struct *tsk;
- struct io_wqe *wqe;
-
- worker = container_of(cb, struct io_worker, create_work);
- clear_bit_unlock(0, &worker->create_state);
- wqe = worker->wqe;
- tsk = create_io_thread(io_wqe_worker, worker, wqe->node);
- if (!IS_ERR(tsk)) {
- io_init_new_worker(wqe, worker, tsk);
- io_worker_release(worker);
- return;
- } else if (!io_should_retry_thread(PTR_ERR(tsk))) {
- struct io_wqe_acct *acct = io_wqe_get_acct(worker);
-
- atomic_dec(&acct->nr_running);
- raw_spin_lock(&wqe->lock);
- acct->nr_workers--;
- if (!acct->nr_workers) {
- struct io_cb_cancel_data match = {
- .fn = io_wq_work_match_all,
- .cancel_all = true,
- };
-
- raw_spin_unlock(&wqe->lock);
- while (io_acct_cancel_pending_work(wqe, acct, &match))
- ;
- } else {
- raw_spin_unlock(&wqe->lock);
- }
- io_worker_ref_put(wqe->wq);
- kfree(worker);
- return;
- }
-
- /* re-create attempts grab a new worker ref, drop the existing one */
- io_worker_release(worker);
- schedule_work(&worker->work);
-}
-
-static void io_workqueue_create(struct work_struct *work)
-{
- struct io_worker *worker = container_of(work, struct io_worker, work);
- struct io_wqe_acct *acct = io_wqe_get_acct(worker);
-
- if (!io_queue_worker_create(worker, acct, create_worker_cont))
- kfree(worker);
-}
-
-static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index)
-{
- struct io_wqe_acct *acct = &wqe->acct[index];
- struct io_worker *worker;
- struct task_struct *tsk;
-
- __set_current_state(TASK_RUNNING);
-
- worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, wqe->node);
- if (!worker) {
-fail:
- atomic_dec(&acct->nr_running);
- raw_spin_lock(&wqe->lock);
- acct->nr_workers--;
- raw_spin_unlock(&wqe->lock);
- io_worker_ref_put(wq);
- return false;
- }
-
- refcount_set(&worker->ref, 1);
- worker->wqe = wqe;
- raw_spin_lock_init(&worker->lock);
- init_completion(&worker->ref_done);
-
- if (index == IO_WQ_ACCT_BOUND)
- worker->flags |= IO_WORKER_F_BOUND;
-
- tsk = create_io_thread(io_wqe_worker, worker, wqe->node);
- if (!IS_ERR(tsk)) {
- io_init_new_worker(wqe, worker, tsk);
- } else if (!io_should_retry_thread(PTR_ERR(tsk))) {
- kfree(worker);
- goto fail;
- } else {
- INIT_WORK(&worker->work, io_workqueue_create);
- schedule_work(&worker->work);
- }
-
- return true;
-}
-
-/*
- * Iterate the passed in list and call the specific function for each
- * worker that isn't exiting
- */
-static bool io_wq_for_each_worker(struct io_wqe *wqe,
- bool (*func)(struct io_worker *, void *),
- void *data)
-{
- struct io_worker *worker;
- bool ret = false;
-
- list_for_each_entry_rcu(worker, &wqe->all_list, all_list) {
- if (io_worker_get(worker)) {
- /* no task if node is/was offline */
- if (worker->task)
- ret = func(worker, data);
- io_worker_release(worker);
- if (ret)
- break;
- }
- }
-
- return ret;
-}
-
-static bool io_wq_worker_wake(struct io_worker *worker, void *data)
-{
- set_notify_signal(worker->task);
- wake_up_process(worker->task);
- return false;
-}
-
-static void io_run_cancel(struct io_wq_work *work, struct io_wqe *wqe)
-{
- struct io_wq *wq = wqe->wq;
-
- do {
- work->flags |= IO_WQ_WORK_CANCEL;
- wq->do_work(work);
- work = wq->free_work(work);
- } while (work);
-}
-
-static void io_wqe_insert_work(struct io_wqe *wqe, struct io_wq_work *work)
-{
- struct io_wqe_acct *acct = io_work_get_acct(wqe, work);
- unsigned int hash;
- struct io_wq_work *tail;
-
- if (!io_wq_is_hashed(work)) {
-append:
- wq_list_add_tail(&work->list, &acct->work_list);
- return;
- }
-
- hash = io_get_work_hash(work);
- tail = wqe->hash_tail[hash];
- wqe->hash_tail[hash] = work;
- if (!tail)
- goto append;
-
- wq_list_add_after(&work->list, &tail->list, &acct->work_list);
-}
-
-static bool io_wq_work_match_item(struct io_wq_work *work, void *data)
-{
- return work == data;
-}
-
-static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work)
-{
- struct io_wqe_acct *acct = io_work_get_acct(wqe, work);
- struct io_cb_cancel_data match;
- unsigned work_flags = work->flags;
- bool do_create;
-
- /*
- * If io-wq is exiting for this task, or if the request has explicitly
- * been marked as one that should not get executed, cancel it here.
- */
- if (test_bit(IO_WQ_BIT_EXIT, &wqe->wq->state) ||
- (work->flags & IO_WQ_WORK_CANCEL)) {
- io_run_cancel(work, wqe);
- return;
- }
-
- raw_spin_lock(&acct->lock);
- io_wqe_insert_work(wqe, work);
- clear_bit(IO_ACCT_STALLED_BIT, &acct->flags);
- raw_spin_unlock(&acct->lock);
-
- raw_spin_lock(&wqe->lock);
- rcu_read_lock();
- do_create = !io_wqe_activate_free_worker(wqe, acct);
- rcu_read_unlock();
-
- raw_spin_unlock(&wqe->lock);
-
- if (do_create && ((work_flags & IO_WQ_WORK_CONCURRENT) ||
- !atomic_read(&acct->nr_running))) {
- bool did_create;
-
- did_create = io_wqe_create_worker(wqe, acct);
- if (likely(did_create))
- return;
-
- raw_spin_lock(&wqe->lock);
- if (acct->nr_workers) {
- raw_spin_unlock(&wqe->lock);
- return;
- }
- raw_spin_unlock(&wqe->lock);
-
- /* fatal condition, failed to create the first worker */
- match.fn = io_wq_work_match_item,
- match.data = work,
- match.cancel_all = false,
-
- io_acct_cancel_pending_work(wqe, acct, &match);
- }
-}
-
-void io_wq_enqueue(struct io_wq *wq, struct io_wq_work *work)
-{
- struct io_wqe *wqe = wq->wqes[numa_node_id()];
-
- io_wqe_enqueue(wqe, work);
-}
-
-/*
- * Work items that hash to the same value will not be done in parallel.
- * Used to limit concurrent writes, generally hashed by inode.
- */
-void io_wq_hash_work(struct io_wq_work *work, void *val)
-{
- unsigned int bit;
-
- bit = hash_ptr(val, IO_WQ_HASH_ORDER);
- work->flags |= (IO_WQ_WORK_HASHED | (bit << IO_WQ_HASH_SHIFT));
-}
-
-static bool __io_wq_worker_cancel(struct io_worker *worker,
- struct io_cb_cancel_data *match,
- struct io_wq_work *work)
-{
- if (work && match->fn(work, match->data)) {
- work->flags |= IO_WQ_WORK_CANCEL;
- set_notify_signal(worker->task);
- return true;
- }
-
- return false;
-}
-
-static bool io_wq_worker_cancel(struct io_worker *worker, void *data)
-{
- struct io_cb_cancel_data *match = data;
-
- /*
- * Hold the lock to avoid ->cur_work going out of scope, caller
- * may dereference the passed in work.
- */
- raw_spin_lock(&worker->lock);
- if (__io_wq_worker_cancel(worker, match, worker->cur_work) ||
- __io_wq_worker_cancel(worker, match, worker->next_work))
- match->nr_running++;
- raw_spin_unlock(&worker->lock);
-
- return match->nr_running && !match->cancel_all;
-}
-
-static inline void io_wqe_remove_pending(struct io_wqe *wqe,
- struct io_wq_work *work,
- struct io_wq_work_node *prev)
-{
- struct io_wqe_acct *acct = io_work_get_acct(wqe, work);
- unsigned int hash = io_get_work_hash(work);
- struct io_wq_work *prev_work = NULL;
-
- if (io_wq_is_hashed(work) && work == wqe->hash_tail[hash]) {
- if (prev)
- prev_work = container_of(prev, struct io_wq_work, list);
- if (prev_work && io_get_work_hash(prev_work) == hash)
- wqe->hash_tail[hash] = prev_work;
- else
- wqe->hash_tail[hash] = NULL;
- }
- wq_list_del(&acct->work_list, &work->list, prev);
-}
-
-static bool io_acct_cancel_pending_work(struct io_wqe *wqe,
- struct io_wqe_acct *acct,
- struct io_cb_cancel_data *match)
-{
- struct io_wq_work_node *node, *prev;
- struct io_wq_work *work;
-
- raw_spin_lock(&acct->lock);
- wq_list_for_each(node, prev, &acct->work_list) {
- work = container_of(node, struct io_wq_work, list);
- if (!match->fn(work, match->data))
- continue;
- io_wqe_remove_pending(wqe, work, prev);
- raw_spin_unlock(&acct->lock);
- io_run_cancel(work, wqe);
- match->nr_pending++;
- /* not safe to continue after unlock */
- return true;
- }
- raw_spin_unlock(&acct->lock);
-
- return false;
-}
-
-static void io_wqe_cancel_pending_work(struct io_wqe *wqe,
- struct io_cb_cancel_data *match)
-{
- int i;
-retry:
- for (i = 0; i < IO_WQ_ACCT_NR; i++) {
- struct io_wqe_acct *acct = io_get_acct(wqe, i == 0);
-
- if (io_acct_cancel_pending_work(wqe, acct, match)) {
- if (match->cancel_all)
- goto retry;
- break;
- }
- }
-}
-
-static void io_wqe_cancel_running_work(struct io_wqe *wqe,
- struct io_cb_cancel_data *match)
-{
- rcu_read_lock();
- io_wq_for_each_worker(wqe, io_wq_worker_cancel, match);
- rcu_read_unlock();
-}
-
-enum io_wq_cancel io_wq_cancel_cb(struct io_wq *wq, work_cancel_fn *cancel,
- void *data, bool cancel_all)
-{
- struct io_cb_cancel_data match = {
- .fn = cancel,
- .data = data,
- .cancel_all = cancel_all,
- };
- int node;
-
- /*
- * First check pending list, if we're lucky we can just remove it
- * from there. CANCEL_OK means that the work is returned as-new,
- * no completion will be posted for it.
- *
- * Then check if a free (going busy) or busy worker has the work
- * currently running. If we find it there, we'll return CANCEL_RUNNING
- * as an indication that we attempt to signal cancellation. The
- * completion will run normally in this case.
- *
- * Do both of these while holding the wqe->lock, to ensure that
- * we'll find a work item regardless of state.
- */
- for_each_node(node) {
- struct io_wqe *wqe = wq->wqes[node];
-
- io_wqe_cancel_pending_work(wqe, &match);
- if (match.nr_pending && !match.cancel_all)
- return IO_WQ_CANCEL_OK;
-
- raw_spin_lock(&wqe->lock);
- io_wqe_cancel_running_work(wqe, &match);
- raw_spin_unlock(&wqe->lock);
- if (match.nr_running && !match.cancel_all)
- return IO_WQ_CANCEL_RUNNING;
- }
-
- if (match.nr_running)
- return IO_WQ_CANCEL_RUNNING;
- if (match.nr_pending)
- return IO_WQ_CANCEL_OK;
- return IO_WQ_CANCEL_NOTFOUND;
-}
-
-static int io_wqe_hash_wake(struct wait_queue_entry *wait, unsigned mode,
- int sync, void *key)
-{
- struct io_wqe *wqe = container_of(wait, struct io_wqe, wait);
- int i;
-
- list_del_init(&wait->entry);
-
- rcu_read_lock();
- for (i = 0; i < IO_WQ_ACCT_NR; i++) {
- struct io_wqe_acct *acct = &wqe->acct[i];
-
- if (test_and_clear_bit(IO_ACCT_STALLED_BIT, &acct->flags))
- io_wqe_activate_free_worker(wqe, acct);
- }
- rcu_read_unlock();
- return 1;
-}
-
-struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
-{
- int ret, node, i;
- struct io_wq *wq;
-
- if (WARN_ON_ONCE(!data->free_work || !data->do_work))
- return ERR_PTR(-EINVAL);
- if (WARN_ON_ONCE(!bounded))
- return ERR_PTR(-EINVAL);
-
- wq = kzalloc(struct_size(wq, wqes, nr_node_ids), GFP_KERNEL);
- if (!wq)
- return ERR_PTR(-ENOMEM);
- ret = cpuhp_state_add_instance_nocalls(io_wq_online, &wq->cpuhp_node);
- if (ret)
- goto err_wq;
-
- refcount_inc(&data->hash->refs);
- wq->hash = data->hash;
- wq->free_work = data->free_work;
- wq->do_work = data->do_work;
-
- ret = -ENOMEM;
- for_each_node(node) {
- struct io_wqe *wqe;
- int alloc_node = node;
-
- if (!node_online(alloc_node))
- alloc_node = NUMA_NO_NODE;
- wqe = kzalloc_node(sizeof(struct io_wqe), GFP_KERNEL, alloc_node);
- if (!wqe)
- goto err;
- if (!alloc_cpumask_var(&wqe->cpu_mask, GFP_KERNEL))
- goto err;
- cpumask_copy(wqe->cpu_mask, cpumask_of_node(node));
- wq->wqes[node] = wqe;
- wqe->node = alloc_node;
- wqe->acct[IO_WQ_ACCT_BOUND].max_workers = bounded;
- wqe->acct[IO_WQ_ACCT_UNBOUND].max_workers =
- task_rlimit(current, RLIMIT_NPROC);
- INIT_LIST_HEAD(&wqe->wait.entry);
- wqe->wait.func = io_wqe_hash_wake;
- for (i = 0; i < IO_WQ_ACCT_NR; i++) {
- struct io_wqe_acct *acct = &wqe->acct[i];
-
- acct->index = i;
- atomic_set(&acct->nr_running, 0);
- INIT_WQ_LIST(&acct->work_list);
- raw_spin_lock_init(&acct->lock);
- }
- wqe->wq = wq;
- raw_spin_lock_init(&wqe->lock);
- INIT_HLIST_NULLS_HEAD(&wqe->free_list, 0);
- INIT_LIST_HEAD(&wqe->all_list);
- }
-
- wq->task = get_task_struct(data->task);
- atomic_set(&wq->worker_refs, 1);
- init_completion(&wq->worker_done);
- return wq;
-err:
- io_wq_put_hash(data->hash);
- cpuhp_state_remove_instance_nocalls(io_wq_online, &wq->cpuhp_node);
- for_each_node(node) {
- if (!wq->wqes[node])
- continue;
- free_cpumask_var(wq->wqes[node]->cpu_mask);
- kfree(wq->wqes[node]);
- }
-err_wq:
- kfree(wq);
- return ERR_PTR(ret);
-}
-
-static bool io_task_work_match(struct callback_head *cb, void *data)
-{
- struct io_worker *worker;
-
- if (cb->func != create_worker_cb && cb->func != create_worker_cont)
- return false;
- worker = container_of(cb, struct io_worker, create_work);
- return worker->wqe->wq == data;
-}
-
-void io_wq_exit_start(struct io_wq *wq)
-{
- set_bit(IO_WQ_BIT_EXIT, &wq->state);
-}
-
-static void io_wq_cancel_tw_create(struct io_wq *wq)
-{
- struct callback_head *cb;
-
- while ((cb = task_work_cancel_match(wq->task, io_task_work_match, wq)) != NULL) {
- struct io_worker *worker;
-
- worker = container_of(cb, struct io_worker, create_work);
- io_worker_cancel_cb(worker);
- }
-}
-
-static void io_wq_exit_workers(struct io_wq *wq)
-{
- int node;
-
- if (!wq->task)
- return;
-
- io_wq_cancel_tw_create(wq);
-
- rcu_read_lock();
- for_each_node(node) {
- struct io_wqe *wqe = wq->wqes[node];
-
- io_wq_for_each_worker(wqe, io_wq_worker_wake, NULL);
- }
- rcu_read_unlock();
- io_worker_ref_put(wq);
- wait_for_completion(&wq->worker_done);
-
- for_each_node(node) {
- spin_lock_irq(&wq->hash->wait.lock);
- list_del_init(&wq->wqes[node]->wait.entry);
- spin_unlock_irq(&wq->hash->wait.lock);
- }
- put_task_struct(wq->task);
- wq->task = NULL;
-}
-
-static void io_wq_destroy(struct io_wq *wq)
-{
- int node;
-
- cpuhp_state_remove_instance_nocalls(io_wq_online, &wq->cpuhp_node);
-
- for_each_node(node) {
- struct io_wqe *wqe = wq->wqes[node];
- struct io_cb_cancel_data match = {
- .fn = io_wq_work_match_all,
- .cancel_all = true,
- };
- io_wqe_cancel_pending_work(wqe, &match);
- free_cpumask_var(wqe->cpu_mask);
- kfree(wqe);
- }
- io_wq_put_hash(wq->hash);
- kfree(wq);
-}
-
-void io_wq_put_and_exit(struct io_wq *wq)
-{
- WARN_ON_ONCE(!test_bit(IO_WQ_BIT_EXIT, &wq->state));
-
- io_wq_exit_workers(wq);
- io_wq_destroy(wq);
-}
-
-struct online_data {
- unsigned int cpu;
- bool online;
-};
-
-static bool io_wq_worker_affinity(struct io_worker *worker, void *data)
-{
- struct online_data *od = data;
-
- if (od->online)
- cpumask_set_cpu(od->cpu, worker->wqe->cpu_mask);
- else
- cpumask_clear_cpu(od->cpu, worker->wqe->cpu_mask);
- return false;
-}
-
-static int __io_wq_cpu_online(struct io_wq *wq, unsigned int cpu, bool online)
-{
- struct online_data od = {
- .cpu = cpu,
- .online = online
- };
- int i;
-
- rcu_read_lock();
- for_each_node(i)
- io_wq_for_each_worker(wq->wqes[i], io_wq_worker_affinity, &od);
- rcu_read_unlock();
- return 0;
-}
-
-static int io_wq_cpu_online(unsigned int cpu, struct hlist_node *node)
-{
- struct io_wq *wq = hlist_entry_safe(node, struct io_wq, cpuhp_node);
-
- return __io_wq_cpu_online(wq, cpu, true);
-}
-
-static int io_wq_cpu_offline(unsigned int cpu, struct hlist_node *node)
-{
- struct io_wq *wq = hlist_entry_safe(node, struct io_wq, cpuhp_node);
-
- return __io_wq_cpu_online(wq, cpu, false);
-}
-
-int io_wq_cpu_affinity(struct io_wq *wq, cpumask_var_t mask)
-{
- int i;
-
- rcu_read_lock();
- for_each_node(i) {
- struct io_wqe *wqe = wq->wqes[i];
-
- if (mask)
- cpumask_copy(wqe->cpu_mask, mask);
- else
- cpumask_copy(wqe->cpu_mask, cpumask_of_node(i));
- }
- rcu_read_unlock();
- return 0;
-}
-
-/*
- * Set max number of unbounded workers, returns old value. If new_count is 0,
- * then just return the old value.
- */
-int io_wq_max_workers(struct io_wq *wq, int *new_count)
-{
- int prev[IO_WQ_ACCT_NR];
- bool first_node = true;
- int i, node;
-
- BUILD_BUG_ON((int) IO_WQ_ACCT_BOUND != (int) IO_WQ_BOUND);
- BUILD_BUG_ON((int) IO_WQ_ACCT_UNBOUND != (int) IO_WQ_UNBOUND);
- BUILD_BUG_ON((int) IO_WQ_ACCT_NR != 2);
-
- for (i = 0; i < IO_WQ_ACCT_NR; i++) {
- if (new_count[i] > task_rlimit(current, RLIMIT_NPROC))
- new_count[i] = task_rlimit(current, RLIMIT_NPROC);
- }
-
- for (i = 0; i < IO_WQ_ACCT_NR; i++)
- prev[i] = 0;
-
- rcu_read_lock();
- for_each_node(node) {
- struct io_wqe *wqe = wq->wqes[node];
- struct io_wqe_acct *acct;
-
- raw_spin_lock(&wqe->lock);
- for (i = 0; i < IO_WQ_ACCT_NR; i++) {
- acct = &wqe->acct[i];
- if (first_node)
- prev[i] = max_t(int, acct->max_workers, prev[i]);
- if (new_count[i])
- acct->max_workers = new_count[i];
- }
- raw_spin_unlock(&wqe->lock);
- first_node = false;
- }
- rcu_read_unlock();
-
- for (i = 0; i < IO_WQ_ACCT_NR; i++)
- new_count[i] = prev[i];
-
- return 0;
-}
-
-static __init int io_wq_init(void)
-{
- int ret;
-
- ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "io-wq/online",
- io_wq_cpu_online, io_wq_cpu_offline);
- if (ret < 0)
- return ret;
- io_wq_online = ret;
- return 0;
-}
-subsys_initcall(io_wq_init);
diff --git a/fs/io-wq.h b/fs/io-wq.h
deleted file mode 100644
index dbecd27656c7..000000000000
--- a/fs/io-wq.h
+++ /dev/null
@@ -1,227 +0,0 @@
-#ifndef INTERNAL_IO_WQ_H
-#define INTERNAL_IO_WQ_H
-
-#include <linux/refcount.h>
-
-struct io_wq;
-
-enum {
- IO_WQ_WORK_CANCEL = 1,
- IO_WQ_WORK_HASHED = 2,
- IO_WQ_WORK_UNBOUND = 4,
- IO_WQ_WORK_CONCURRENT = 16,
-
- IO_WQ_HASH_SHIFT = 24, /* upper 8 bits are used for hash key */
-};
-
-enum io_wq_cancel {
- IO_WQ_CANCEL_OK, /* cancelled before started */
- IO_WQ_CANCEL_RUNNING, /* found, running, and attempted cancelled */
- IO_WQ_CANCEL_NOTFOUND, /* work not found */
-};
-
-struct io_wq_work_node {
- struct io_wq_work_node *next;
-};
-
-struct io_wq_work_list {
- struct io_wq_work_node *first;
- struct io_wq_work_node *last;
-};
-
-#define wq_list_for_each(pos, prv, head) \
- for (pos = (head)->first, prv = NULL; pos; prv = pos, pos = (pos)->next)
-
-#define wq_list_for_each_resume(pos, prv) \
- for (; pos; prv = pos, pos = (pos)->next)
-
-#define wq_list_empty(list) (READ_ONCE((list)->first) == NULL)
-#define INIT_WQ_LIST(list) do { \
- (list)->first = NULL; \
-} while (0)
-
-static inline void wq_list_add_after(struct io_wq_work_node *node,
- struct io_wq_work_node *pos,
- struct io_wq_work_list *list)
-{
- struct io_wq_work_node *next = pos->next;
-
- pos->next = node;
- node->next = next;
- if (!next)
- list->last = node;
-}
-
-/**
- * wq_list_merge - merge the second list to the first one.
- * @list0: the first list
- * @list1: the second list
- * Return the first node after mergence.
- */
-static inline struct io_wq_work_node *wq_list_merge(struct io_wq_work_list *list0,
- struct io_wq_work_list *list1)
-{
- struct io_wq_work_node *ret;
-
- if (!list0->first) {
- ret = list1->first;
- } else {
- ret = list0->first;
- list0->last->next = list1->first;
- }
- INIT_WQ_LIST(list0);
- INIT_WQ_LIST(list1);
- return ret;
-}
-
-static inline void wq_list_add_tail(struct io_wq_work_node *node,
- struct io_wq_work_list *list)
-{
- node->next = NULL;
- if (!list->first) {
- list->last = node;
- WRITE_ONCE(list->first, node);
- } else {
- list->last->next = node;
- list->last = node;
- }
-}
-
-static inline void wq_list_add_head(struct io_wq_work_node *node,
- struct io_wq_work_list *list)
-{
- node->next = list->first;
- if (!node->next)
- list->last = node;
- WRITE_ONCE(list->first, node);
-}
-
-static inline void wq_list_cut(struct io_wq_work_list *list,
- struct io_wq_work_node *last,
- struct io_wq_work_node *prev)
-{
- /* first in the list, if prev==NULL */
- if (!prev)
- WRITE_ONCE(list->first, last->next);
- else
- prev->next = last->next;
-
- if (last == list->last)
- list->last = prev;
- last->next = NULL;
-}
-
-static inline void __wq_list_splice(struct io_wq_work_list *list,
- struct io_wq_work_node *to)
-{
- list->last->next = to->next;
- to->next = list->first;
- INIT_WQ_LIST(list);
-}
-
-static inline bool wq_list_splice(struct io_wq_work_list *list,
- struct io_wq_work_node *to)
-{
- if (!wq_list_empty(list)) {
- __wq_list_splice(list, to);
- return true;
- }
- return false;
-}
-
-static inline void wq_stack_add_head(struct io_wq_work_node *node,
- struct io_wq_work_node *stack)
-{
- node->next = stack->next;
- stack->next = node;
-}
-
-static inline void wq_list_del(struct io_wq_work_list *list,
- struct io_wq_work_node *node,
- struct io_wq_work_node *prev)
-{
- wq_list_cut(list, node, prev);
-}
-
-static inline
-struct io_wq_work_node *wq_stack_extract(struct io_wq_work_node *stack)
-{
- struct io_wq_work_node *node = stack->next;
-
- stack->next = node->next;
- return node;
-}
-
-struct io_wq_work {
- struct io_wq_work_node list;
- unsigned flags;
-};
-
-static inline struct io_wq_work *wq_next_work(struct io_wq_work *work)
-{
- if (!work->list.next)
- return NULL;
-
- return container_of(work->list.next, struct io_wq_work, list);
-}
-
-typedef struct io_wq_work *(free_work_fn)(struct io_wq_work *);
-typedef void (io_wq_work_fn)(struct io_wq_work *);
-
-struct io_wq_hash {
- refcount_t refs;
- unsigned long map;
- struct wait_queue_head wait;
-};
-
-static inline void io_wq_put_hash(struct io_wq_hash *hash)
-{
- if (refcount_dec_and_test(&hash->refs))
- kfree(hash);
-}
-
-struct io_wq_data {
- struct io_wq_hash *hash;
- struct task_struct *task;
- io_wq_work_fn *do_work;
- free_work_fn *free_work;
-};
-
-struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data);
-void io_wq_exit_start(struct io_wq *wq);
-void io_wq_put_and_exit(struct io_wq *wq);
-
-void io_wq_enqueue(struct io_wq *wq, struct io_wq_work *work);
-void io_wq_hash_work(struct io_wq_work *work, void *val);
-
-int io_wq_cpu_affinity(struct io_wq *wq, cpumask_var_t mask);
-int io_wq_max_workers(struct io_wq *wq, int *new_count);
-
-static inline bool io_wq_is_hashed(struct io_wq_work *work)
-{
- return work->flags & IO_WQ_WORK_HASHED;
-}
-
-typedef bool (work_cancel_fn)(struct io_wq_work *, void *);
-
-enum io_wq_cancel io_wq_cancel_cb(struct io_wq *wq, work_cancel_fn *cancel,
- void *data, bool cancel_all);
-
-#if defined(CONFIG_IO_WQ)
-extern void io_wq_worker_sleeping(struct task_struct *);
-extern void io_wq_worker_running(struct task_struct *);
-#else
-static inline void io_wq_worker_sleeping(struct task_struct *tsk)
-{
-}
-static inline void io_wq_worker_running(struct task_struct *tsk)
-{
-}
-#endif
-
-static inline bool io_wq_current_is_worker(void)
-{
- return in_task() && (current->flags & PF_IO_WORKER) &&
- current->worker_private;
-}
-#endif
diff --git a/fs/io_uring.c b/fs/io_uring.c
deleted file mode 100644
index 3d97372e811e..000000000000
--- a/fs/io_uring.c
+++ /dev/null
@@ -1,11915 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Shared application/kernel submission and completion ring pairs, for
- * supporting fast/efficient IO.
- *
- * A note on the read/write ordering memory barriers that are matched between
- * the application and kernel side.
- *
- * After the application reads the CQ ring tail, it must use an
- * appropriate smp_rmb() to pair with the smp_wmb() the kernel uses
- * before writing the tail (using smp_load_acquire to read the tail will
- * do). It also needs a smp_mb() before updating CQ head (ordering the
- * entry load(s) with the head store), pairing with an implicit barrier
- * through a control-dependency in io_get_cqe (smp_store_release to
- * store head will do). Failure to do so could lead to reading invalid
- * CQ entries.
- *
- * Likewise, the application must use an appropriate smp_wmb() before
- * writing the SQ tail (ordering SQ entry stores with the tail store),
- * which pairs with smp_load_acquire in io_get_sqring (smp_store_release
- * to store the tail will do). And it needs a barrier ordering the SQ
- * head load before writing new SQ entries (smp_load_acquire to read
- * head will do).
- *
- * When using the SQ poll thread (IORING_SETUP_SQPOLL), the application
- * needs to check the SQ flags for IORING_SQ_NEED_WAKEUP *after*
- * updating the SQ tail; a full memory barrier smp_mb() is needed
- * between.
- *
- * Also see the examples in the liburing library:
- *
- * git://git.kernel.dk/liburing
- *
- * io_uring also uses READ/WRITE_ONCE() for _any_ store or load that happens
- * from data shared between the kernel and application. This is done both
- * for ordering purposes, but also to ensure that once a value is loaded from
- * data that the application could potentially modify, it remains stable.
- *
- * Copyright (C) 2018-2019 Jens Axboe
- * Copyright (c) 2018-2019 Christoph Hellwig
- */
-#include <linux/kernel.h>
-#include <linux/init.h>
-#include <linux/errno.h>
-#include <linux/syscalls.h>
-#include <linux/compat.h>
-#include <net/compat.h>
-#include <linux/refcount.h>
-#include <linux/uio.h>
-#include <linux/bits.h>
-
-#include <linux/sched/signal.h>
-#include <linux/fs.h>
-#include <linux/file.h>
-#include <linux/fdtable.h>
-#include <linux/mm.h>
-#include <linux/mman.h>
-#include <linux/percpu.h>
-#include <linux/slab.h>
-#include <linux/blk-mq.h>
-#include <linux/bvec.h>
-#include <linux/net.h>
-#include <net/sock.h>
-#include <net/af_unix.h>
-#include <net/scm.h>
-#include <linux/anon_inodes.h>
-#include <linux/sched/mm.h>
-#include <linux/uaccess.h>
-#include <linux/nospec.h>
-#include <linux/sizes.h>
-#include <linux/hugetlb.h>
-#include <linux/highmem.h>
-#include <linux/namei.h>
-#include <linux/fsnotify.h>
-#include <linux/fadvise.h>
-#include <linux/eventpoll.h>
-#include <linux/splice.h>
-#include <linux/task_work.h>
-#include <linux/pagemap.h>
-#include <linux/io_uring.h>
-#include <linux/audit.h>
-#include <linux/security.h>
-
-#define CREATE_TRACE_POINTS
-#include <trace/events/io_uring.h>
-
-#include <uapi/linux/io_uring.h>
-
-#include "internal.h"
-#include "io-wq.h"
-
-#define IORING_MAX_ENTRIES 32768
-#define IORING_MAX_CQ_ENTRIES (2 * IORING_MAX_ENTRIES)
-#define IORING_SQPOLL_CAP_ENTRIES_VALUE 8
-
-/* only define max */
-#define IORING_MAX_FIXED_FILES (1U << 15)
-#define IORING_MAX_RESTRICTIONS (IORING_RESTRICTION_LAST + \
- IORING_REGISTER_LAST + IORING_OP_LAST)
-
-#define IO_RSRC_TAG_TABLE_SHIFT (PAGE_SHIFT - 3)
-#define IO_RSRC_TAG_TABLE_MAX (1U << IO_RSRC_TAG_TABLE_SHIFT)
-#define IO_RSRC_TAG_TABLE_MASK (IO_RSRC_TAG_TABLE_MAX - 1)
-
-#define IORING_MAX_REG_BUFFERS (1U << 14)
-
-#define SQE_COMMON_FLAGS (IOSQE_FIXED_FILE | IOSQE_IO_LINK | \
- IOSQE_IO_HARDLINK | IOSQE_ASYNC)
-
-#define SQE_VALID_FLAGS (SQE_COMMON_FLAGS | IOSQE_BUFFER_SELECT | \
- IOSQE_IO_DRAIN | IOSQE_CQE_SKIP_SUCCESS)
-
-#define IO_REQ_CLEAN_FLAGS (REQ_F_BUFFER_SELECTED | REQ_F_NEED_CLEANUP | \
- REQ_F_POLLED | REQ_F_INFLIGHT | REQ_F_CREDS | \
- REQ_F_ASYNC_DATA)
-
-#define IO_TCTX_REFS_CACHE_NR (1U << 10)
-
-struct io_uring {
- u32 head ____cacheline_aligned_in_smp;
- u32 tail ____cacheline_aligned_in_smp;
-};
-
-/*
- * This data is shared with the application through the mmap at offsets
- * IORING_OFF_SQ_RING and IORING_OFF_CQ_RING.
- *
- * The offsets to the member fields are published through struct
- * io_sqring_offsets when calling io_uring_setup.
- */
-struct io_rings {
- /*
- * Head and tail offsets into the ring; the offsets need to be
- * masked to get valid indices.
- *
- * The kernel controls head of the sq ring and the tail of the cq ring,
- * and the application controls tail of the sq ring and the head of the
- * cq ring.
- */
- struct io_uring sq, cq;
- /*
- * Bitmasks to apply to head and tail offsets (constant, equals
- * ring_entries - 1)
- */
- u32 sq_ring_mask, cq_ring_mask;
- /* Ring sizes (constant, power of 2) */
- u32 sq_ring_entries, cq_ring_entries;
- /*
- * Number of invalid entries dropped by the kernel due to
- * invalid index stored in array
- *
- * Written by the kernel, shouldn't be modified by the
- * application (i.e. get number of "new events" by comparing to
- * cached value).
- *
- * After a new SQ head value was read by the application this
- * counter includes all submissions that were dropped reaching
- * the new SQ head (and possibly more).
- */
- u32 sq_dropped;
- /*
- * Runtime SQ flags
- *
- * Written by the kernel, shouldn't be modified by the
- * application.
- *
- * The application needs a full memory barrier before checking
- * for IORING_SQ_NEED_WAKEUP after updating the sq tail.
- */
- u32 sq_flags;
- /*
- * Runtime CQ flags
- *
- * Written by the application, shouldn't be modified by the
- * kernel.
- */
- u32 cq_flags;
- /*
- * Number of completion events lost because the queue was full;
- * this should be avoided by the application by making sure
- * there are not more requests pending than there is space in
- * the completion queue.
- *
- * Written by the kernel, shouldn't be modified by the
- * application (i.e. get number of "new events" by comparing to
- * cached value).
- *
- * As completion events come in out of order this counter is not
- * ordered with any other data.
- */
- u32 cq_overflow;
- /*
- * Ring buffer of completion events.
- *
- * The kernel writes completion events fresh every time they are
- * produced, so the application is allowed to modify pending
- * entries.
- */
- struct io_uring_cqe cqes[] ____cacheline_aligned_in_smp;
-};
-
-enum io_uring_cmd_flags {
- IO_URING_F_COMPLETE_DEFER = 1,
- IO_URING_F_UNLOCKED = 2,
- /* int's last bit, sign checks are usually faster than a bit test */
- IO_URING_F_NONBLOCK = INT_MIN,
-};
-
-struct io_mapped_ubuf {
- u64 ubuf;
- u64 ubuf_end;
- unsigned int nr_bvecs;
- unsigned long acct_pages;
- struct bio_vec bvec[];
-};
-
-struct io_ring_ctx;
-
-struct io_overflow_cqe {
- struct io_uring_cqe cqe;
- struct list_head list;
-};
-
-struct io_fixed_file {
- /* file * with additional FFS_* flags */
- unsigned long file_ptr;
-};
-
-struct io_rsrc_put {
- struct list_head list;
- u64 tag;
- union {
- void *rsrc;
- struct file *file;
- struct io_mapped_ubuf *buf;
- };
-};
-
-struct io_file_table {
- struct io_fixed_file *files;
-};
-
-struct io_rsrc_node {
- struct percpu_ref refs;
- struct list_head node;
- struct list_head rsrc_list;
- struct io_rsrc_data *rsrc_data;
- struct llist_node llist;
- bool done;
-};
-
-typedef void (rsrc_put_fn)(struct io_ring_ctx *ctx, struct io_rsrc_put *prsrc);
-
-struct io_rsrc_data {
- struct io_ring_ctx *ctx;
-
- u64 **tags;
- unsigned int nr;
- rsrc_put_fn *do_put;
- atomic_t refs;
- struct completion done;
- bool quiesce;
-};
-
-struct io_buffer_list {
- struct list_head list;
- struct list_head buf_list;
- __u16 bgid;
-};
-
-struct io_buffer {
- struct list_head list;
- __u64 addr;
- __u32 len;
- __u16 bid;
- __u16 bgid;
-};
-
-struct io_restriction {
- DECLARE_BITMAP(register_op, IORING_REGISTER_LAST);
- DECLARE_BITMAP(sqe_op, IORING_OP_LAST);
- u8 sqe_flags_allowed;
- u8 sqe_flags_required;
- bool registered;
-};
-
-enum {
- IO_SQ_THREAD_SHOULD_STOP = 0,
- IO_SQ_THREAD_SHOULD_PARK,
-};
-
-struct io_sq_data {
- refcount_t refs;
- atomic_t park_pending;
- struct mutex lock;
-
- /* ctx's that are using this sqd */
- struct list_head ctx_list;
-
- struct task_struct *thread;
- struct wait_queue_head wait;
-
- unsigned sq_thread_idle;
- int sq_cpu;
- pid_t task_pid;
- pid_t task_tgid;
-
- unsigned long state;
- struct completion exited;
-};
-
-#define IO_COMPL_BATCH 32
-#define IO_REQ_CACHE_SIZE 32
-#define IO_REQ_ALLOC_BATCH 8
-
-struct io_submit_link {
- struct io_kiocb *head;
- struct io_kiocb *last;
-};
-
-struct io_submit_state {
- /* inline/task_work completion list, under ->uring_lock */
- struct io_wq_work_node free_list;
- /* batch completion logic */
- struct io_wq_work_list compl_reqs;
- struct io_submit_link link;
-
- bool plug_started;
- bool need_plug;
- bool flush_cqes;
- unsigned short submit_nr;
- struct blk_plug plug;
-};
-
-struct io_ev_fd {
- struct eventfd_ctx *cq_ev_fd;
- unsigned int eventfd_async: 1;
- struct rcu_head rcu;
-};
-
-#define IO_BUFFERS_HASH_BITS 5
-
-struct io_ring_ctx {
- /* const or read-mostly hot data */
- struct {
- struct percpu_ref refs;
-
- struct io_rings *rings;
- unsigned int flags;
- unsigned int compat: 1;
- unsigned int drain_next: 1;
- unsigned int restricted: 1;
- unsigned int off_timeout_used: 1;
- unsigned int drain_active: 1;
- unsigned int drain_disabled: 1;
- unsigned int has_evfd: 1;
- } ____cacheline_aligned_in_smp;
-
- /* submission data */
- struct {
- struct mutex uring_lock;
-
- /*
- * Ring buffer of indices into array of io_uring_sqe, which is
- * mmapped by the application using the IORING_OFF_SQES offset.
- *
- * This indirection could e.g. be used to assign fixed
- * io_uring_sqe entries to operations and only submit them to
- * the queue when needed.
- *
- * The kernel modifies neither the indices array nor the entries
- * array.
- */
- u32 *sq_array;
- struct io_uring_sqe *sq_sqes;
- unsigned cached_sq_head;
- unsigned sq_entries;
- struct list_head defer_list;
-
- /*
- * Fixed resources fast path, should be accessed only under
- * uring_lock, and updated through io_uring_register(2)
- */
- struct io_rsrc_node *rsrc_node;
- int rsrc_cached_refs;
- struct io_file_table file_table;
- unsigned nr_user_files;
- unsigned nr_user_bufs;
- struct io_mapped_ubuf **user_bufs;
-
- struct io_submit_state submit_state;
- struct list_head timeout_list;
- struct list_head ltimeout_list;
- struct list_head cq_overflow_list;
- struct list_head *io_buffers;
- struct list_head io_buffers_cache;
- struct list_head apoll_cache;
- struct xarray personalities;
- u32 pers_next;
- unsigned sq_thread_idle;
- } ____cacheline_aligned_in_smp;
-
- /* IRQ completion list, under ->completion_lock */
- struct io_wq_work_list locked_free_list;
- unsigned int locked_free_nr;
-
- const struct cred *sq_creds; /* cred used for __io_sq_thread() */
- struct io_sq_data *sq_data; /* if using sq thread polling */
-
- struct wait_queue_head sqo_sq_wait;
- struct list_head sqd_list;
-
- unsigned long check_cq_overflow;
-
- struct {
- unsigned cached_cq_tail;
- unsigned cq_entries;
- struct io_ev_fd __rcu *io_ev_fd;
- struct wait_queue_head cq_wait;
- unsigned cq_extra;
- atomic_t cq_timeouts;
- unsigned cq_last_tm_flush;
- } ____cacheline_aligned_in_smp;
-
- struct {
- spinlock_t completion_lock;
-
- spinlock_t timeout_lock;
-
- /*
- * ->iopoll_list is protected by the ctx->uring_lock for
- * io_uring instances that don't use IORING_SETUP_SQPOLL.
- * For SQPOLL, only the single threaded io_sq_thread() will
- * manipulate the list, hence no extra locking is needed there.
- */
- struct io_wq_work_list iopoll_list;
- struct hlist_head *cancel_hash;
- unsigned cancel_hash_bits;
- bool poll_multi_queue;
-
- struct list_head io_buffers_comp;
- } ____cacheline_aligned_in_smp;
-
- struct io_restriction restrictions;
-
- /* slow path rsrc auxilary data, used by update/register */
- struct {
- struct io_rsrc_node *rsrc_backup_node;
- struct io_mapped_ubuf *dummy_ubuf;
- struct io_rsrc_data *file_data;
- struct io_rsrc_data *buf_data;
-
- struct delayed_work rsrc_put_work;
- struct llist_head rsrc_put_llist;
- struct list_head rsrc_ref_list;
- spinlock_t rsrc_ref_lock;
-
- struct list_head io_buffers_pages;
- };
-
- /* Keep this last, we don't need it for the fast path */
- struct {
- #if defined(CONFIG_UNIX)
- struct socket *ring_sock;
- #endif
- /* hashed buffered write serialization */
- struct io_wq_hash *hash_map;
-
- /* Only used for accounting purposes */
- struct user_struct *user;
- struct mm_struct *mm_account;
-
- /* ctx exit and cancelation */
- struct llist_head fallback_llist;
- struct delayed_work fallback_work;
- struct work_struct exit_work;
- struct list_head tctx_list;
- struct completion ref_comp;
- u32 iowq_limits[2];
- bool iowq_limits_set;
- };
-};
-
-/*
- * Arbitrary limit, can be raised if need be
- */
-#define IO_RINGFD_REG_MAX 16
-
-struct io_uring_task {
- /* submission side */
- int cached_refs;
- struct xarray xa;
- struct wait_queue_head wait;
- const struct io_ring_ctx *last;
- struct io_wq *io_wq;
- struct percpu_counter inflight;
- atomic_t inflight_tracked;
- atomic_t in_idle;
-
- spinlock_t task_lock;
- struct io_wq_work_list task_list;
- struct io_wq_work_list prior_task_list;
- struct callback_head task_work;
- struct file **registered_rings;
- bool task_running;
-};
-
-/*
- * First field must be the file pointer in all the
- * iocb unions! See also 'struct kiocb' in <linux/fs.h>
- */
-struct io_poll_iocb {
- struct file *file;
- struct wait_queue_head *head;
- __poll_t events;
- struct wait_queue_entry wait;
-};
-
-struct io_poll_update {
- struct file *file;
- u64 old_user_data;
- u64 new_user_data;
- __poll_t events;
- bool update_events;
- bool update_user_data;
-};
-
-struct io_close {
- struct file *file;
- int fd;
- u32 file_slot;
-};
-
-struct io_timeout_data {
- struct io_kiocb *req;
- struct hrtimer timer;
- struct timespec64 ts;
- enum hrtimer_mode mode;
- u32 flags;
-};
-
-struct io_accept {
- struct file *file;
- struct sockaddr __user *addr;
- int __user *addr_len;
- int flags;
- u32 file_slot;
- unsigned long nofile;
-};
-
-struct io_sync {
- struct file *file;
- loff_t len;
- loff_t off;
- int flags;
- int mode;
-};
-
-struct io_cancel {
- struct file *file;
- u64 addr;
-};
-
-struct io_timeout {
- struct file *file;
- u32 off;
- u32 target_seq;
- struct list_head list;
- /* head of the link, used by linked timeouts only */
- struct io_kiocb *head;
- /* for linked completions */
- struct io_kiocb *prev;
-};
-
-struct io_timeout_rem {
- struct file *file;
- u64 addr;
-
- /* timeout update */
- struct timespec64 ts;
- u32 flags;
- bool ltimeout;
-};
-
-struct io_rw {
- /* NOTE: kiocb has the file as the first member, so don't do it here */
- struct kiocb kiocb;
- u64 addr;
- u32 len;
- u32 flags;
-};
-
-struct io_connect {
- struct file *file;
- struct sockaddr __user *addr;
- int addr_len;
-};
-
-struct io_sr_msg {
- struct file *file;
- union {
- struct compat_msghdr __user *umsg_compat;
- struct user_msghdr __user *umsg;
- void __user *buf;
- };
- int msg_flags;
- int bgid;
- size_t len;
- size_t done_io;
-};
-
-struct io_open {
- struct file *file;
- int dfd;
- u32 file_slot;
- struct filename *filename;
- struct open_how how;
- unsigned long nofile;
-};
-
-struct io_rsrc_update {
- struct file *file;
- u64 arg;
- u32 nr_args;
- u32 offset;
-};
-
-struct io_fadvise {
- struct file *file;
- u64 offset;
- u32 len;
- u32 advice;
-};
-
-struct io_madvise {
- struct file *file;
- u64 addr;
- u32 len;
- u32 advice;
-};
-
-struct io_epoll {
- struct file *file;
- int epfd;
- int op;
- int fd;
- struct epoll_event event;
-};
-
-struct io_splice {
- struct file *file_out;
- loff_t off_out;
- loff_t off_in;
- u64 len;
- int splice_fd_in;
- unsigned int flags;
-};
-
-struct io_provide_buf {
- struct file *file;
- __u64 addr;
- __u32 len;
- __u32 bgid;
- __u16 nbufs;
- __u16 bid;
-};
-
-struct io_statx {
- struct file *file;
- int dfd;
- unsigned int mask;
- unsigned int flags;
- struct filename *filename;
- struct statx __user *buffer;
-};
-
-struct io_shutdown {
- struct file *file;
- int how;
-};
-
-struct io_rename {
- struct file *file;
- int old_dfd;
- int new_dfd;
- struct filename *oldpath;
- struct filename *newpath;
- int flags;
-};
-
-struct io_unlink {
- struct file *file;
- int dfd;
- int flags;
- struct filename *filename;
-};
-
-struct io_mkdir {
- struct file *file;
- int dfd;
- umode_t mode;
- struct filename *filename;
-};
-
-struct io_symlink {
- struct file *file;
- int new_dfd;
- struct filename *oldpath;
- struct filename *newpath;
-};
-
-struct io_hardlink {
- struct file *file;
- int old_dfd;
- int new_dfd;
- struct filename *oldpath;
- struct filename *newpath;
- int flags;
-};
-
-struct io_msg {
- struct file *file;
- u64 user_data;
- u32 len;
-};
-
-struct io_async_connect {
- struct sockaddr_storage address;
-};
-
-struct io_async_msghdr {
- struct iovec fast_iov[UIO_FASTIOV];
- /* points to an allocated iov, if NULL we use fast_iov instead */
- struct iovec *free_iov;
- struct sockaddr __user *uaddr;
- struct msghdr msg;
- struct sockaddr_storage addr;
-};
-
-struct io_rw_state {
- struct iov_iter iter;
- struct iov_iter_state iter_state;
- struct iovec fast_iov[UIO_FASTIOV];
-};
-
-struct io_async_rw {
- struct io_rw_state s;
- const struct iovec *free_iovec;
- size_t bytes_done;
- struct wait_page_queue wpq;
-};
-
-enum {
- REQ_F_FIXED_FILE_BIT = IOSQE_FIXED_FILE_BIT,
- REQ_F_IO_DRAIN_BIT = IOSQE_IO_DRAIN_BIT,
- REQ_F_LINK_BIT = IOSQE_IO_LINK_BIT,
- REQ_F_HARDLINK_BIT = IOSQE_IO_HARDLINK_BIT,
- REQ_F_FORCE_ASYNC_BIT = IOSQE_ASYNC_BIT,
- REQ_F_BUFFER_SELECT_BIT = IOSQE_BUFFER_SELECT_BIT,
- REQ_F_CQE_SKIP_BIT = IOSQE_CQE_SKIP_SUCCESS_BIT,
-
- /* first byte is taken by user flags, shift it to not overlap */
- REQ_F_FAIL_BIT = 8,
- REQ_F_INFLIGHT_BIT,
- REQ_F_CUR_POS_BIT,
- REQ_F_NOWAIT_BIT,
- REQ_F_LINK_TIMEOUT_BIT,
- REQ_F_NEED_CLEANUP_BIT,
- REQ_F_POLLED_BIT,
- REQ_F_BUFFER_SELECTED_BIT,
- REQ_F_COMPLETE_INLINE_BIT,
- REQ_F_REISSUE_BIT,
- REQ_F_CREDS_BIT,
- REQ_F_REFCOUNT_BIT,
- REQ_F_ARM_LTIMEOUT_BIT,
- REQ_F_ASYNC_DATA_BIT,
- REQ_F_SKIP_LINK_CQES_BIT,
- REQ_F_SINGLE_POLL_BIT,
- REQ_F_DOUBLE_POLL_BIT,
- REQ_F_PARTIAL_IO_BIT,
- /* keep async read/write and isreg together and in order */
- REQ_F_SUPPORT_NOWAIT_BIT,
- REQ_F_ISREG_BIT,
-
- /* not a real bit, just to check we're not overflowing the space */
- __REQ_F_LAST_BIT,
-};
-
-enum {
- /* ctx owns file */
- REQ_F_FIXED_FILE = BIT(REQ_F_FIXED_FILE_BIT),
- /* drain existing IO first */
- REQ_F_IO_DRAIN = BIT(REQ_F_IO_DRAIN_BIT),
- /* linked sqes */
- REQ_F_LINK = BIT(REQ_F_LINK_BIT),
- /* doesn't sever on completion < 0 */
- REQ_F_HARDLINK = BIT(REQ_F_HARDLINK_BIT),
- /* IOSQE_ASYNC */
- REQ_F_FORCE_ASYNC = BIT(REQ_F_FORCE_ASYNC_BIT),
- /* IOSQE_BUFFER_SELECT */
- REQ_F_BUFFER_SELECT = BIT(REQ_F_BUFFER_SELECT_BIT),
- /* IOSQE_CQE_SKIP_SUCCESS */
- REQ_F_CQE_SKIP = BIT(REQ_F_CQE_SKIP_BIT),
-
- /* fail rest of links */
- REQ_F_FAIL = BIT(REQ_F_FAIL_BIT),
- /* on inflight list, should be cancelled and waited on exit reliably */
- REQ_F_INFLIGHT = BIT(REQ_F_INFLIGHT_BIT),
- /* read/write uses file position */
- REQ_F_CUR_POS = BIT(REQ_F_CUR_POS_BIT),
- /* must not punt to workers */
- REQ_F_NOWAIT = BIT(REQ_F_NOWAIT_BIT),
- /* has or had linked timeout */
- REQ_F_LINK_TIMEOUT = BIT(REQ_F_LINK_TIMEOUT_BIT),
- /* needs cleanup */
- REQ_F_NEED_CLEANUP = BIT(REQ_F_NEED_CLEANUP_BIT),
- /* already went through poll handler */
- REQ_F_POLLED = BIT(REQ_F_POLLED_BIT),
- /* buffer already selected */
- REQ_F_BUFFER_SELECTED = BIT(REQ_F_BUFFER_SELECTED_BIT),
- /* completion is deferred through io_comp_state */
- REQ_F_COMPLETE_INLINE = BIT(REQ_F_COMPLETE_INLINE_BIT),
- /* caller should reissue async */
- REQ_F_REISSUE = BIT(REQ_F_REISSUE_BIT),
- /* supports async reads/writes */
- REQ_F_SUPPORT_NOWAIT = BIT(REQ_F_SUPPORT_NOWAIT_BIT),
- /* regular file */
- REQ_F_ISREG = BIT(REQ_F_ISREG_BIT),
- /* has creds assigned */
- REQ_F_CREDS = BIT(REQ_F_CREDS_BIT),
- /* skip refcounting if not set */
- REQ_F_REFCOUNT = BIT(REQ_F_REFCOUNT_BIT),
- /* there is a linked timeout that has to be armed */
- REQ_F_ARM_LTIMEOUT = BIT(REQ_F_ARM_LTIMEOUT_BIT),
- /* ->async_data allocated */
- REQ_F_ASYNC_DATA = BIT(REQ_F_ASYNC_DATA_BIT),
- /* don't post CQEs while failing linked requests */
- REQ_F_SKIP_LINK_CQES = BIT(REQ_F_SKIP_LINK_CQES_BIT),
- /* single poll may be active */
- REQ_F_SINGLE_POLL = BIT(REQ_F_SINGLE_POLL_BIT),
- /* double poll may active */
- REQ_F_DOUBLE_POLL = BIT(REQ_F_DOUBLE_POLL_BIT),
- /* request has already done partial IO */
- REQ_F_PARTIAL_IO = BIT(REQ_F_PARTIAL_IO_BIT),
-};
-
-struct async_poll {
- struct io_poll_iocb poll;
- struct io_poll_iocb *double_poll;
-};
-
-typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked);
-
-struct io_task_work {
- union {
- struct io_wq_work_node node;
- struct llist_node fallback_node;
- };
- io_req_tw_func_t func;
-};
-
-enum {
- IORING_RSRC_FILE = 0,
- IORING_RSRC_BUFFER = 1,
-};
-
-/*
- * NOTE! Each of the iocb union members has the file pointer
- * as the first entry in their struct definition. So you can
- * access the file pointer through any of the sub-structs,
- * or directly as just 'file' in this struct.
- */
-struct io_kiocb {
- union {
- struct file *file;
- struct io_rw rw;
- struct io_poll_iocb poll;
- struct io_poll_update poll_update;
- struct io_accept accept;
- struct io_sync sync;
- struct io_cancel cancel;
- struct io_timeout timeout;
- struct io_timeout_rem timeout_rem;
- struct io_connect connect;
- struct io_sr_msg sr_msg;
- struct io_open open;
- struct io_close close;
- struct io_rsrc_update rsrc_update;
- struct io_fadvise fadvise;
- struct io_madvise madvise;
- struct io_epoll epoll;
- struct io_splice splice;
- struct io_provide_buf pbuf;
- struct io_statx statx;
- struct io_shutdown shutdown;
- struct io_rename rename;
- struct io_unlink unlink;
- struct io_mkdir mkdir;
- struct io_symlink symlink;
- struct io_hardlink hardlink;
- struct io_msg msg;
- };
-
- u8 opcode;
- /* polled IO has completed */
- u8 iopoll_completed;
- u16 buf_index;
- unsigned int flags;
-
- u64 user_data;
- u32 result;
- /* fd initially, then cflags for completion */
- union {
- u32 cflags;
- int fd;
- };
-
- struct io_ring_ctx *ctx;
- struct task_struct *task;
-
- struct percpu_ref *fixed_rsrc_refs;
- /* store used ubuf, so we can prevent reloading */
- struct io_mapped_ubuf *imu;
-
- union {
- /* used by request caches, completion batching and iopoll */
- struct io_wq_work_node comp_list;
- /* cache ->apoll->events */
- __poll_t apoll_events;
- };
- atomic_t refs;
- atomic_t poll_refs;
- struct io_task_work io_task_work;
- /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */
- struct hlist_node hash_node;
- /* internal polling, see IORING_FEAT_FAST_POLL */
- struct async_poll *apoll;
- /* opcode allocated if it needs to store data for async defer */
- void *async_data;
- /* stores selected buf, valid IFF REQ_F_BUFFER_SELECTED is set */
- struct io_buffer *kbuf;
- /* linked requests, IFF REQ_F_HARDLINK or REQ_F_LINK are set */
- struct io_kiocb *link;
- /* custom credentials, valid IFF REQ_F_CREDS is set */
- const struct cred *creds;
- struct io_wq_work work;
-};
-
-struct io_tctx_node {
- struct list_head ctx_node;
- struct task_struct *task;
- struct io_ring_ctx *ctx;
-};
-
-struct io_defer_entry {
- struct list_head list;
- struct io_kiocb *req;
- u32 seq;
-};
-
-struct io_op_def {
- /* needs req->file assigned */
- unsigned needs_file : 1;
- /* should block plug */
- unsigned plug : 1;
- /* hash wq insertion if file is a regular file */
- unsigned hash_reg_file : 1;
- /* unbound wq insertion if file is a non-regular file */
- unsigned unbound_nonreg_file : 1;
- /* set if opcode supports polled "wait" */
- unsigned pollin : 1;
- unsigned pollout : 1;
- unsigned poll_exclusive : 1;
- /* op supports buffer selection */
- unsigned buffer_select : 1;
- /* do prep async if is going to be punted */
- unsigned needs_async_setup : 1;
- /* opcode is not supported by this kernel */
- unsigned not_supported : 1;
- /* skip auditing */
- unsigned audit_skip : 1;
- /* size of async data needed, if any */
- unsigned short async_size;
-};
-
-static const struct io_op_def io_op_defs[] = {
- [IORING_OP_NOP] = {},
- [IORING_OP_READV] = {
- .needs_file = 1,
- .unbound_nonreg_file = 1,
- .pollin = 1,
- .buffer_select = 1,
- .needs_async_setup = 1,
- .plug = 1,
- .audit_skip = 1,
- .async_size = sizeof(struct io_async_rw),
- },
- [IORING_OP_WRITEV] = {
- .needs_file = 1,
- .hash_reg_file = 1,
- .unbound_nonreg_file = 1,
- .pollout = 1,
- .needs_async_setup = 1,
- .plug = 1,
- .audit_skip = 1,
- .async_size = sizeof(struct io_async_rw),
- },
- [IORING_OP_FSYNC] = {
- .needs_file = 1,
- .audit_skip = 1,
- },
- [IORING_OP_READ_FIXED] = {
- .needs_file = 1,
- .unbound_nonreg_file = 1,
- .pollin = 1,
- .plug = 1,
- .audit_skip = 1,
- .async_size = sizeof(struct io_async_rw),
- },
- [IORING_OP_WRITE_FIXED] = {
- .needs_file = 1,
- .hash_reg_file = 1,
- .unbound_nonreg_file = 1,
- .pollout = 1,
- .plug = 1,
- .audit_skip = 1,
- .async_size = sizeof(struct io_async_rw),
- },
- [IORING_OP_POLL_ADD] = {
- .needs_file = 1,
- .unbound_nonreg_file = 1,
- .audit_skip = 1,
- },
- [IORING_OP_POLL_REMOVE] = {
- .audit_skip = 1,
- },
- [IORING_OP_SYNC_FILE_RANGE] = {
- .needs_file = 1,
- .audit_skip = 1,
- },
- [IORING_OP_SENDMSG] = {
- .needs_file = 1,
- .unbound_nonreg_file = 1,
- .pollout = 1,
- .needs_async_setup = 1,
- .async_size = sizeof(struct io_async_msghdr),
- },
- [IORING_OP_RECVMSG] = {
- .needs_file = 1,
- .unbound_nonreg_file = 1,
- .pollin = 1,
- .buffer_select = 1,
- .needs_async_setup = 1,
- .async_size = sizeof(struct io_async_msghdr),
- },
- [IORING_OP_TIMEOUT] = {
- .audit_skip = 1,
- .async_size = sizeof(struct io_timeout_data),
- },
- [IORING_OP_TIMEOUT_REMOVE] = {
- /* used by timeout updates' prep() */
- .audit_skip = 1,
- },
- [IORING_OP_ACCEPT] = {
- .needs_file = 1,
- .unbound_nonreg_file = 1,
- .pollin = 1,
- .poll_exclusive = 1,
- },
- [IORING_OP_ASYNC_CANCEL] = {
- .audit_skip = 1,
- },
- [IORING_OP_LINK_TIMEOUT] = {
- .audit_skip = 1,
- .async_size = sizeof(struct io_timeout_data),
- },
- [IORING_OP_CONNECT] = {
- .needs_file = 1,
- .unbound_nonreg_file = 1,
- .pollout = 1,
- .needs_async_setup = 1,
- .async_size = sizeof(struct io_async_connect),
- },
- [IORING_OP_FALLOCATE] = {
- .needs_file = 1,
- },
- [IORING_OP_OPENAT] = {},
- [IORING_OP_CLOSE] = {},
- [IORING_OP_FILES_UPDATE] = {
- .audit_skip = 1,
- },
- [IORING_OP_STATX] = {
- .audit_skip = 1,
- },
- [IORING_OP_READ] = {
- .needs_file = 1,
- .unbound_nonreg_file = 1,
- .pollin = 1,
- .buffer_select = 1,
- .plug = 1,
- .audit_skip = 1,
- .async_size = sizeof(struct io_async_rw),
- },
- [IORING_OP_WRITE] = {
- .needs_file = 1,
- .hash_reg_file = 1,
- .unbound_nonreg_file = 1,
- .pollout = 1,
- .plug = 1,
- .audit_skip = 1,
- .async_size = sizeof(struct io_async_rw),
- },
- [IORING_OP_FADVISE] = {
- .needs_file = 1,
- .audit_skip = 1,
- },
- [IORING_OP_MADVISE] = {},
- [IORING_OP_SEND] = {
- .needs_file = 1,
- .unbound_nonreg_file = 1,
- .pollout = 1,
- .audit_skip = 1,
- },
- [IORING_OP_RECV] = {
- .needs_file = 1,
- .unbound_nonreg_file = 1,
- .pollin = 1,
- .buffer_select = 1,
- .audit_skip = 1,
- },
- [IORING_OP_OPENAT2] = {
- },
- [IORING_OP_EPOLL_CTL] = {
- .unbound_nonreg_file = 1,
- .audit_skip = 1,
- },
- [IORING_OP_SPLICE] = {
- .needs_file = 1,
- .hash_reg_file = 1,
- .unbound_nonreg_file = 1,
- .audit_skip = 1,
- },
- [IORING_OP_PROVIDE_BUFFERS] = {
- .audit_skip = 1,
- },
- [IORING_OP_REMOVE_BUFFERS] = {
- .audit_skip = 1,
- },
- [IORING_OP_TEE] = {
- .needs_file = 1,
- .hash_reg_file = 1,
- .unbound_nonreg_file = 1,
- .audit_skip = 1,
- },
- [IORING_OP_SHUTDOWN] = {
- .needs_file = 1,
- },
- [IORING_OP_RENAMEAT] = {},
- [IORING_OP_UNLINKAT] = {},
- [IORING_OP_MKDIRAT] = {},
- [IORING_OP_SYMLINKAT] = {},
- [IORING_OP_LINKAT] = {},
- [IORING_OP_MSG_RING] = {
- .needs_file = 1,
- },
-};
-
-/* requests with any of those set should undergo io_disarm_next() */
-#define IO_DISARM_MASK (REQ_F_ARM_LTIMEOUT | REQ_F_LINK_TIMEOUT | REQ_F_FAIL)
-
-static bool io_disarm_next(struct io_kiocb *req);
-static void io_uring_del_tctx_node(unsigned long index);
-static void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
- struct task_struct *task,
- bool cancel_all);
-static void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd);
-
-static void io_fill_cqe_req(struct io_kiocb *req, s32 res, u32 cflags);
-
-static void io_put_req(struct io_kiocb *req);
-static void io_put_req_deferred(struct io_kiocb *req);
-static void io_dismantle_req(struct io_kiocb *req);
-static void io_queue_linked_timeout(struct io_kiocb *req);
-static int __io_register_rsrc_update(struct io_ring_ctx *ctx, unsigned type,
- struct io_uring_rsrc_update2 *up,
- unsigned nr_args);
-static void io_clean_op(struct io_kiocb *req);
-static inline struct file *io_file_get_fixed(struct io_kiocb *req, int fd,
- unsigned issue_flags);
-static inline struct file *io_file_get_normal(struct io_kiocb *req, int fd);
-static void __io_queue_sqe(struct io_kiocb *req);
-static void io_rsrc_put_work(struct work_struct *work);
-
-static void io_req_task_queue(struct io_kiocb *req);
-static void __io_submit_flush_completions(struct io_ring_ctx *ctx);
-static int io_req_prep_async(struct io_kiocb *req);
-
-static int io_install_fixed_file(struct io_kiocb *req, struct file *file,
- unsigned int issue_flags, u32 slot_index);
-static int io_close_fixed(struct io_kiocb *req, unsigned int issue_flags);
-
-static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer);
-static void io_eventfd_signal(struct io_ring_ctx *ctx);
-
-static struct kmem_cache *req_cachep;
-
-static const struct file_operations io_uring_fops;
-
-struct sock *io_uring_get_socket(struct file *file)
-{
-#if defined(CONFIG_UNIX)
- if (file->f_op == &io_uring_fops) {
- struct io_ring_ctx *ctx = file->private_data;
-
- return ctx->ring_sock->sk;
- }
-#endif
- return NULL;
-}
-EXPORT_SYMBOL(io_uring_get_socket);
-
-static inline void io_tw_lock(struct io_ring_ctx *ctx, bool *locked)
-{
- if (!*locked) {
- mutex_lock(&ctx->uring_lock);
- *locked = true;
- }
-}
-
-#define io_for_each_link(pos, head) \
- for (pos = (head); pos; pos = pos->link)
-
-/*
- * Shamelessly stolen from the mm implementation of page reference checking,
- * see commit f958d7b528b1 for details.
- */
-#define req_ref_zero_or_close_to_overflow(req) \
- ((unsigned int) atomic_read(&(req->refs)) + 127u <= 127u)
-
-static inline bool req_ref_inc_not_zero(struct io_kiocb *req)
-{
- WARN_ON_ONCE(!(req->flags & REQ_F_REFCOUNT));
- return atomic_inc_not_zero(&req->refs);
-}
-
-static inline bool req_ref_put_and_test(struct io_kiocb *req)
-{
- if (likely(!(req->flags & REQ_F_REFCOUNT)))
- return true;
-
- WARN_ON_ONCE(req_ref_zero_or_close_to_overflow(req));
- return atomic_dec_and_test(&req->refs);
-}
-
-static inline void req_ref_get(struct io_kiocb *req)
-{
- WARN_ON_ONCE(!(req->flags & REQ_F_REFCOUNT));
- WARN_ON_ONCE(req_ref_zero_or_close_to_overflow(req));
- atomic_inc(&req->refs);
-}
-
-static inline void io_submit_flush_completions(struct io_ring_ctx *ctx)
-{
- if (!wq_list_empty(&ctx->submit_state.compl_reqs))
- __io_submit_flush_completions(ctx);
-}
-
-static inline void __io_req_set_refcount(struct io_kiocb *req, int nr)
-{
- if (!(req->flags & REQ_F_REFCOUNT)) {
- req->flags |= REQ_F_REFCOUNT;
- atomic_set(&req->refs, nr);
- }
-}
-
-static inline void io_req_set_refcount(struct io_kiocb *req)
-{
- __io_req_set_refcount(req, 1);
-}
-
-#define IO_RSRC_REF_BATCH 100
-
-static inline void io_req_put_rsrc_locked(struct io_kiocb *req,
- struct io_ring_ctx *ctx)
- __must_hold(&ctx->uring_lock)
-{
- struct percpu_ref *ref = req->fixed_rsrc_refs;
-
- if (ref) {
- if (ref == &ctx->rsrc_node->refs)
- ctx->rsrc_cached_refs++;
- else
- percpu_ref_put(ref);
- }
-}
-
-static inline void io_req_put_rsrc(struct io_kiocb *req, struct io_ring_ctx *ctx)
-{
- if (req->fixed_rsrc_refs)
- percpu_ref_put(req->fixed_rsrc_refs);
-}
-
-static __cold void io_rsrc_refs_drop(struct io_ring_ctx *ctx)
- __must_hold(&ctx->uring_lock)
-{
- if (ctx->rsrc_cached_refs) {
- percpu_ref_put_many(&ctx->rsrc_node->refs, ctx->rsrc_cached_refs);
- ctx->rsrc_cached_refs = 0;
- }
-}
-
-static void io_rsrc_refs_refill(struct io_ring_ctx *ctx)
- __must_hold(&ctx->uring_lock)
-{
- ctx->rsrc_cached_refs += IO_RSRC_REF_BATCH;
- percpu_ref_get_many(&ctx->rsrc_node->refs, IO_RSRC_REF_BATCH);
-}
-
-static inline void io_req_set_rsrc_node(struct io_kiocb *req,
- struct io_ring_ctx *ctx,
- unsigned int issue_flags)
-{
- if (!req->fixed_rsrc_refs) {
- req->fixed_rsrc_refs = &ctx->rsrc_node->refs;
-
- if (!(issue_flags & IO_URING_F_UNLOCKED)) {
- lockdep_assert_held(&ctx->uring_lock);
- ctx->rsrc_cached_refs--;
- if (unlikely(ctx->rsrc_cached_refs < 0))
- io_rsrc_refs_refill(ctx);
- } else {
- percpu_ref_get(req->fixed_rsrc_refs);
- }
- }
-}
-
-static unsigned int __io_put_kbuf(struct io_kiocb *req, struct list_head *list)
-{
- struct io_buffer *kbuf = req->kbuf;
- unsigned int cflags;
-
- cflags = IORING_CQE_F_BUFFER | (kbuf->bid << IORING_CQE_BUFFER_SHIFT);
- req->flags &= ~REQ_F_BUFFER_SELECTED;
- list_add(&kbuf->list, list);
- req->kbuf = NULL;
- return cflags;
-}
-
-static inline unsigned int io_put_kbuf_comp(struct io_kiocb *req)
-{
- lockdep_assert_held(&req->ctx->completion_lock);
-
- if (likely(!(req->flags & REQ_F_BUFFER_SELECTED)))
- return 0;
- return __io_put_kbuf(req, &req->ctx->io_buffers_comp);
-}
-
-static inline unsigned int io_put_kbuf(struct io_kiocb *req,
- unsigned issue_flags)
-{
- unsigned int cflags;
-
- if (likely(!(req->flags & REQ_F_BUFFER_SELECTED)))
- return 0;
-
- /*
- * We can add this buffer back to two lists:
- *
- * 1) The io_buffers_cache list. This one is protected by the
- * ctx->uring_lock. If we already hold this lock, add back to this
- * list as we can grab it from issue as well.
- * 2) The io_buffers_comp list. This one is protected by the
- * ctx->completion_lock.
- *
- * We migrate buffers from the comp_list to the issue cache list
- * when we need one.
- */
- if (issue_flags & IO_URING_F_UNLOCKED) {
- struct io_ring_ctx *ctx = req->ctx;
-
- spin_lock(&ctx->completion_lock);
- cflags = __io_put_kbuf(req, &ctx->io_buffers_comp);
- spin_unlock(&ctx->completion_lock);
- } else {
- lockdep_assert_held(&req->ctx->uring_lock);
-
- cflags = __io_put_kbuf(req, &req->ctx->io_buffers_cache);
- }
-
- return cflags;
-}
-
-static struct io_buffer_list *io_buffer_get_list(struct io_ring_ctx *ctx,
- unsigned int bgid)
-{
- struct list_head *hash_list;
- struct io_buffer_list *bl;
-
- hash_list = &ctx->io_buffers[hash_32(bgid, IO_BUFFERS_HASH_BITS)];
- list_for_each_entry(bl, hash_list, list)
- if (bl->bgid == bgid || bgid == -1U)
- return bl;
-
- return NULL;
-}
-
-static void io_kbuf_recycle(struct io_kiocb *req, unsigned issue_flags)
-{
- struct io_ring_ctx *ctx = req->ctx;
- struct io_buffer_list *bl;
- struct io_buffer *buf;
-
- if (likely(!(req->flags & REQ_F_BUFFER_SELECTED)))
- return;
- /* don't recycle if we already did IO to this buffer */
- if (req->flags & REQ_F_PARTIAL_IO)
- return;
-
- if (issue_flags & IO_URING_F_UNLOCKED)
- mutex_lock(&ctx->uring_lock);
-
- lockdep_assert_held(&ctx->uring_lock);
-
- buf = req->kbuf;
- bl = io_buffer_get_list(ctx, buf->bgid);
- list_add(&buf->list, &bl->buf_list);
- req->flags &= ~REQ_F_BUFFER_SELECTED;
- req->kbuf = NULL;
-
- if (issue_flags & IO_URING_F_UNLOCKED)
- mutex_unlock(&ctx->uring_lock);
-}
-
-static bool io_match_task(struct io_kiocb *head, struct task_struct *task,
- bool cancel_all)
- __must_hold(&req->ctx->timeout_lock)
-{
- struct io_kiocb *req;
-
- if (task && head->task != task)
- return false;
- if (cancel_all)
- return true;
-
- io_for_each_link(req, head) {
- if (req->flags & REQ_F_INFLIGHT)
- return true;
- }
- return false;
-}
-
-static bool io_match_linked(struct io_kiocb *head)
-{
- struct io_kiocb *req;
-
- io_for_each_link(req, head) {
- if (req->flags & REQ_F_INFLIGHT)
- return true;
- }
- return false;
-}
-
-/*
- * As io_match_task() but protected against racing with linked timeouts.
- * User must not hold timeout_lock.
- */
-static bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task,
- bool cancel_all)
-{
- bool matched;
-
- if (task && head->task != task)
- return false;
- if (cancel_all)
- return true;
-
- if (head->flags & REQ_F_LINK_TIMEOUT) {
- struct io_ring_ctx *ctx = head->ctx;
-
- /* protect against races with linked timeouts */
- spin_lock_irq(&ctx->timeout_lock);
- matched = io_match_linked(head);
- spin_unlock_irq(&ctx->timeout_lock);
- } else {
- matched = io_match_linked(head);
- }
- return matched;
-}
-
-static inline bool req_has_async_data(struct io_kiocb *req)
-{
- return req->flags & REQ_F_ASYNC_DATA;
-}
-
-static inline void req_set_fail(struct io_kiocb *req)
-{
- req->flags |= REQ_F_FAIL;
- if (req->flags & REQ_F_CQE_SKIP) {
- req->flags &= ~REQ_F_CQE_SKIP;
- req->flags |= REQ_F_SKIP_LINK_CQES;
- }
-}
-
-static inline void req_fail_link_node(struct io_kiocb *req, int res)
-{
- req_set_fail(req);
- req->result = res;
-}
-
-static __cold void io_ring_ctx_ref_free(struct percpu_ref *ref)
-{
- struct io_ring_ctx *ctx = container_of(ref, struct io_ring_ctx, refs);
-
- complete(&ctx->ref_comp);
-}
-
-static inline bool io_is_timeout_noseq(struct io_kiocb *req)
-{
- return !req->timeout.off;
-}
-
-static __cold void io_fallback_req_func(struct work_struct *work)
-{
- struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx,
- fallback_work.work);
- struct llist_node *node = llist_del_all(&ctx->fallback_llist);
- struct io_kiocb *req, *tmp;
- bool locked = false;
-
- percpu_ref_get(&ctx->refs);
- llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
- req->io_task_work.func(req, &locked);
-
- if (locked) {
- io_submit_flush_completions(ctx);
- mutex_unlock(&ctx->uring_lock);
- }
- percpu_ref_put(&ctx->refs);
-}
-
-static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
-{
- struct io_ring_ctx *ctx;
- int i, hash_bits;
-
- ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
- if (!ctx)
- return NULL;
-
- /*
- * Use 5 bits less than the max cq entries, that should give us around
- * 32 entries per hash list if totally full and uniformly spread.
- */
- hash_bits = ilog2(p->cq_entries);
- hash_bits -= 5;
- if (hash_bits <= 0)
- hash_bits = 1;
- ctx->cancel_hash_bits = hash_bits;
- ctx->cancel_hash = kmalloc((1U << hash_bits) * sizeof(struct hlist_head),
- GFP_KERNEL);
- if (!ctx->cancel_hash)
- goto err;
- __hash_init(ctx->cancel_hash, 1U << hash_bits);
-
- ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
- if (!ctx->dummy_ubuf)
- goto err;
- /* set invalid range, so io_import_fixed() fails meeting it */
- ctx->dummy_ubuf->ubuf = -1UL;
-
- ctx->io_buffers = kcalloc(1U << IO_BUFFERS_HASH_BITS,
- sizeof(struct list_head), GFP_KERNEL);
- if (!ctx->io_buffers)
- goto err;
- for (i = 0; i < (1U << IO_BUFFERS_HASH_BITS); i++)
- INIT_LIST_HEAD(&ctx->io_buffers[i]);
-
- if (percpu_ref_init(&ctx->refs, io_ring_ctx_ref_free,
- PERCPU_REF_ALLOW_REINIT, GFP_KERNEL))
- goto err;
-
- ctx->flags = p->flags;
- init_waitqueue_head(&ctx->sqo_sq_wait);
- INIT_LIST_HEAD(&ctx->sqd_list);
- INIT_LIST_HEAD(&ctx->cq_overflow_list);
- INIT_LIST_HEAD(&ctx->io_buffers_cache);
- INIT_LIST_HEAD(&ctx->apoll_cache);
- init_completion(&ctx->ref_comp);
- xa_init_flags(&ctx->personalities, XA_FLAGS_ALLOC1);
- mutex_init(&ctx->uring_lock);
- init_waitqueue_head(&ctx->cq_wait);
- spin_lock_init(&ctx->completion_lock);
- spin_lock_init(&ctx->timeout_lock);
- INIT_WQ_LIST(&ctx->iopoll_list);
- INIT_LIST_HEAD(&ctx->io_buffers_pages);
- INIT_LIST_HEAD(&ctx->io_buffers_comp);
- INIT_LIST_HEAD(&ctx->defer_list);
- INIT_LIST_HEAD(&ctx->timeout_list);
- INIT_LIST_HEAD(&ctx->ltimeout_list);
- spin_lock_init(&ctx->rsrc_ref_lock);
- INIT_LIST_HEAD(&ctx->rsrc_ref_list);
- INIT_DELAYED_WORK(&ctx->rsrc_put_work, io_rsrc_put_work);
- init_llist_head(&ctx->rsrc_put_llist);
- INIT_LIST_HEAD(&ctx->tctx_list);
- ctx->submit_state.free_list.next = NULL;
- INIT_WQ_LIST(&ctx->locked_free_list);
- INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func);
- INIT_WQ_LIST(&ctx->submit_state.compl_reqs);
- return ctx;
-err:
- kfree(ctx->dummy_ubuf);
- kfree(ctx->cancel_hash);
- kfree(ctx->io_buffers);
- kfree(ctx);
- return NULL;
-}
-
-static void io_account_cq_overflow(struct io_ring_ctx *ctx)
-{
- struct io_rings *r = ctx->rings;
-
- WRITE_ONCE(r->cq_overflow, READ_ONCE(r->cq_overflow) + 1);
- ctx->cq_extra--;
-}
-
-static bool req_need_defer(struct io_kiocb *req, u32 seq)
-{
- if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
- struct io_ring_ctx *ctx = req->ctx;
-
- return seq + READ_ONCE(ctx->cq_extra) != ctx->cached_cq_tail;
- }
-
- return false;
-}
-
-#define FFS_NOWAIT 0x1UL
-#define FFS_ISREG 0x2UL
-#define FFS_MASK ~(FFS_NOWAIT|FFS_ISREG)
-
-static inline bool io_req_ffs_set(struct io_kiocb *req)
-{
- return req->flags & REQ_F_FIXED_FILE;
-}
-
-static inline void io_req_track_inflight(struct io_kiocb *req)
-{
- if (!(req->flags & REQ_F_INFLIGHT)) {
- req->flags |= REQ_F_INFLIGHT;
- atomic_inc(&req->task->io_uring->inflight_tracked);
- }
-}
-
-static struct io_kiocb *__io_prep_linked_timeout(struct io_kiocb *req)
-{
- if (WARN_ON_ONCE(!req->link))
- return NULL;
-
- req->flags &= ~REQ_F_ARM_LTIMEOUT;
- req->flags |= REQ_F_LINK_TIMEOUT;
-
- /* linked timeouts should have two refs once prep'ed */
- io_req_set_refcount(req);
- __io_req_set_refcount(req->link, 2);
- return req->link;
-}
-
-static inline struct io_kiocb *io_prep_linked_timeout(struct io_kiocb *req)
-{
- if (likely(!(req->flags & REQ_F_ARM_LTIMEOUT)))
- return NULL;
- return __io_prep_linked_timeout(req);
-}
-
-static void io_prep_async_work(struct io_kiocb *req)
-{
- const struct io_op_def *def = &io_op_defs[req->opcode];
- struct io_ring_ctx *ctx = req->ctx;
-
- if (!(req->flags & REQ_F_CREDS)) {
- req->flags |= REQ_F_CREDS;
- req->creds = get_current_cred();
- }
-
- req->work.list.next = NULL;
- req->work.flags = 0;
- if (req->flags & REQ_F_FORCE_ASYNC)
- req->work.flags |= IO_WQ_WORK_CONCURRENT;
-
- if (req->flags & REQ_F_ISREG) {
- if (def->hash_reg_file || (ctx->flags & IORING_SETUP_IOPOLL))
- io_wq_hash_work(&req->work, file_inode(req->file));
- } else if (!req->file || !S_ISBLK(file_inode(req->file)->i_mode)) {
- if (def->unbound_nonreg_file)
- req->work.flags |= IO_WQ_WORK_UNBOUND;
- }
-}
-
-static void io_prep_async_link(struct io_kiocb *req)
-{
- struct io_kiocb *cur;
-
- if (req->flags & REQ_F_LINK_TIMEOUT) {
- struct io_ring_ctx *ctx = req->ctx;
-
- spin_lock_irq(&ctx->timeout_lock);
- io_for_each_link(cur, req)
- io_prep_async_work(cur);
- spin_unlock_irq(&ctx->timeout_lock);
- } else {
- io_for_each_link(cur, req)
- io_prep_async_work(cur);
- }
-}
-
-static inline void io_req_add_compl_list(struct io_kiocb *req)
-{
- struct io_ring_ctx *ctx = req->ctx;
- struct io_submit_state *state = &ctx->submit_state;
-
- if (!(req->flags & REQ_F_CQE_SKIP))
- ctx->submit_state.flush_cqes = true;
- wq_list_add_tail(&req->comp_list, &state->compl_reqs);
-}
-
-static void io_queue_async_work(struct io_kiocb *req, bool *dont_use)
-{
- struct io_ring_ctx *ctx = req->ctx;
- struct io_kiocb *link = io_prep_linked_timeout(req);
- struct io_uring_task *tctx = req->task->io_uring;
-
- BUG_ON(!tctx);
- BUG_ON(!tctx->io_wq);
-
- /* init ->work of the whole link before punting */
- io_prep_async_link(req);
-
- /*
- * Not expected to happen, but if we do have a bug where this _can_
- * happen, catch it here and ensure the request is marked as
- * canceled. That will make io-wq go through the usual work cancel
- * procedure rather than attempt to run this request (or create a new
- * worker for it).
- */
- if (WARN_ON_ONCE(!same_thread_group(req->task, current)))
- req->work.flags |= IO_WQ_WORK_CANCEL;
-
- trace_io_uring_queue_async_work(ctx, req, req->user_data, req->opcode, req->flags,
- &req->work, io_wq_is_hashed(&req->work));
- io_wq_enqueue(tctx->io_wq, &req->work);
- if (link)
- io_queue_linked_timeout(link);
-}
-
-static void io_kill_timeout(struct io_kiocb *req, int status)
- __must_hold(&req->ctx->completion_lock)
- __must_hold(&req->ctx->timeout_lock)
-{
- struct io_timeout_data *io = req->async_data;
-
- if (hrtimer_try_to_cancel(&io->timer) != -1) {
- if (status)
- req_set_fail(req);
- atomic_set(&req->ctx->cq_timeouts,
- atomic_read(&req->ctx->cq_timeouts) + 1);
- list_del_init(&req->timeout.list);
- io_fill_cqe_req(req, status, 0);
- io_put_req_deferred(req);
- }
-}
-
-static __cold void io_queue_deferred(struct io_ring_ctx *ctx)
-{
- while (!list_empty(&ctx->defer_list)) {
- struct io_defer_entry *de = list_first_entry(&ctx->defer_list,
- struct io_defer_entry, list);
-
- if (req_need_defer(de->req, de->seq))
- break;
- list_del_init(&de->list);
- io_req_task_queue(de->req);
- kfree(de);
- }
-}
-
-static __cold void io_flush_timeouts(struct io_ring_ctx *ctx)
- __must_hold(&ctx->completion_lock)
-{
- u32 seq = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts);
- struct io_kiocb *req, *tmp;
-
- spin_lock_irq(&ctx->timeout_lock);
- list_for_each_entry_safe(req, tmp, &ctx->timeout_list, timeout.list) {
- u32 events_needed, events_got;
-
- if (io_is_timeout_noseq(req))
- break;
-
- /*
- * Since seq can easily wrap around over time, subtract
- * the last seq at which timeouts were flushed before comparing.
- * Assuming not more than 2^31-1 events have happened since,
- * these subtractions won't have wrapped, so we can check if
- * target is in [last_seq, current_seq] by comparing the two.
- */
- events_needed = req->timeout.target_seq - ctx->cq_last_tm_flush;
- events_got = seq - ctx->cq_last_tm_flush;
- if (events_got < events_needed)
- break;
-
- io_kill_timeout(req, 0);
- }
- ctx->cq_last_tm_flush = seq;
- spin_unlock_irq(&ctx->timeout_lock);
-}
-
-static inline void io_commit_cqring(struct io_ring_ctx *ctx)
-{
- /* order cqe stores with ring update */
- smp_store_release(&ctx->rings->cq.tail, ctx->cached_cq_tail);
-}
-
-static void __io_commit_cqring_flush(struct io_ring_ctx *ctx)
-{
- if (ctx->off_timeout_used || ctx->drain_active) {
- spin_lock(&ctx->completion_lock);
- if (ctx->off_timeout_used)
- io_flush_timeouts(ctx);
- if (ctx->drain_active)
- io_queue_deferred(ctx);
- io_commit_cqring(ctx);
- spin_unlock(&ctx->completion_lock);
- }
- if (ctx->has_evfd)
- io_eventfd_signal(ctx);
-}
-
-static inline bool io_sqring_full(struct io_ring_ctx *ctx)
-{
- struct io_rings *r = ctx->rings;
-
- return READ_ONCE(r->sq.tail) - ctx->cached_sq_head == ctx->sq_entries;
-}
-
-static inline unsigned int __io_cqring_events(struct io_ring_ctx *ctx)
-{
- return ctx->cached_cq_tail - READ_ONCE(ctx->rings->cq.head);
-}
-
-static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx)
-{
- struct io_rings *rings = ctx->rings;
- unsigned tail, mask = ctx->cq_entries - 1;
-
- /*
- * writes to the cq entry need to come after reading head; the
- * control dependency is enough as we're using WRITE_ONCE to
- * fill the cq entry
- */
- if (__io_cqring_events(ctx) == ctx->cq_entries)
- return NULL;
-
- tail = ctx->cached_cq_tail++;
- return &rings->cqes[tail & mask];
-}
-
-static void io_eventfd_signal(struct io_ring_ctx *ctx)
-{
- struct io_ev_fd *ev_fd;
-
- rcu_read_lock();
- /*
- * rcu_dereference ctx->io_ev_fd once and use it for both for checking
- * and eventfd_signal
- */
- ev_fd = rcu_dereference(ctx->io_ev_fd);
-
- /*
- * Check again if ev_fd exists incase an io_eventfd_unregister call
- * completed between the NULL check of ctx->io_ev_fd at the start of
- * the function and rcu_read_lock.
- */
- if (unlikely(!ev_fd))
- goto out;
- if (READ_ONCE(ctx->rings->cq_flags) & IORING_CQ_EVENTFD_DISABLED)
- goto out;
-
- if (!ev_fd->eventfd_async || io_wq_current_is_worker())
- eventfd_signal(ev_fd->cq_ev_fd, 1);
-out:
- rcu_read_unlock();
-}
-
-static inline void io_cqring_wake(struct io_ring_ctx *ctx)
-{
- /*
- * wake_up_all() may seem excessive, but io_wake_function() and
- * io_should_wake() handle the termination of the loop and only
- * wake as many waiters as we need to.
- */
- if (wq_has_sleeper(&ctx->cq_wait))
- wake_up_all(&ctx->cq_wait);
-}
-
-/*
- * This should only get called when at least one event has been posted.
- * Some applications rely on the eventfd notification count only changing
- * IFF a new CQE has been added to the CQ ring. There's no depedency on
- * 1:1 relationship between how many times this function is called (and
- * hence the eventfd count) and number of CQEs posted to the CQ ring.
- */
-static inline void io_cqring_ev_posted(struct io_ring_ctx *ctx)
-{
- if (unlikely(ctx->off_timeout_used || ctx->drain_active ||
- ctx->has_evfd))
- __io_commit_cqring_flush(ctx);
-
- io_cqring_wake(ctx);
-}
-
-static void io_cqring_ev_posted_iopoll(struct io_ring_ctx *ctx)
-{
- if (unlikely(ctx->off_timeout_used || ctx->drain_active ||
- ctx->has_evfd))
- __io_commit_cqring_flush(ctx);
-
- if (ctx->flags & IORING_SETUP_SQPOLL)
- io_cqring_wake(ctx);
-}
-
-/* Returns true if there are no backlogged entries after the flush */
-static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
-{
- bool all_flushed, posted;
-
- if (!force && __io_cqring_events(ctx) == ctx->cq_entries)
- return false;
-
- posted = false;
- spin_lock(&ctx->completion_lock);
- while (!list_empty(&ctx->cq_overflow_list)) {
- struct io_uring_cqe *cqe = io_get_cqe(ctx);
- struct io_overflow_cqe *ocqe;
-
- if (!cqe && !force)
- break;
- ocqe = list_first_entry(&ctx->cq_overflow_list,
- struct io_overflow_cqe, list);
- if (cqe)
- memcpy(cqe, &ocqe->cqe, sizeof(*cqe));
- else
- io_account_cq_overflow(ctx);
-
- posted = true;
- list_del(&ocqe->list);
- kfree(ocqe);
- }
-
- all_flushed = list_empty(&ctx->cq_overflow_list);
- if (all_flushed) {
- clear_bit(0, &ctx->check_cq_overflow);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags & ~IORING_SQ_CQ_OVERFLOW);
- }
-
- if (posted)
- io_commit_cqring(ctx);
- spin_unlock(&ctx->completion_lock);
- if (posted)
- io_cqring_ev_posted(ctx);
- return all_flushed;
-}
-
-static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx)
-{
- bool ret = true;
-
- if (test_bit(0, &ctx->check_cq_overflow)) {
- /* iopoll syncs against uring_lock, not completion_lock */
- if (ctx->flags & IORING_SETUP_IOPOLL)
- mutex_lock(&ctx->uring_lock);
- ret = __io_cqring_overflow_flush(ctx, false);
- if (ctx->flags & IORING_SETUP_IOPOLL)
- mutex_unlock(&ctx->uring_lock);
- }
-
- return ret;
-}
-
-/* must to be called somewhat shortly after putting a request */
-static inline void io_put_task(struct task_struct *task, int nr)
-{
- struct io_uring_task *tctx = task->io_uring;
-
- if (likely(task == current)) {
- tctx->cached_refs += nr;
- } else {
- percpu_counter_sub(&tctx->inflight, nr);
- if (unlikely(atomic_read(&tctx->in_idle)))
- wake_up(&tctx->wait);
- put_task_struct_many(task, nr);
- }
-}
-
-static void io_task_refs_refill(struct io_uring_task *tctx)
-{
- unsigned int refill = -tctx->cached_refs + IO_TCTX_REFS_CACHE_NR;
-
- percpu_counter_add(&tctx->inflight, refill);
- refcount_add(refill, &current->usage);
- tctx->cached_refs += refill;
-}
-
-static inline void io_get_task_refs(int nr)
-{
- struct io_uring_task *tctx = current->io_uring;
-
- tctx->cached_refs -= nr;
- if (unlikely(tctx->cached_refs < 0))
- io_task_refs_refill(tctx);
-}
-
-static __cold void io_uring_drop_tctx_refs(struct task_struct *task)
-{
- struct io_uring_task *tctx = task->io_uring;
- unsigned int refs = tctx->cached_refs;
-
- if (refs) {
- tctx->cached_refs = 0;
- percpu_counter_sub(&tctx->inflight, refs);
- put_task_struct_many(task, refs);
- }
-}
-
-static bool io_cqring_event_overflow(struct io_ring_ctx *ctx, u64 user_data,
- s32 res, u32 cflags)
-{
- struct io_overflow_cqe *ocqe;
-
- ocqe = kmalloc(sizeof(*ocqe), GFP_ATOMIC | __GFP_ACCOUNT);
- if (!ocqe) {
- /*
- * If we're in ring overflow flush mode, or in task cancel mode,
- * or cannot allocate an overflow entry, then we need to drop it
- * on the floor.
- */
- io_account_cq_overflow(ctx);
- return false;
- }
- if (list_empty(&ctx->cq_overflow_list)) {
- set_bit(0, &ctx->check_cq_overflow);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags | IORING_SQ_CQ_OVERFLOW);
-
- }
- ocqe->cqe.user_data = user_data;
- ocqe->cqe.res = res;
- ocqe->cqe.flags = cflags;
- list_add_tail(&ocqe->list, &ctx->cq_overflow_list);
- return true;
-}
-
-static inline bool __io_fill_cqe(struct io_ring_ctx *ctx, u64 user_data,
- s32 res, u32 cflags)
-{
- struct io_uring_cqe *cqe;
-
- /*
- * If we can't get a cq entry, userspace overflowed the
- * submission (by quite a lot). Increment the overflow count in
- * the ring.
- */
- cqe = io_get_cqe(ctx);
- if (likely(cqe)) {
- WRITE_ONCE(cqe->user_data, user_data);
- WRITE_ONCE(cqe->res, res);
- WRITE_ONCE(cqe->flags, cflags);
- return true;
- }
- return io_cqring_event_overflow(ctx, user_data, res, cflags);
-}
-
-static inline bool __io_fill_cqe_req(struct io_kiocb *req, s32 res, u32 cflags)
-{
- trace_io_uring_complete(req->ctx, req, req->user_data, res, cflags);
- return __io_fill_cqe(req->ctx, req->user_data, res, cflags);
-}
-
-static noinline void io_fill_cqe_req(struct io_kiocb *req, s32 res, u32 cflags)
-{
- if (!(req->flags & REQ_F_CQE_SKIP))
- __io_fill_cqe_req(req, res, cflags);
-}
-
-static noinline bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data,
- s32 res, u32 cflags)
-{
- ctx->cq_extra++;
- trace_io_uring_complete(ctx, NULL, user_data, res, cflags);
- return __io_fill_cqe(ctx, user_data, res, cflags);
-}
-
-static void __io_req_complete_post(struct io_kiocb *req, s32 res,
- u32 cflags)
-{
- struct io_ring_ctx *ctx = req->ctx;
-
- if (!(req->flags & REQ_F_CQE_SKIP))
- __io_fill_cqe_req(req, res, cflags);
- /*
- * If we're the last reference to this request, add to our locked
- * free_list cache.
- */
- if (req_ref_put_and_test(req)) {
- if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK)) {
- if (req->flags & IO_DISARM_MASK)
- io_disarm_next(req);
- if (req->link) {
- io_req_task_queue(req->link);
- req->link = NULL;
- }
- }
- io_req_put_rsrc(req, ctx);
- /*
- * Selected buffer deallocation in io_clean_op() assumes that
- * we don't hold ->completion_lock. Clean them here to avoid
- * deadlocks.
- */
- io_put_kbuf_comp(req);
- io_dismantle_req(req);
- io_put_task(req->task, 1);
- wq_list_add_head(&req->comp_list, &ctx->locked_free_list);
- ctx->locked_free_nr++;
- }
-}
-
-static void io_req_complete_post(struct io_kiocb *req, s32 res,
- u32 cflags)
-{
- struct io_ring_ctx *ctx = req->ctx;
-
- spin_lock(&ctx->completion_lock);
- __io_req_complete_post(req, res, cflags);
- io_commit_cqring(ctx);
- spin_unlock(&ctx->completion_lock);
- io_cqring_ev_posted(ctx);
-}
-
-static inline void io_req_complete_state(struct io_kiocb *req, s32 res,
- u32 cflags)
-{
- req->result = res;
- req->cflags = cflags;
- req->flags |= REQ_F_COMPLETE_INLINE;
-}
-
-static inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags,
- s32 res, u32 cflags)
-{
- if (issue_flags & IO_URING_F_COMPLETE_DEFER)
- io_req_complete_state(req, res, cflags);
- else
- io_req_complete_post(req, res, cflags);
-}
-
-static inline void io_req_complete(struct io_kiocb *req, s32 res)
-{
- __io_req_complete(req, 0, res, 0);
-}
-
-static void io_req_complete_failed(struct io_kiocb *req, s32 res)
-{
- req_set_fail(req);
- io_req_complete_post(req, res, io_put_kbuf(req, IO_URING_F_UNLOCKED));
-}
-
-static void io_req_complete_fail_submit(struct io_kiocb *req)
-{
- /*
- * We don't submit, fail them all, for that replace hardlinks with
- * normal links. Extra REQ_F_LINK is tolerated.
- */
- req->flags &= ~REQ_F_HARDLINK;
- req->flags |= REQ_F_LINK;
- io_req_complete_failed(req, req->result);
-}
-
-/*
- * Don't initialise the fields below on every allocation, but do that in
- * advance and keep them valid across allocations.
- */
-static void io_preinit_req(struct io_kiocb *req, struct io_ring_ctx *ctx)
-{
- req->ctx = ctx;
- req->link = NULL;
- req->async_data = NULL;
- /* not necessary, but safer to zero */
- req->result = 0;
-}
-
-static void io_flush_cached_locked_reqs(struct io_ring_ctx *ctx,
- struct io_submit_state *state)
-{
- spin_lock(&ctx->completion_lock);
- wq_list_splice(&ctx->locked_free_list, &state->free_list);
- ctx->locked_free_nr = 0;
- spin_unlock(&ctx->completion_lock);
-}
-
-/* Returns true IFF there are requests in the cache */
-static bool io_flush_cached_reqs(struct io_ring_ctx *ctx)
-{
- struct io_submit_state *state = &ctx->submit_state;
-
- /*
- * If we have more than a batch's worth of requests in our IRQ side
- * locked cache, grab the lock and move them over to our submission
- * side cache.
- */
- if (READ_ONCE(ctx->locked_free_nr) > IO_COMPL_BATCH)
- io_flush_cached_locked_reqs(ctx, state);
- return !!state->free_list.next;
-}
-
-/*
- * A request might get retired back into the request caches even before opcode
- * handlers and io_issue_sqe() are done with it, e.g. inline completion path.
- * Because of that, io_alloc_req() should be called only under ->uring_lock
- * and with extra caution to not get a request that is still worked on.
- */
-static __cold bool __io_alloc_req_refill(struct io_ring_ctx *ctx)
- __must_hold(&ctx->uring_lock)
-{
- struct io_submit_state *state = &ctx->submit_state;
- gfp_t gfp = GFP_KERNEL | __GFP_NOWARN;
- void *reqs[IO_REQ_ALLOC_BATCH];
- struct io_kiocb *req;
- int ret, i;
-
- if (likely(state->free_list.next || io_flush_cached_reqs(ctx)))
- return true;
-
- ret = kmem_cache_alloc_bulk(req_cachep, gfp, ARRAY_SIZE(reqs), reqs);
-
- /*
- * Bulk alloc is all-or-nothing. If we fail to get a batch,
- * retry single alloc to be on the safe side.
- */
- if (unlikely(ret <= 0)) {
- reqs[0] = kmem_cache_alloc(req_cachep, gfp);
- if (!reqs[0])
- return false;
- ret = 1;
- }
-
- percpu_ref_get_many(&ctx->refs, ret);
- for (i = 0; i < ret; i++) {
- req = reqs[i];
-
- io_preinit_req(req, ctx);
- wq_stack_add_head(&req->comp_list, &state->free_list);
- }
- return true;
-}
-
-static inline bool io_alloc_req_refill(struct io_ring_ctx *ctx)
-{
- if (unlikely(!ctx->submit_state.free_list.next))
- return __io_alloc_req_refill(ctx);
- return true;
-}
-
-static inline struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx)
-{
- struct io_wq_work_node *node;
-
- node = wq_stack_extract(&ctx->submit_state.free_list);
- return container_of(node, struct io_kiocb, comp_list);
-}
-
-static inline void io_put_file(struct file *file)
-{
- if (file)
- fput(file);
-}
-
-static inline void io_dismantle_req(struct io_kiocb *req)
-{
- unsigned int flags = req->flags;
-
- if (unlikely(flags & IO_REQ_CLEAN_FLAGS))
- io_clean_op(req);
- if (!(flags & REQ_F_FIXED_FILE))
- io_put_file(req->file);
-}
-
-static __cold void __io_free_req(struct io_kiocb *req)
-{
- struct io_ring_ctx *ctx = req->ctx;
-
- io_req_put_rsrc(req, ctx);
- io_dismantle_req(req);
- io_put_task(req->task, 1);
-
- spin_lock(&ctx->completion_lock);
- wq_list_add_head(&req->comp_list, &ctx->locked_free_list);
- ctx->locked_free_nr++;
- spin_unlock(&ctx->completion_lock);
-}
-
-static inline void io_remove_next_linked(struct io_kiocb *req)
-{
- struct io_kiocb *nxt = req->link;
-
- req->link = nxt->link;
- nxt->link = NULL;
-}
-
-static bool io_kill_linked_timeout(struct io_kiocb *req)
- __must_hold(&req->ctx->completion_lock)
- __must_hold(&req->ctx->timeout_lock)
-{
- struct io_kiocb *link = req->link;
-
- if (link && link->opcode == IORING_OP_LINK_TIMEOUT) {
- struct io_timeout_data *io = link->async_data;
-
- io_remove_next_linked(req);
- link->timeout.head = NULL;
- if (hrtimer_try_to_cancel(&io->timer) != -1) {
- list_del(&link->timeout.list);
- /* leave REQ_F_CQE_SKIP to io_fill_cqe_req */
- io_fill_cqe_req(link, -ECANCELED, 0);
- io_put_req_deferred(link);
- return true;
- }
- }
- return false;
-}
-
-static void io_fail_links(struct io_kiocb *req)
- __must_hold(&req->ctx->completion_lock)
-{
- struct io_kiocb *nxt, *link = req->link;
- bool ignore_cqes = req->flags & REQ_F_SKIP_LINK_CQES;
-
- req->link = NULL;
- while (link) {
- long res = -ECANCELED;
-
- if (link->flags & REQ_F_FAIL)
- res = link->result;
-
- nxt = link->link;
- link->link = NULL;
-
- trace_io_uring_fail_link(req->ctx, req, req->user_data,
- req->opcode, link);
-
- if (!ignore_cqes) {
- link->flags &= ~REQ_F_CQE_SKIP;
- io_fill_cqe_req(link, res, 0);
- }
- io_put_req_deferred(link);
- link = nxt;
- }
-}
-
-static bool io_disarm_next(struct io_kiocb *req)
- __must_hold(&req->ctx->completion_lock)
-{
- bool posted = false;
-
- if (req->flags & REQ_F_ARM_LTIMEOUT) {
- struct io_kiocb *link = req->link;
-
- req->flags &= ~REQ_F_ARM_LTIMEOUT;
- if (link && link->opcode == IORING_OP_LINK_TIMEOUT) {
- io_remove_next_linked(req);
- /* leave REQ_F_CQE_SKIP to io_fill_cqe_req */
- io_fill_cqe_req(link, -ECANCELED, 0);
- io_put_req_deferred(link);
- posted = true;
- }
- } else if (req->flags & REQ_F_LINK_TIMEOUT) {
- struct io_ring_ctx *ctx = req->ctx;
-
- spin_lock_irq(&ctx->timeout_lock);
- posted = io_kill_linked_timeout(req);
- spin_unlock_irq(&ctx->timeout_lock);
- }
- if (unlikely((req->flags & REQ_F_FAIL) &&
- !(req->flags & REQ_F_HARDLINK))) {
- posted |= (req->link != NULL);
- io_fail_links(req);
- }
- return posted;
-}
-
-static void __io_req_find_next_prep(struct io_kiocb *req)
-{
- struct io_ring_ctx *ctx = req->ctx;
- bool posted;
-
- spin_lock(&ctx->completion_lock);
- posted = io_disarm_next(req);
- if (posted)
- io_commit_cqring(ctx);
- spin_unlock(&ctx->completion_lock);
- if (posted)
- io_cqring_ev_posted(ctx);
-}
-
-static inline struct io_kiocb *io_req_find_next(struct io_kiocb *req)
-{
- struct io_kiocb *nxt;
-
- if (likely(!(req->flags & (REQ_F_LINK|REQ_F_HARDLINK))))
- return NULL;
- /*
- * If LINK is set, we have dependent requests in this chain. If we
- * didn't fail this request, queue the first one up, moving any other
- * dependencies to the next request. In case of failure, fail the rest
- * of the chain.
- */
- if (unlikely(req->flags & IO_DISARM_MASK))
- __io_req_find_next_prep(req);
- nxt = req->link;
- req->link = NULL;
- return nxt;
-}
-
-static void ctx_flush_and_put(struct io_ring_ctx *ctx, bool *locked)
-{
- if (!ctx)
- return;
- if (*locked) {
- io_submit_flush_completions(ctx);
- mutex_unlock(&ctx->uring_lock);
- *locked = false;
- }
- percpu_ref_put(&ctx->refs);
-}
-
-static inline void ctx_commit_and_unlock(struct io_ring_ctx *ctx)
-{
- io_commit_cqring(ctx);
- spin_unlock(&ctx->completion_lock);
- io_cqring_ev_posted(ctx);
-}
-
-static void handle_prev_tw_list(struct io_wq_work_node *node,
- struct io_ring_ctx **ctx, bool *uring_locked)
-{
- if (*ctx && !*uring_locked)
- spin_lock(&(*ctx)->completion_lock);
-
- do {
- struct io_wq_work_node *next = node->next;
- struct io_kiocb *req = container_of(node, struct io_kiocb,
- io_task_work.node);
-
- prefetch(container_of(next, struct io_kiocb, io_task_work.node));
-
- if (req->ctx != *ctx) {
- if (unlikely(!*uring_locked && *ctx))
- ctx_commit_and_unlock(*ctx);
-
- ctx_flush_and_put(*ctx, uring_locked);
- *ctx = req->ctx;
- /* if not contended, grab and improve batching */
- *uring_locked = mutex_trylock(&(*ctx)->uring_lock);
- percpu_ref_get(&(*ctx)->refs);
- if (unlikely(!*uring_locked))
- spin_lock(&(*ctx)->completion_lock);
- }
- if (likely(*uring_locked))
- req->io_task_work.func(req, uring_locked);
- else
- __io_req_complete_post(req, req->result,
- io_put_kbuf_comp(req));
- node = next;
- } while (node);
-
- if (unlikely(!*uring_locked))
- ctx_commit_and_unlock(*ctx);
-}
-
-static void handle_tw_list(struct io_wq_work_node *node,
- struct io_ring_ctx **ctx, bool *locked)
-{
- do {
- struct io_wq_work_node *next = node->next;
- struct io_kiocb *req = container_of(node, struct io_kiocb,
- io_task_work.node);
-
- prefetch(container_of(next, struct io_kiocb, io_task_work.node));
-
- if (req->ctx != *ctx) {
- ctx_flush_and_put(*ctx, locked);
- *ctx = req->ctx;
- /* if not contended, grab and improve batching */
- *locked = mutex_trylock(&(*ctx)->uring_lock);
- percpu_ref_get(&(*ctx)->refs);
- }
- req->io_task_work.func(req, locked);
- node = next;
- } while (node);
-}
-
-static void tctx_task_work(struct callback_head *cb)
-{
- bool uring_locked = false;
- struct io_ring_ctx *ctx = NULL;
- struct io_uring_task *tctx = container_of(cb, struct io_uring_task,
- task_work);
-
- while (1) {
- struct io_wq_work_node *node1, *node2;
-
- if (!tctx->task_list.first &&
- !tctx->prior_task_list.first && uring_locked)
- io_submit_flush_completions(ctx);
-
- spin_lock_irq(&tctx->task_lock);
- node1 = tctx->prior_task_list.first;
- node2 = tctx->task_list.first;
- INIT_WQ_LIST(&tctx->task_list);
- INIT_WQ_LIST(&tctx->prior_task_list);
- if (!node2 && !node1)
- tctx->task_running = false;
- spin_unlock_irq(&tctx->task_lock);
- if (!node2 && !node1)
- break;
-
- if (node1)
- handle_prev_tw_list(node1, &ctx, &uring_locked);
-
- if (node2)
- handle_tw_list(node2, &ctx, &uring_locked);
- cond_resched();
- }
-
- ctx_flush_and_put(ctx, &uring_locked);
-
- /* relaxed read is enough as only the task itself sets ->in_idle */
- if (unlikely(atomic_read(&tctx->in_idle)))
- io_uring_drop_tctx_refs(current);
-}
-
-static void io_req_task_work_add(struct io_kiocb *req, bool priority)
-{
- struct task_struct *tsk = req->task;
- struct io_uring_task *tctx = tsk->io_uring;
- enum task_work_notify_mode notify;
- struct io_wq_work_node *node;
- unsigned long flags;
- bool running;
-
- WARN_ON_ONCE(!tctx);
-
- spin_lock_irqsave(&tctx->task_lock, flags);
- if (priority)
- wq_list_add_tail(&req->io_task_work.node, &tctx->prior_task_list);
- else
- wq_list_add_tail(&req->io_task_work.node, &tctx->task_list);
- running = tctx->task_running;
- if (!running)
- tctx->task_running = true;
- spin_unlock_irqrestore(&tctx->task_lock, flags);
-
- /* task_work already pending, we're done */
- if (running)
- return;
-
- /*
- * SQPOLL kernel thread doesn't need notification, just a wakeup. For
- * all other cases, use TWA_SIGNAL unconditionally to ensure we're
- * processing task_work. There's no reliable way to tell if TWA_RESUME
- * will do the job.
- */
- notify = (req->ctx->flags & IORING_SETUP_SQPOLL) ? TWA_NONE : TWA_SIGNAL;
- if (likely(!task_work_add(tsk, &tctx->task_work, notify))) {
- if (notify == TWA_NONE)
- wake_up_process(tsk);
- return;
- }
-
- spin_lock_irqsave(&tctx->task_lock, flags);
- tctx->task_running = false;
- node = wq_list_merge(&tctx->prior_task_list, &tctx->task_list);
- spin_unlock_irqrestore(&tctx->task_lock, flags);
-
- while (node) {
- req = container_of(node, struct io_kiocb, io_task_work.node);
- node = node->next;
- if (llist_add(&req->io_task_work.fallback_node,
- &req->ctx->fallback_llist))
- schedule_delayed_work(&req->ctx->fallback_work, 1);
- }
-}
-
-static void io_req_task_cancel(struct io_kiocb *req, bool *locked)
-{
- struct io_ring_ctx *ctx = req->ctx;
-
- /* not needed for normal modes, but SQPOLL depends on it */
- io_tw_lock(ctx, locked);
- io_req_complete_failed(req, req->result);
-}
-
-static void io_req_task_submit(struct io_kiocb *req, bool *locked)
-{
- struct io_ring_ctx *ctx = req->ctx;
-
- io_tw_lock(ctx, locked);
- /* req->task == current here, checking PF_EXITING is safe */
- if (likely(!(req->task->flags & PF_EXITING)))
- __io_queue_sqe(req);
- else
- io_req_complete_failed(req, -EFAULT);
-}
-
-static void io_req_task_queue_fail(struct io_kiocb *req, int ret)
-{
- req->result = ret;
- req->io_task_work.func = io_req_task_cancel;
- io_req_task_work_add(req, false);
-}
-
-static void io_req_task_queue(struct io_kiocb *req)
-{
- req->io_task_work.func = io_req_task_submit;
- io_req_task_work_add(req, false);
-}
-
-static void io_req_task_queue_reissue(struct io_kiocb *req)
-{
- req->io_task_work.func = io_queue_async_work;
- io_req_task_work_add(req, false);
-}
-
-static inline void io_queue_next(struct io_kiocb *req)
-{
- struct io_kiocb *nxt = io_req_find_next(req);
-
- if (nxt)
- io_req_task_queue(nxt);
-}
-
-static void io_free_req(struct io_kiocb *req)
-{
- io_queue_next(req);
- __io_free_req(req);
-}
-
-static void io_free_req_work(struct io_kiocb *req, bool *locked)
-{
- io_free_req(req);
-}
-
-static void io_free_batch_list(struct io_ring_ctx *ctx,
- struct io_wq_work_node *node)
- __must_hold(&ctx->uring_lock)
-{
- struct task_struct *task = NULL;
- int task_refs = 0;
-
- do {
- struct io_kiocb *req = container_of(node, struct io_kiocb,
- comp_list);
-
- if (unlikely(req->flags & REQ_F_REFCOUNT)) {
- node = req->comp_list.next;
- if (!req_ref_put_and_test(req))
- continue;
- }
-
- io_req_put_rsrc_locked(req, ctx);
- io_queue_next(req);
- io_dismantle_req(req);
-
- if (req->task != task) {
- if (task)
- io_put_task(task, task_refs);
- task = req->task;
- task_refs = 0;
- }
- task_refs++;
- node = req->comp_list.next;
- wq_stack_add_head(&req->comp_list, &ctx->submit_state.free_list);
- } while (node);
-
- if (task)
- io_put_task(task, task_refs);
-}
-
-static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
- __must_hold(&ctx->uring_lock)
-{
- struct io_wq_work_node *node, *prev;
- struct io_submit_state *state = &ctx->submit_state;
-
- if (state->flush_cqes) {
- spin_lock(&ctx->completion_lock);
- wq_list_for_each(node, prev, &state->compl_reqs) {
- struct io_kiocb *req = container_of(node, struct io_kiocb,
- comp_list);
-
- if (!(req->flags & REQ_F_CQE_SKIP))
- __io_fill_cqe_req(req, req->result, req->cflags);
- if ((req->flags & REQ_F_POLLED) && req->apoll) {
- struct async_poll *apoll = req->apoll;
-
- if (apoll->double_poll)
- kfree(apoll->double_poll);
- list_add(&apoll->poll.wait.entry,
- &ctx->apoll_cache);
- req->flags &= ~REQ_F_POLLED;
- }
- }
-
- io_commit_cqring(ctx);
- spin_unlock(&ctx->completion_lock);
- io_cqring_ev_posted(ctx);
- state->flush_cqes = false;
- }
-
- io_free_batch_list(ctx, state->compl_reqs.first);
- INIT_WQ_LIST(&state->compl_reqs);
-}
-
-/*
- * Drop reference to request, return next in chain (if there is one) if this
- * was the last reference to this request.
- */
-static inline struct io_kiocb *io_put_req_find_next(struct io_kiocb *req)
-{
- struct io_kiocb *nxt = NULL;
-
- if (req_ref_put_and_test(req)) {
- nxt = io_req_find_next(req);
- __io_free_req(req);
- }
- return nxt;
-}
-
-static inline void io_put_req(struct io_kiocb *req)
-{
- if (req_ref_put_and_test(req))
- io_free_req(req);
-}
-
-static inline void io_put_req_deferred(struct io_kiocb *req)
-{
- if (req_ref_put_and_test(req)) {
- req->io_task_work.func = io_free_req_work;
- io_req_task_work_add(req, false);
- }
-}
-
-static unsigned io_cqring_events(struct io_ring_ctx *ctx)
-{
- /* See comment at the top of this file */
- smp_rmb();
- return __io_cqring_events(ctx);
-}
-
-static inline unsigned int io_sqring_entries(struct io_ring_ctx *ctx)
-{
- struct io_rings *rings = ctx->rings;
-
- /* make sure SQ entry isn't read before tail */
- return smp_load_acquire(&rings->sq.tail) - ctx->cached_sq_head;
-}
-
-static inline bool io_run_task_work(void)
-{
- if (test_thread_flag(TIF_NOTIFY_SIGNAL) || task_work_pending(current)) {
- __set_current_state(TASK_RUNNING);
- clear_notify_signal();
- if (task_work_pending(current))
- task_work_run();
- return true;
- }
-
- return false;
-}
-
-static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
-{
- struct io_wq_work_node *pos, *start, *prev;
- unsigned int poll_flags = BLK_POLL_NOSLEEP;
- DEFINE_IO_COMP_BATCH(iob);
- int nr_events = 0;
-
- /*
- * Only spin for completions if we don't have multiple devices hanging
- * off our complete list.
- */
- if (ctx->poll_multi_queue || force_nonspin)
- poll_flags |= BLK_POLL_ONESHOT;
-
- wq_list_for_each(pos, start, &ctx->iopoll_list) {
- struct io_kiocb *req = container_of(pos, struct io_kiocb, comp_list);
- struct kiocb *kiocb = &req->rw.kiocb;
- int ret;
-
- /*
- * Move completed and retryable entries to our local lists.
- * If we find a request that requires polling, break out
- * and complete those lists first, if we have entries there.
- */
- if (READ_ONCE(req->iopoll_completed))
- break;
-
- ret = kiocb->ki_filp->f_op->iopoll(kiocb, &iob, poll_flags);
- if (unlikely(ret < 0))
- return ret;
- else if (ret)
- poll_flags |= BLK_POLL_ONESHOT;
-
- /* iopoll may have completed current req */
- if (!rq_list_empty(iob.req_list) ||
- READ_ONCE(req->iopoll_completed))
- break;
- }
-
- if (!rq_list_empty(iob.req_list))
- iob.complete(&iob);
- else if (!pos)
- return 0;
-
- prev = start;
- wq_list_for_each_resume(pos, prev) {
- struct io_kiocb *req = container_of(pos, struct io_kiocb, comp_list);
-
- /* order with io_complete_rw_iopoll(), e.g. ->result updates */
- if (!smp_load_acquire(&req->iopoll_completed))
- break;
- nr_events++;
- if (unlikely(req->flags & REQ_F_CQE_SKIP))
- continue;
- __io_fill_cqe_req(req, req->result, io_put_kbuf(req, 0));
- }
-
- if (unlikely(!nr_events))
- return 0;
-
- io_commit_cqring(ctx);
- io_cqring_ev_posted_iopoll(ctx);
- pos = start ? start->next : ctx->iopoll_list.first;
- wq_list_cut(&ctx->iopoll_list, prev, start);
- io_free_batch_list(ctx, pos);
- return nr_events;
-}
-
-/*
- * We can't just wait for polled events to come to us, we have to actively
- * find and complete them.
- */
-static __cold void io_iopoll_try_reap_events(struct io_ring_ctx *ctx)
-{
- if (!(ctx->flags & IORING_SETUP_IOPOLL))
- return;
-
- mutex_lock(&ctx->uring_lock);
- while (!wq_list_empty(&ctx->iopoll_list)) {
- /* let it sleep and repeat later if can't complete a request */
- if (io_do_iopoll(ctx, true) == 0)
- break;
- /*
- * Ensure we allow local-to-the-cpu processing to take place,
- * in this case we need to ensure that we reap all events.
- * Also let task_work, etc. to progress by releasing the mutex
- */
- if (need_resched()) {
- mutex_unlock(&ctx->uring_lock);
- cond_resched();
- mutex_lock(&ctx->uring_lock);
- }
- }
- mutex_unlock(&ctx->uring_lock);
-}
-
-static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
-{
- unsigned int nr_events = 0;
- int ret = 0;
-
- /*
- * We disallow the app entering submit/complete with polling, but we
- * still need to lock the ring to prevent racing with polled issue
- * that got punted to a workqueue.
- */
- mutex_lock(&ctx->uring_lock);
- /*
- * Don't enter poll loop if we already have events pending.
- * If we do, we can potentially be spinning for commands that
- * already triggered a CQE (eg in error).
- */
- if (test_bit(0, &ctx->check_cq_overflow))
- __io_cqring_overflow_flush(ctx, false);
- if (io_cqring_events(ctx))
- goto out;
- do {
- /*
- * If a submit got punted to a workqueue, we can have the
- * application entering polling for a command before it gets
- * issued. That app will hold the uring_lock for the duration
- * of the poll right here, so we need to take a breather every
- * now and then to ensure that the issue has a chance to add
- * the poll to the issued list. Otherwise we can spin here
- * forever, while the workqueue is stuck trying to acquire the
- * very same mutex.
- */
- if (wq_list_empty(&ctx->iopoll_list)) {
- u32 tail = ctx->cached_cq_tail;
-
- mutex_unlock(&ctx->uring_lock);
- io_run_task_work();
- mutex_lock(&ctx->uring_lock);
-
- /* some requests don't go through iopoll_list */
- if (tail != ctx->cached_cq_tail ||
- wq_list_empty(&ctx->iopoll_list))
- break;
- }
- ret = io_do_iopoll(ctx, !min);
- if (ret < 0)
- break;
- nr_events += ret;
- ret = 0;
- } while (nr_events < min && !need_resched());
-out:
- mutex_unlock(&ctx->uring_lock);
- return ret;
-}
-
-static void kiocb_end_write(struct io_kiocb *req)
-{
- /*
- * Tell lockdep we inherited freeze protection from submission
- * thread.
- */
- if (req->flags & REQ_F_ISREG) {
- struct super_block *sb = file_inode(req->file)->i_sb;
-
- __sb_writers_acquired(sb, SB_FREEZE_WRITE);
- sb_end_write(sb);
- }
-}
-
-#ifdef CONFIG_BLOCK
-static bool io_resubmit_prep(struct io_kiocb *req)
-{
- struct io_async_rw *rw = req->async_data;
-
- if (!req_has_async_data(req))
- return !io_req_prep_async(req);
- iov_iter_restore(&rw->s.iter, &rw->s.iter_state);
- return true;
-}
-
-static bool io_rw_should_reissue(struct io_kiocb *req)
-{
- umode_t mode = file_inode(req->file)->i_mode;
- struct io_ring_ctx *ctx = req->ctx;
-
- if (!S_ISBLK(mode) && !S_ISREG(mode))
- return false;
- if ((req->flags & REQ_F_NOWAIT) || (io_wq_current_is_worker() &&
- !(ctx->flags & IORING_SETUP_IOPOLL)))
- return false;
- /*
- * If ref is dying, we might be running poll reap from the exit work.
- * Don't attempt to reissue from that path, just let it fail with
- * -EAGAIN.
- */
- if (percpu_ref_is_dying(&ctx->refs))
- return false;
- /*
- * Play it safe and assume not safe to re-import and reissue if we're
- * not in the original thread group (or in task context).
- */
- if (!same_thread_group(req->task, current) || !in_task())
- return false;
- return true;
-}
-#else
-static bool io_resubmit_prep(struct io_kiocb *req)
-{
- return false;
-}
-static bool io_rw_should_reissue(struct io_kiocb *req)
-{
- return false;
-}
-#endif
-
-static bool __io_complete_rw_common(struct io_kiocb *req, long res)
-{
- if (req->rw.kiocb.ki_flags & IOCB_WRITE) {
- kiocb_end_write(req);
- fsnotify_modify(req->file);
- } else {
- fsnotify_access(req->file);
- }
- if (unlikely(res != req->result)) {
- if ((res == -EAGAIN || res == -EOPNOTSUPP) &&
- io_rw_should_reissue(req)) {
- req->flags |= REQ_F_REISSUE;
- return true;
- }
- req_set_fail(req);
- req->result = res;
- }
- return false;
-}
-
-static inline void io_req_task_complete(struct io_kiocb *req, bool *locked)
-{
- int res = req->result;
-
- if (*locked) {
- io_req_complete_state(req, res, io_put_kbuf(req, 0));
- io_req_add_compl_list(req);
- } else {
- io_req_complete_post(req, res,
- io_put_kbuf(req, IO_URING_F_UNLOCKED));
- }
-}
-
-static void __io_complete_rw(struct io_kiocb *req, long res,
- unsigned int issue_flags)
-{
- if (__io_complete_rw_common(req, res))
- return;
- __io_req_complete(req, issue_flags, req->result,
- io_put_kbuf(req, issue_flags));
-}
-
-static void io_complete_rw(struct kiocb *kiocb, long res)
-{
- struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw.kiocb);
-
- if (__io_complete_rw_common(req, res))
- return;
- req->result = res;
- req->io_task_work.func = io_req_task_complete;
- io_req_task_work_add(req, !!(req->ctx->flags & IORING_SETUP_SQPOLL));
-}
-
-static void io_complete_rw_iopoll(struct kiocb *kiocb, long res)
-{
- struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw.kiocb);
-
- if (kiocb->ki_flags & IOCB_WRITE)
- kiocb_end_write(req);
- if (unlikely(res != req->result)) {
- if (res == -EAGAIN && io_rw_should_reissue(req)) {
- req->flags |= REQ_F_REISSUE;
- return;
- }
- req->result = res;
- }
-
- /* order with io_iopoll_complete() checking ->iopoll_completed */
- smp_store_release(&req->iopoll_completed, 1);
-}
-
-/*
- * After the iocb has been issued, it's safe to be found on the poll list.
- * Adding the kiocb to the list AFTER submission ensures that we don't
- * find it from a io_do_iopoll() thread before the issuer is done
- * accessing the kiocb cookie.
- */
-static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_ring_ctx *ctx = req->ctx;
- const bool needs_lock = issue_flags & IO_URING_F_UNLOCKED;
-
- /* workqueue context doesn't hold uring_lock, grab it now */
- if (unlikely(needs_lock))
- mutex_lock(&ctx->uring_lock);
-
- /*
- * Track whether we have multiple files in our lists. This will impact
- * how we do polling eventually, not spinning if we're on potentially
- * different devices.
- */
- if (wq_list_empty(&ctx->iopoll_list)) {
- ctx->poll_multi_queue = false;
- } else if (!ctx->poll_multi_queue) {
- struct io_kiocb *list_req;
-
- list_req = container_of(ctx->iopoll_list.first, struct io_kiocb,
- comp_list);
- if (list_req->file != req->file)
- ctx->poll_multi_queue = true;
- }
-
- /*
- * For fast devices, IO may have already completed. If it has, add
- * it to the front so we find it first.
- */
- if (READ_ONCE(req->iopoll_completed))
- wq_list_add_head(&req->comp_list, &ctx->iopoll_list);
- else
- wq_list_add_tail(&req->comp_list, &ctx->iopoll_list);
-
- if (unlikely(needs_lock)) {
- /*
- * If IORING_SETUP_SQPOLL is enabled, sqes are either handle
- * in sq thread task context or in io worker task context. If
- * current task context is sq thread, we don't need to check
- * whether should wake up sq thread.
- */
- if ((ctx->flags & IORING_SETUP_SQPOLL) &&
- wq_has_sleeper(&ctx->sq_data->wait))
- wake_up(&ctx->sq_data->wait);
-
- mutex_unlock(&ctx->uring_lock);
- }
-}
-
-static bool io_bdev_nowait(struct block_device *bdev)
-{
- return !bdev || blk_queue_nowait(bdev_get_queue(bdev));
-}
-
-/*
- * If we tracked the file through the SCM inflight mechanism, we could support
- * any file. For now, just ensure that anything potentially problematic is done
- * inline.
- */
-static bool __io_file_supports_nowait(struct file *file, umode_t mode)
-{
- if (S_ISBLK(mode)) {
- if (IS_ENABLED(CONFIG_BLOCK) &&
- io_bdev_nowait(I_BDEV(file->f_mapping->host)))
- return true;
- return false;
- }
- if (S_ISSOCK(mode))
- return true;
- if (S_ISREG(mode)) {
- if (IS_ENABLED(CONFIG_BLOCK) &&
- io_bdev_nowait(file->f_inode->i_sb->s_bdev) &&
- file->f_op != &io_uring_fops)
- return true;
- return false;
- }
-
- /* any ->read/write should understand O_NONBLOCK */
- if (file->f_flags & O_NONBLOCK)
- return true;
- return file->f_mode & FMODE_NOWAIT;
-}
-
-/*
- * If we tracked the file through the SCM inflight mechanism, we could support
- * any file. For now, just ensure that anything potentially problematic is done
- * inline.
- */
-static unsigned int io_file_get_flags(struct file *file)
-{
- umode_t mode = file_inode(file)->i_mode;
- unsigned int res = 0;
-
- if (S_ISREG(mode))
- res |= FFS_ISREG;
- if (__io_file_supports_nowait(file, mode))
- res |= FFS_NOWAIT;
- return res;
-}
-
-static inline bool io_file_supports_nowait(struct io_kiocb *req)
-{
- return req->flags & REQ_F_SUPPORT_NOWAIT;
-}
-
-static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
- struct kiocb *kiocb = &req->rw.kiocb;
- unsigned ioprio;
- int ret;
-
- kiocb->ki_pos = READ_ONCE(sqe->off);
- /* used for fixed read/write too - just read unconditionally */
- req->buf_index = READ_ONCE(sqe->buf_index);
- req->imu = NULL;
-
- if (req->opcode == IORING_OP_READ_FIXED ||
- req->opcode == IORING_OP_WRITE_FIXED) {
- struct io_ring_ctx *ctx = req->ctx;
- u16 index;
-
- if (unlikely(req->buf_index >= ctx->nr_user_bufs))
- return -EFAULT;
- index = array_index_nospec(req->buf_index, ctx->nr_user_bufs);
- req->imu = ctx->user_bufs[index];
- io_req_set_rsrc_node(req, ctx, 0);
- }
-
- ioprio = READ_ONCE(sqe->ioprio);
- if (ioprio) {
- ret = ioprio_check_cap(ioprio);
- if (ret)
- return ret;
-
- kiocb->ki_ioprio = ioprio;
- } else {
- kiocb->ki_ioprio = get_current_ioprio();
- }
-
- req->rw.addr = READ_ONCE(sqe->addr);
- req->rw.len = READ_ONCE(sqe->len);
- req->rw.flags = READ_ONCE(sqe->rw_flags);
- return 0;
-}
-
-static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret)
-{
- switch (ret) {
- case -EIOCBQUEUED:
- break;
- case -ERESTARTSYS:
- case -ERESTARTNOINTR:
- case -ERESTARTNOHAND:
- case -ERESTART_RESTARTBLOCK:
- /*
- * We can't just restart the syscall, since previously
- * submitted sqes may already be in progress. Just fail this
- * IO with EINTR.
- */
- ret = -EINTR;
- fallthrough;
- default:
- kiocb->ki_complete(kiocb, ret);
- }
-}
-
-static inline loff_t *io_kiocb_update_pos(struct io_kiocb *req)
-{
- struct kiocb *kiocb = &req->rw.kiocb;
-
- if (kiocb->ki_pos != -1)
- return &kiocb->ki_pos;
-
- if (!(req->file->f_mode & FMODE_STREAM)) {
- req->flags |= REQ_F_CUR_POS;
- kiocb->ki_pos = req->file->f_pos;
- return &kiocb->ki_pos;
- }
-
- kiocb->ki_pos = 0;
- return NULL;
-}
-
-static void kiocb_done(struct io_kiocb *req, ssize_t ret,
- unsigned int issue_flags)
-{
- struct io_async_rw *io = req->async_data;
-
- /* add previously done IO, if any */
- if (req_has_async_data(req) && io->bytes_done > 0) {
- if (ret < 0)
- ret = io->bytes_done;
- else
- ret += io->bytes_done;
- }
-
- if (req->flags & REQ_F_CUR_POS)
- req->file->f_pos = req->rw.kiocb.ki_pos;
- if (ret >= 0 && (req->rw.kiocb.ki_complete == io_complete_rw))
- __io_complete_rw(req, ret, issue_flags);
- else
- io_rw_done(&req->rw.kiocb, ret);
-
- if (req->flags & REQ_F_REISSUE) {
- req->flags &= ~REQ_F_REISSUE;
- if (io_resubmit_prep(req))
- io_req_task_queue_reissue(req);
- else
- io_req_task_queue_fail(req, ret);
- }
-}
-
-static int __io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter,
- struct io_mapped_ubuf *imu)
-{
- size_t len = req->rw.len;
- u64 buf_end, buf_addr = req->rw.addr;
- size_t offset;
-
- if (unlikely(check_add_overflow(buf_addr, (u64)len, &buf_end)))
- return -EFAULT;
- /* not inside the mapped region */
- if (unlikely(buf_addr < imu->ubuf || buf_end > imu->ubuf_end))
- return -EFAULT;
-
- /*
- * May not be a start of buffer, set size appropriately
- * and advance us to the beginning.
- */
- offset = buf_addr - imu->ubuf;
- iov_iter_bvec(iter, rw, imu->bvec, imu->nr_bvecs, offset + len);
-
- if (offset) {
- /*
- * Don't use iov_iter_advance() here, as it's really slow for
- * using the latter parts of a big fixed buffer - it iterates
- * over each segment manually. We can cheat a bit here, because
- * we know that:
- *
- * 1) it's a BVEC iter, we set it up
- * 2) all bvecs are PAGE_SIZE in size, except potentially the
- * first and last bvec
- *
- * So just find our index, and adjust the iterator afterwards.
- * If the offset is within the first bvec (or the whole first
- * bvec, just use iov_iter_advance(). This makes it easier
- * since we can just skip the first segment, which may not
- * be PAGE_SIZE aligned.
- */
- const struct bio_vec *bvec = imu->bvec;
-
- if (offset <= bvec->bv_len) {
- iov_iter_advance(iter, offset);
- } else {
- unsigned long seg_skip;
-
- /* skip first vec */
- offset -= bvec->bv_len;
- seg_skip = 1 + (offset >> PAGE_SHIFT);
-
- iter->bvec = bvec + seg_skip;
- iter->nr_segs -= seg_skip;
- iter->count -= bvec->bv_len + offset;
- iter->iov_offset = offset & ~PAGE_MASK;
- }
- }
-
- return 0;
-}
-
-static int io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter,
- unsigned int issue_flags)
-{
- if (WARN_ON_ONCE(!req->imu))
- return -EFAULT;
- return __io_import_fixed(req, rw, iter, req->imu);
-}
-
-static void io_ring_submit_unlock(struct io_ring_ctx *ctx, bool needs_lock)
-{
- if (needs_lock)
- mutex_unlock(&ctx->uring_lock);
-}
-
-static void io_ring_submit_lock(struct io_ring_ctx *ctx, bool needs_lock)
-{
- /*
- * "Normal" inline submissions always hold the uring_lock, since we
- * grab it from the system call. Same is true for the SQPOLL offload.
- * The only exception is when we've detached the request and issue it
- * from an async worker thread, grab the lock for that case.
- */
- if (needs_lock)
- mutex_lock(&ctx->uring_lock);
-}
-
-static void io_buffer_add_list(struct io_ring_ctx *ctx,
- struct io_buffer_list *bl, unsigned int bgid)
-{
- struct list_head *list;
-
- list = &ctx->io_buffers[hash_32(bgid, IO_BUFFERS_HASH_BITS)];
- INIT_LIST_HEAD(&bl->buf_list);
- bl->bgid = bgid;
- list_add(&bl->list, list);
-}
-
-static struct io_buffer *io_buffer_select(struct io_kiocb *req, size_t *len,
- int bgid, unsigned int issue_flags)
-{
- struct io_buffer *kbuf = req->kbuf;
- bool needs_lock = issue_flags & IO_URING_F_UNLOCKED;
- struct io_ring_ctx *ctx = req->ctx;
- struct io_buffer_list *bl;
-
- if (req->flags & REQ_F_BUFFER_SELECTED)
- return kbuf;
-
- io_ring_submit_lock(ctx, needs_lock);
-
- lockdep_assert_held(&ctx->uring_lock);
-
- bl = io_buffer_get_list(ctx, bgid);
- if (bl && !list_empty(&bl->buf_list)) {
- kbuf = list_first_entry(&bl->buf_list, struct io_buffer, list);
- list_del(&kbuf->list);
- if (*len > kbuf->len)
- *len = kbuf->len;
- req->flags |= REQ_F_BUFFER_SELECTED;
- req->kbuf = kbuf;
- } else {
- kbuf = ERR_PTR(-ENOBUFS);
- }
-
- io_ring_submit_unlock(req->ctx, needs_lock);
- return kbuf;
-}
-
-static void __user *io_rw_buffer_select(struct io_kiocb *req, size_t *len,
- unsigned int issue_flags)
-{
- struct io_buffer *kbuf;
- u16 bgid;
-
- bgid = req->buf_index;
- kbuf = io_buffer_select(req, len, bgid, issue_flags);
- if (IS_ERR(kbuf))
- return kbuf;
- return u64_to_user_ptr(kbuf->addr);
-}
-
-#ifdef CONFIG_COMPAT
-static ssize_t io_compat_import(struct io_kiocb *req, struct iovec *iov,
- unsigned int issue_flags)
-{
- struct compat_iovec __user *uiov;
- compat_ssize_t clen;
- void __user *buf;
- ssize_t len;
-
- uiov = u64_to_user_ptr(req->rw.addr);
- if (!access_ok(uiov, sizeof(*uiov)))
- return -EFAULT;
- if (__get_user(clen, &uiov->iov_len))
- return -EFAULT;
- if (clen < 0)
- return -EINVAL;
-
- len = clen;
- buf = io_rw_buffer_select(req, &len, issue_flags);
- if (IS_ERR(buf))
- return PTR_ERR(buf);
- iov[0].iov_base = buf;
- iov[0].iov_len = (compat_size_t) len;
- return 0;
-}
-#endif
-
-static ssize_t __io_iov_buffer_select(struct io_kiocb *req, struct iovec *iov,
- unsigned int issue_flags)
-{
- struct iovec __user *uiov = u64_to_user_ptr(req->rw.addr);
- void __user *buf;
- ssize_t len;
-
- if (copy_from_user(iov, uiov, sizeof(*uiov)))
- return -EFAULT;
-
- len = iov[0].iov_len;
- if (len < 0)
- return -EINVAL;
- buf = io_rw_buffer_select(req, &len, issue_flags);
- if (IS_ERR(buf))
- return PTR_ERR(buf);
- iov[0].iov_base = buf;
- iov[0].iov_len = len;
- return 0;
-}
-
-static ssize_t io_iov_buffer_select(struct io_kiocb *req, struct iovec *iov,
- unsigned int issue_flags)
-{
- if (req->flags & REQ_F_BUFFER_SELECTED) {
- struct io_buffer *kbuf = req->kbuf;
-
- iov[0].iov_base = u64_to_user_ptr(kbuf->addr);
- iov[0].iov_len = kbuf->len;
- return 0;
- }
- if (req->rw.len != 1)
- return -EINVAL;
-
-#ifdef CONFIG_COMPAT
- if (req->ctx->compat)
- return io_compat_import(req, iov, issue_flags);
-#endif
-
- return __io_iov_buffer_select(req, iov, issue_flags);
-}
-
-static inline bool io_do_buffer_select(struct io_kiocb *req)
-{
- if (!(req->flags & REQ_F_BUFFER_SELECT))
- return false;
- return !(req->flags & REQ_F_BUFFER_SELECTED);
-}
-
-static struct iovec *__io_import_iovec(int rw, struct io_kiocb *req,
- struct io_rw_state *s,
- unsigned int issue_flags)
-{
- struct iov_iter *iter = &s->iter;
- u8 opcode = req->opcode;
- struct iovec *iovec;
- void __user *buf;
- size_t sqe_len;
- ssize_t ret;
-
- if (opcode == IORING_OP_READ_FIXED || opcode == IORING_OP_WRITE_FIXED) {
- ret = io_import_fixed(req, rw, iter, issue_flags);
- if (ret)
- return ERR_PTR(ret);
- return NULL;
- }
-
- /* buffer index only valid with fixed read/write, or buffer select */
- if (unlikely(req->buf_index && !(req->flags & REQ_F_BUFFER_SELECT)))
- return ERR_PTR(-EINVAL);
-
- buf = u64_to_user_ptr(req->rw.addr);
- sqe_len = req->rw.len;
-
- if (opcode == IORING_OP_READ || opcode == IORING_OP_WRITE) {
- if (req->flags & REQ_F_BUFFER_SELECT) {
- buf = io_rw_buffer_select(req, &sqe_len, issue_flags);
- if (IS_ERR(buf))
- return ERR_CAST(buf);
- req->rw.len = sqe_len;
- }
-
- ret = import_single_range(rw, buf, sqe_len, s->fast_iov, iter);
- if (ret)
- return ERR_PTR(ret);
- return NULL;
- }
-
- iovec = s->fast_iov;
- if (req->flags & REQ_F_BUFFER_SELECT) {
- ret = io_iov_buffer_select(req, iovec, issue_flags);
- if (ret)
- return ERR_PTR(ret);
- iov_iter_init(iter, rw, iovec, 1, iovec->iov_len);
- return NULL;
- }
-
- ret = __import_iovec(rw, buf, sqe_len, UIO_FASTIOV, &iovec, iter,
- req->ctx->compat);
- if (unlikely(ret < 0))
- return ERR_PTR(ret);
- return iovec;
-}
-
-static inline int io_import_iovec(int rw, struct io_kiocb *req,
- struct iovec **iovec, struct io_rw_state *s,
- unsigned int issue_flags)
-{
- *iovec = __io_import_iovec(rw, req, s, issue_flags);
- if (unlikely(IS_ERR(*iovec)))
- return PTR_ERR(*iovec);
-
- iov_iter_save_state(&s->iter, &s->iter_state);
- return 0;
-}
-
-static inline loff_t *io_kiocb_ppos(struct kiocb *kiocb)
-{
- return (kiocb->ki_filp->f_mode & FMODE_STREAM) ? NULL : &kiocb->ki_pos;
-}
-
-/*
- * For files that don't have ->read_iter() and ->write_iter(), handle them
- * by looping over ->read() or ->write() manually.
- */
-static ssize_t loop_rw_iter(int rw, struct io_kiocb *req, struct iov_iter *iter)
-{
- struct kiocb *kiocb = &req->rw.kiocb;
- struct file *file = req->file;
- ssize_t ret = 0;
- loff_t *ppos;
-
- /*
- * Don't support polled IO through this interface, and we can't
- * support non-blocking either. For the latter, this just causes
- * the kiocb to be handled from an async context.
- */
- if (kiocb->ki_flags & IOCB_HIPRI)
- return -EOPNOTSUPP;
- if ((kiocb->ki_flags & IOCB_NOWAIT) &&
- !(kiocb->ki_filp->f_flags & O_NONBLOCK))
- return -EAGAIN;
-
- ppos = io_kiocb_ppos(kiocb);
-
- while (iov_iter_count(iter)) {
- struct iovec iovec;
- ssize_t nr;
-
- if (!iov_iter_is_bvec(iter)) {
- iovec = iov_iter_iovec(iter);
- } else {
- iovec.iov_base = u64_to_user_ptr(req->rw.addr);
- iovec.iov_len = req->rw.len;
- }
-
- if (rw == READ) {
- nr = file->f_op->read(file, iovec.iov_base,
- iovec.iov_len, ppos);
- } else {
- nr = file->f_op->write(file, iovec.iov_base,
- iovec.iov_len, ppos);
- }
-
- if (nr < 0) {
- if (!ret)
- ret = nr;
- break;
- }
- ret += nr;
- if (!iov_iter_is_bvec(iter)) {
- iov_iter_advance(iter, nr);
- } else {
- req->rw.addr += nr;
- req->rw.len -= nr;
- if (!req->rw.len)
- break;
- }
- if (nr != iovec.iov_len)
- break;
- }
-
- return ret;
-}
-
-static void io_req_map_rw(struct io_kiocb *req, const struct iovec *iovec,
- const struct iovec *fast_iov, struct iov_iter *iter)
-{
- struct io_async_rw *rw = req->async_data;
-
- memcpy(&rw->s.iter, iter, sizeof(*iter));
- rw->free_iovec = iovec;
- rw->bytes_done = 0;
- /* can only be fixed buffers, no need to do anything */
- if (iov_iter_is_bvec(iter))
- return;
- if (!iovec) {
- unsigned iov_off = 0;
-
- rw->s.iter.iov = rw->s.fast_iov;
- if (iter->iov != fast_iov) {
- iov_off = iter->iov - fast_iov;
- rw->s.iter.iov += iov_off;
- }
- if (rw->s.fast_iov != fast_iov)
- memcpy(rw->s.fast_iov + iov_off, fast_iov + iov_off,
- sizeof(struct iovec) * iter->nr_segs);
- } else {
- req->flags |= REQ_F_NEED_CLEANUP;
- }
-}
-
-static inline bool io_alloc_async_data(struct io_kiocb *req)
-{
- WARN_ON_ONCE(!io_op_defs[req->opcode].async_size);
- req->async_data = kmalloc(io_op_defs[req->opcode].async_size, GFP_KERNEL);
- if (req->async_data) {
- req->flags |= REQ_F_ASYNC_DATA;
- return false;
- }
- return true;
-}
-
-static int io_setup_async_rw(struct io_kiocb *req, const struct iovec *iovec,
- struct io_rw_state *s, bool force)
-{
- if (!force && !io_op_defs[req->opcode].needs_async_setup)
- return 0;
- if (!req_has_async_data(req)) {
- struct io_async_rw *iorw;
-
- if (io_alloc_async_data(req)) {
- kfree(iovec);
- return -ENOMEM;
- }
-
- io_req_map_rw(req, iovec, s->fast_iov, &s->iter);
- iorw = req->async_data;
- /* we've copied and mapped the iter, ensure state is saved */
- iov_iter_save_state(&iorw->s.iter, &iorw->s.iter_state);
- }
- return 0;
-}
-
-static inline int io_rw_prep_async(struct io_kiocb *req, int rw)
-{
- struct io_async_rw *iorw = req->async_data;
- struct iovec *iov;
- int ret;
-
- /* submission path, ->uring_lock should already be taken */
- ret = io_import_iovec(rw, req, &iov, &iorw->s, 0);
- if (unlikely(ret < 0))
- return ret;
-
- iorw->bytes_done = 0;
- iorw->free_iovec = iov;
- if (iov)
- req->flags |= REQ_F_NEED_CLEANUP;
- return 0;
-}
-
-/*
- * This is our waitqueue callback handler, registered through __folio_lock_async()
- * when we initially tried to do the IO with the iocb armed our waitqueue.
- * This gets called when the page is unlocked, and we generally expect that to
- * happen when the page IO is completed and the page is now uptodate. This will
- * queue a task_work based retry of the operation, attempting to copy the data
- * again. If the latter fails because the page was NOT uptodate, then we will
- * do a thread based blocking retry of the operation. That's the unexpected
- * slow path.
- */
-static int io_async_buf_func(struct wait_queue_entry *wait, unsigned mode,
- int sync, void *arg)
-{
- struct wait_page_queue *wpq;
- struct io_kiocb *req = wait->private;
- struct wait_page_key *key = arg;
-
- wpq = container_of(wait, struct wait_page_queue, wait);
-
- if (!wake_page_match(wpq, key))
- return 0;
-
- req->rw.kiocb.ki_flags &= ~IOCB_WAITQ;
- list_del_init(&wait->entry);
- io_req_task_queue(req);
- return 1;
-}
-
-/*
- * This controls whether a given IO request should be armed for async page
- * based retry. If we return false here, the request is handed to the async
- * worker threads for retry. If we're doing buffered reads on a regular file,
- * we prepare a private wait_page_queue entry and retry the operation. This
- * will either succeed because the page is now uptodate and unlocked, or it
- * will register a callback when the page is unlocked at IO completion. Through
- * that callback, io_uring uses task_work to setup a retry of the operation.
- * That retry will attempt the buffered read again. The retry will generally
- * succeed, or in rare cases where it fails, we then fall back to using the
- * async worker threads for a blocking retry.
- */
-static bool io_rw_should_retry(struct io_kiocb *req)
-{
- struct io_async_rw *rw = req->async_data;
- struct wait_page_queue *wait = &rw->wpq;
- struct kiocb *kiocb = &req->rw.kiocb;
-
- /* never retry for NOWAIT, we just complete with -EAGAIN */
- if (req->flags & REQ_F_NOWAIT)
- return false;
-
- /* Only for buffered IO */
- if (kiocb->ki_flags & (IOCB_DIRECT | IOCB_HIPRI))
- return false;
-
- /*
- * just use poll if we can, and don't attempt if the fs doesn't
- * support callback based unlocks
- */
- if (file_can_poll(req->file) || !(req->file->f_mode & FMODE_BUF_RASYNC))
- return false;
-
- wait->wait.func = io_async_buf_func;
- wait->wait.private = req;
- wait->wait.flags = 0;
- INIT_LIST_HEAD(&wait->wait.entry);
- kiocb->ki_flags |= IOCB_WAITQ;
- kiocb->ki_flags &= ~IOCB_NOWAIT;
- kiocb->ki_waitq = wait;
- return true;
-}
-
-static inline int io_iter_do_read(struct io_kiocb *req, struct iov_iter *iter)
-{
- if (likely(req->file->f_op->read_iter))
- return call_read_iter(req->file, &req->rw.kiocb, iter);
- else if (req->file->f_op->read)
- return loop_rw_iter(READ, req, iter);
- else
- return -EINVAL;
-}
-
-static bool need_read_all(struct io_kiocb *req)
-{
- return req->flags & REQ_F_ISREG ||
- S_ISBLK(file_inode(req->file)->i_mode);
-}
-
-static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
-{
- struct kiocb *kiocb = &req->rw.kiocb;
- struct io_ring_ctx *ctx = req->ctx;
- struct file *file = req->file;
- int ret;
-
- if (unlikely(!file || !(file->f_mode & mode)))
- return -EBADF;
-
- if (!io_req_ffs_set(req))
- req->flags |= io_file_get_flags(file) << REQ_F_SUPPORT_NOWAIT_BIT;
-
- kiocb->ki_flags = iocb_flags(file);
- ret = kiocb_set_rw_flags(kiocb, req->rw.flags);
- if (unlikely(ret))
- return ret;
-
- /*
- * If the file is marked O_NONBLOCK, still allow retry for it if it
- * supports async. Otherwise it's impossible to use O_NONBLOCK files
- * reliably. If not, or it IOCB_NOWAIT is set, don't retry.
- */
- if ((kiocb->ki_flags & IOCB_NOWAIT) ||
- ((file->f_flags & O_NONBLOCK) && !io_file_supports_nowait(req)))
- req->flags |= REQ_F_NOWAIT;
-
- if (ctx->flags & IORING_SETUP_IOPOLL) {
- if (!(kiocb->ki_flags & IOCB_DIRECT) || !file->f_op->iopoll)
- return -EOPNOTSUPP;
-
- kiocb->private = NULL;
- kiocb->ki_flags |= IOCB_HIPRI | IOCB_ALLOC_CACHE;
- kiocb->ki_complete = io_complete_rw_iopoll;
- req->iopoll_completed = 0;
- } else {
- if (kiocb->ki_flags & IOCB_HIPRI)
- return -EINVAL;
- kiocb->ki_complete = io_complete_rw;
- }
-
- return 0;
-}
-
-static int io_read(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_rw_state __s, *s = &__s;
- struct iovec *iovec;
- struct kiocb *kiocb = &req->rw.kiocb;
- bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
- struct io_async_rw *rw;
- ssize_t ret, ret2;
- loff_t *ppos;
-
- if (!req_has_async_data(req)) {
- ret = io_import_iovec(READ, req, &iovec, s, issue_flags);
- if (unlikely(ret < 0))
- return ret;
- } else {
- rw = req->async_data;
- s = &rw->s;
-
- /*
- * Safe and required to re-import if we're using provided
- * buffers, as we dropped the selected one before retry.
- */
- if (io_do_buffer_select(req)) {
- ret = io_import_iovec(READ, req, &iovec, s, issue_flags);
- if (unlikely(ret < 0))
- return ret;
- }
-
- /*
- * We come here from an earlier attempt, restore our state to
- * match in case it doesn't. It's cheap enough that we don't
- * need to make this conditional.
- */
- iov_iter_restore(&s->iter, &s->iter_state);
- iovec = NULL;
- }
- ret = io_rw_init_file(req, FMODE_READ);
- if (unlikely(ret)) {
- kfree(iovec);
- return ret;
- }
- req->result = iov_iter_count(&s->iter);
-
- if (force_nonblock) {
- /* If the file doesn't support async, just async punt */
- if (unlikely(!io_file_supports_nowait(req))) {
- ret = io_setup_async_rw(req, iovec, s, true);
- return ret ?: -EAGAIN;
- }
- kiocb->ki_flags |= IOCB_NOWAIT;
- } else {
- /* Ensure we clear previously set non-block flag */
- kiocb->ki_flags &= ~IOCB_NOWAIT;
- }
-
- ppos = io_kiocb_update_pos(req);
-
- ret = rw_verify_area(READ, req->file, ppos, req->result);
- if (unlikely(ret)) {
- kfree(iovec);
- return ret;
- }
-
- ret = io_iter_do_read(req, &s->iter);
-
- if (ret == -EAGAIN || (req->flags & REQ_F_REISSUE)) {
- req->flags &= ~REQ_F_REISSUE;
- /* if we can poll, just do that */
- if (req->opcode == IORING_OP_READ && file_can_poll(req->file))
- return -EAGAIN;
- /* IOPOLL retry should happen for io-wq threads */
- if (!force_nonblock && !(req->ctx->flags & IORING_SETUP_IOPOLL))
- goto done;
- /* no retry on NONBLOCK nor RWF_NOWAIT */
- if (req->flags & REQ_F_NOWAIT)
- goto done;
- ret = 0;
- } else if (ret == -EIOCBQUEUED) {
- goto out_free;
- } else if (ret == req->result || ret <= 0 || !force_nonblock ||
- (req->flags & REQ_F_NOWAIT) || !need_read_all(req)) {
- /* read all, failed, already did sync or don't want to retry */
- goto done;
- }
-
- /*
- * Don't depend on the iter state matching what was consumed, or being
- * untouched in case of error. Restore it and we'll advance it
- * manually if we need to.
- */
- iov_iter_restore(&s->iter, &s->iter_state);
-
- ret2 = io_setup_async_rw(req, iovec, s, true);
- if (ret2)
- return ret2;
-
- iovec = NULL;
- rw = req->async_data;
- s = &rw->s;
- /*
- * Now use our persistent iterator and state, if we aren't already.
- * We've restored and mapped the iter to match.
- */
-
- do {
- /*
- * We end up here because of a partial read, either from
- * above or inside this loop. Advance the iter by the bytes
- * that were consumed.
- */
- iov_iter_advance(&s->iter, ret);
- if (!iov_iter_count(&s->iter))
- break;
- rw->bytes_done += ret;
- iov_iter_save_state(&s->iter, &s->iter_state);
-
- /* if we can retry, do so with the callbacks armed */
- if (!io_rw_should_retry(req)) {
- kiocb->ki_flags &= ~IOCB_WAITQ;
- return -EAGAIN;
- }
-
- /*
- * Now retry read with the IOCB_WAITQ parts set in the iocb. If
- * we get -EIOCBQUEUED, then we'll get a notification when the
- * desired page gets unlocked. We can also get a partial read
- * here, and if we do, then just retry at the new offset.
- */
- ret = io_iter_do_read(req, &s->iter);
- if (ret == -EIOCBQUEUED)
- return 0;
- /* we got some bytes, but not all. retry. */
- kiocb->ki_flags &= ~IOCB_WAITQ;
- iov_iter_restore(&s->iter, &s->iter_state);
- } while (ret > 0);
-done:
- kiocb_done(req, ret, issue_flags);
-out_free:
- /* it's faster to check here then delegate to kfree */
- if (iovec)
- kfree(iovec);
- return 0;
-}
-
-static int io_write(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_rw_state __s, *s = &__s;
- struct iovec *iovec;
- struct kiocb *kiocb = &req->rw.kiocb;
- bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
- ssize_t ret, ret2;
- loff_t *ppos;
-
- if (!req_has_async_data(req)) {
- ret = io_import_iovec(WRITE, req, &iovec, s, issue_flags);
- if (unlikely(ret < 0))
- return ret;
- } else {
- struct io_async_rw *rw = req->async_data;
-
- s = &rw->s;
- iov_iter_restore(&s->iter, &s->iter_state);
- iovec = NULL;
- }
- ret = io_rw_init_file(req, FMODE_WRITE);
- if (unlikely(ret)) {
- kfree(iovec);
- return ret;
- }
- req->result = iov_iter_count(&s->iter);
-
- if (force_nonblock) {
- /* If the file doesn't support async, just async punt */
- if (unlikely(!io_file_supports_nowait(req)))
- goto copy_iov;
-
- /* file path doesn't support NOWAIT for non-direct_IO */
- if (force_nonblock && !(kiocb->ki_flags & IOCB_DIRECT) &&
- (req->flags & REQ_F_ISREG))
- goto copy_iov;
-
- kiocb->ki_flags |= IOCB_NOWAIT;
- } else {
- /* Ensure we clear previously set non-block flag */
- kiocb->ki_flags &= ~IOCB_NOWAIT;
- }
-
- ppos = io_kiocb_update_pos(req);
-
- ret = rw_verify_area(WRITE, req->file, ppos, req->result);
- if (unlikely(ret))
- goto out_free;
-
- /*
- * Open-code file_start_write here to grab freeze protection,
- * which will be released by another thread in
- * io_complete_rw(). Fool lockdep by telling it the lock got
- * released so that it doesn't complain about the held lock when
- * we return to userspace.
- */
- if (req->flags & REQ_F_ISREG) {
- sb_start_write(file_inode(req->file)->i_sb);
- __sb_writers_release(file_inode(req->file)->i_sb,
- SB_FREEZE_WRITE);
- }
- kiocb->ki_flags |= IOCB_WRITE;
-
- if (likely(req->file->f_op->write_iter))
- ret2 = call_write_iter(req->file, kiocb, &s->iter);
- else if (req->file->f_op->write)
- ret2 = loop_rw_iter(WRITE, req, &s->iter);
- else
- ret2 = -EINVAL;
-
- if (req->flags & REQ_F_REISSUE) {
- req->flags &= ~REQ_F_REISSUE;
- ret2 = -EAGAIN;
- }
-
- /*
- * Raw bdev writes will return -EOPNOTSUPP for IOCB_NOWAIT. Just
- * retry them without IOCB_NOWAIT.
- */
- if (ret2 == -EOPNOTSUPP && (kiocb->ki_flags & IOCB_NOWAIT))
- ret2 = -EAGAIN;
- /* no retry on NONBLOCK nor RWF_NOWAIT */
- if (ret2 == -EAGAIN && (req->flags & REQ_F_NOWAIT))
- goto done;
- if (!force_nonblock || ret2 != -EAGAIN) {
- /* IOPOLL retry should happen for io-wq threads */
- if (ret2 == -EAGAIN && (req->ctx->flags & IORING_SETUP_IOPOLL))
- goto copy_iov;
-done:
- kiocb_done(req, ret2, issue_flags);
- } else {
-copy_iov:
- iov_iter_restore(&s->iter, &s->iter_state);
- ret = io_setup_async_rw(req, iovec, s, false);
- return ret ?: -EAGAIN;
- }
-out_free:
- /* it's reportedly faster than delegating the null check to kfree() */
- if (iovec)
- kfree(iovec);
- return ret;
-}
-
-static int io_renameat_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
-{
- struct io_rename *ren = &req->rename;
- const char __user *oldf, *newf;
-
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (sqe->ioprio || sqe->buf_index || sqe->splice_fd_in)
- return -EINVAL;
- if (unlikely(req->flags & REQ_F_FIXED_FILE))
- return -EBADF;
-
- ren->old_dfd = READ_ONCE(sqe->fd);
- oldf = u64_to_user_ptr(READ_ONCE(sqe->addr));
- newf = u64_to_user_ptr(READ_ONCE(sqe->addr2));
- ren->new_dfd = READ_ONCE(sqe->len);
- ren->flags = READ_ONCE(sqe->rename_flags);
-
- ren->oldpath = getname(oldf);
- if (IS_ERR(ren->oldpath))
- return PTR_ERR(ren->oldpath);
-
- ren->newpath = getname(newf);
- if (IS_ERR(ren->newpath)) {
- putname(ren->oldpath);
- return PTR_ERR(ren->newpath);
- }
-
- req->flags |= REQ_F_NEED_CLEANUP;
- return 0;
-}
-
-static int io_renameat(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_rename *ren = &req->rename;
- int ret;
-
- if (issue_flags & IO_URING_F_NONBLOCK)
- return -EAGAIN;
-
- ret = do_renameat2(ren->old_dfd, ren->oldpath, ren->new_dfd,
- ren->newpath, ren->flags);
-
- req->flags &= ~REQ_F_NEED_CLEANUP;
- if (ret < 0)
- req_set_fail(req);
- io_req_complete(req, ret);
- return 0;
-}
-
-static int io_unlinkat_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
-{
- struct io_unlink *un = &req->unlink;
- const char __user *fname;
-
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (sqe->ioprio || sqe->off || sqe->len || sqe->buf_index ||
- sqe->splice_fd_in)
- return -EINVAL;
- if (unlikely(req->flags & REQ_F_FIXED_FILE))
- return -EBADF;
-
- un->dfd = READ_ONCE(sqe->fd);
-
- un->flags = READ_ONCE(sqe->unlink_flags);
- if (un->flags & ~AT_REMOVEDIR)
- return -EINVAL;
-
- fname = u64_to_user_ptr(READ_ONCE(sqe->addr));
- un->filename = getname(fname);
- if (IS_ERR(un->filename))
- return PTR_ERR(un->filename);
-
- req->flags |= REQ_F_NEED_CLEANUP;
- return 0;
-}
-
-static int io_unlinkat(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_unlink *un = &req->unlink;
- int ret;
-
- if (issue_flags & IO_URING_F_NONBLOCK)
- return -EAGAIN;
-
- if (un->flags & AT_REMOVEDIR)
- ret = do_rmdir(un->dfd, un->filename);
- else
- ret = do_unlinkat(un->dfd, un->filename);
-
- req->flags &= ~REQ_F_NEED_CLEANUP;
- if (ret < 0)
- req_set_fail(req);
- io_req_complete(req, ret);
- return 0;
-}
-
-static int io_mkdirat_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
-{
- struct io_mkdir *mkd = &req->mkdir;
- const char __user *fname;
-
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (sqe->ioprio || sqe->off || sqe->rw_flags || sqe->buf_index ||
- sqe->splice_fd_in)
- return -EINVAL;
- if (unlikely(req->flags & REQ_F_FIXED_FILE))
- return -EBADF;
-
- mkd->dfd = READ_ONCE(sqe->fd);
- mkd->mode = READ_ONCE(sqe->len);
-
- fname = u64_to_user_ptr(READ_ONCE(sqe->addr));
- mkd->filename = getname(fname);
- if (IS_ERR(mkd->filename))
- return PTR_ERR(mkd->filename);
-
- req->flags |= REQ_F_NEED_CLEANUP;
- return 0;
-}
-
-static int io_mkdirat(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_mkdir *mkd = &req->mkdir;
- int ret;
-
- if (issue_flags & IO_URING_F_NONBLOCK)
- return -EAGAIN;
-
- ret = do_mkdirat(mkd->dfd, mkd->filename, mkd->mode);
-
- req->flags &= ~REQ_F_NEED_CLEANUP;
- if (ret < 0)
- req_set_fail(req);
- io_req_complete(req, ret);
- return 0;
-}
-
-static int io_symlinkat_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
-{
- struct io_symlink *sl = &req->symlink;
- const char __user *oldpath, *newpath;
-
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (sqe->ioprio || sqe->len || sqe->rw_flags || sqe->buf_index ||
- sqe->splice_fd_in)
- return -EINVAL;
- if (unlikely(req->flags & REQ_F_FIXED_FILE))
- return -EBADF;
-
- sl->new_dfd = READ_ONCE(sqe->fd);
- oldpath = u64_to_user_ptr(READ_ONCE(sqe->addr));
- newpath = u64_to_user_ptr(READ_ONCE(sqe->addr2));
-
- sl->oldpath = getname(oldpath);
- if (IS_ERR(sl->oldpath))
- return PTR_ERR(sl->oldpath);
-
- sl->newpath = getname(newpath);
- if (IS_ERR(sl->newpath)) {
- putname(sl->oldpath);
- return PTR_ERR(sl->newpath);
- }
-
- req->flags |= REQ_F_NEED_CLEANUP;
- return 0;
-}
-
-static int io_symlinkat(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_symlink *sl = &req->symlink;
- int ret;
-
- if (issue_flags & IO_URING_F_NONBLOCK)
- return -EAGAIN;
-
- ret = do_symlinkat(sl->oldpath, sl->new_dfd, sl->newpath);
-
- req->flags &= ~REQ_F_NEED_CLEANUP;
- if (ret < 0)
- req_set_fail(req);
- io_req_complete(req, ret);
- return 0;
-}
-
-static int io_linkat_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
-{
- struct io_hardlink *lnk = &req->hardlink;
- const char __user *oldf, *newf;
-
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (sqe->ioprio || sqe->rw_flags || sqe->buf_index || sqe->splice_fd_in)
- return -EINVAL;
- if (unlikely(req->flags & REQ_F_FIXED_FILE))
- return -EBADF;
-
- lnk->old_dfd = READ_ONCE(sqe->fd);
- lnk->new_dfd = READ_ONCE(sqe->len);
- oldf = u64_to_user_ptr(READ_ONCE(sqe->addr));
- newf = u64_to_user_ptr(READ_ONCE(sqe->addr2));
- lnk->flags = READ_ONCE(sqe->hardlink_flags);
-
- lnk->oldpath = getname(oldf);
- if (IS_ERR(lnk->oldpath))
- return PTR_ERR(lnk->oldpath);
-
- lnk->newpath = getname(newf);
- if (IS_ERR(lnk->newpath)) {
- putname(lnk->oldpath);
- return PTR_ERR(lnk->newpath);
- }
-
- req->flags |= REQ_F_NEED_CLEANUP;
- return 0;
-}
-
-static int io_linkat(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_hardlink *lnk = &req->hardlink;
- int ret;
-
- if (issue_flags & IO_URING_F_NONBLOCK)
- return -EAGAIN;
-
- ret = do_linkat(lnk->old_dfd, lnk->oldpath, lnk->new_dfd,
- lnk->newpath, lnk->flags);
-
- req->flags &= ~REQ_F_NEED_CLEANUP;
- if (ret < 0)
- req_set_fail(req);
- io_req_complete(req, ret);
- return 0;
-}
-
-static int io_shutdown_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
-{
-#if defined(CONFIG_NET)
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (unlikely(sqe->ioprio || sqe->off || sqe->addr || sqe->rw_flags ||
- sqe->buf_index || sqe->splice_fd_in))
- return -EINVAL;
-
- req->shutdown.how = READ_ONCE(sqe->len);
- return 0;
-#else
- return -EOPNOTSUPP;
-#endif
-}
-
-static int io_shutdown(struct io_kiocb *req, unsigned int issue_flags)
-{
-#if defined(CONFIG_NET)
- struct socket *sock;
- int ret;
-
- if (issue_flags & IO_URING_F_NONBLOCK)
- return -EAGAIN;
-
- sock = sock_from_file(req->file);
- if (unlikely(!sock))
- return -ENOTSOCK;
-
- ret = __sys_shutdown_sock(sock, req->shutdown.how);
- if (ret < 0)
- req_set_fail(req);
- io_req_complete(req, ret);
- return 0;
-#else
- return -EOPNOTSUPP;
-#endif
-}
-
-static int __io_splice_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
-{
- struct io_splice *sp = &req->splice;
- unsigned int valid_flags = SPLICE_F_FD_IN_FIXED | SPLICE_F_ALL;
-
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
-
- sp->len = READ_ONCE(sqe->len);
- sp->flags = READ_ONCE(sqe->splice_flags);
- if (unlikely(sp->flags & ~valid_flags))
- return -EINVAL;
- sp->splice_fd_in = READ_ONCE(sqe->splice_fd_in);
- return 0;
-}
-
-static int io_tee_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
-{
- if (READ_ONCE(sqe->splice_off_in) || READ_ONCE(sqe->off))
- return -EINVAL;
- return __io_splice_prep(req, sqe);
-}
-
-static int io_tee(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_splice *sp = &req->splice;
- struct file *out = sp->file_out;
- unsigned int flags = sp->flags & ~SPLICE_F_FD_IN_FIXED;
- struct file *in;
- long ret = 0;
-
- if (issue_flags & IO_URING_F_NONBLOCK)
- return -EAGAIN;
-
- if (sp->flags & SPLICE_F_FD_IN_FIXED)
- in = io_file_get_fixed(req, sp->splice_fd_in, issue_flags);
- else
- in = io_file_get_normal(req, sp->splice_fd_in);
- if (!in) {
- ret = -EBADF;
- goto done;
- }
-
- if (sp->len)
- ret = do_tee(in, out, sp->len, flags);
-
- if (!(sp->flags & SPLICE_F_FD_IN_FIXED))
- io_put_file(in);
-done:
- if (ret != sp->len)
- req_set_fail(req);
- io_req_complete(req, ret);
- return 0;
-}
-
-static int io_splice_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
- struct io_splice *sp = &req->splice;
-
- sp->off_in = READ_ONCE(sqe->splice_off_in);
- sp->off_out = READ_ONCE(sqe->off);
- return __io_splice_prep(req, sqe);
-}
-
-static int io_splice(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_splice *sp = &req->splice;
- struct file *out = sp->file_out;
- unsigned int flags = sp->flags & ~SPLICE_F_FD_IN_FIXED;
- loff_t *poff_in, *poff_out;
- struct file *in;
- long ret = 0;
-
- if (issue_flags & IO_URING_F_NONBLOCK)
- return -EAGAIN;
-
- if (sp->flags & SPLICE_F_FD_IN_FIXED)
- in = io_file_get_fixed(req, sp->splice_fd_in, issue_flags);
- else
- in = io_file_get_normal(req, sp->splice_fd_in);
- if (!in) {
- ret = -EBADF;
- goto done;
- }
-
- poff_in = (sp->off_in == -1) ? NULL : &sp->off_in;
- poff_out = (sp->off_out == -1) ? NULL : &sp->off_out;
-
- if (sp->len)
- ret = do_splice(in, poff_in, out, poff_out, sp->len, flags);
-
- if (!(sp->flags & SPLICE_F_FD_IN_FIXED))
- io_put_file(in);
-done:
- if (ret != sp->len)
- req_set_fail(req);
- io_req_complete(req, ret);
- return 0;
-}
-
-/*
- * IORING_OP_NOP just posts a completion event, nothing else.
- */
-static int io_nop(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_ring_ctx *ctx = req->ctx;
-
- if (unlikely(ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
-
- __io_req_complete(req, issue_flags, 0, 0);
- return 0;
-}
-
-static int io_msg_ring_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
-{
- if (unlikely(sqe->addr || sqe->ioprio || sqe->rw_flags ||
- sqe->splice_fd_in || sqe->buf_index || sqe->personality))
- return -EINVAL;
-
- req->msg.user_data = READ_ONCE(sqe->off);
- req->msg.len = READ_ONCE(sqe->len);
- return 0;
-}
-
-static int io_msg_ring(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_ring_ctx *target_ctx;
- struct io_msg *msg = &req->msg;
- bool filled;
- int ret;
-
- ret = -EBADFD;
- if (req->file->f_op != &io_uring_fops)
- goto done;
-
- ret = -EOVERFLOW;
- target_ctx = req->file->private_data;
-
- spin_lock(&target_ctx->completion_lock);
- filled = io_fill_cqe_aux(target_ctx, msg->user_data, msg->len, 0);
- io_commit_cqring(target_ctx);
- spin_unlock(&target_ctx->completion_lock);
-
- if (filled) {
- io_cqring_ev_posted(target_ctx);
- ret = 0;
- }
-
-done:
- if (ret < 0)
- req_set_fail(req);
- __io_req_complete(req, issue_flags, ret, 0);
- /* put file to avoid an attempt to IOPOLL the req */
- io_put_file(req->file);
- req->file = NULL;
- return 0;
-}
-
-static int io_fsync_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
- struct io_ring_ctx *ctx = req->ctx;
-
- if (unlikely(ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (unlikely(sqe->addr || sqe->ioprio || sqe->buf_index ||
- sqe->splice_fd_in))
- return -EINVAL;
-
- req->sync.flags = READ_ONCE(sqe->fsync_flags);
- if (unlikely(req->sync.flags & ~IORING_FSYNC_DATASYNC))
- return -EINVAL;
-
- req->sync.off = READ_ONCE(sqe->off);
- req->sync.len = READ_ONCE(sqe->len);
- return 0;
-}
-
-static int io_fsync(struct io_kiocb *req, unsigned int issue_flags)
-{
- loff_t end = req->sync.off + req->sync.len;
- int ret;
-
- /* fsync always requires a blocking context */
- if (issue_flags & IO_URING_F_NONBLOCK)
- return -EAGAIN;
-
- ret = vfs_fsync_range(req->file, req->sync.off,
- end > 0 ? end : LLONG_MAX,
- req->sync.flags & IORING_FSYNC_DATASYNC);
- if (ret < 0)
- req_set_fail(req);
- io_req_complete(req, ret);
- return 0;
-}
-
-static int io_fallocate_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
-{
- if (sqe->ioprio || sqe->buf_index || sqe->rw_flags ||
- sqe->splice_fd_in)
- return -EINVAL;
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
-
- req->sync.off = READ_ONCE(sqe->off);
- req->sync.len = READ_ONCE(sqe->addr);
- req->sync.mode = READ_ONCE(sqe->len);
- return 0;
-}
-
-static int io_fallocate(struct io_kiocb *req, unsigned int issue_flags)
-{
- int ret;
-
- /* fallocate always requiring blocking context */
- if (issue_flags & IO_URING_F_NONBLOCK)
- return -EAGAIN;
- ret = vfs_fallocate(req->file, req->sync.mode, req->sync.off,
- req->sync.len);
- if (ret < 0)
- req_set_fail(req);
- else
- fsnotify_modify(req->file);
- io_req_complete(req, ret);
- return 0;
-}
-
-static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
- const char __user *fname;
- int ret;
-
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (unlikely(sqe->ioprio || sqe->buf_index))
- return -EINVAL;
- if (unlikely(req->flags & REQ_F_FIXED_FILE))
- return -EBADF;
-
- /* open.how should be already initialised */
- if (!(req->open.how.flags & O_PATH) && force_o_largefile())
- req->open.how.flags |= O_LARGEFILE;
-
- req->open.dfd = READ_ONCE(sqe->fd);
- fname = u64_to_user_ptr(READ_ONCE(sqe->addr));
- req->open.filename = getname(fname);
- if (IS_ERR(req->open.filename)) {
- ret = PTR_ERR(req->open.filename);
- req->open.filename = NULL;
- return ret;
- }
-
- req->open.file_slot = READ_ONCE(sqe->file_index);
- if (req->open.file_slot && (req->open.how.flags & O_CLOEXEC))
- return -EINVAL;
-
- req->open.nofile = rlimit(RLIMIT_NOFILE);
- req->flags |= REQ_F_NEED_CLEANUP;
- return 0;
-}
-
-static int io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
- u64 mode = READ_ONCE(sqe->len);
- u64 flags = READ_ONCE(sqe->open_flags);
-
- req->open.how = build_open_how(flags, mode);
- return __io_openat_prep(req, sqe);
-}
-
-static int io_openat2_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
- struct open_how __user *how;
- size_t len;
- int ret;
-
- how = u64_to_user_ptr(READ_ONCE(sqe->addr2));
- len = READ_ONCE(sqe->len);
- if (len < OPEN_HOW_SIZE_VER0)
- return -EINVAL;
-
- ret = copy_struct_from_user(&req->open.how, sizeof(req->open.how), how,
- len);
- if (ret)
- return ret;
-
- return __io_openat_prep(req, sqe);
-}
-
-static int io_openat2(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct open_flags op;
- struct file *file;
- bool resolve_nonblock, nonblock_set;
- bool fixed = !!req->open.file_slot;
- int ret;
-
- ret = build_open_flags(&req->open.how, &op);
- if (ret)
- goto err;
- nonblock_set = op.open_flag & O_NONBLOCK;
- resolve_nonblock = req->open.how.resolve & RESOLVE_CACHED;
- if (issue_flags & IO_URING_F_NONBLOCK) {
- /*
- * Don't bother trying for O_TRUNC, O_CREAT, or O_TMPFILE open,
- * it'll always -EAGAIN
- */
- if (req->open.how.flags & (O_TRUNC | O_CREAT | O_TMPFILE))
- return -EAGAIN;
- op.lookup_flags |= LOOKUP_CACHED;
- op.open_flag |= O_NONBLOCK;
- }
-
- if (!fixed) {
- ret = __get_unused_fd_flags(req->open.how.flags, req->open.nofile);
- if (ret < 0)
- goto err;
- }
-
- file = do_filp_open(req->open.dfd, req->open.filename, &op);
- if (IS_ERR(file)) {
- /*
- * We could hang on to this 'fd' on retrying, but seems like
- * marginal gain for something that is now known to be a slower
- * path. So just put it, and we'll get a new one when we retry.
- */
- if (!fixed)
- put_unused_fd(ret);
-
- ret = PTR_ERR(file);
- /* only retry if RESOLVE_CACHED wasn't already set by application */
- if (ret == -EAGAIN &&
- (!resolve_nonblock && (issue_flags & IO_URING_F_NONBLOCK)))
- return -EAGAIN;
- goto err;
- }
-
- if ((issue_flags & IO_URING_F_NONBLOCK) && !nonblock_set)
- file->f_flags &= ~O_NONBLOCK;
- fsnotify_open(file);
-
- if (!fixed)
- fd_install(ret, file);
- else
- ret = io_install_fixed_file(req, file, issue_flags,
- req->open.file_slot - 1);
-err:
- putname(req->open.filename);
- req->flags &= ~REQ_F_NEED_CLEANUP;
- if (ret < 0)
- req_set_fail(req);
- __io_req_complete(req, issue_flags, ret, 0);
- return 0;
-}
-
-static int io_openat(struct io_kiocb *req, unsigned int issue_flags)
-{
- return io_openat2(req, issue_flags);
-}
-
-static int io_remove_buffers_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
-{
- struct io_provide_buf *p = &req->pbuf;
- u64 tmp;
-
- if (sqe->ioprio || sqe->rw_flags || sqe->addr || sqe->len || sqe->off ||
- sqe->splice_fd_in)
- return -EINVAL;
-
- tmp = READ_ONCE(sqe->fd);
- if (!tmp || tmp > USHRT_MAX)
- return -EINVAL;
-
- memset(p, 0, sizeof(*p));
- p->nbufs = tmp;
- p->bgid = READ_ONCE(sqe->buf_group);
- return 0;
-}
-
-static int __io_remove_buffers(struct io_ring_ctx *ctx,
- struct io_buffer_list *bl, unsigned nbufs)
-{
- unsigned i = 0;
-
- /* shouldn't happen */
- if (!nbufs)
- return 0;
-
- /* the head kbuf is the list itself */
- while (!list_empty(&bl->buf_list)) {
- struct io_buffer *nxt;
-
- nxt = list_first_entry(&bl->buf_list, struct io_buffer, list);
- list_del(&nxt->list);
- if (++i == nbufs)
- return i;
- cond_resched();
- }
- i++;
-
- return i;
-}
-
-static int io_remove_buffers(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_provide_buf *p = &req->pbuf;
- struct io_ring_ctx *ctx = req->ctx;
- struct io_buffer_list *bl;
- int ret = 0;
- bool needs_lock = issue_flags & IO_URING_F_UNLOCKED;
-
- io_ring_submit_lock(ctx, needs_lock);
-
- lockdep_assert_held(&ctx->uring_lock);
-
- ret = -ENOENT;
- bl = io_buffer_get_list(ctx, p->bgid);
- if (bl)
- ret = __io_remove_buffers(ctx, bl, p->nbufs);
- if (ret < 0)
- req_set_fail(req);
-
- /* complete before unlock, IOPOLL may need the lock */
- __io_req_complete(req, issue_flags, ret, 0);
- io_ring_submit_unlock(ctx, needs_lock);
- return 0;
-}
-
-static int io_provide_buffers_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
-{
- unsigned long size, tmp_check;
- struct io_provide_buf *p = &req->pbuf;
- u64 tmp;
-
- if (sqe->ioprio || sqe->rw_flags || sqe->splice_fd_in)
- return -EINVAL;
-
- tmp = READ_ONCE(sqe->fd);
- if (!tmp || tmp > USHRT_MAX)
- return -E2BIG;
- p->nbufs = tmp;
- p->addr = READ_ONCE(sqe->addr);
- p->len = READ_ONCE(sqe->len);
-
- if (check_mul_overflow((unsigned long)p->len, (unsigned long)p->nbufs,
- &size))
- return -EOVERFLOW;
- if (check_add_overflow((unsigned long)p->addr, size, &tmp_check))
- return -EOVERFLOW;
-
- size = (unsigned long)p->len * p->nbufs;
- if (!access_ok(u64_to_user_ptr(p->addr), size))
- return -EFAULT;
-
- p->bgid = READ_ONCE(sqe->buf_group);
- tmp = READ_ONCE(sqe->off);
- if (tmp > USHRT_MAX)
- return -E2BIG;
- p->bid = tmp;
- return 0;
-}
-
-static int io_refill_buffer_cache(struct io_ring_ctx *ctx)
-{
- struct io_buffer *buf;
- struct page *page;
- int bufs_in_page;
-
- /*
- * Completions that don't happen inline (eg not under uring_lock) will
- * add to ->io_buffers_comp. If we don't have any free buffers, check
- * the completion list and splice those entries first.
- */
- if (!list_empty_careful(&ctx->io_buffers_comp)) {
- spin_lock(&ctx->completion_lock);
- if (!list_empty(&ctx->io_buffers_comp)) {
- list_splice_init(&ctx->io_buffers_comp,
- &ctx->io_buffers_cache);
- spin_unlock(&ctx->completion_lock);
- return 0;
- }
- spin_unlock(&ctx->completion_lock);
- }
-
- /*
- * No free buffers and no completion entries either. Allocate a new
- * page worth of buffer entries and add those to our freelist.
- */
- page = alloc_page(GFP_KERNEL_ACCOUNT);
- if (!page)
- return -ENOMEM;
-
- list_add(&page->lru, &ctx->io_buffers_pages);
-
- buf = page_address(page);
- bufs_in_page = PAGE_SIZE / sizeof(*buf);
- while (bufs_in_page) {
- list_add_tail(&buf->list, &ctx->io_buffers_cache);
- buf++;
- bufs_in_page--;
- }
-
- return 0;
-}
-
-static int io_add_buffers(struct io_ring_ctx *ctx, struct io_provide_buf *pbuf,
- struct io_buffer_list *bl)
-{
- struct io_buffer *buf;
- u64 addr = pbuf->addr;
- int i, bid = pbuf->bid;
-
- for (i = 0; i < pbuf->nbufs; i++) {
- if (list_empty(&ctx->io_buffers_cache) &&
- io_refill_buffer_cache(ctx))
- break;
- buf = list_first_entry(&ctx->io_buffers_cache, struct io_buffer,
- list);
- list_move_tail(&buf->list, &bl->buf_list);
- buf->addr = addr;
- buf->len = min_t(__u32, pbuf->len, MAX_RW_COUNT);
- buf->bid = bid;
- buf->bgid = pbuf->bgid;
- addr += pbuf->len;
- bid++;
- cond_resched();
- }
-
- return i ? 0 : -ENOMEM;
-}
-
-static int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_provide_buf *p = &req->pbuf;
- struct io_ring_ctx *ctx = req->ctx;
- struct io_buffer_list *bl;
- int ret = 0;
- bool needs_lock = issue_flags & IO_URING_F_UNLOCKED;
-
- io_ring_submit_lock(ctx, needs_lock);
-
- lockdep_assert_held(&ctx->uring_lock);
-
- bl = io_buffer_get_list(ctx, p->bgid);
- if (unlikely(!bl)) {
- bl = kmalloc(sizeof(*bl), GFP_KERNEL);
- if (!bl) {
- ret = -ENOMEM;
- goto err;
- }
- io_buffer_add_list(ctx, bl, p->bgid);
- }
-
- ret = io_add_buffers(ctx, p, bl);
-err:
- if (ret < 0)
- req_set_fail(req);
- /* complete before unlock, IOPOLL may need the lock */
- __io_req_complete(req, issue_flags, ret, 0);
- io_ring_submit_unlock(ctx, needs_lock);
- return 0;
-}
-
-static int io_epoll_ctl_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
-{
-#if defined(CONFIG_EPOLL)
- if (sqe->ioprio || sqe->buf_index || sqe->splice_fd_in)
- return -EINVAL;
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
-
- req->epoll.epfd = READ_ONCE(sqe->fd);
- req->epoll.op = READ_ONCE(sqe->len);
- req->epoll.fd = READ_ONCE(sqe->off);
-
- if (ep_op_has_event(req->epoll.op)) {
- struct epoll_event __user *ev;
-
- ev = u64_to_user_ptr(READ_ONCE(sqe->addr));
- if (copy_from_user(&req->epoll.event, ev, sizeof(*ev)))
- return -EFAULT;
- }
-
- return 0;
-#else
- return -EOPNOTSUPP;
-#endif
-}
-
-static int io_epoll_ctl(struct io_kiocb *req, unsigned int issue_flags)
-{
-#if defined(CONFIG_EPOLL)
- struct io_epoll *ie = &req->epoll;
- int ret;
- bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
-
- ret = do_epoll_ctl(ie->epfd, ie->op, ie->fd, &ie->event, force_nonblock);
- if (force_nonblock && ret == -EAGAIN)
- return -EAGAIN;
-
- if (ret < 0)
- req_set_fail(req);
- __io_req_complete(req, issue_flags, ret, 0);
- return 0;
-#else
- return -EOPNOTSUPP;
-#endif
-}
-
-static int io_madvise_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
-#if defined(CONFIG_ADVISE_SYSCALLS) && defined(CONFIG_MMU)
- if (sqe->ioprio || sqe->buf_index || sqe->off || sqe->splice_fd_in)
- return -EINVAL;
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
-
- req->madvise.addr = READ_ONCE(sqe->addr);
- req->madvise.len = READ_ONCE(sqe->len);
- req->madvise.advice = READ_ONCE(sqe->fadvise_advice);
- return 0;
-#else
- return -EOPNOTSUPP;
-#endif
-}
-
-static int io_madvise(struct io_kiocb *req, unsigned int issue_flags)
-{
-#if defined(CONFIG_ADVISE_SYSCALLS) && defined(CONFIG_MMU)
- struct io_madvise *ma = &req->madvise;
- int ret;
-
- if (issue_flags & IO_URING_F_NONBLOCK)
- return -EAGAIN;
-
- ret = do_madvise(current->mm, ma->addr, ma->len, ma->advice);
- if (ret < 0)
- req_set_fail(req);
- io_req_complete(req, ret);
- return 0;
-#else
- return -EOPNOTSUPP;
-#endif
-}
-
-static int io_fadvise_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
- if (sqe->ioprio || sqe->buf_index || sqe->addr || sqe->splice_fd_in)
- return -EINVAL;
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
-
- req->fadvise.offset = READ_ONCE(sqe->off);
- req->fadvise.len = READ_ONCE(sqe->len);
- req->fadvise.advice = READ_ONCE(sqe->fadvise_advice);
- return 0;
-}
-
-static int io_fadvise(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_fadvise *fa = &req->fadvise;
- int ret;
-
- if (issue_flags & IO_URING_F_NONBLOCK) {
- switch (fa->advice) {
- case POSIX_FADV_NORMAL:
- case POSIX_FADV_RANDOM:
- case POSIX_FADV_SEQUENTIAL:
- break;
- default:
- return -EAGAIN;
- }
- }
-
- ret = vfs_fadvise(req->file, fa->offset, fa->len, fa->advice);
- if (ret < 0)
- req_set_fail(req);
- __io_req_complete(req, issue_flags, ret, 0);
- return 0;
-}
-
-static int io_statx_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
- const char __user *path;
-
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (sqe->ioprio || sqe->buf_index || sqe->splice_fd_in)
- return -EINVAL;
- if (req->flags & REQ_F_FIXED_FILE)
- return -EBADF;
-
- req->statx.dfd = READ_ONCE(sqe->fd);
- req->statx.mask = READ_ONCE(sqe->len);
- path = u64_to_user_ptr(READ_ONCE(sqe->addr));
- req->statx.buffer = u64_to_user_ptr(READ_ONCE(sqe->addr2));
- req->statx.flags = READ_ONCE(sqe->statx_flags);
-
- req->statx.filename = getname_flags(path,
- getname_statx_lookup_flags(req->statx.flags),
- NULL);
-
- if (IS_ERR(req->statx.filename)) {
- int ret = PTR_ERR(req->statx.filename);
-
- req->statx.filename = NULL;
- return ret;
- }
-
- req->flags |= REQ_F_NEED_CLEANUP;
- return 0;
-}
-
-static int io_statx(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_statx *ctx = &req->statx;
- int ret;
-
- if (issue_flags & IO_URING_F_NONBLOCK)
- return -EAGAIN;
-
- ret = do_statx(ctx->dfd, ctx->filename, ctx->flags, ctx->mask,
- ctx->buffer);
-
- if (ret < 0)
- req_set_fail(req);
- io_req_complete(req, ret);
- return 0;
-}
-
-static int io_close_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (sqe->ioprio || sqe->off || sqe->addr || sqe->len ||
- sqe->rw_flags || sqe->buf_index)
- return -EINVAL;
- if (req->flags & REQ_F_FIXED_FILE)
- return -EBADF;
-
- req->close.fd = READ_ONCE(sqe->fd);
- req->close.file_slot = READ_ONCE(sqe->file_index);
- if (req->close.file_slot && req->close.fd)
- return -EINVAL;
-
- return 0;
-}
-
-static int io_close(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct files_struct *files = current->files;
- struct io_close *close = &req->close;
- struct fdtable *fdt;
- struct file *file = NULL;
- int ret = -EBADF;
-
- if (req->close.file_slot) {
- ret = io_close_fixed(req, issue_flags);
- goto err;
- }
-
- spin_lock(&files->file_lock);
- fdt = files_fdtable(files);
- if (close->fd >= fdt->max_fds) {
- spin_unlock(&files->file_lock);
- goto err;
- }
- file = fdt->fd[close->fd];
- if (!file || file->f_op == &io_uring_fops) {
- spin_unlock(&files->file_lock);
- file = NULL;
- goto err;
- }
-
- /* if the file has a flush method, be safe and punt to async */
- if (file->f_op->flush && (issue_flags & IO_URING_F_NONBLOCK)) {
- spin_unlock(&files->file_lock);
- return -EAGAIN;
- }
-
- ret = __close_fd_get_file(close->fd, &file);
- spin_unlock(&files->file_lock);
- if (ret < 0) {
- if (ret == -ENOENT)
- ret = -EBADF;
- goto err;
- }
-
- /* No ->flush() or already async, safely close from here */
- ret = filp_close(file, current->files);
-err:
- if (ret < 0)
- req_set_fail(req);
- if (file)
- fput(file);
- __io_req_complete(req, issue_flags, ret, 0);
- return 0;
-}
-
-static int io_sfr_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
- struct io_ring_ctx *ctx = req->ctx;
-
- if (unlikely(ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (unlikely(sqe->addr || sqe->ioprio || sqe->buf_index ||
- sqe->splice_fd_in))
- return -EINVAL;
-
- req->sync.off = READ_ONCE(sqe->off);
- req->sync.len = READ_ONCE(sqe->len);
- req->sync.flags = READ_ONCE(sqe->sync_range_flags);
- return 0;
-}
-
-static int io_sync_file_range(struct io_kiocb *req, unsigned int issue_flags)
-{
- int ret;
-
- /* sync_file_range always requires a blocking context */
- if (issue_flags & IO_URING_F_NONBLOCK)
- return -EAGAIN;
-
- ret = sync_file_range(req->file, req->sync.off, req->sync.len,
- req->sync.flags);
- if (ret < 0)
- req_set_fail(req);
- io_req_complete(req, ret);
- return 0;
-}
-
-#if defined(CONFIG_NET)
-static int io_setup_async_msg(struct io_kiocb *req,
- struct io_async_msghdr *kmsg)
-{
- struct io_async_msghdr *async_msg = req->async_data;
-
- if (async_msg)
- return -EAGAIN;
- if (io_alloc_async_data(req)) {
- kfree(kmsg->free_iov);
- return -ENOMEM;
- }
- async_msg = req->async_data;
- req->flags |= REQ_F_NEED_CLEANUP;
- memcpy(async_msg, kmsg, sizeof(*kmsg));
- async_msg->msg.msg_name = &async_msg->addr;
- /* if were using fast_iov, set it to the new one */
- if (!async_msg->free_iov)
- async_msg->msg.msg_iter.iov = async_msg->fast_iov;
-
- return -EAGAIN;
-}
-
-static int io_sendmsg_copy_hdr(struct io_kiocb *req,
- struct io_async_msghdr *iomsg)
-{
- iomsg->msg.msg_name = &iomsg->addr;
- iomsg->free_iov = iomsg->fast_iov;
- return sendmsg_copy_msghdr(&iomsg->msg, req->sr_msg.umsg,
- req->sr_msg.msg_flags, &iomsg->free_iov);
-}
-
-static int io_sendmsg_prep_async(struct io_kiocb *req)
-{
- int ret;
-
- ret = io_sendmsg_copy_hdr(req, req->async_data);
- if (!ret)
- req->flags |= REQ_F_NEED_CLEANUP;
- return ret;
-}
-
-static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
- struct io_sr_msg *sr = &req->sr_msg;
-
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (unlikely(sqe->addr2 || sqe->file_index || sqe->ioprio))
- return -EINVAL;
-
- sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
- sr->len = READ_ONCE(sqe->len);
- sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
- if (sr->msg_flags & MSG_DONTWAIT)
- req->flags |= REQ_F_NOWAIT;
-
-#ifdef CONFIG_COMPAT
- if (req->ctx->compat)
- sr->msg_flags |= MSG_CMSG_COMPAT;
-#endif
- return 0;
-}
-
-static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_async_msghdr iomsg, *kmsg;
- struct socket *sock;
- unsigned flags;
- int min_ret = 0;
- int ret;
-
- sock = sock_from_file(req->file);
- if (unlikely(!sock))
- return -ENOTSOCK;
-
- if (req_has_async_data(req)) {
- kmsg = req->async_data;
- } else {
- ret = io_sendmsg_copy_hdr(req, &iomsg);
- if (ret)
- return ret;
- kmsg = &iomsg;
- }
-
- flags = req->sr_msg.msg_flags;
- if (issue_flags & IO_URING_F_NONBLOCK)
- flags |= MSG_DONTWAIT;
- if (flags & MSG_WAITALL)
- min_ret = iov_iter_count(&kmsg->msg.msg_iter);
-
- ret = __sys_sendmsg_sock(sock, &kmsg->msg, flags);
-
- if (ret < min_ret) {
- if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
- return io_setup_async_msg(req, kmsg);
- if (ret == -ERESTARTSYS)
- ret = -EINTR;
- req_set_fail(req);
- }
- /* fast path, check for non-NULL to avoid function call */
- if (kmsg->free_iov)
- kfree(kmsg->free_iov);
- req->flags &= ~REQ_F_NEED_CLEANUP;
- __io_req_complete(req, issue_flags, ret, 0);
- return 0;
-}
-
-static int io_send(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_sr_msg *sr = &req->sr_msg;
- struct msghdr msg;
- struct iovec iov;
- struct socket *sock;
- unsigned flags;
- int min_ret = 0;
- int ret;
-
- sock = sock_from_file(req->file);
- if (unlikely(!sock))
- return -ENOTSOCK;
-
- ret = import_single_range(WRITE, sr->buf, sr->len, &iov, &msg.msg_iter);
- if (unlikely(ret))
- return ret;
-
- msg.msg_name = NULL;
- msg.msg_control = NULL;
- msg.msg_controllen = 0;
- msg.msg_namelen = 0;
-
- flags = req->sr_msg.msg_flags;
- if (issue_flags & IO_URING_F_NONBLOCK)
- flags |= MSG_DONTWAIT;
- if (flags & MSG_WAITALL)
- min_ret = iov_iter_count(&msg.msg_iter);
-
- msg.msg_flags = flags;
- ret = sock_sendmsg(sock, &msg);
- if (ret < min_ret) {
- if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
- return -EAGAIN;
- if (ret == -ERESTARTSYS)
- ret = -EINTR;
- req_set_fail(req);
- }
- __io_req_complete(req, issue_flags, ret, 0);
- return 0;
-}
-
-static int __io_recvmsg_copy_hdr(struct io_kiocb *req,
- struct io_async_msghdr *iomsg)
-{
- struct io_sr_msg *sr = &req->sr_msg;
- struct iovec __user *uiov;
- size_t iov_len;
- int ret;
-
- ret = __copy_msghdr_from_user(&iomsg->msg, sr->umsg,
- &iomsg->uaddr, &uiov, &iov_len);
- if (ret)
- return ret;
-
- if (req->flags & REQ_F_BUFFER_SELECT) {
- if (iov_len > 1)
- return -EINVAL;
- if (copy_from_user(iomsg->fast_iov, uiov, sizeof(*uiov)))
- return -EFAULT;
- sr->len = iomsg->fast_iov[0].iov_len;
- iomsg->free_iov = NULL;
- } else {
- iomsg->free_iov = iomsg->fast_iov;
- ret = __import_iovec(READ, uiov, iov_len, UIO_FASTIOV,
- &iomsg->free_iov, &iomsg->msg.msg_iter,
- false);
- if (ret > 0)
- ret = 0;
- }
-
- return ret;
-}
-
-#ifdef CONFIG_COMPAT
-static int __io_compat_recvmsg_copy_hdr(struct io_kiocb *req,
- struct io_async_msghdr *iomsg)
-{
- struct io_sr_msg *sr = &req->sr_msg;
- struct compat_iovec __user *uiov;
- compat_uptr_t ptr;
- compat_size_t len;
- int ret;
-
- ret = __get_compat_msghdr(&iomsg->msg, sr->umsg_compat, &iomsg->uaddr,
- &ptr, &len);
- if (ret)
- return ret;
-
- uiov = compat_ptr(ptr);
- if (req->flags & REQ_F_BUFFER_SELECT) {
- compat_ssize_t clen;
-
- if (len > 1)
- return -EINVAL;
- if (!access_ok(uiov, sizeof(*uiov)))
- return -EFAULT;
- if (__get_user(clen, &uiov->iov_len))
- return -EFAULT;
- if (clen < 0)
- return -EINVAL;
- sr->len = clen;
- iomsg->free_iov = NULL;
- } else {
- iomsg->free_iov = iomsg->fast_iov;
- ret = __import_iovec(READ, (struct iovec __user *)uiov, len,
- UIO_FASTIOV, &iomsg->free_iov,
- &iomsg->msg.msg_iter, true);
- if (ret < 0)
- return ret;
- }
-
- return 0;
-}
-#endif
-
-static int io_recvmsg_copy_hdr(struct io_kiocb *req,
- struct io_async_msghdr *iomsg)
-{
- iomsg->msg.msg_name = &iomsg->addr;
-
-#ifdef CONFIG_COMPAT
- if (req->ctx->compat)
- return __io_compat_recvmsg_copy_hdr(req, iomsg);
-#endif
-
- return __io_recvmsg_copy_hdr(req, iomsg);
-}
-
-static struct io_buffer *io_recv_buffer_select(struct io_kiocb *req,
- unsigned int issue_flags)
-{
- struct io_sr_msg *sr = &req->sr_msg;
-
- return io_buffer_select(req, &sr->len, sr->bgid, issue_flags);
-}
-
-static int io_recvmsg_prep_async(struct io_kiocb *req)
-{
- int ret;
-
- ret = io_recvmsg_copy_hdr(req, req->async_data);
- if (!ret)
- req->flags |= REQ_F_NEED_CLEANUP;
- return ret;
-}
-
-static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
- struct io_sr_msg *sr = &req->sr_msg;
-
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (unlikely(sqe->addr2 || sqe->file_index || sqe->ioprio))
- return -EINVAL;
-
- sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
- sr->len = READ_ONCE(sqe->len);
- sr->bgid = READ_ONCE(sqe->buf_group);
- sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
- if (sr->msg_flags & MSG_DONTWAIT)
- req->flags |= REQ_F_NOWAIT;
-
-#ifdef CONFIG_COMPAT
- if (req->ctx->compat)
- sr->msg_flags |= MSG_CMSG_COMPAT;
-#endif
- sr->done_io = 0;
- return 0;
-}
-
-static bool io_net_retry(struct socket *sock, int flags)
-{
- if (!(flags & MSG_WAITALL))
- return false;
- return sock->type == SOCK_STREAM || sock->type == SOCK_SEQPACKET;
-}
-
-static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_async_msghdr iomsg, *kmsg;
- struct io_sr_msg *sr = &req->sr_msg;
- struct socket *sock;
- struct io_buffer *kbuf;
- unsigned flags;
- int ret, min_ret = 0;
- bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
-
- sock = sock_from_file(req->file);
- if (unlikely(!sock))
- return -ENOTSOCK;
-
- if (req_has_async_data(req)) {
- kmsg = req->async_data;
- } else {
- ret = io_recvmsg_copy_hdr(req, &iomsg);
- if (ret)
- return ret;
- kmsg = &iomsg;
- }
-
- if (req->flags & REQ_F_BUFFER_SELECT) {
- kbuf = io_recv_buffer_select(req, issue_flags);
- if (IS_ERR(kbuf))
- return PTR_ERR(kbuf);
- kmsg->fast_iov[0].iov_base = u64_to_user_ptr(kbuf->addr);
- kmsg->fast_iov[0].iov_len = req->sr_msg.len;
- iov_iter_init(&kmsg->msg.msg_iter, READ, kmsg->fast_iov,
- 1, req->sr_msg.len);
- }
-
- flags = req->sr_msg.msg_flags;
- if (force_nonblock)
- flags |= MSG_DONTWAIT;
- if (flags & MSG_WAITALL)
- min_ret = iov_iter_count(&kmsg->msg.msg_iter);
-
- ret = __sys_recvmsg_sock(sock, &kmsg->msg, req->sr_msg.umsg,
- kmsg->uaddr, flags);
- if (ret < min_ret) {
- if (ret == -EAGAIN && force_nonblock)
- return io_setup_async_msg(req, kmsg);
- if (ret == -ERESTARTSYS)
- ret = -EINTR;
- if (ret > 0 && io_net_retry(sock, flags)) {
- sr->done_io += ret;
- req->flags |= REQ_F_PARTIAL_IO;
- return io_setup_async_msg(req, kmsg);
- }
- req_set_fail(req);
- } else if ((flags & MSG_WAITALL) && (kmsg->msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC))) {
- req_set_fail(req);
- }
-
- /* fast path, check for non-NULL to avoid function call */
- if (kmsg->free_iov)
- kfree(kmsg->free_iov);
- req->flags &= ~REQ_F_NEED_CLEANUP;
- if (ret >= 0)
- ret += sr->done_io;
- else if (sr->done_io)
- ret = sr->done_io;
- __io_req_complete(req, issue_flags, ret, io_put_kbuf(req, issue_flags));
- return 0;
-}
-
-static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_buffer *kbuf;
- struct io_sr_msg *sr = &req->sr_msg;
- struct msghdr msg;
- void __user *buf = sr->buf;
- struct socket *sock;
- struct iovec iov;
- unsigned flags;
- int ret, min_ret = 0;
- bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
-
- sock = sock_from_file(req->file);
- if (unlikely(!sock))
- return -ENOTSOCK;
-
- if (req->flags & REQ_F_BUFFER_SELECT) {
- kbuf = io_recv_buffer_select(req, issue_flags);
- if (IS_ERR(kbuf))
- return PTR_ERR(kbuf);
- buf = u64_to_user_ptr(kbuf->addr);
- }
-
- ret = import_single_range(READ, buf, sr->len, &iov, &msg.msg_iter);
- if (unlikely(ret))
- goto out_free;
-
- msg.msg_name = NULL;
- msg.msg_control = NULL;
- msg.msg_controllen = 0;
- msg.msg_namelen = 0;
- msg.msg_iocb = NULL;
- msg.msg_flags = 0;
-
- flags = req->sr_msg.msg_flags;
- if (force_nonblock)
- flags |= MSG_DONTWAIT;
- if (flags & MSG_WAITALL)
- min_ret = iov_iter_count(&msg.msg_iter);
-
- ret = sock_recvmsg(sock, &msg, flags);
- if (ret < min_ret) {
- if (ret == -EAGAIN && force_nonblock)
- return -EAGAIN;
- if (ret == -ERESTARTSYS)
- ret = -EINTR;
- if (ret > 0 && io_net_retry(sock, flags)) {
- sr->len -= ret;
- sr->buf += ret;
- sr->done_io += ret;
- req->flags |= REQ_F_PARTIAL_IO;
- return -EAGAIN;
- }
- req_set_fail(req);
- } else if ((flags & MSG_WAITALL) && (msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC))) {
-out_free:
- req_set_fail(req);
- }
-
- if (ret >= 0)
- ret += sr->done_io;
- else if (sr->done_io)
- ret = sr->done_io;
- __io_req_complete(req, issue_flags, ret, io_put_kbuf(req, issue_flags));
- return 0;
-}
-
-static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
- struct io_accept *accept = &req->accept;
-
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (sqe->ioprio || sqe->len || sqe->buf_index)
- return -EINVAL;
-
- accept->addr = u64_to_user_ptr(READ_ONCE(sqe->addr));
- accept->addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr2));
- accept->flags = READ_ONCE(sqe->accept_flags);
- accept->nofile = rlimit(RLIMIT_NOFILE);
-
- accept->file_slot = READ_ONCE(sqe->file_index);
- if (accept->file_slot && (accept->flags & SOCK_CLOEXEC))
- return -EINVAL;
- if (accept->flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK))
- return -EINVAL;
- if (SOCK_NONBLOCK != O_NONBLOCK && (accept->flags & SOCK_NONBLOCK))
- accept->flags = (accept->flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
- return 0;
-}
-
-static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_accept *accept = &req->accept;
- bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
- unsigned int file_flags = force_nonblock ? O_NONBLOCK : 0;
- bool fixed = !!accept->file_slot;
- struct file *file;
- int ret, fd;
-
- if (!fixed) {
- fd = __get_unused_fd_flags(accept->flags, accept->nofile);
- if (unlikely(fd < 0))
- return fd;
- }
- file = do_accept(req->file, file_flags, accept->addr, accept->addr_len,
- accept->flags);
- if (IS_ERR(file)) {
- if (!fixed)
- put_unused_fd(fd);
- ret = PTR_ERR(file);
- if (ret == -EAGAIN && force_nonblock)
- return -EAGAIN;
- if (ret == -ERESTARTSYS)
- ret = -EINTR;
- req_set_fail(req);
- } else if (!fixed) {
- fd_install(fd, file);
- ret = fd;
- } else {
- ret = io_install_fixed_file(req, file, issue_flags,
- accept->file_slot - 1);
- }
- __io_req_complete(req, issue_flags, ret, 0);
- return 0;
-}
-
-static int io_connect_prep_async(struct io_kiocb *req)
-{
- struct io_async_connect *io = req->async_data;
- struct io_connect *conn = &req->connect;
-
- return move_addr_to_kernel(conn->addr, conn->addr_len, &io->address);
-}
-
-static int io_connect_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
- struct io_connect *conn = &req->connect;
-
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (sqe->ioprio || sqe->len || sqe->buf_index || sqe->rw_flags ||
- sqe->splice_fd_in)
- return -EINVAL;
-
- conn->addr = u64_to_user_ptr(READ_ONCE(sqe->addr));
- conn->addr_len = READ_ONCE(sqe->addr2);
- return 0;
-}
-
-static int io_connect(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_async_connect __io, *io;
- unsigned file_flags;
- int ret;
- bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
-
- if (req_has_async_data(req)) {
- io = req->async_data;
- } else {
- ret = move_addr_to_kernel(req->connect.addr,
- req->connect.addr_len,
- &__io.address);
- if (ret)
- goto out;
- io = &__io;
- }
-
- file_flags = force_nonblock ? O_NONBLOCK : 0;
-
- ret = __sys_connect_file(req->file, &io->address,
- req->connect.addr_len, file_flags);
- if ((ret == -EAGAIN || ret == -EINPROGRESS) && force_nonblock) {
- if (req_has_async_data(req))
- return -EAGAIN;
- if (io_alloc_async_data(req)) {
- ret = -ENOMEM;
- goto out;
- }
- memcpy(req->async_data, &__io, sizeof(__io));
- return -EAGAIN;
- }
- if (ret == -ERESTARTSYS)
- ret = -EINTR;
-out:
- if (ret < 0)
- req_set_fail(req);
- __io_req_complete(req, issue_flags, ret, 0);
- return 0;
-}
-#else /* !CONFIG_NET */
-#define IO_NETOP_FN(op) \
-static int io_##op(struct io_kiocb *req, unsigned int issue_flags) \
-{ \
- return -EOPNOTSUPP; \
-}
-
-#define IO_NETOP_PREP(op) \
-IO_NETOP_FN(op) \
-static int io_##op##_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) \
-{ \
- return -EOPNOTSUPP; \
-} \
-
-#define IO_NETOP_PREP_ASYNC(op) \
-IO_NETOP_PREP(op) \
-static int io_##op##_prep_async(struct io_kiocb *req) \
-{ \
- return -EOPNOTSUPP; \
-}
-
-IO_NETOP_PREP_ASYNC(sendmsg);
-IO_NETOP_PREP_ASYNC(recvmsg);
-IO_NETOP_PREP_ASYNC(connect);
-IO_NETOP_PREP(accept);
-IO_NETOP_FN(send);
-IO_NETOP_FN(recv);
-#endif /* CONFIG_NET */
-
-struct io_poll_table {
- struct poll_table_struct pt;
- struct io_kiocb *req;
- int nr_entries;
- int error;
-};
-
-#define IO_POLL_CANCEL_FLAG BIT(31)
-#define IO_POLL_REF_MASK GENMASK(30, 0)
-
-/*
- * If refs part of ->poll_refs (see IO_POLL_REF_MASK) is 0, it's free. We can
- * bump it and acquire ownership. It's disallowed to modify requests while not
- * owning it, that prevents from races for enqueueing task_work's and b/w
- * arming poll and wakeups.
- */
-static inline bool io_poll_get_ownership(struct io_kiocb *req)
-{
- return !(atomic_fetch_inc(&req->poll_refs) & IO_POLL_REF_MASK);
-}
-
-static void io_poll_mark_cancelled(struct io_kiocb *req)
-{
- atomic_or(IO_POLL_CANCEL_FLAG, &req->poll_refs);
-}
-
-static struct io_poll_iocb *io_poll_get_double(struct io_kiocb *req)
-{
- /* pure poll stashes this in ->async_data, poll driven retry elsewhere */
- if (req->opcode == IORING_OP_POLL_ADD)
- return req->async_data;
- return req->apoll->double_poll;
-}
-
-static struct io_poll_iocb *io_poll_get_single(struct io_kiocb *req)
-{
- if (req->opcode == IORING_OP_POLL_ADD)
- return &req->poll;
- return &req->apoll->poll;
-}
-
-static void io_poll_req_insert(struct io_kiocb *req)
-{
- struct io_ring_ctx *ctx = req->ctx;
- struct hlist_head *list;
-
- list = &ctx->cancel_hash[hash_long(req->user_data, ctx->cancel_hash_bits)];
- hlist_add_head(&req->hash_node, list);
-}
-
-static void io_init_poll_iocb(struct io_poll_iocb *poll, __poll_t events,
- wait_queue_func_t wake_func)
-{
- poll->head = NULL;
-#define IO_POLL_UNMASK (EPOLLERR|EPOLLHUP|EPOLLNVAL|EPOLLRDHUP)
- /* mask in events that we always want/need */
- poll->events = events | IO_POLL_UNMASK;
- INIT_LIST_HEAD(&poll->wait.entry);
- init_waitqueue_func_entry(&poll->wait, wake_func);
-}
-
-static inline void io_poll_remove_entry(struct io_poll_iocb *poll)
-{
- struct wait_queue_head *head = smp_load_acquire(&poll->head);
-
- if (head) {
- spin_lock_irq(&head->lock);
- list_del_init(&poll->wait.entry);
- poll->head = NULL;
- spin_unlock_irq(&head->lock);
- }
-}
-
-static void io_poll_remove_entries(struct io_kiocb *req)
-{
- /*
- * Nothing to do if neither of those flags are set. Avoid dipping
- * into the poll/apoll/double cachelines if we can.
- */
- if (!(req->flags & (REQ_F_SINGLE_POLL | REQ_F_DOUBLE_POLL)))
- return;
-
- /*
- * While we hold the waitqueue lock and the waitqueue is nonempty,
- * wake_up_pollfree() will wait for us. However, taking the waitqueue
- * lock in the first place can race with the waitqueue being freed.
- *
- * We solve this as eventpoll does: by taking advantage of the fact that
- * all users of wake_up_pollfree() will RCU-delay the actual free. If
- * we enter rcu_read_lock() and see that the pointer to the queue is
- * non-NULL, we can then lock it without the memory being freed out from
- * under us.
- *
- * Keep holding rcu_read_lock() as long as we hold the queue lock, in
- * case the caller deletes the entry from the queue, leaving it empty.
- * In that case, only RCU prevents the queue memory from being freed.
- */
- rcu_read_lock();
- if (req->flags & REQ_F_SINGLE_POLL)
- io_poll_remove_entry(io_poll_get_single(req));
- if (req->flags & REQ_F_DOUBLE_POLL)
- io_poll_remove_entry(io_poll_get_double(req));
- rcu_read_unlock();
-}
-
-/*
- * All poll tw should go through this. Checks for poll events, manages
- * references, does rewait, etc.
- *
- * Returns a negative error on failure. >0 when no action require, which is
- * either spurious wakeup or multishot CQE is served. 0 when it's done with
- * the request, then the mask is stored in req->result.
- */
-static int io_poll_check_events(struct io_kiocb *req, bool locked)
-{
- struct io_ring_ctx *ctx = req->ctx;
- int v;
-
- /* req->task == current here, checking PF_EXITING is safe */
- if (unlikely(req->task->flags & PF_EXITING))
- io_poll_mark_cancelled(req);
-
- do {
- v = atomic_read(&req->poll_refs);
-
- /* tw handler should be the owner, and so have some references */
- if (WARN_ON_ONCE(!(v & IO_POLL_REF_MASK)))
- return 0;
- if (v & IO_POLL_CANCEL_FLAG)
- return -ECANCELED;
-
- if (!req->result) {
- struct poll_table_struct pt = { ._key = req->apoll_events };
- req->result = vfs_poll(req->file, &pt) & req->apoll_events;
- }
-
- /* multishot, just fill an CQE and proceed */
- if (req->result && !(req->apoll_events & EPOLLONESHOT)) {
- __poll_t mask = mangle_poll(req->result & req->apoll_events);
- bool filled;
-
- spin_lock(&ctx->completion_lock);
- filled = io_fill_cqe_aux(ctx, req->user_data, mask,
- IORING_CQE_F_MORE);
- io_commit_cqring(ctx);
- spin_unlock(&ctx->completion_lock);
- if (unlikely(!filled))
- return -ECANCELED;
- io_cqring_ev_posted(ctx);
- } else if (req->result) {
- return 0;
- }
-
- /*
- * Release all references, retry if someone tried to restart
- * task_work while we were executing it.
- */
- } while (atomic_sub_return(v & IO_POLL_REF_MASK, &req->poll_refs));
-
- return 1;
-}
-
-static void io_poll_task_func(struct io_kiocb *req, bool *locked)
-{
- struct io_ring_ctx *ctx = req->ctx;
- int ret;
-
- ret = io_poll_check_events(req, *locked);
- if (ret > 0)
- return;
-
- if (!ret) {
- req->result = mangle_poll(req->result & req->poll.events);
- } else {
- req->result = ret;
- req_set_fail(req);
- }
-
- io_poll_remove_entries(req);
- spin_lock(&ctx->completion_lock);
- hash_del(&req->hash_node);
- __io_req_complete_post(req, req->result, 0);
- io_commit_cqring(ctx);
- spin_unlock(&ctx->completion_lock);
- io_cqring_ev_posted(ctx);
-}
-
-static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
-{
- struct io_ring_ctx *ctx = req->ctx;
- int ret;
-
- ret = io_poll_check_events(req, *locked);
- if (ret > 0)
- return;
-
- io_poll_remove_entries(req);
- spin_lock(&ctx->completion_lock);
- hash_del(&req->hash_node);
- spin_unlock(&ctx->completion_lock);
-
- if (!ret)
- io_req_task_submit(req, locked);
- else
- io_req_complete_failed(req, ret);
-}
-
-static void __io_poll_execute(struct io_kiocb *req, int mask,
- __poll_t __maybe_unused events)
-{
- req->result = mask;
- /*
- * This is useful for poll that is armed on behalf of another
- * request, and where the wakeup path could be on a different
- * CPU. We want to avoid pulling in req->apoll->events for that
- * case.
- */
- if (req->opcode == IORING_OP_POLL_ADD)
- req->io_task_work.func = io_poll_task_func;
- else
- req->io_task_work.func = io_apoll_task_func;
-
- trace_io_uring_task_add(req->ctx, req, req->user_data, req->opcode, mask);
- io_req_task_work_add(req, false);
-}
-
-static inline void io_poll_execute(struct io_kiocb *req, int res,
- __poll_t events)
-{
- if (io_poll_get_ownership(req))
- __io_poll_execute(req, res, events);
-}
-
-static void io_poll_cancel_req(struct io_kiocb *req)
-{
- io_poll_mark_cancelled(req);
- /* kick tw, which should complete the request */
- io_poll_execute(req, 0, 0);
-}
-
-#define wqe_to_req(wait) ((void *)((unsigned long) (wait)->private & ~1))
-#define wqe_is_double(wait) ((unsigned long) (wait)->private & 1)
-#define IO_ASYNC_POLL_COMMON (EPOLLONESHOT | POLLPRI)
-
-static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
- void *key)
-{
- struct io_kiocb *req = wqe_to_req(wait);
- struct io_poll_iocb *poll = container_of(wait, struct io_poll_iocb,
- wait);
- __poll_t mask = key_to_poll(key);
-
- if (unlikely(mask & POLLFREE)) {
- io_poll_mark_cancelled(req);
- /* we have to kick tw in case it's not already */
- io_poll_execute(req, 0, poll->events);
-
- /*
- * If the waitqueue is being freed early but someone is already
- * holds ownership over it, we have to tear down the request as
- * best we can. That means immediately removing the request from
- * its waitqueue and preventing all further accesses to the
- * waitqueue via the request.
- */
- list_del_init(&poll->wait.entry);
-
- /*
- * Careful: this *must* be the last step, since as soon
- * as req->head is NULL'ed out, the request can be
- * completed and freed, since aio_poll_complete_work()
- * will no longer need to take the waitqueue lock.
- */
- smp_store_release(&poll->head, NULL);
- return 1;
- }
-
- /* for instances that support it check for an event match first */
- if (mask && !(mask & (poll->events & ~IO_ASYNC_POLL_COMMON)))
- return 0;
-
- if (io_poll_get_ownership(req)) {
- /* optional, saves extra locking for removal in tw handler */
- if (mask && poll->events & EPOLLONESHOT) {
- list_del_init(&poll->wait.entry);
- poll->head = NULL;
- if (wqe_is_double(wait))
- req->flags &= ~REQ_F_DOUBLE_POLL;
- else
- req->flags &= ~REQ_F_SINGLE_POLL;
- }
- __io_poll_execute(req, mask, poll->events);
- }
- return 1;
-}
-
-static void __io_queue_proc(struct io_poll_iocb *poll, struct io_poll_table *pt,
- struct wait_queue_head *head,
- struct io_poll_iocb **poll_ptr)
-{
- struct io_kiocb *req = pt->req;
- unsigned long wqe_private = (unsigned long) req;
-
- /*
- * The file being polled uses multiple waitqueues for poll handling
- * (e.g. one for read, one for write). Setup a separate io_poll_iocb
- * if this happens.
- */
- if (unlikely(pt->nr_entries)) {
- struct io_poll_iocb *first = poll;
-
- /* double add on the same waitqueue head, ignore */
- if (first->head == head)
- return;
- /* already have a 2nd entry, fail a third attempt */
- if (*poll_ptr) {
- if ((*poll_ptr)->head == head)
- return;
- pt->error = -EINVAL;
- return;
- }
-
- poll = kmalloc(sizeof(*poll), GFP_ATOMIC);
- if (!poll) {
- pt->error = -ENOMEM;
- return;
- }
- /* mark as double wq entry */
- wqe_private |= 1;
- req->flags |= REQ_F_DOUBLE_POLL;
- io_init_poll_iocb(poll, first->events, first->wait.func);
- *poll_ptr = poll;
- if (req->opcode == IORING_OP_POLL_ADD)
- req->flags |= REQ_F_ASYNC_DATA;
- }
-
- req->flags |= REQ_F_SINGLE_POLL;
- pt->nr_entries++;
- poll->head = head;
- poll->wait.private = (void *) wqe_private;
-
- if (poll->events & EPOLLEXCLUSIVE)
- add_wait_queue_exclusive(head, &poll->wait);
- else
- add_wait_queue(head, &poll->wait);
-}
-
-static void io_poll_queue_proc(struct file *file, struct wait_queue_head *head,
- struct poll_table_struct *p)
-{
- struct io_poll_table *pt = container_of(p, struct io_poll_table, pt);
-
- __io_queue_proc(&pt->req->poll, pt, head,
- (struct io_poll_iocb **) &pt->req->async_data);
-}
-
-static int __io_arm_poll_handler(struct io_kiocb *req,
- struct io_poll_iocb *poll,
- struct io_poll_table *ipt, __poll_t mask)
-{
- struct io_ring_ctx *ctx = req->ctx;
- int v;
-
- INIT_HLIST_NODE(&req->hash_node);
- io_init_poll_iocb(poll, mask, io_poll_wake);
- poll->file = req->file;
-
- req->apoll_events = poll->events;
-
- ipt->pt._key = mask;
- ipt->req = req;
- ipt->error = 0;
- ipt->nr_entries = 0;
-
- /*
- * Take the ownership to delay any tw execution up until we're done
- * with poll arming. see io_poll_get_ownership().
- */
- atomic_set(&req->poll_refs, 1);
- mask = vfs_poll(req->file, &ipt->pt) & poll->events;
-
- if (mask && (poll->events & EPOLLONESHOT)) {
- io_poll_remove_entries(req);
- /* no one else has access to the req, forget about the ref */
- return mask;
- }
- if (!mask && unlikely(ipt->error || !ipt->nr_entries)) {
- io_poll_remove_entries(req);
- if (!ipt->error)
- ipt->error = -EINVAL;
- return 0;
- }
-
- spin_lock(&ctx->completion_lock);
- io_poll_req_insert(req);
- spin_unlock(&ctx->completion_lock);
-
- if (mask) {
- /* can't multishot if failed, just queue the event we've got */
- if (unlikely(ipt->error || !ipt->nr_entries)) {
- poll->events |= EPOLLONESHOT;
- req->apoll_events |= EPOLLONESHOT;
- ipt->error = 0;
- }
- __io_poll_execute(req, mask, poll->events);
- return 0;
- }
-
- /*
- * Release ownership. If someone tried to queue a tw while it was
- * locked, kick it off for them.
- */
- v = atomic_dec_return(&req->poll_refs);
- if (unlikely(v & IO_POLL_REF_MASK))
- __io_poll_execute(req, 0, poll->events);
- return 0;
-}
-
-static void io_async_queue_proc(struct file *file, struct wait_queue_head *head,
- struct poll_table_struct *p)
-{
- struct io_poll_table *pt = container_of(p, struct io_poll_table, pt);
- struct async_poll *apoll = pt->req->apoll;
-
- __io_queue_proc(&apoll->poll, pt, head, &apoll->double_poll);
-}
-
-enum {
- IO_APOLL_OK,
- IO_APOLL_ABORTED,
- IO_APOLL_READY
-};
-
-static int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
-{
- const struct io_op_def *def = &io_op_defs[req->opcode];
- struct io_ring_ctx *ctx = req->ctx;
- struct async_poll *apoll;
- struct io_poll_table ipt;
- __poll_t mask = IO_ASYNC_POLL_COMMON | POLLERR;
- int ret;
-
- if (!def->pollin && !def->pollout)
- return IO_APOLL_ABORTED;
- if (!file_can_poll(req->file) || (req->flags & REQ_F_POLLED))
- return IO_APOLL_ABORTED;
-
- if (def->pollin) {
- mask |= POLLIN | POLLRDNORM;
-
- /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
- if ((req->opcode == IORING_OP_RECVMSG) &&
- (req->sr_msg.msg_flags & MSG_ERRQUEUE))
- mask &= ~POLLIN;
- } else {
- mask |= POLLOUT | POLLWRNORM;
- }
- if (def->poll_exclusive)
- mask |= EPOLLEXCLUSIVE;
- if (!(issue_flags & IO_URING_F_UNLOCKED) &&
- !list_empty(&ctx->apoll_cache)) {
- apoll = list_first_entry(&ctx->apoll_cache, struct async_poll,
- poll.wait.entry);
- list_del_init(&apoll->poll.wait.entry);
- } else {
- apoll = kmalloc(sizeof(*apoll), GFP_ATOMIC);
- if (unlikely(!apoll))
- return IO_APOLL_ABORTED;
- }
- apoll->double_poll = NULL;
- req->apoll = apoll;
- req->flags |= REQ_F_POLLED;
- ipt.pt._qproc = io_async_queue_proc;
-
- io_kbuf_recycle(req, issue_flags);
-
- ret = __io_arm_poll_handler(req, &apoll->poll, &ipt, mask);
- if (ret || ipt.error)
- return ret ? IO_APOLL_READY : IO_APOLL_ABORTED;
-
- trace_io_uring_poll_arm(ctx, req, req->user_data, req->opcode,
- mask, apoll->poll.events);
- return IO_APOLL_OK;
-}
-
-/*
- * Returns true if we found and killed one or more poll requests
- */
-static __cold bool io_poll_remove_all(struct io_ring_ctx *ctx,
- struct task_struct *tsk, bool cancel_all)
-{
- struct hlist_node *tmp;
- struct io_kiocb *req;
- bool found = false;
- int i;
-
- spin_lock(&ctx->completion_lock);
- for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
- struct hlist_head *list;
-
- list = &ctx->cancel_hash[i];
- hlist_for_each_entry_safe(req, tmp, list, hash_node) {
- if (io_match_task_safe(req, tsk, cancel_all)) {
- hlist_del_init(&req->hash_node);
- io_poll_cancel_req(req);
- found = true;
- }
- }
- }
- spin_unlock(&ctx->completion_lock);
- return found;
-}
-
-static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, __u64 sqe_addr,
- bool poll_only)
- __must_hold(&ctx->completion_lock)
-{
- struct hlist_head *list;
- struct io_kiocb *req;
-
- list = &ctx->cancel_hash[hash_long(sqe_addr, ctx->cancel_hash_bits)];
- hlist_for_each_entry(req, list, hash_node) {
- if (sqe_addr != req->user_data)
- continue;
- if (poll_only && req->opcode != IORING_OP_POLL_ADD)
- continue;
- return req;
- }
- return NULL;
-}
-
-static bool io_poll_disarm(struct io_kiocb *req)
- __must_hold(&ctx->completion_lock)
-{
- if (!io_poll_get_ownership(req))
- return false;
- io_poll_remove_entries(req);
- hash_del(&req->hash_node);
- return true;
-}
-
-static int io_poll_cancel(struct io_ring_ctx *ctx, __u64 sqe_addr,
- bool poll_only)
- __must_hold(&ctx->completion_lock)
-{
- struct io_kiocb *req = io_poll_find(ctx, sqe_addr, poll_only);
-
- if (!req)
- return -ENOENT;
- io_poll_cancel_req(req);
- return 0;
-}
-
-static __poll_t io_poll_parse_events(const struct io_uring_sqe *sqe,
- unsigned int flags)
-{
- u32 events;
-
- events = READ_ONCE(sqe->poll32_events);
-#ifdef __BIG_ENDIAN
- events = swahw32(events);
-#endif
- if (!(flags & IORING_POLL_ADD_MULTI))
- events |= EPOLLONESHOT;
- return demangle_poll(events) | (events & (EPOLLEXCLUSIVE|EPOLLONESHOT));
-}
-
-static int io_poll_update_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
-{
- struct io_poll_update *upd = &req->poll_update;
- u32 flags;
-
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (sqe->ioprio || sqe->buf_index || sqe->splice_fd_in)
- return -EINVAL;
- flags = READ_ONCE(sqe->len);
- if (flags & ~(IORING_POLL_UPDATE_EVENTS | IORING_POLL_UPDATE_USER_DATA |
- IORING_POLL_ADD_MULTI))
- return -EINVAL;
- /* meaningless without update */
- if (flags == IORING_POLL_ADD_MULTI)
- return -EINVAL;
-
- upd->old_user_data = READ_ONCE(sqe->addr);
- upd->update_events = flags & IORING_POLL_UPDATE_EVENTS;
- upd->update_user_data = flags & IORING_POLL_UPDATE_USER_DATA;
-
- upd->new_user_data = READ_ONCE(sqe->off);
- if (!upd->update_user_data && upd->new_user_data)
- return -EINVAL;
- if (upd->update_events)
- upd->events = io_poll_parse_events(sqe, flags);
- else if (sqe->poll32_events)
- return -EINVAL;
-
- return 0;
-}
-
-static int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
- struct io_poll_iocb *poll = &req->poll;
- u32 flags;
-
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (sqe->ioprio || sqe->buf_index || sqe->off || sqe->addr)
- return -EINVAL;
- flags = READ_ONCE(sqe->len);
- if (flags & ~IORING_POLL_ADD_MULTI)
- return -EINVAL;
- if ((flags & IORING_POLL_ADD_MULTI) && (req->flags & REQ_F_CQE_SKIP))
- return -EINVAL;
-
- io_req_set_refcount(req);
- poll->events = io_poll_parse_events(sqe, flags);
- return 0;
-}
-
-static int io_poll_add(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_poll_iocb *poll = &req->poll;
- struct io_poll_table ipt;
- int ret;
-
- ipt.pt._qproc = io_poll_queue_proc;
-
- ret = __io_arm_poll_handler(req, &req->poll, &ipt, poll->events);
- if (!ret && ipt.error)
- req_set_fail(req);
- ret = ret ?: ipt.error;
- if (ret)
- __io_req_complete(req, issue_flags, ret, 0);
- return 0;
-}
-
-static int io_poll_update(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_ring_ctx *ctx = req->ctx;
- struct io_kiocb *preq;
- int ret2, ret = 0;
- bool locked;
-
- spin_lock(&ctx->completion_lock);
- preq = io_poll_find(ctx, req->poll_update.old_user_data, true);
- if (!preq || !io_poll_disarm(preq)) {
- spin_unlock(&ctx->completion_lock);
- ret = preq ? -EALREADY : -ENOENT;
- goto out;
- }
- spin_unlock(&ctx->completion_lock);
-
- if (req->poll_update.update_events || req->poll_update.update_user_data) {
- /* only mask one event flags, keep behavior flags */
- if (req->poll_update.update_events) {
- preq->poll.events &= ~0xffff;
- preq->poll.events |= req->poll_update.events & 0xffff;
- preq->poll.events |= IO_POLL_UNMASK;
- }
- if (req->poll_update.update_user_data)
- preq->user_data = req->poll_update.new_user_data;
-
- ret2 = io_poll_add(preq, issue_flags);
- /* successfully updated, don't complete poll request */
- if (!ret2)
- goto out;
- }
-
- req_set_fail(preq);
- preq->result = -ECANCELED;
- locked = !(issue_flags & IO_URING_F_UNLOCKED);
- io_req_task_complete(preq, &locked);
-out:
- if (ret < 0)
- req_set_fail(req);
- /* complete update request, we're done with it */
- __io_req_complete(req, issue_flags, ret, 0);
- return 0;
-}
-
-static enum hrtimer_restart io_timeout_fn(struct hrtimer *timer)
-{
- struct io_timeout_data *data = container_of(timer,
- struct io_timeout_data, timer);
- struct io_kiocb *req = data->req;
- struct io_ring_ctx *ctx = req->ctx;
- unsigned long flags;
-
- spin_lock_irqsave(&ctx->timeout_lock, flags);
- list_del_init(&req->timeout.list);
- atomic_set(&req->ctx->cq_timeouts,
- atomic_read(&req->ctx->cq_timeouts) + 1);
- spin_unlock_irqrestore(&ctx->timeout_lock, flags);
-
- if (!(data->flags & IORING_TIMEOUT_ETIME_SUCCESS))
- req_set_fail(req);
-
- req->result = -ETIME;
- req->io_task_work.func = io_req_task_complete;
- io_req_task_work_add(req, false);
- return HRTIMER_NORESTART;
-}
-
-static struct io_kiocb *io_timeout_extract(struct io_ring_ctx *ctx,
- __u64 user_data)
- __must_hold(&ctx->timeout_lock)
-{
- struct io_timeout_data *io;
- struct io_kiocb *req;
- bool found = false;
-
- list_for_each_entry(req, &ctx->timeout_list, timeout.list) {
- found = user_data == req->user_data;
- if (found)
- break;
- }
- if (!found)
- return ERR_PTR(-ENOENT);
-
- io = req->async_data;
- if (hrtimer_try_to_cancel(&io->timer) == -1)
- return ERR_PTR(-EALREADY);
- list_del_init(&req->timeout.list);
- return req;
-}
-
-static int io_timeout_cancel(struct io_ring_ctx *ctx, __u64 user_data)
- __must_hold(&ctx->completion_lock)
- __must_hold(&ctx->timeout_lock)
-{
- struct io_kiocb *req = io_timeout_extract(ctx, user_data);
-
- if (IS_ERR(req))
- return PTR_ERR(req);
- io_req_task_queue_fail(req, -ECANCELED);
- return 0;
-}
-
-static clockid_t io_timeout_get_clock(struct io_timeout_data *data)
-{
- switch (data->flags & IORING_TIMEOUT_CLOCK_MASK) {
- case IORING_TIMEOUT_BOOTTIME:
- return CLOCK_BOOTTIME;
- case IORING_TIMEOUT_REALTIME:
- return CLOCK_REALTIME;
- default:
- /* can't happen, vetted at prep time */
- WARN_ON_ONCE(1);
- fallthrough;
- case 0:
- return CLOCK_MONOTONIC;
- }
-}
-
-static int io_linked_timeout_update(struct io_ring_ctx *ctx, __u64 user_data,
- struct timespec64 *ts, enum hrtimer_mode mode)
- __must_hold(&ctx->timeout_lock)
-{
- struct io_timeout_data *io;
- struct io_kiocb *req;
- bool found = false;
-
- list_for_each_entry(req, &ctx->ltimeout_list, timeout.list) {
- found = user_data == req->user_data;
- if (found)
- break;
- }
- if (!found)
- return -ENOENT;
-
- io = req->async_data;
- if (hrtimer_try_to_cancel(&io->timer) == -1)
- return -EALREADY;
- hrtimer_init(&io->timer, io_timeout_get_clock(io), mode);
- io->timer.function = io_link_timeout_fn;
- hrtimer_start(&io->timer, timespec64_to_ktime(*ts), mode);
- return 0;
-}
-
-static int io_timeout_update(struct io_ring_ctx *ctx, __u64 user_data,
- struct timespec64 *ts, enum hrtimer_mode mode)
- __must_hold(&ctx->timeout_lock)
-{
- struct io_kiocb *req = io_timeout_extract(ctx, user_data);
- struct io_timeout_data *data;
-
- if (IS_ERR(req))
- return PTR_ERR(req);
-
- req->timeout.off = 0; /* noseq */
- data = req->async_data;
- list_add_tail(&req->timeout.list, &ctx->timeout_list);
- hrtimer_init(&data->timer, io_timeout_get_clock(data), mode);
- data->timer.function = io_timeout_fn;
- hrtimer_start(&data->timer, timespec64_to_ktime(*ts), mode);
- return 0;
-}
-
-static int io_timeout_remove_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
-{
- struct io_timeout_rem *tr = &req->timeout_rem;
-
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT)))
- return -EINVAL;
- if (sqe->ioprio || sqe->buf_index || sqe->len || sqe->splice_fd_in)
- return -EINVAL;
-
- tr->ltimeout = false;
- tr->addr = READ_ONCE(sqe->addr);
- tr->flags = READ_ONCE(sqe->timeout_flags);
- if (tr->flags & IORING_TIMEOUT_UPDATE_MASK) {
- if (hweight32(tr->flags & IORING_TIMEOUT_CLOCK_MASK) > 1)
- return -EINVAL;
- if (tr->flags & IORING_LINK_TIMEOUT_UPDATE)
- tr->ltimeout = true;
- if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK|IORING_TIMEOUT_ABS))
- return -EINVAL;
- if (get_timespec64(&tr->ts, u64_to_user_ptr(sqe->addr2)))
- return -EFAULT;
- if (tr->ts.tv_sec < 0 || tr->ts.tv_nsec < 0)
- return -EINVAL;
- } else if (tr->flags) {
- /* timeout removal doesn't support flags */
- return -EINVAL;
- }
-
- return 0;
-}
-
-static inline enum hrtimer_mode io_translate_timeout_mode(unsigned int flags)
-{
- return (flags & IORING_TIMEOUT_ABS) ? HRTIMER_MODE_ABS
- : HRTIMER_MODE_REL;
-}
-
-/*
- * Remove or update an existing timeout command
- */
-static int io_timeout_remove(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_timeout_rem *tr = &req->timeout_rem;
- struct io_ring_ctx *ctx = req->ctx;
- int ret;
-
- if (!(req->timeout_rem.flags & IORING_TIMEOUT_UPDATE)) {
- spin_lock(&ctx->completion_lock);
- spin_lock_irq(&ctx->timeout_lock);
- ret = io_timeout_cancel(ctx, tr->addr);
- spin_unlock_irq(&ctx->timeout_lock);
- spin_unlock(&ctx->completion_lock);
- } else {
- enum hrtimer_mode mode = io_translate_timeout_mode(tr->flags);
-
- spin_lock_irq(&ctx->timeout_lock);
- if (tr->ltimeout)
- ret = io_linked_timeout_update(ctx, tr->addr, &tr->ts, mode);
- else
- ret = io_timeout_update(ctx, tr->addr, &tr->ts, mode);
- spin_unlock_irq(&ctx->timeout_lock);
- }
-
- if (ret < 0)
- req_set_fail(req);
- io_req_complete_post(req, ret, 0);
- return 0;
-}
-
-static int io_timeout_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe,
- bool is_timeout_link)
-{
- struct io_timeout_data *data;
- unsigned flags;
- u32 off = READ_ONCE(sqe->off);
-
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (sqe->ioprio || sqe->buf_index || sqe->len != 1 ||
- sqe->splice_fd_in)
- return -EINVAL;
- if (off && is_timeout_link)
- return -EINVAL;
- flags = READ_ONCE(sqe->timeout_flags);
- if (flags & ~(IORING_TIMEOUT_ABS | IORING_TIMEOUT_CLOCK_MASK |
- IORING_TIMEOUT_ETIME_SUCCESS))
- return -EINVAL;
- /* more than one clock specified is invalid, obviously */
- if (hweight32(flags & IORING_TIMEOUT_CLOCK_MASK) > 1)
- return -EINVAL;
-
- INIT_LIST_HEAD(&req->timeout.list);
- req->timeout.off = off;
- if (unlikely(off && !req->ctx->off_timeout_used))
- req->ctx->off_timeout_used = true;
-
- if (WARN_ON_ONCE(req_has_async_data(req)))
- return -EFAULT;
- if (io_alloc_async_data(req))
- return -ENOMEM;
-
- data = req->async_data;
- data->req = req;
- data->flags = flags;
-
- if (get_timespec64(&data->ts, u64_to_user_ptr(sqe->addr)))
- return -EFAULT;
-
- if (data->ts.tv_sec < 0 || data->ts.tv_nsec < 0)
- return -EINVAL;
-
- INIT_LIST_HEAD(&req->timeout.list);
- data->mode = io_translate_timeout_mode(flags);
- hrtimer_init(&data->timer, io_timeout_get_clock(data), data->mode);
-
- if (is_timeout_link) {
- struct io_submit_link *link = &req->ctx->submit_state.link;
-
- if (!link->head)
- return -EINVAL;
- if (link->last->opcode == IORING_OP_LINK_TIMEOUT)
- return -EINVAL;
- req->timeout.head = link->last;
- link->last->flags |= REQ_F_ARM_LTIMEOUT;
- }
- return 0;
-}
-
-static int io_timeout(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_ring_ctx *ctx = req->ctx;
- struct io_timeout_data *data = req->async_data;
- struct list_head *entry;
- u32 tail, off = req->timeout.off;
-
- spin_lock_irq(&ctx->timeout_lock);
-
- /*
- * sqe->off holds how many events that need to occur for this
- * timeout event to be satisfied. If it isn't set, then this is
- * a pure timeout request, sequence isn't used.
- */
- if (io_is_timeout_noseq(req)) {
- entry = ctx->timeout_list.prev;
- goto add;
- }
-
- tail = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts);
- req->timeout.target_seq = tail + off;
-
- /* Update the last seq here in case io_flush_timeouts() hasn't.
- * This is safe because ->completion_lock is held, and submissions
- * and completions are never mixed in the same ->completion_lock section.
- */
- ctx->cq_last_tm_flush = tail;
-
- /*
- * Insertion sort, ensuring the first entry in the list is always
- * the one we need first.
- */
- list_for_each_prev(entry, &ctx->timeout_list) {
- struct io_kiocb *nxt = list_entry(entry, struct io_kiocb,
- timeout.list);
-
- if (io_is_timeout_noseq(nxt))
- continue;
- /* nxt.seq is behind @tail, otherwise would've been completed */
- if (off >= nxt->timeout.target_seq - tail)
- break;
- }
-add:
- list_add(&req->timeout.list, entry);
- data->timer.function = io_timeout_fn;
- hrtimer_start(&data->timer, timespec64_to_ktime(data->ts), data->mode);
- spin_unlock_irq(&ctx->timeout_lock);
- return 0;
-}
-
-struct io_cancel_data {
- struct io_ring_ctx *ctx;
- u64 user_data;
-};
-
-static bool io_cancel_cb(struct io_wq_work *work, void *data)
-{
- struct io_kiocb *req = container_of(work, struct io_kiocb, work);
- struct io_cancel_data *cd = data;
-
- return req->ctx == cd->ctx && req->user_data == cd->user_data;
-}
-
-static int io_async_cancel_one(struct io_uring_task *tctx, u64 user_data,
- struct io_ring_ctx *ctx)
-{
- struct io_cancel_data data = { .ctx = ctx, .user_data = user_data, };
- enum io_wq_cancel cancel_ret;
- int ret = 0;
-
- if (!tctx || !tctx->io_wq)
- return -ENOENT;
-
- cancel_ret = io_wq_cancel_cb(tctx->io_wq, io_cancel_cb, &data, false);
- switch (cancel_ret) {
- case IO_WQ_CANCEL_OK:
- ret = 0;
- break;
- case IO_WQ_CANCEL_RUNNING:
- ret = -EALREADY;
- break;
- case IO_WQ_CANCEL_NOTFOUND:
- ret = -ENOENT;
- break;
- }
-
- return ret;
-}
-
-static int io_try_cancel_userdata(struct io_kiocb *req, u64 sqe_addr)
-{
- struct io_ring_ctx *ctx = req->ctx;
- int ret;
-
- WARN_ON_ONCE(!io_wq_current_is_worker() && req->task != current);
-
- ret = io_async_cancel_one(req->task->io_uring, sqe_addr, ctx);
- /*
- * Fall-through even for -EALREADY, as we may have poll armed
- * that need unarming.
- */
- if (!ret)
- return 0;
-
- spin_lock(&ctx->completion_lock);
- ret = io_poll_cancel(ctx, sqe_addr, false);
- if (ret != -ENOENT)
- goto out;
-
- spin_lock_irq(&ctx->timeout_lock);
- ret = io_timeout_cancel(ctx, sqe_addr);
- spin_unlock_irq(&ctx->timeout_lock);
-out:
- spin_unlock(&ctx->completion_lock);
- return ret;
-}
-
-static int io_async_cancel_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
-{
- if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
- return -EINVAL;
- if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT)))
- return -EINVAL;
- if (sqe->ioprio || sqe->off || sqe->len || sqe->cancel_flags ||
- sqe->splice_fd_in)
- return -EINVAL;
-
- req->cancel.addr = READ_ONCE(sqe->addr);
- return 0;
-}
-
-static int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_ring_ctx *ctx = req->ctx;
- u64 sqe_addr = req->cancel.addr;
- bool needs_lock = issue_flags & IO_URING_F_UNLOCKED;
- struct io_tctx_node *node;
- int ret;
-
- ret = io_try_cancel_userdata(req, sqe_addr);
- if (ret != -ENOENT)
- goto done;
-
- /* slow path, try all io-wq's */
- io_ring_submit_lock(ctx, needs_lock);
- ret = -ENOENT;
- list_for_each_entry(node, &ctx->tctx_list, ctx_node) {
- struct io_uring_task *tctx = node->task->io_uring;
-
- ret = io_async_cancel_one(tctx, req->cancel.addr, ctx);
- if (ret != -ENOENT)
- break;
- }
- io_ring_submit_unlock(ctx, needs_lock);
-done:
- if (ret < 0)
- req_set_fail(req);
- io_req_complete_post(req, ret, 0);
- return 0;
-}
-
-static int io_rsrc_update_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
-{
- if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT)))
- return -EINVAL;
- if (sqe->ioprio || sqe->rw_flags || sqe->splice_fd_in)
- return -EINVAL;
-
- req->rsrc_update.offset = READ_ONCE(sqe->off);
- req->rsrc_update.nr_args = READ_ONCE(sqe->len);
- if (!req->rsrc_update.nr_args)
- return -EINVAL;
- req->rsrc_update.arg = READ_ONCE(sqe->addr);
- return 0;
-}
-
-static int io_files_update(struct io_kiocb *req, unsigned int issue_flags)
-{
- struct io_ring_ctx *ctx = req->ctx;
- bool needs_lock = issue_flags & IO_URING_F_UNLOCKED;
- struct io_uring_rsrc_update2 up;
- int ret;
-
- up.offset = req->rsrc_update.offset;
- up.data = req->rsrc_update.arg;
- up.nr = 0;
- up.tags = 0;
- up.resv = 0;
- up.resv2 = 0;
-
- io_ring_submit_lock(ctx, needs_lock);
- ret = __io_register_rsrc_update(ctx, IORING_RSRC_FILE,
- &up, req->rsrc_update.nr_args);
- io_ring_submit_unlock(ctx, needs_lock);
-
- if (ret < 0)
- req_set_fail(req);
- __io_req_complete(req, issue_flags, ret, 0);
- return 0;
-}
-
-static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
- switch (req->opcode) {
- case IORING_OP_NOP:
- return 0;
- case IORING_OP_READV:
- case IORING_OP_READ_FIXED:
- case IORING_OP_READ:
- case IORING_OP_WRITEV:
- case IORING_OP_WRITE_FIXED:
- case IORING_OP_WRITE:
- return io_prep_rw(req, sqe);
- case IORING_OP_POLL_ADD:
- return io_poll_add_prep(req, sqe);
- case IORING_OP_POLL_REMOVE:
- return io_poll_update_prep(req, sqe);
- case IORING_OP_FSYNC:
- return io_fsync_prep(req, sqe);
- case IORING_OP_SYNC_FILE_RANGE:
- return io_sfr_prep(req, sqe);
- case IORING_OP_SENDMSG:
- case IORING_OP_SEND:
- return io_sendmsg_prep(req, sqe);
- case IORING_OP_RECVMSG:
- case IORING_OP_RECV:
- return io_recvmsg_prep(req, sqe);
- case IORING_OP_CONNECT:
- return io_connect_prep(req, sqe);
- case IORING_OP_TIMEOUT:
- return io_timeout_prep(req, sqe, false);
- case IORING_OP_TIMEOUT_REMOVE:
- return io_timeout_remove_prep(req, sqe);
- case IORING_OP_ASYNC_CANCEL:
- return io_async_cancel_prep(req, sqe);
- case IORING_OP_LINK_TIMEOUT:
- return io_timeout_prep(req, sqe, true);
- case IORING_OP_ACCEPT:
- return io_accept_prep(req, sqe);
- case IORING_OP_FALLOCATE:
- return io_fallocate_prep(req, sqe);
- case IORING_OP_OPENAT:
- return io_openat_prep(req, sqe);
- case IORING_OP_CLOSE:
- return io_close_prep(req, sqe);
- case IORING_OP_FILES_UPDATE:
- return io_rsrc_update_prep(req, sqe);
- case IORING_OP_STATX:
- return io_statx_prep(req, sqe);
- case IORING_OP_FADVISE:
- return io_fadvise_prep(req, sqe);
- case IORING_OP_MADVISE:
- return io_madvise_prep(req, sqe);
- case IORING_OP_OPENAT2:
- return io_openat2_prep(req, sqe);
- case IORING_OP_EPOLL_CTL:
- return io_epoll_ctl_prep(req, sqe);
- case IORING_OP_SPLICE:
- return io_splice_prep(req, sqe);
- case IORING_OP_PROVIDE_BUFFERS:
- return io_provide_buffers_prep(req, sqe);
- case IORING_OP_REMOVE_BUFFERS:
- return io_remove_buffers_prep(req, sqe);
- case IORING_OP_TEE:
- return io_tee_prep(req, sqe);
- case IORING_OP_SHUTDOWN:
- return io_shutdown_prep(req, sqe);
- case IORING_OP_RENAMEAT:
- return io_renameat_prep(req, sqe);
- case IORING_OP_UNLINKAT:
- return io_unlinkat_prep(req, sqe);
- case IORING_OP_MKDIRAT:
- return io_mkdirat_prep(req, sqe);
- case IORING_OP_SYMLINKAT:
- return io_symlinkat_prep(req, sqe);
- case IORING_OP_LINKAT:
- return io_linkat_prep(req, sqe);
- case IORING_OP_MSG_RING:
- return io_msg_ring_prep(req, sqe);
- }
-
- printk_once(KERN_WARNING "io_uring: unhandled opcode %d\n",
- req->opcode);
- return -EINVAL;
-}
-
-static int io_req_prep_async(struct io_kiocb *req)
-{
- const struct io_op_def *def = &io_op_defs[req->opcode];
-
- /* assign early for deferred execution for non-fixed file */
- if (def->needs_file && !(req->flags & REQ_F_FIXED_FILE))
- req->file = io_file_get_normal(req, req->fd);
- if (!def->needs_async_setup)
- return 0;
- if (WARN_ON_ONCE(req_has_async_data(req)))
- return -EFAULT;
- if (io_alloc_async_data(req))
- return -EAGAIN;
-
- switch (req->opcode) {
- case IORING_OP_READV:
- return io_rw_prep_async(req, READ);
- case IORING_OP_WRITEV:
- return io_rw_prep_async(req, WRITE);
- case IORING_OP_SENDMSG:
- return io_sendmsg_prep_async(req);
- case IORING_OP_RECVMSG:
- return io_recvmsg_prep_async(req);
- case IORING_OP_CONNECT:
- return io_connect_prep_async(req);
- }
- printk_once(KERN_WARNING "io_uring: prep_async() bad opcode %d\n",
- req->opcode);
- return -EFAULT;
-}
-
-static u32 io_get_sequence(struct io_kiocb *req)
-{
- u32 seq = req->ctx->cached_sq_head;
-
- /* need original cached_sq_head, but it was increased for each req */
- io_for_each_link(req, req)
- seq--;
- return seq;
-}
-
-static __cold void io_drain_req(struct io_kiocb *req)
-{
- struct io_ring_ctx *ctx = req->ctx;
- struct io_defer_entry *de;
- int ret;
- u32 seq = io_get_sequence(req);
-
- /* Still need defer if there is pending req in defer list. */
- spin_lock(&ctx->completion_lock);
- if (!req_need_defer(req, seq) && list_empty_careful(&ctx->defer_list)) {
- spin_unlock(&ctx->completion_lock);
-queue:
- ctx->drain_active = false;
- io_req_task_queue(req);
- return;
- }
- spin_unlock(&ctx->completion_lock);
-
- ret = io_req_prep_async(req);
- if (ret) {
-fail:
- io_req_complete_failed(req, ret);
- return;
- }
- io_prep_async_link(req);
- de = kmalloc(sizeof(*de), GFP_KERNEL);
- if (!de) {
- ret = -ENOMEM;
- goto fail;
- }
-
- spin_lock(&ctx->completion_lock);
- if (!req_need_defer(req, seq) && list_empty(&ctx->defer_list)) {
- spin_unlock(&ctx->completion_lock);
- kfree(de);
- goto queue;
- }
-
- trace_io_uring_defer(ctx, req, req->user_data, req->opcode);
- de->req = req;
- de->seq = seq;
- list_add_tail(&de->list, &ctx->defer_list);
- spin_unlock(&ctx->completion_lock);
-}
-
-static void io_clean_op(struct io_kiocb *req)
-{
- if (req->flags & REQ_F_BUFFER_SELECTED) {
- spin_lock(&req->ctx->completion_lock);
- io_put_kbuf_comp(req);
- spin_unlock(&req->ctx->completion_lock);
- }
-
- if (req->flags & REQ_F_NEED_CLEANUP) {
- switch (req->opcode) {
- case IORING_OP_READV:
- case IORING_OP_READ_FIXED:
- case IORING_OP_READ:
- case IORING_OP_WRITEV:
- case IORING_OP_WRITE_FIXED:
- case IORING_OP_WRITE: {
- struct io_async_rw *io = req->async_data;
-
- kfree(io->free_iovec);
- break;
- }
- case IORING_OP_RECVMSG:
- case IORING_OP_SENDMSG: {
- struct io_async_msghdr *io = req->async_data;
-
- kfree(io->free_iov);
- break;
- }
- case IORING_OP_OPENAT:
- case IORING_OP_OPENAT2:
- if (req->open.filename)
- putname(req->open.filename);
- break;
- case IORING_OP_RENAMEAT:
- putname(req->rename.oldpath);
- putname(req->rename.newpath);
- break;
- case IORING_OP_UNLINKAT:
- putname(req->unlink.filename);
- break;
- case IORING_OP_MKDIRAT:
- putname(req->mkdir.filename);
- break;
- case IORING_OP_SYMLINKAT:
- putname(req->symlink.oldpath);
- putname(req->symlink.newpath);
- break;
- case IORING_OP_LINKAT:
- putname(req->hardlink.oldpath);
- putname(req->hardlink.newpath);
- break;
- case IORING_OP_STATX:
- if (req->statx.filename)
- putname(req->statx.filename);
- break;
- }
- }
- if ((req->flags & REQ_F_POLLED) && req->apoll) {
- kfree(req->apoll->double_poll);
- kfree(req->apoll);
- req->apoll = NULL;
- }
- if (req->flags & REQ_F_INFLIGHT) {
- struct io_uring_task *tctx = req->task->io_uring;
-
- atomic_dec(&tctx->inflight_tracked);
- }
- if (req->flags & REQ_F_CREDS)
- put_cred(req->creds);
- if (req->flags & REQ_F_ASYNC_DATA) {
- kfree(req->async_data);
- req->async_data = NULL;
- }
- req->flags &= ~IO_REQ_CLEAN_FLAGS;
-}
-
-static bool io_assign_file(struct io_kiocb *req, unsigned int issue_flags)
-{
- if (req->file || !io_op_defs[req->opcode].needs_file)
- return true;
-
- if (req->flags & REQ_F_FIXED_FILE)
- req->file = io_file_get_fixed(req, req->fd, issue_flags);
- else
- req->file = io_file_get_normal(req, req->fd);
- if (req->file)
- return true;
-
- req_set_fail(req);
- req->result = -EBADF;
- return false;
-}
-
-static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
-{
- const struct cred *creds = NULL;
- int ret;
-
- if (unlikely(!io_assign_file(req, issue_flags)))
- return -EBADF;
-
- if (unlikely((req->flags & REQ_F_CREDS) && req->creds != current_cred()))
- creds = override_creds(req->creds);
-
- if (!io_op_defs[req->opcode].audit_skip)
- audit_uring_entry(req->opcode);
-
- switch (req->opcode) {
- case IORING_OP_NOP:
- ret = io_nop(req, issue_flags);
- break;
- case IORING_OP_READV:
- case IORING_OP_READ_FIXED:
- case IORING_OP_READ:
- ret = io_read(req, issue_flags);
- break;
- case IORING_OP_WRITEV:
- case IORING_OP_WRITE_FIXED:
- case IORING_OP_WRITE:
- ret = io_write(req, issue_flags);
- break;
- case IORING_OP_FSYNC:
- ret = io_fsync(req, issue_flags);
- break;
- case IORING_OP_POLL_ADD:
- ret = io_poll_add(req, issue_flags);
- break;
- case IORING_OP_POLL_REMOVE:
- ret = io_poll_update(req, issue_flags);
- break;
- case IORING_OP_SYNC_FILE_RANGE:
- ret = io_sync_file_range(req, issue_flags);
- break;
- case IORING_OP_SENDMSG:
- ret = io_sendmsg(req, issue_flags);
- break;
- case IORING_OP_SEND:
- ret = io_send(req, issue_flags);
- break;
- case IORING_OP_RECVMSG:
- ret = io_recvmsg(req, issue_flags);
- break;
- case IORING_OP_RECV:
- ret = io_recv(req, issue_flags);
- break;
- case IORING_OP_TIMEOUT:
- ret = io_timeout(req, issue_flags);
- break;
- case IORING_OP_TIMEOUT_REMOVE:
- ret = io_timeout_remove(req, issue_flags);
- break;
- case IORING_OP_ACCEPT:
- ret = io_accept(req, issue_flags);
- break;
- case IORING_OP_CONNECT:
- ret = io_connect(req, issue_flags);
- break;
- case IORING_OP_ASYNC_CANCEL:
- ret = io_async_cancel(req, issue_flags);
- break;
- case IORING_OP_FALLOCATE:
- ret = io_fallocate(req, issue_flags);
- break;
- case IORING_OP_OPENAT:
- ret = io_openat(req, issue_flags);
- break;
- case IORING_OP_CLOSE:
- ret = io_close(req, issue_flags);
- break;
- case IORING_OP_FILES_UPDATE:
- ret = io_files_update(req, issue_flags);
- break;
- case IORING_OP_STATX:
- ret = io_statx(req, issue_flags);
- break;
- case IORING_OP_FADVISE:
- ret = io_fadvise(req, issue_flags);
- break;
- case IORING_OP_MADVISE:
- ret = io_madvise(req, issue_flags);
- break;
- case IORING_OP_OPENAT2:
- ret = io_openat2(req, issue_flags);
- break;
- case IORING_OP_EPOLL_CTL:
- ret = io_epoll_ctl(req, issue_flags);
- break;
- case IORING_OP_SPLICE:
- ret = io_splice(req, issue_flags);
- break;
- case IORING_OP_PROVIDE_BUFFERS:
- ret = io_provide_buffers(req, issue_flags);
- break;
- case IORING_OP_REMOVE_BUFFERS:
- ret = io_remove_buffers(req, issue_flags);
- break;
- case IORING_OP_TEE:
- ret = io_tee(req, issue_flags);
- break;
- case IORING_OP_SHUTDOWN:
- ret = io_shutdown(req, issue_flags);
- break;
- case IORING_OP_RENAMEAT:
- ret = io_renameat(req, issue_flags);
- break;
- case IORING_OP_UNLINKAT:
- ret = io_unlinkat(req, issue_flags);
- break;
- case IORING_OP_MKDIRAT:
- ret = io_mkdirat(req, issue_flags);
- break;
- case IORING_OP_SYMLINKAT:
- ret = io_symlinkat(req, issue_flags);
- break;
- case IORING_OP_LINKAT:
- ret = io_linkat(req, issue_flags);
- break;
- case IORING_OP_MSG_RING:
- ret = io_msg_ring(req, issue_flags);
- break;
- default:
- ret = -EINVAL;
- break;
- }
-
- if (!io_op_defs[req->opcode].audit_skip)
- audit_uring_exit(!ret, ret);
-
- if (creds)
- revert_creds(creds);
- if (ret)
- return ret;
- /* If the op doesn't have a file, we're not polling for it */
- if ((req->ctx->flags & IORING_SETUP_IOPOLL) && req->file)
- io_iopoll_req_issued(req, issue_flags);
-
- return 0;
-}
-
-static struct io_wq_work *io_wq_free_work(struct io_wq_work *work)
-{
- struct io_kiocb *req = container_of(work, struct io_kiocb, work);
-
- req = io_put_req_find_next(req);
- return req ? &req->work : NULL;
-}
-
-static void io_wq_submit_work(struct io_wq_work *work)
-{
- struct io_kiocb *req = container_of(work, struct io_kiocb, work);
- const struct io_op_def *def = &io_op_defs[req->opcode];
- unsigned int issue_flags = IO_URING_F_UNLOCKED;
- bool needs_poll = false;
- struct io_kiocb *timeout;
- int ret = 0, err = -ECANCELED;
-
- /* one will be dropped by ->io_free_work() after returning to io-wq */
- if (!(req->flags & REQ_F_REFCOUNT))
- __io_req_set_refcount(req, 2);
- else
- req_ref_get(req);
-
- timeout = io_prep_linked_timeout(req);
- if (timeout)
- io_queue_linked_timeout(timeout);
-
-
- /* either cancelled or io-wq is dying, so don't touch tctx->iowq */
- if (work->flags & IO_WQ_WORK_CANCEL) {
-fail:
- io_req_task_queue_fail(req, err);
- return;
- }
- if (!io_assign_file(req, issue_flags)) {
- err = -EBADF;
- work->flags |= IO_WQ_WORK_CANCEL;
- goto fail;
- }
-
- if (req->flags & REQ_F_FORCE_ASYNC) {
- bool opcode_poll = def->pollin || def->pollout;
-
- if (opcode_poll && file_can_poll(req->file)) {
- needs_poll = true;
- issue_flags |= IO_URING_F_NONBLOCK;
- }
- }
-
- do {
- ret = io_issue_sqe(req, issue_flags);
- if (ret != -EAGAIN)
- break;
- /*
- * We can get EAGAIN for iopolled IO even though we're
- * forcing a sync submission from here, since we can't
- * wait for request slots on the block side.
- */
- if (!needs_poll) {
- if (!(req->ctx->flags & IORING_SETUP_IOPOLL))
- break;
- cond_resched();
- continue;
- }
-
- if (io_arm_poll_handler(req, issue_flags) == IO_APOLL_OK)
- return;
- /* aborted or ready, in either case retry blocking */
- needs_poll = false;
- issue_flags &= ~IO_URING_F_NONBLOCK;
- } while (1);
-
- /* avoid locking problems by failing it from a clean context */
- if (ret)
- io_req_task_queue_fail(req, ret);
-}
-
-static inline struct io_fixed_file *io_fixed_file_slot(struct io_file_table *table,
- unsigned i)
-{
- return &table->files[i];
-}
-
-static inline struct file *io_file_from_index(struct io_ring_ctx *ctx,
- int index)
-{
- struct io_fixed_file *slot = io_fixed_file_slot(&ctx->file_table, index);
-
- return (struct file *) (slot->file_ptr & FFS_MASK);
-}
-
-static void io_fixed_file_set(struct io_fixed_file *file_slot, struct file *file)
-{
- unsigned long file_ptr = (unsigned long) file;
-
- file_ptr |= io_file_get_flags(file);
- file_slot->file_ptr = file_ptr;
-}
-
-static inline struct file *io_file_get_fixed(struct io_kiocb *req, int fd,
- unsigned int issue_flags)
-{
- struct io_ring_ctx *ctx = req->ctx;
- struct file *file = NULL;
- unsigned long file_ptr;
-
- if (issue_flags & IO_URING_F_UNLOCKED)
- mutex_lock(&ctx->uring_lock);
-
- if (unlikely((unsigned int)fd >= ctx->nr_user_files))
- goto out;
- fd = array_index_nospec(fd, ctx->nr_user_files);
- file_ptr = io_fixed_file_slot(&ctx->file_table, fd)->file_ptr;
- file = (struct file *) (file_ptr & FFS_MASK);
- file_ptr &= ~FFS_MASK;
- /* mask in overlapping REQ_F and FFS bits */
- req->flags |= (file_ptr << REQ_F_SUPPORT_NOWAIT_BIT);
- io_req_set_rsrc_node(req, ctx, 0);
-out:
- if (issue_flags & IO_URING_F_UNLOCKED)
- mutex_unlock(&ctx->uring_lock);
- return file;
-}
-
-static struct file *io_file_get_normal(struct io_kiocb *req, int fd)
-{
- struct file *file = fget(fd);
-
- trace_io_uring_file_get(req->ctx, req, req->user_data, fd);
-
- /* we don't allow fixed io_uring files */
- if (file && file->f_op == &io_uring_fops)
- io_req_track_inflight(req);
- return file;
-}
-
-static void io_req_task_link_timeout(struct io_kiocb *req, bool *locked)
-{
- struct io_kiocb *prev = req->timeout.prev;
- int ret = -ENOENT;
-
- if (prev) {
- if (!(req->task->flags & PF_EXITING))
- ret = io_try_cancel_userdata(req, prev->user_data);
- io_req_complete_post(req, ret ?: -ETIME, 0);
- io_put_req(prev);
- } else {
- io_req_complete_post(req, -ETIME, 0);
- }
-}
-
-static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer)
-{
- struct io_timeout_data *data = container_of(timer,
- struct io_timeout_data, timer);
- struct io_kiocb *prev, *req = data->req;
- struct io_ring_ctx *ctx = req->ctx;
- unsigned long flags;
-
- spin_lock_irqsave(&ctx->timeout_lock, flags);
- prev = req->timeout.head;
- req->timeout.head = NULL;
-
- /*
- * We don't expect the list to be empty, that will only happen if we
- * race with the completion of the linked work.
- */
- if (prev) {
- io_remove_next_linked(prev);
- if (!req_ref_inc_not_zero(prev))
- prev = NULL;
- }
- list_del(&req->timeout.list);
- req->timeout.prev = prev;
- spin_unlock_irqrestore(&ctx->timeout_lock, flags);
-
- req->io_task_work.func = io_req_task_link_timeout;
- io_req_task_work_add(req, false);
- return HRTIMER_NORESTART;
-}
-
-static void io_queue_linked_timeout(struct io_kiocb *req)
-{
- struct io_ring_ctx *ctx = req->ctx;
-
- spin_lock_irq(&ctx->timeout_lock);
- /*
- * If the back reference is NULL, then our linked request finished
- * before we got a chance to setup the timer
- */
- if (req->timeout.head) {
- struct io_timeout_data *data = req->async_data;
-
- data->timer.function = io_link_timeout_fn;
- hrtimer_start(&data->timer, timespec64_to_ktime(data->ts),
- data->mode);
- list_add_tail(&req->timeout.list, &ctx->ltimeout_list);
- }
- spin_unlock_irq(&ctx->timeout_lock);
- /* drop submission reference */
- io_put_req(req);
-}
-
-static void io_queue_sqe_arm_apoll(struct io_kiocb *req)
- __must_hold(&req->ctx->uring_lock)
-{
- struct io_kiocb *linked_timeout = io_prep_linked_timeout(req);
-
- switch (io_arm_poll_handler(req, 0)) {
- case IO_APOLL_READY:
- io_req_task_queue(req);
- break;
- case IO_APOLL_ABORTED:
- /*
- * Queued up for async execution, worker will release
- * submit reference when the iocb is actually submitted.
- */
- io_queue_async_work(req, NULL);
- break;
- case IO_APOLL_OK:
- break;
- }
-
- if (linked_timeout)
- io_queue_linked_timeout(linked_timeout);
-}
-
-static inline void __io_queue_sqe(struct io_kiocb *req)
- __must_hold(&req->ctx->uring_lock)
-{
- struct io_kiocb *linked_timeout;
- int ret;
-
- ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
-
- if (req->flags & REQ_F_COMPLETE_INLINE) {
- io_req_add_compl_list(req);
- return;
- }
- /*
- * We async punt it if the file wasn't marked NOWAIT, or if the file
- * doesn't support non-blocking read/write attempts
- */
- if (likely(!ret)) {
- linked_timeout = io_prep_linked_timeout(req);
- if (linked_timeout)
- io_queue_linked_timeout(linked_timeout);
- } else if (ret == -EAGAIN && !(req->flags & REQ_F_NOWAIT)) {
- io_queue_sqe_arm_apoll(req);
- } else {
- io_req_complete_failed(req, ret);
- }
-}
-
-static void io_queue_sqe_fallback(struct io_kiocb *req)
- __must_hold(&req->ctx->uring_lock)
-{
- if (req->flags & REQ_F_FAIL) {
- io_req_complete_fail_submit(req);
- } else if (unlikely(req->ctx->drain_active)) {
- io_drain_req(req);
- } else {
- int ret = io_req_prep_async(req);
-
- if (unlikely(ret))
- io_req_complete_failed(req, ret);
- else
- io_queue_async_work(req, NULL);
- }
-}
-
-static inline void io_queue_sqe(struct io_kiocb *req)
- __must_hold(&req->ctx->uring_lock)
-{
- if (likely(!(req->flags & (REQ_F_FORCE_ASYNC | REQ_F_FAIL))))
- __io_queue_sqe(req);
- else
- io_queue_sqe_fallback(req);
-}
-
-/*
- * Check SQE restrictions (opcode and flags).
- *
- * Returns 'true' if SQE is allowed, 'false' otherwise.
- */
-static inline bool io_check_restriction(struct io_ring_ctx *ctx,
- struct io_kiocb *req,
- unsigned int sqe_flags)
-{
- if (!test_bit(req->opcode, ctx->restrictions.sqe_op))
- return false;
-
- if ((sqe_flags & ctx->restrictions.sqe_flags_required) !=
- ctx->restrictions.sqe_flags_required)
- return false;
-
- if (sqe_flags & ~(ctx->restrictions.sqe_flags_allowed |
- ctx->restrictions.sqe_flags_required))
- return false;
-
- return true;
-}
-
-static void io_init_req_drain(struct io_kiocb *req)
-{
- struct io_ring_ctx *ctx = req->ctx;
- struct io_kiocb *head = ctx->submit_state.link.head;
-
- ctx->drain_active = true;
- if (head) {
- /*
- * If we need to drain a request in the middle of a link, drain
- * the head request and the next request/link after the current
- * link. Considering sequential execution of links,
- * REQ_F_IO_DRAIN will be maintained for every request of our
- * link.
- */
- head->flags |= REQ_F_IO_DRAIN | REQ_F_FORCE_ASYNC;
- ctx->drain_next = true;
- }
-}
-
-static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
- __must_hold(&ctx->uring_lock)
-{
- unsigned int sqe_flags;
- int personality;
- u8 opcode;
-
- /* req is partially pre-initialised, see io_preinit_req() */
- req->opcode = opcode = READ_ONCE(sqe->opcode);
- /* same numerical values with corresponding REQ_F_*, safe to copy */
- req->flags = sqe_flags = READ_ONCE(sqe->flags);
- req->user_data = READ_ONCE(sqe->user_data);
- req->file = NULL;
- req->fixed_rsrc_refs = NULL;
- req->task = current;
-
- if (unlikely(opcode >= IORING_OP_LAST)) {
- req->opcode = 0;
- return -EINVAL;
- }
- if (unlikely(sqe_flags & ~SQE_COMMON_FLAGS)) {
- /* enforce forwards compatibility on users */
- if (sqe_flags & ~SQE_VALID_FLAGS)
- return -EINVAL;
- if ((sqe_flags & IOSQE_BUFFER_SELECT) &&
- !io_op_defs[opcode].buffer_select)
- return -EOPNOTSUPP;
- if (sqe_flags & IOSQE_CQE_SKIP_SUCCESS)
- ctx->drain_disabled = true;
- if (sqe_flags & IOSQE_IO_DRAIN) {
- if (ctx->drain_disabled)
- return -EOPNOTSUPP;
- io_init_req_drain(req);
- }
- }
- if (unlikely(ctx->restricted || ctx->drain_active || ctx->drain_next)) {
- if (ctx->restricted && !io_check_restriction(ctx, req, sqe_flags))
- return -EACCES;
- /* knock it to the slow queue path, will be drained there */
- if (ctx->drain_active)
- req->flags |= REQ_F_FORCE_ASYNC;
- /* if there is no link, we're at "next" request and need to drain */
- if (unlikely(ctx->drain_next) && !ctx->submit_state.link.head) {
- ctx->drain_next = false;
- ctx->drain_active = true;
- req->flags |= REQ_F_IO_DRAIN | REQ_F_FORCE_ASYNC;
- }
- }
-
- if (io_op_defs[opcode].needs_file) {
- struct io_submit_state *state = &ctx->submit_state;
-
- req->fd = READ_ONCE(sqe->fd);
-
- /*
- * Plug now if we have more than 2 IO left after this, and the
- * target is potentially a read/write to block based storage.
- */
- if (state->need_plug && io_op_defs[opcode].plug) {
- state->plug_started = true;
- state->need_plug = false;
- blk_start_plug_nr_ios(&state->plug, state->submit_nr);
- }
- }
-
- personality = READ_ONCE(sqe->personality);
- if (personality) {
- int ret;
-
- req->creds = xa_load(&ctx->personalities, personality);
- if (!req->creds)
- return -EINVAL;
- get_cred(req->creds);
- ret = security_uring_override_creds(req->creds);
- if (ret) {
- put_cred(req->creds);
- return ret;
- }
- req->flags |= REQ_F_CREDS;
- }
-
- return io_req_prep(req, sqe);
-}
-
-static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
- __must_hold(&ctx->uring_lock)
-{
- struct io_submit_link *link = &ctx->submit_state.link;
- int ret;
-
- ret = io_init_req(ctx, req, sqe);
- if (unlikely(ret)) {
- trace_io_uring_req_failed(sqe, ctx, req, ret);
-
- /* fail even hard links since we don't submit */
- if (link->head) {
- /*
- * we can judge a link req is failed or cancelled by if
- * REQ_F_FAIL is set, but the head is an exception since
- * it may be set REQ_F_FAIL because of other req's failure
- * so let's leverage req->result to distinguish if a head
- * is set REQ_F_FAIL because of its failure or other req's
- * failure so that we can set the correct ret code for it.
- * init result here to avoid affecting the normal path.
- */
- if (!(link->head->flags & REQ_F_FAIL))
- req_fail_link_node(link->head, -ECANCELED);
- } else if (!(req->flags & (REQ_F_LINK | REQ_F_HARDLINK))) {
- /*
- * the current req is a normal req, we should return
- * error and thus break the submittion loop.
- */
- io_req_complete_failed(req, ret);
- return ret;
- }
- req_fail_link_node(req, ret);
- }
-
- /* don't need @sqe from now on */
- trace_io_uring_submit_sqe(ctx, req, req->user_data, req->opcode,
- req->flags, true,
- ctx->flags & IORING_SETUP_SQPOLL);
-
- /*
- * If we already have a head request, queue this one for async
- * submittal once the head completes. If we don't have a head but
- * IOSQE_IO_LINK is set in the sqe, start a new head. This one will be
- * submitted sync once the chain is complete. If none of those
- * conditions are true (normal request), then just queue it.
- */
- if (link->head) {
- struct io_kiocb *head = link->head;
-
- if (!(req->flags & REQ_F_FAIL)) {
- ret = io_req_prep_async(req);
- if (unlikely(ret)) {
- req_fail_link_node(req, ret);
- if (!(head->flags & REQ_F_FAIL))
- req_fail_link_node(head, -ECANCELED);
- }
- }
- trace_io_uring_link(ctx, req, head);
- link->last->link = req;
- link->last = req;
-
- if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK))
- return 0;
- /* last request of a link, enqueue the link */
- link->head = NULL;
- req = head;
- } else if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK)) {
- link->head = req;
- link->last = req;
- return 0;
- }
-
- io_queue_sqe(req);
- return 0;
-}
-
-/*
- * Batched submission is done, ensure local IO is flushed out.
- */
-static void io_submit_state_end(struct io_ring_ctx *ctx)
-{
- struct io_submit_state *state = &ctx->submit_state;
-
- if (state->link.head)
- io_queue_sqe(state->link.head);
- /* flush only after queuing links as they can generate completions */
- io_submit_flush_completions(ctx);
- if (state->plug_started)
- blk_finish_plug(&state->plug);
-}
-
-/*
- * Start submission side cache.
- */
-static void io_submit_state_start(struct io_submit_state *state,
- unsigned int max_ios)
-{
- state->plug_started = false;
- state->need_plug = max_ios > 2;
- state->submit_nr = max_ios;
- /* set only head, no need to init link_last in advance */
- state->link.head = NULL;
-}
-
-static void io_commit_sqring(struct io_ring_ctx *ctx)
-{
- struct io_rings *rings = ctx->rings;
-
- /*
- * Ensure any loads from the SQEs are done at this point,
- * since once we write the new head, the application could
- * write new data to them.
- */
- smp_store_release(&rings->sq.head, ctx->cached_sq_head);
-}
-
-/*
- * Fetch an sqe, if one is available. Note this returns a pointer to memory
- * that is mapped by userspace. This means that care needs to be taken to
- * ensure that reads are stable, as we cannot rely on userspace always
- * being a good citizen. If members of the sqe are validated and then later
- * used, it's important that those reads are done through READ_ONCE() to
- * prevent a re-load down the line.
- */
-static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx)
-{
- unsigned head, mask = ctx->sq_entries - 1;
- unsigned sq_idx = ctx->cached_sq_head++ & mask;
-
- /*
- * The cached sq head (or cq tail) serves two purposes:
- *
- * 1) allows us to batch the cost of updating the user visible
- * head updates.
- * 2) allows the kernel side to track the head on its own, even
- * though the application is the one updating it.
- */
- head = READ_ONCE(ctx->sq_array[sq_idx]);
- if (likely(head < ctx->sq_entries))
- return &ctx->sq_sqes[head];
-
- /* drop invalid entries */
- ctx->cq_extra--;
- WRITE_ONCE(ctx->rings->sq_dropped,
- READ_ONCE(ctx->rings->sq_dropped) + 1);
- return NULL;
-}
-
-static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
- __must_hold(&ctx->uring_lock)
-{
- unsigned int entries = io_sqring_entries(ctx);
- int submitted = 0;
-
- if (unlikely(!entries))
- return 0;
- /* make sure SQ entry isn't read before tail */
- nr = min3(nr, ctx->sq_entries, entries);
- io_get_task_refs(nr);
-
- io_submit_state_start(&ctx->submit_state, nr);
- do {
- const struct io_uring_sqe *sqe;
- struct io_kiocb *req;
-
- if (unlikely(!io_alloc_req_refill(ctx))) {
- if (!submitted)
- submitted = -EAGAIN;
- break;
- }
- req = io_alloc_req(ctx);
- sqe = io_get_sqe(ctx);
- if (unlikely(!sqe)) {
- wq_stack_add_head(&req->comp_list, &ctx->submit_state.free_list);
- break;
- }
- /* will complete beyond this point, count as submitted */
- submitted++;
- if (io_submit_sqe(ctx, req, sqe)) {
- /*
- * Continue submitting even for sqe failure if the
- * ring was setup with IORING_SETUP_SUBMIT_ALL
- */
- if (!(ctx->flags & IORING_SETUP_SUBMIT_ALL))
- break;
- }
- } while (submitted < nr);
-
- if (unlikely(submitted != nr)) {
- int ref_used = (submitted == -EAGAIN) ? 0 : submitted;
- int unused = nr - ref_used;
-
- current->io_uring->cached_refs += unused;
- }
-
- io_submit_state_end(ctx);
- /* Commit SQ ring head once we've consumed and submitted all SQEs */
- io_commit_sqring(ctx);
-
- return submitted;
-}
-
-static inline bool io_sqd_events_pending(struct io_sq_data *sqd)
-{
- return READ_ONCE(sqd->state);
-}
-
-static inline void io_ring_set_wakeup_flag(struct io_ring_ctx *ctx)
-{
- /* Tell userspace we may need a wakeup call */
- spin_lock(&ctx->completion_lock);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags | IORING_SQ_NEED_WAKEUP);
- spin_unlock(&ctx->completion_lock);
-}
-
-static inline void io_ring_clear_wakeup_flag(struct io_ring_ctx *ctx)
-{
- spin_lock(&ctx->completion_lock);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags & ~IORING_SQ_NEED_WAKEUP);
- spin_unlock(&ctx->completion_lock);
-}
-
-static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
-{
- unsigned int to_submit;
- int ret = 0;
-
- to_submit = io_sqring_entries(ctx);
- /* if we're handling multiple rings, cap submit size for fairness */
- if (cap_entries && to_submit > IORING_SQPOLL_CAP_ENTRIES_VALUE)
- to_submit = IORING_SQPOLL_CAP_ENTRIES_VALUE;
-
- if (!wq_list_empty(&ctx->iopoll_list) || to_submit) {
- const struct cred *creds = NULL;
-
- if (ctx->sq_creds != current_cred())
- creds = override_creds(ctx->sq_creds);
-
- mutex_lock(&ctx->uring_lock);
- if (!wq_list_empty(&ctx->iopoll_list))
- io_do_iopoll(ctx, true);
-
- /*
- * Don't submit if refs are dying, good for io_uring_register(),
- * but also it is relied upon by io_ring_exit_work()
- */
- if (to_submit && likely(!percpu_ref_is_dying(&ctx->refs)) &&
- !(ctx->flags & IORING_SETUP_R_DISABLED))
- ret = io_submit_sqes(ctx, to_submit);
- mutex_unlock(&ctx->uring_lock);
-
- if (to_submit && wq_has_sleeper(&ctx->sqo_sq_wait))
- wake_up(&ctx->sqo_sq_wait);
- if (creds)
- revert_creds(creds);
- }
-
- return ret;
-}
-
-static __cold void io_sqd_update_thread_idle(struct io_sq_data *sqd)
-{
- struct io_ring_ctx *ctx;
- unsigned sq_thread_idle = 0;
-
- list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
- sq_thread_idle = max(sq_thread_idle, ctx->sq_thread_idle);
- sqd->sq_thread_idle = sq_thread_idle;
-}
-
-static bool io_sqd_handle_event(struct io_sq_data *sqd)
-{
- bool did_sig = false;
- struct ksignal ksig;
-
- if (test_bit(IO_SQ_THREAD_SHOULD_PARK, &sqd->state) ||
- signal_pending(current)) {
- mutex_unlock(&sqd->lock);
- if (signal_pending(current))
- did_sig = get_signal(&ksig);
- cond_resched();
- mutex_lock(&sqd->lock);
- }
- return did_sig || test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state);
-}
-
-static int io_sq_thread(void *data)
-{
- struct io_sq_data *sqd = data;
- struct io_ring_ctx *ctx;
- unsigned long timeout = 0;
- char buf[TASK_COMM_LEN];
- DEFINE_WAIT(wait);
-
- snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
- set_task_comm(current, buf);
-
- if (sqd->sq_cpu != -1)
- set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu));
- else
- set_cpus_allowed_ptr(current, cpu_online_mask);
- current->flags |= PF_NO_SETAFFINITY;
-
- audit_alloc_kernel(current);
-
- mutex_lock(&sqd->lock);
- while (1) {
- bool cap_entries, sqt_spin = false;
-
- if (io_sqd_events_pending(sqd) || signal_pending(current)) {
- if (io_sqd_handle_event(sqd))
- break;
- timeout = jiffies + sqd->sq_thread_idle;
- }
-
- cap_entries = !list_is_singular(&sqd->ctx_list);
- list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
- int ret = __io_sq_thread(ctx, cap_entries);
-
- if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list)))
- sqt_spin = true;
- }
- if (io_run_task_work())
- sqt_spin = true;
-
- if (sqt_spin || !time_after(jiffies, timeout)) {
- cond_resched();
- if (sqt_spin)
- timeout = jiffies + sqd->sq_thread_idle;
- continue;
- }
-
- prepare_to_wait(&sqd->wait, &wait, TASK_INTERRUPTIBLE);
- if (!io_sqd_events_pending(sqd) && !task_work_pending(current)) {
- bool needs_sched = true;
-
- list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
- io_ring_set_wakeup_flag(ctx);
-
- if ((ctx->flags & IORING_SETUP_IOPOLL) &&
- !wq_list_empty(&ctx->iopoll_list)) {
- needs_sched = false;
- break;
- }
-
- /*
- * Ensure the store of the wakeup flag is not
- * reordered with the load of the SQ tail
- */
- smp_mb();
-
- if (io_sqring_entries(ctx)) {
- needs_sched = false;
- break;
- }
- }
-
- if (needs_sched) {
- mutex_unlock(&sqd->lock);
- schedule();
- mutex_lock(&sqd->lock);
- }
- list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
- io_ring_clear_wakeup_flag(ctx);
- }
-
- finish_wait(&sqd->wait, &wait);
- timeout = jiffies + sqd->sq_thread_idle;
- }
-
- io_uring_cancel_generic(true, sqd);
- sqd->thread = NULL;
- list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
- io_ring_set_wakeup_flag(ctx);
- io_run_task_work();
- mutex_unlock(&sqd->lock);
-
- audit_free(current);
-
- complete(&sqd->exited);
- do_exit(0);
-}
-
-struct io_wait_queue {
- struct wait_queue_entry wq;
- struct io_ring_ctx *ctx;
- unsigned cq_tail;
- unsigned nr_timeouts;
-};
-
-static inline bool io_should_wake(struct io_wait_queue *iowq)
-{
- struct io_ring_ctx *ctx = iowq->ctx;
- int dist = ctx->cached_cq_tail - (int) iowq->cq_tail;
-
- /*
- * Wake up if we have enough events, or if a timeout occurred since we
- * started waiting. For timeouts, we always want to return to userspace,
- * regardless of event count.
- */
- return dist >= 0 || atomic_read(&ctx->cq_timeouts) != iowq->nr_timeouts;
-}
-
-static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode,
- int wake_flags, void *key)
-{
- struct io_wait_queue *iowq = container_of(curr, struct io_wait_queue,
- wq);
-
- /*
- * Cannot safely flush overflowed CQEs from here, ensure we wake up
- * the task, and the next invocation will do it.
- */
- if (io_should_wake(iowq) || test_bit(0, &iowq->ctx->check_cq_overflow))
- return autoremove_wake_function(curr, mode, wake_flags, key);
- return -1;
-}
-
-static int io_run_task_work_sig(void)
-{
- if (io_run_task_work())
- return 1;
- if (test_thread_flag(TIF_NOTIFY_SIGNAL))
- return -ERESTARTSYS;
- if (task_sigpending(current))
- return -EINTR;
- return 0;
-}
-
-/* when returns >0, the caller should retry */
-static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
- struct io_wait_queue *iowq,
- ktime_t timeout)
-{
- int ret;
-
- /* make sure we run task_work before checking for signals */
- ret = io_run_task_work_sig();
- if (ret || io_should_wake(iowq))
- return ret;
- /* let the caller flush overflows, retry */
- if (test_bit(0, &ctx->check_cq_overflow))
- return 1;
-
- if (!schedule_hrtimeout(&timeout, HRTIMER_MODE_ABS))
- return -ETIME;
- return 1;
-}
-
-/*
- * Wait until events become available, if we don't already have some. The
- * application must reap them itself, as they reside on the shared cq ring.
- */
-static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
- const sigset_t __user *sig, size_t sigsz,
- struct __kernel_timespec __user *uts)
-{
- struct io_wait_queue iowq;
- struct io_rings *rings = ctx->rings;
- ktime_t timeout = KTIME_MAX;
- int ret;
-
- do {
- io_cqring_overflow_flush(ctx);
- if (io_cqring_events(ctx) >= min_events)
- return 0;
- if (!io_run_task_work())
- break;
- } while (1);
-
- if (sig) {
-#ifdef CONFIG_COMPAT
- if (in_compat_syscall())
- ret = set_compat_user_sigmask((const compat_sigset_t __user *)sig,
- sigsz);
- else
-#endif
- ret = set_user_sigmask(sig, sigsz);
-
- if (ret)
- return ret;
- }
-
- if (uts) {
- struct timespec64 ts;
-
- if (get_timespec64(&ts, uts))
- return -EFAULT;
- timeout = ktime_add_ns(timespec64_to_ktime(ts), ktime_get_ns());
- }
-
- init_waitqueue_func_entry(&iowq.wq, io_wake_function);
- iowq.wq.private = current;
- INIT_LIST_HEAD(&iowq.wq.entry);
- iowq.ctx = ctx;
- iowq.nr_timeouts = atomic_read(&ctx->cq_timeouts);
- iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events;
-
- trace_io_uring_cqring_wait(ctx, min_events);
- do {
- /* if we can't even flush overflow, don't wait for more */
- if (!io_cqring_overflow_flush(ctx)) {
- ret = -EBUSY;
- break;
- }
- prepare_to_wait_exclusive(&ctx->cq_wait, &iowq.wq,
- TASK_INTERRUPTIBLE);
- ret = io_cqring_wait_schedule(ctx, &iowq, timeout);
- finish_wait(&ctx->cq_wait, &iowq.wq);
- cond_resched();
- } while (ret > 0);
-
- restore_saved_sigmask_unless(ret == -EINTR);
-
- return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0;
-}
-
-static void io_free_page_table(void **table, size_t size)
-{
- unsigned i, nr_tables = DIV_ROUND_UP(size, PAGE_SIZE);
-
- for (i = 0; i < nr_tables; i++)
- kfree(table[i]);
- kfree(table);
-}
-
-static __cold void **io_alloc_page_table(size_t size)
-{
- unsigned i, nr_tables = DIV_ROUND_UP(size, PAGE_SIZE);
- size_t init_size = size;
- void **table;
-
- table = kcalloc(nr_tables, sizeof(*table), GFP_KERNEL_ACCOUNT);
- if (!table)
- return NULL;
-
- for (i = 0; i < nr_tables; i++) {
- unsigned int this_size = min_t(size_t, size, PAGE_SIZE);
-
- table[i] = kzalloc(this_size, GFP_KERNEL_ACCOUNT);
- if (!table[i]) {
- io_free_page_table(table, init_size);
- return NULL;
- }
- size -= this_size;
- }
- return table;
-}
-
-static void io_rsrc_node_destroy(struct io_rsrc_node *ref_node)
-{
- percpu_ref_exit(&ref_node->refs);
- kfree(ref_node);
-}
-
-static __cold void io_rsrc_node_ref_zero(struct percpu_ref *ref)
-{
- struct io_rsrc_node *node = container_of(ref, struct io_rsrc_node, refs);
- struct io_ring_ctx *ctx = node->rsrc_data->ctx;
- unsigned long flags;
- bool first_add = false;
- unsigned long delay = HZ;
-
- spin_lock_irqsave(&ctx->rsrc_ref_lock, flags);
- node->done = true;
-
- /* if we are mid-quiesce then do not delay */
- if (node->rsrc_data->quiesce)
- delay = 0;
-
- while (!list_empty(&ctx->rsrc_ref_list)) {
- node = list_first_entry(&ctx->rsrc_ref_list,
- struct io_rsrc_node, node);
- /* recycle ref nodes in order */
- if (!node->done)
- break;
- list_del(&node->node);
- first_add |= llist_add(&node->llist, &ctx->rsrc_put_llist);
- }
- spin_unlock_irqrestore(&ctx->rsrc_ref_lock, flags);
-
- if (first_add)
- mod_delayed_work(system_wq, &ctx->rsrc_put_work, delay);
-}
-
-static struct io_rsrc_node *io_rsrc_node_alloc(void)
-{
- struct io_rsrc_node *ref_node;
-
- ref_node = kzalloc(sizeof(*ref_node), GFP_KERNEL);
- if (!ref_node)
- return NULL;
-
- if (percpu_ref_init(&ref_node->refs, io_rsrc_node_ref_zero,
- 0, GFP_KERNEL)) {
- kfree(ref_node);
- return NULL;
- }
- INIT_LIST_HEAD(&ref_node->node);
- INIT_LIST_HEAD(&ref_node->rsrc_list);
- ref_node->done = false;
- return ref_node;
-}
-
-static void io_rsrc_node_switch(struct io_ring_ctx *ctx,
- struct io_rsrc_data *data_to_kill)
- __must_hold(&ctx->uring_lock)
-{
- WARN_ON_ONCE(!ctx->rsrc_backup_node);
- WARN_ON_ONCE(data_to_kill && !ctx->rsrc_node);
-
- io_rsrc_refs_drop(ctx);
-
- if (data_to_kill) {
- struct io_rsrc_node *rsrc_node = ctx->rsrc_node;
-
- rsrc_node->rsrc_data = data_to_kill;
- spin_lock_irq(&ctx->rsrc_ref_lock);
- list_add_tail(&rsrc_node->node, &ctx->rsrc_ref_list);
- spin_unlock_irq(&ctx->rsrc_ref_lock);
-
- atomic_inc(&data_to_kill->refs);
- percpu_ref_kill(&rsrc_node->refs);
- ctx->rsrc_node = NULL;
- }
-
- if (!ctx->rsrc_node) {
- ctx->rsrc_node = ctx->rsrc_backup_node;
- ctx->rsrc_backup_node = NULL;
- }
-}
-
-static int io_rsrc_node_switch_start(struct io_ring_ctx *ctx)
-{
- if (ctx->rsrc_backup_node)
- return 0;
- ctx->rsrc_backup_node = io_rsrc_node_alloc();
- return ctx->rsrc_backup_node ? 0 : -ENOMEM;
-}
-
-static __cold int io_rsrc_ref_quiesce(struct io_rsrc_data *data,
- struct io_ring_ctx *ctx)
-{
- int ret;
-
- /* As we may drop ->uring_lock, other task may have started quiesce */
- if (data->quiesce)
- return -ENXIO;
-
- data->quiesce = true;
- do {
- ret = io_rsrc_node_switch_start(ctx);
- if (ret)
- break;
- io_rsrc_node_switch(ctx, data);
-
- /* kill initial ref, already quiesced if zero */
- if (atomic_dec_and_test(&data->refs))
- break;
- mutex_unlock(&ctx->uring_lock);
- flush_delayed_work(&ctx->rsrc_put_work);
- ret = wait_for_completion_interruptible(&data->done);
- if (!ret) {
- mutex_lock(&ctx->uring_lock);
- if (atomic_read(&data->refs) > 0) {
- /*
- * it has been revived by another thread while
- * we were unlocked
- */
- mutex_unlock(&ctx->uring_lock);
- } else {
- break;
- }
- }
-
- atomic_inc(&data->refs);
- /* wait for all works potentially completing data->done */
- flush_delayed_work(&ctx->rsrc_put_work);
- reinit_completion(&data->done);
-
- ret = io_run_task_work_sig();
- mutex_lock(&ctx->uring_lock);
- } while (ret >= 0);
- data->quiesce = false;
-
- return ret;
-}
-
-static u64 *io_get_tag_slot(struct io_rsrc_data *data, unsigned int idx)
-{
- unsigned int off = idx & IO_RSRC_TAG_TABLE_MASK;
- unsigned int table_idx = idx >> IO_RSRC_TAG_TABLE_SHIFT;
-
- return &data->tags[table_idx][off];
-}
-
-static void io_rsrc_data_free(struct io_rsrc_data *data)
-{
- size_t size = data->nr * sizeof(data->tags[0][0]);
-
- if (data->tags)
- io_free_page_table((void **)data->tags, size);
- kfree(data);
-}
-
-static __cold int io_rsrc_data_alloc(struct io_ring_ctx *ctx, rsrc_put_fn *do_put,
- u64 __user *utags, unsigned nr,
- struct io_rsrc_data **pdata)
-{
- struct io_rsrc_data *data;
- int ret = -ENOMEM;
- unsigned i;
-
- data = kzalloc(sizeof(*data), GFP_KERNEL);
- if (!data)
- return -ENOMEM;
- data->tags = (u64 **)io_alloc_page_table(nr * sizeof(data->tags[0][0]));
- if (!data->tags) {
- kfree(data);
- return -ENOMEM;
- }
-
- data->nr = nr;
- data->ctx = ctx;
- data->do_put = do_put;
- if (utags) {
- ret = -EFAULT;
- for (i = 0; i < nr; i++) {
- u64 *tag_slot = io_get_tag_slot(data, i);
-
- if (copy_from_user(tag_slot, &utags[i],
- sizeof(*tag_slot)))
- goto fail;
- }
- }
-
- atomic_set(&data->refs, 1);
- init_completion(&data->done);
- *pdata = data;
- return 0;
-fail:
- io_rsrc_data_free(data);
- return ret;
-}
-
-static bool io_alloc_file_tables(struct io_file_table *table, unsigned nr_files)
-{
- table->files = kvcalloc(nr_files, sizeof(table->files[0]),
- GFP_KERNEL_ACCOUNT);
- return !!table->files;
-}
-
-static void io_free_file_tables(struct io_file_table *table)
-{
- kvfree(table->files);
- table->files = NULL;
-}
-
-static void __io_sqe_files_unregister(struct io_ring_ctx *ctx)
-{
-#if defined(CONFIG_UNIX)
- if (ctx->ring_sock) {
- struct sock *sock = ctx->ring_sock->sk;
- struct sk_buff *skb;
-
- while ((skb = skb_dequeue(&sock->sk_receive_queue)) != NULL)
- kfree_skb(skb);
- }
-#else
- int i;
-
- for (i = 0; i < ctx->nr_user_files; i++) {
- struct file *file;
-
- file = io_file_from_index(ctx, i);
- if (file)
- fput(file);
- }
-#endif
- io_free_file_tables(&ctx->file_table);
- io_rsrc_data_free(ctx->file_data);
- ctx->file_data = NULL;
- ctx->nr_user_files = 0;
-}
-
-static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
-{
- unsigned nr = ctx->nr_user_files;
- int ret;
-
- if (!ctx->file_data)
- return -ENXIO;
-
- /*
- * Quiesce may unlock ->uring_lock, and while it's not held
- * prevent new requests using the table.
- */
- ctx->nr_user_files = 0;
- ret = io_rsrc_ref_quiesce(ctx->file_data, ctx);
- ctx->nr_user_files = nr;
- if (!ret)
- __io_sqe_files_unregister(ctx);
- return ret;
-}
-
-static void io_sq_thread_unpark(struct io_sq_data *sqd)
- __releases(&sqd->lock)
-{
- WARN_ON_ONCE(sqd->thread == current);
-
- /*
- * Do the dance but not conditional clear_bit() because it'd race with
- * other threads incrementing park_pending and setting the bit.
- */
- clear_bit(IO_SQ_THREAD_SHOULD_PARK, &sqd->state);
- if (atomic_dec_return(&sqd->park_pending))
- set_bit(IO_SQ_THREAD_SHOULD_PARK, &sqd->state);
- mutex_unlock(&sqd->lock);
-}
-
-static void io_sq_thread_park(struct io_sq_data *sqd)
- __acquires(&sqd->lock)
-{
- WARN_ON_ONCE(sqd->thread == current);
-
- atomic_inc(&sqd->park_pending);
- set_bit(IO_SQ_THREAD_SHOULD_PARK, &sqd->state);
- mutex_lock(&sqd->lock);
- if (sqd->thread)
- wake_up_process(sqd->thread);
-}
-
-static void io_sq_thread_stop(struct io_sq_data *sqd)
-{
- WARN_ON_ONCE(sqd->thread == current);
- WARN_ON_ONCE(test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state));
-
- set_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state);
- mutex_lock(&sqd->lock);
- if (sqd->thread)
- wake_up_process(sqd->thread);
- mutex_unlock(&sqd->lock);
- wait_for_completion(&sqd->exited);
-}
-
-static void io_put_sq_data(struct io_sq_data *sqd)
-{
- if (refcount_dec_and_test(&sqd->refs)) {
- WARN_ON_ONCE(atomic_read(&sqd->park_pending));
-
- io_sq_thread_stop(sqd);
- kfree(sqd);
- }
-}
-
-static void io_sq_thread_finish(struct io_ring_ctx *ctx)
-{
- struct io_sq_data *sqd = ctx->sq_data;
-
- if (sqd) {
- io_sq_thread_park(sqd);
- list_del_init(&ctx->sqd_list);
- io_sqd_update_thread_idle(sqd);
- io_sq_thread_unpark(sqd);
-
- io_put_sq_data(sqd);
- ctx->sq_data = NULL;
- }
-}
-
-static struct io_sq_data *io_attach_sq_data(struct io_uring_params *p)
-{
- struct io_ring_ctx *ctx_attach;
- struct io_sq_data *sqd;
- struct fd f;
-
- f = fdget(p->wq_fd);
- if (!f.file)
- return ERR_PTR(-ENXIO);
- if (f.file->f_op != &io_uring_fops) {
- fdput(f);
- return ERR_PTR(-EINVAL);
- }
-
- ctx_attach = f.file->private_data;
- sqd = ctx_attach->sq_data;
- if (!sqd) {
- fdput(f);
- return ERR_PTR(-EINVAL);
- }
- if (sqd->task_tgid != current->tgid) {
- fdput(f);
- return ERR_PTR(-EPERM);
- }
-
- refcount_inc(&sqd->refs);
- fdput(f);
- return sqd;
-}
-
-static struct io_sq_data *io_get_sq_data(struct io_uring_params *p,
- bool *attached)
-{
- struct io_sq_data *sqd;
-
- *attached = false;
- if (p->flags & IORING_SETUP_ATTACH_WQ) {
- sqd = io_attach_sq_data(p);
- if (!IS_ERR(sqd)) {
- *attached = true;
- return sqd;
- }
- /* fall through for EPERM case, setup new sqd/task */
- if (PTR_ERR(sqd) != -EPERM)
- return sqd;
- }
-
- sqd = kzalloc(sizeof(*sqd), GFP_KERNEL);
- if (!sqd)
- return ERR_PTR(-ENOMEM);
-
- atomic_set(&sqd->park_pending, 0);
- refcount_set(&sqd->refs, 1);
- INIT_LIST_HEAD(&sqd->ctx_list);
- mutex_init(&sqd->lock);
- init_waitqueue_head(&sqd->wait);
- init_completion(&sqd->exited);
- return sqd;
-}
-
-#if defined(CONFIG_UNIX)
-/*
- * Ensure the UNIX gc is aware of our file set, so we are certain that
- * the io_uring can be safely unregistered on process exit, even if we have
- * loops in the file referencing.
- */
-static int __io_sqe_files_scm(struct io_ring_ctx *ctx, int nr, int offset)
-{
- struct sock *sk = ctx->ring_sock->sk;
- struct scm_fp_list *fpl;
- struct sk_buff *skb;
- int i, nr_files;
-
- fpl = kzalloc(sizeof(*fpl), GFP_KERNEL);
- if (!fpl)
- return -ENOMEM;
-
- skb = alloc_skb(0, GFP_KERNEL);
- if (!skb) {
- kfree(fpl);
- return -ENOMEM;
- }
-
- skb->sk = sk;
-
- nr_files = 0;
- fpl->user = get_uid(current_user());
- for (i = 0; i < nr; i++) {
- struct file *file = io_file_from_index(ctx, i + offset);
-
- if (!file)
- continue;
- fpl->fp[nr_files] = get_file(file);
- unix_inflight(fpl->user, fpl->fp[nr_files]);
- nr_files++;
- }
-
- if (nr_files) {
- fpl->max = SCM_MAX_FD;
- fpl->count = nr_files;
- UNIXCB(skb).fp = fpl;
- skb->destructor = unix_destruct_scm;
- refcount_add(skb->truesize, &sk->sk_wmem_alloc);
- skb_queue_head(&sk->sk_receive_queue, skb);
-
- for (i = 0; i < nr; i++) {
- struct file *file = io_file_from_index(ctx, i + offset);
-
- if (file)
- fput(file);
- }
- } else {
- kfree_skb(skb);
- free_uid(fpl->user);
- kfree(fpl);
- }
-
- return 0;
-}
-
-/*
- * If UNIX sockets are enabled, fd passing can cause a reference cycle which
- * causes regular reference counting to break down. We rely on the UNIX
- * garbage collection to take care of this problem for us.
- */
-static int io_sqe_files_scm(struct io_ring_ctx *ctx)
-{
- unsigned left, total;
- int ret = 0;
-
- total = 0;
- left = ctx->nr_user_files;
- while (left) {
- unsigned this_files = min_t(unsigned, left, SCM_MAX_FD);
-
- ret = __io_sqe_files_scm(ctx, this_files, total);
- if (ret)
- break;
- left -= this_files;
- total += this_files;
- }
-
- if (!ret)
- return 0;
-
- while (total < ctx->nr_user_files) {
- struct file *file = io_file_from_index(ctx, total);
-
- if (file)
- fput(file);
- total++;
- }
-
- return ret;
-}
-#else
-static int io_sqe_files_scm(struct io_ring_ctx *ctx)
-{
- return 0;
-}
-#endif
-
-static void io_rsrc_file_put(struct io_ring_ctx *ctx, struct io_rsrc_put *prsrc)
-{
- struct file *file = prsrc->file;
-#if defined(CONFIG_UNIX)
- struct sock *sock = ctx->ring_sock->sk;
- struct sk_buff_head list, *head = &sock->sk_receive_queue;
- struct sk_buff *skb;
- int i;
-
- __skb_queue_head_init(&list);
-
- /*
- * Find the skb that holds this file in its SCM_RIGHTS. When found,
- * remove this entry and rearrange the file array.
- */
- skb = skb_dequeue(head);
- while (skb) {
- struct scm_fp_list *fp;
-
- fp = UNIXCB(skb).fp;
- for (i = 0; i < fp->count; i++) {
- int left;
-
- if (fp->fp[i] != file)
- continue;
-
- unix_notinflight(fp->user, fp->fp[i]);
- left = fp->count - 1 - i;
- if (left) {
- memmove(&fp->fp[i], &fp->fp[i + 1],
- left * sizeof(struct file *));
- }
- fp->count--;
- if (!fp->count) {
- kfree_skb(skb);
- skb = NULL;
- } else {
- __skb_queue_tail(&list, skb);
- }
- fput(file);
- file = NULL;
- break;
- }
-
- if (!file)
- break;
-
- __skb_queue_tail(&list, skb);
-
- skb = skb_dequeue(head);
- }
-
- if (skb_peek(&list)) {
- spin_lock_irq(&head->lock);
- while ((skb = __skb_dequeue(&list)) != NULL)
- __skb_queue_tail(head, skb);
- spin_unlock_irq(&head->lock);
- }
-#else
- fput(file);
-#endif
-}
-
-static void __io_rsrc_put_work(struct io_rsrc_node *ref_node)
-{
- struct io_rsrc_data *rsrc_data = ref_node->rsrc_data;
- struct io_ring_ctx *ctx = rsrc_data->ctx;
- struct io_rsrc_put *prsrc, *tmp;
-
- list_for_each_entry_safe(prsrc, tmp, &ref_node->rsrc_list, list) {
- list_del(&prsrc->list);
-
- if (prsrc->tag) {
- bool lock_ring = ctx->flags & IORING_SETUP_IOPOLL;
-
- io_ring_submit_lock(ctx, lock_ring);
- spin_lock(&ctx->completion_lock);
- io_fill_cqe_aux(ctx, prsrc->tag, 0, 0);
- io_commit_cqring(ctx);
- spin_unlock(&ctx->completion_lock);
- io_cqring_ev_posted(ctx);
- io_ring_submit_unlock(ctx, lock_ring);
- }
-
- rsrc_data->do_put(ctx, prsrc);
- kfree(prsrc);
- }
-
- io_rsrc_node_destroy(ref_node);
- if (atomic_dec_and_test(&rsrc_data->refs))
- complete(&rsrc_data->done);
-}
-
-static void io_rsrc_put_work(struct work_struct *work)
-{
- struct io_ring_ctx *ctx;
- struct llist_node *node;
-
- ctx = container_of(work, struct io_ring_ctx, rsrc_put_work.work);
- node = llist_del_all(&ctx->rsrc_put_llist);
-
- while (node) {
- struct io_rsrc_node *ref_node;
- struct llist_node *next = node->next;
-
- ref_node = llist_entry(node, struct io_rsrc_node, llist);
- __io_rsrc_put_work(ref_node);
- node = next;
- }
-}
-
-static int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
- unsigned nr_args, u64 __user *tags)
-{
- __s32 __user *fds = (__s32 __user *) arg;
- struct file *file;
- int fd, ret;
- unsigned i;
-
- if (ctx->file_data)
- return -EBUSY;
- if (!nr_args)
- return -EINVAL;
- if (nr_args > IORING_MAX_FIXED_FILES)
- return -EMFILE;
- if (nr_args > rlimit(RLIMIT_NOFILE))
- return -EMFILE;
- ret = io_rsrc_node_switch_start(ctx);
- if (ret)
- return ret;
- ret = io_rsrc_data_alloc(ctx, io_rsrc_file_put, tags, nr_args,
- &ctx->file_data);
- if (ret)
- return ret;
-
- ret = -ENOMEM;
- if (!io_alloc_file_tables(&ctx->file_table, nr_args))
- goto out_free;
-
- for (i = 0; i < nr_args; i++, ctx->nr_user_files++) {
- if (copy_from_user(&fd, &fds[i], sizeof(fd))) {
- ret = -EFAULT;
- goto out_fput;
- }
- /* allow sparse sets */
- if (fd == -1) {
- ret = -EINVAL;
- if (unlikely(*io_get_tag_slot(ctx->file_data, i)))
- goto out_fput;
- continue;
- }
-
- file = fget(fd);
- ret = -EBADF;
- if (unlikely(!file))
- goto out_fput;
-
- /*
- * Don't allow io_uring instances to be registered. If UNIX
- * isn't enabled, then this causes a reference cycle and this
- * instance can never get freed. If UNIX is enabled we'll
- * handle it just fine, but there's still no point in allowing
- * a ring fd as it doesn't support regular read/write anyway.
- */
- if (file->f_op == &io_uring_fops) {
- fput(file);
- goto out_fput;
- }
- io_fixed_file_set(io_fixed_file_slot(&ctx->file_table, i), file);
- }
-
- ret = io_sqe_files_scm(ctx);
- if (ret) {
- __io_sqe_files_unregister(ctx);
- return ret;
- }
-
- io_rsrc_node_switch(ctx, NULL);
- return ret;
-out_fput:
- for (i = 0; i < ctx->nr_user_files; i++) {
- file = io_file_from_index(ctx, i);
- if (file)
- fput(file);
- }
- io_free_file_tables(&ctx->file_table);
- ctx->nr_user_files = 0;
-out_free:
- io_rsrc_data_free(ctx->file_data);
- ctx->file_data = NULL;
- return ret;
-}
-
-static int io_sqe_file_register(struct io_ring_ctx *ctx, struct file *file,
- int index)
-{
-#if defined(CONFIG_UNIX)
- struct sock *sock = ctx->ring_sock->sk;
- struct sk_buff_head *head = &sock->sk_receive_queue;
- struct sk_buff *skb;
-
- /*
- * See if we can merge this file into an existing skb SCM_RIGHTS
- * file set. If there's no room, fall back to allocating a new skb
- * and filling it in.
- */
- spin_lock_irq(&head->lock);
- skb = skb_peek(head);
- if (skb) {
- struct scm_fp_list *fpl = UNIXCB(skb).fp;
-
- if (fpl->count < SCM_MAX_FD) {
- __skb_unlink(skb, head);
- spin_unlock_irq(&head->lock);
- fpl->fp[fpl->count] = get_file(file);
- unix_inflight(fpl->user, fpl->fp[fpl->count]);
- fpl->count++;
- spin_lock_irq(&head->lock);
- __skb_queue_head(head, skb);
- } else {
- skb = NULL;
- }
- }
- spin_unlock_irq(&head->lock);
-
- if (skb) {
- fput(file);
- return 0;
- }
-
- return __io_sqe_files_scm(ctx, 1, index);
-#else
- return 0;
-#endif
-}
-
-static int io_queue_rsrc_removal(struct io_rsrc_data *data, unsigned idx,
- struct io_rsrc_node *node, void *rsrc)
-{
- u64 *tag_slot = io_get_tag_slot(data, idx);
- struct io_rsrc_put *prsrc;
-
- prsrc = kzalloc(sizeof(*prsrc), GFP_KERNEL);
- if (!prsrc)
- return -ENOMEM;
-
- prsrc->tag = *tag_slot;
- *tag_slot = 0;
- prsrc->rsrc = rsrc;
- list_add(&prsrc->list, &node->rsrc_list);
- return 0;
-}
-
-static int io_install_fixed_file(struct io_kiocb *req, struct file *file,
- unsigned int issue_flags, u32 slot_index)
-{
- struct io_ring_ctx *ctx = req->ctx;
- bool needs_lock = issue_flags & IO_URING_F_UNLOCKED;
- bool needs_switch = false;
- struct io_fixed_file *file_slot;
- int ret = -EBADF;
-
- io_ring_submit_lock(ctx, needs_lock);
- if (file->f_op == &io_uring_fops)
- goto err;
- ret = -ENXIO;
- if (!ctx->file_data)
- goto err;
- ret = -EINVAL;
- if (slot_index >= ctx->nr_user_files)
- goto err;
-
- slot_index = array_index_nospec(slot_index, ctx->nr_user_files);
- file_slot = io_fixed_file_slot(&ctx->file_table, slot_index);
-
- if (file_slot->file_ptr) {
- struct file *old_file;
-
- ret = io_rsrc_node_switch_start(ctx);
- if (ret)
- goto err;
-
- old_file = (struct file *)(file_slot->file_ptr & FFS_MASK);
- ret = io_queue_rsrc_removal(ctx->file_data, slot_index,
- ctx->rsrc_node, old_file);
- if (ret)
- goto err;
- file_slot->file_ptr = 0;
- needs_switch = true;
- }
-
- *io_get_tag_slot(ctx->file_data, slot_index) = 0;
- io_fixed_file_set(file_slot, file);
- ret = io_sqe_file_register(ctx, file, slot_index);
- if (ret) {
- file_slot->file_ptr = 0;
- goto err;
- }
-
- ret = 0;
-err:
- if (needs_switch)
- io_rsrc_node_switch(ctx, ctx->file_data);
- io_ring_submit_unlock(ctx, needs_lock);
- if (ret)
- fput(file);
- return ret;
-}
-
-static int io_close_fixed(struct io_kiocb *req, unsigned int issue_flags)
-{
- unsigned int offset = req->close.file_slot - 1;
- struct io_ring_ctx *ctx = req->ctx;
- bool needs_lock = issue_flags & IO_URING_F_UNLOCKED;
- struct io_fixed_file *file_slot;
- struct file *file;
- int ret;
-
- io_ring_submit_lock(ctx, needs_lock);
- ret = -ENXIO;
- if (unlikely(!ctx->file_data))
- goto out;
- ret = -EINVAL;
- if (offset >= ctx->nr_user_files)
- goto out;
- ret = io_rsrc_node_switch_start(ctx);
- if (ret)
- goto out;
-
- offset = array_index_nospec(offset, ctx->nr_user_files);
- file_slot = io_fixed_file_slot(&ctx->file_table, offset);
- ret = -EBADF;
- if (!file_slot->file_ptr)
- goto out;
-
- file = (struct file *)(file_slot->file_ptr & FFS_MASK);
- ret = io_queue_rsrc_removal(ctx->file_data, offset, ctx->rsrc_node, file);
- if (ret)
- goto out;
-
- file_slot->file_ptr = 0;
- io_rsrc_node_switch(ctx, ctx->file_data);
- ret = 0;
-out:
- io_ring_submit_unlock(ctx, needs_lock);
- return ret;
-}
-
-static int __io_sqe_files_update(struct io_ring_ctx *ctx,
- struct io_uring_rsrc_update2 *up,
- unsigned nr_args)
-{
- u64 __user *tags = u64_to_user_ptr(up->tags);
- __s32 __user *fds = u64_to_user_ptr(up->data);
- struct io_rsrc_data *data = ctx->file_data;
- struct io_fixed_file *file_slot;
- struct file *file;
- int fd, i, err = 0;
- unsigned int done;
- bool needs_switch = false;
-
- if (!ctx->file_data)
- return -ENXIO;
- if (up->offset + nr_args > ctx->nr_user_files)
- return -EINVAL;
-
- for (done = 0; done < nr_args; done++) {
- u64 tag = 0;
-
- if ((tags && copy_from_user(&tag, &tags[done], sizeof(tag))) ||
- copy_from_user(&fd, &fds[done], sizeof(fd))) {
- err = -EFAULT;
- break;
- }
- if ((fd == IORING_REGISTER_FILES_SKIP || fd == -1) && tag) {
- err = -EINVAL;
- break;
- }
- if (fd == IORING_REGISTER_FILES_SKIP)
- continue;
-
- i = array_index_nospec(up->offset + done, ctx->nr_user_files);
- file_slot = io_fixed_file_slot(&ctx->file_table, i);
-
- if (file_slot->file_ptr) {
- file = (struct file *)(file_slot->file_ptr & FFS_MASK);
- err = io_queue_rsrc_removal(data, i, ctx->rsrc_node, file);
- if (err)
- break;
- file_slot->file_ptr = 0;
- needs_switch = true;
- }
- if (fd != -1) {
- file = fget(fd);
- if (!file) {
- err = -EBADF;
- break;
- }
- /*
- * Don't allow io_uring instances to be registered. If
- * UNIX isn't enabled, then this causes a reference
- * cycle and this instance can never get freed. If UNIX
- * is enabled we'll handle it just fine, but there's
- * still no point in allowing a ring fd as it doesn't
- * support regular read/write anyway.
- */
- if (file->f_op == &io_uring_fops) {
- fput(file);
- err = -EBADF;
- break;
- }
- *io_get_tag_slot(data, i) = tag;
- io_fixed_file_set(file_slot, file);
- err = io_sqe_file_register(ctx, file, i);
- if (err) {
- file_slot->file_ptr = 0;
- fput(file);
- break;
- }
- }
- }
-
- if (needs_switch)
- io_rsrc_node_switch(ctx, data);
- return done ? done : err;
-}
-
-static struct io_wq *io_init_wq_offload(struct io_ring_ctx *ctx,
- struct task_struct *task)
-{
- struct io_wq_hash *hash;
- struct io_wq_data data;
- unsigned int concurrency;
-
- mutex_lock(&ctx->uring_lock);
- hash = ctx->hash_map;
- if (!hash) {
- hash = kzalloc(sizeof(*hash), GFP_KERNEL);
- if (!hash) {
- mutex_unlock(&ctx->uring_lock);
- return ERR_PTR(-ENOMEM);
- }
- refcount_set(&hash->refs, 1);
- init_waitqueue_head(&hash->wait);
- ctx->hash_map = hash;
- }
- mutex_unlock(&ctx->uring_lock);
-
- data.hash = hash;
- data.task = task;
- data.free_work = io_wq_free_work;
- data.do_work = io_wq_submit_work;
-
- /* Do QD, or 4 * CPUS, whatever is smallest */
- concurrency = min(ctx->sq_entries, 4 * num_online_cpus());
-
- return io_wq_create(concurrency, &data);
-}
-
-static __cold int io_uring_alloc_task_context(struct task_struct *task,
- struct io_ring_ctx *ctx)
-{
- struct io_uring_task *tctx;
- int ret;
-
- tctx = kzalloc(sizeof(*tctx), GFP_KERNEL);
- if (unlikely(!tctx))
- return -ENOMEM;
-
- tctx->registered_rings = kcalloc(IO_RINGFD_REG_MAX,
- sizeof(struct file *), GFP_KERNEL);
- if (unlikely(!tctx->registered_rings)) {
- kfree(tctx);
- return -ENOMEM;
- }
-
- ret = percpu_counter_init(&tctx->inflight, 0, GFP_KERNEL);
- if (unlikely(ret)) {
- kfree(tctx->registered_rings);
- kfree(tctx);
- return ret;
- }
-
- tctx->io_wq = io_init_wq_offload(ctx, task);
- if (IS_ERR(tctx->io_wq)) {
- ret = PTR_ERR(tctx->io_wq);
- percpu_counter_destroy(&tctx->inflight);
- kfree(tctx->registered_rings);
- kfree(tctx);
- return ret;
- }
-
- xa_init(&tctx->xa);
- init_waitqueue_head(&tctx->wait);
- atomic_set(&tctx->in_idle, 0);
- atomic_set(&tctx->inflight_tracked, 0);
- task->io_uring = tctx;
- spin_lock_init(&tctx->task_lock);
- INIT_WQ_LIST(&tctx->task_list);
- INIT_WQ_LIST(&tctx->prior_task_list);
- init_task_work(&tctx->task_work, tctx_task_work);
- return 0;
-}
-
-void __io_uring_free(struct task_struct *tsk)
-{
- struct io_uring_task *tctx = tsk->io_uring;
-
- WARN_ON_ONCE(!xa_empty(&tctx->xa));
- WARN_ON_ONCE(tctx->io_wq);
- WARN_ON_ONCE(tctx->cached_refs);
-
- kfree(tctx->registered_rings);
- percpu_counter_destroy(&tctx->inflight);
- kfree(tctx);
- tsk->io_uring = NULL;
-}
-
-static __cold int io_sq_offload_create(struct io_ring_ctx *ctx,
- struct io_uring_params *p)
-{
- int ret;
-
- /* Retain compatibility with failing for an invalid attach attempt */
- if ((ctx->flags & (IORING_SETUP_ATTACH_WQ | IORING_SETUP_SQPOLL)) ==
- IORING_SETUP_ATTACH_WQ) {
- struct fd f;
-
- f = fdget(p->wq_fd);
- if (!f.file)
- return -ENXIO;
- if (f.file->f_op != &io_uring_fops) {
- fdput(f);
- return -EINVAL;
- }
- fdput(f);
- }
- if (ctx->flags & IORING_SETUP_SQPOLL) {
- struct task_struct *tsk;
- struct io_sq_data *sqd;
- bool attached;
-
- ret = security_uring_sqpoll();
- if (ret)
- return ret;
-
- sqd = io_get_sq_data(p, &attached);
- if (IS_ERR(sqd)) {
- ret = PTR_ERR(sqd);
- goto err;
- }
-
- ctx->sq_creds = get_current_cred();
- ctx->sq_data = sqd;
- ctx->sq_thread_idle = msecs_to_jiffies(p->sq_thread_idle);
- if (!ctx->sq_thread_idle)
- ctx->sq_thread_idle = HZ;
-
- io_sq_thread_park(sqd);
- list_add(&ctx->sqd_list, &sqd->ctx_list);
- io_sqd_update_thread_idle(sqd);
- /* don't attach to a dying SQPOLL thread, would be racy */
- ret = (attached && !sqd->thread) ? -ENXIO : 0;
- io_sq_thread_unpark(sqd);
-
- if (ret < 0)
- goto err;
- if (attached)
- return 0;
-
- if (p->flags & IORING_SETUP_SQ_AFF) {
- int cpu = p->sq_thread_cpu;
-
- ret = -EINVAL;
- if (cpu >= nr_cpu_ids || !cpu_online(cpu))
- goto err_sqpoll;
- sqd->sq_cpu = cpu;
- } else {
- sqd->sq_cpu = -1;
- }
-
- sqd->task_pid = current->pid;
- sqd->task_tgid = current->tgid;
- tsk = create_io_thread(io_sq_thread, sqd, NUMA_NO_NODE);
- if (IS_ERR(tsk)) {
- ret = PTR_ERR(tsk);
- goto err_sqpoll;
- }
-
- sqd->thread = tsk;
- ret = io_uring_alloc_task_context(tsk, ctx);
- wake_up_new_task(tsk);
- if (ret)
- goto err;
- } else if (p->flags & IORING_SETUP_SQ_AFF) {
- /* Can't have SQ_AFF without SQPOLL */
- ret = -EINVAL;
- goto err;
- }
-
- return 0;
-err_sqpoll:
- complete(&ctx->sq_data->exited);
-err:
- io_sq_thread_finish(ctx);
- return ret;
-}
-
-static inline void __io_unaccount_mem(struct user_struct *user,
- unsigned long nr_pages)
-{
- atomic_long_sub(nr_pages, &user->locked_vm);
-}
-
-static inline int __io_account_mem(struct user_struct *user,
- unsigned long nr_pages)
-{
- unsigned long page_limit, cur_pages, new_pages;
-
- /* Don't allow more pages than we can safely lock */
- page_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-
- do {
- cur_pages = atomic_long_read(&user->locked_vm);
- new_pages = cur_pages + nr_pages;
- if (new_pages > page_limit)
- return -ENOMEM;
- } while (atomic_long_cmpxchg(&user->locked_vm, cur_pages,
- new_pages) != cur_pages);
-
- return 0;
-}
-
-static void io_unaccount_mem(struct io_ring_ctx *ctx, unsigned long nr_pages)
-{
- if (ctx->user)
- __io_unaccount_mem(ctx->user, nr_pages);
-
- if (ctx->mm_account)
- atomic64_sub(nr_pages, &ctx->mm_account->pinned_vm);
-}
-
-static int io_account_mem(struct io_ring_ctx *ctx, unsigned long nr_pages)
-{
- int ret;
-
- if (ctx->user) {
- ret = __io_account_mem(ctx->user, nr_pages);
- if (ret)
- return ret;
- }
-
- if (ctx->mm_account)
- atomic64_add(nr_pages, &ctx->mm_account->pinned_vm);
-
- return 0;
-}
-
-static void io_mem_free(void *ptr)
-{
- struct page *page;
-
- if (!ptr)
- return;
-
- page = virt_to_head_page(ptr);
- if (put_page_testzero(page))
- free_compound_page(page);
-}
-
-static void *io_mem_alloc(size_t size)
-{
- gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP;
-
- return (void *) __get_free_pages(gfp, get_order(size));
-}
-
-static unsigned long rings_size(unsigned sq_entries, unsigned cq_entries,
- size_t *sq_offset)
-{
- struct io_rings *rings;
- size_t off, sq_array_size;
-
- off = struct_size(rings, cqes, cq_entries);
- if (off == SIZE_MAX)
- return SIZE_MAX;
-
-#ifdef CONFIG_SMP
- off = ALIGN(off, SMP_CACHE_BYTES);
- if (off == 0)
- return SIZE_MAX;
-#endif
-
- if (sq_offset)
- *sq_offset = off;
-
- sq_array_size = array_size(sizeof(u32), sq_entries);
- if (sq_array_size == SIZE_MAX)
- return SIZE_MAX;
-
- if (check_add_overflow(off, sq_array_size, &off))
- return SIZE_MAX;
-
- return off;
-}
-
-static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_mapped_ubuf **slot)
-{
- struct io_mapped_ubuf *imu = *slot;
- unsigned int i;
-
- if (imu != ctx->dummy_ubuf) {
- for (i = 0; i < imu->nr_bvecs; i++)
- unpin_user_page(imu->bvec[i].bv_page);
- if (imu->acct_pages)
- io_unaccount_mem(ctx, imu->acct_pages);
- kvfree(imu);
- }
- *slot = NULL;
-}
-
-static void io_rsrc_buf_put(struct io_ring_ctx *ctx, struct io_rsrc_put *prsrc)
-{
- io_buffer_unmap(ctx, &prsrc->buf);
- prsrc->buf = NULL;
-}
-
-static void __io_sqe_buffers_unregister(struct io_ring_ctx *ctx)
-{
- unsigned int i;
-
- for (i = 0; i < ctx->nr_user_bufs; i++)
- io_buffer_unmap(ctx, &ctx->user_bufs[i]);
- kfree(ctx->user_bufs);
- io_rsrc_data_free(ctx->buf_data);
- ctx->user_bufs = NULL;
- ctx->buf_data = NULL;
- ctx->nr_user_bufs = 0;
-}
-
-static int io_sqe_buffers_unregister(struct io_ring_ctx *ctx)
-{
- unsigned nr = ctx->nr_user_bufs;
- int ret;
-
- if (!ctx->buf_data)
- return -ENXIO;
-
- /*
- * Quiesce may unlock ->uring_lock, and while it's not held
- * prevent new requests using the table.
- */
- ctx->nr_user_bufs = 0;
- ret = io_rsrc_ref_quiesce(ctx->buf_data, ctx);
- ctx->nr_user_bufs = nr;
- if (!ret)
- __io_sqe_buffers_unregister(ctx);
- return ret;
-}
-
-static int io_copy_iov(struct io_ring_ctx *ctx, struct iovec *dst,
- void __user *arg, unsigned index)
-{
- struct iovec __user *src;
-
-#ifdef CONFIG_COMPAT
- if (ctx->compat) {
- struct compat_iovec __user *ciovs;
- struct compat_iovec ciov;
-
- ciovs = (struct compat_iovec __user *) arg;
- if (copy_from_user(&ciov, &ciovs[index], sizeof(ciov)))
- return -EFAULT;
-
- dst->iov_base = u64_to_user_ptr((u64)ciov.iov_base);
- dst->iov_len = ciov.iov_len;
- return 0;
- }
-#endif
- src = (struct iovec __user *) arg;
- if (copy_from_user(dst, &src[index], sizeof(*dst)))
- return -EFAULT;
- return 0;
-}
-
-/*
- * Not super efficient, but this is just a registration time. And we do cache
- * the last compound head, so generally we'll only do a full search if we don't
- * match that one.
- *
- * We check if the given compound head page has already been accounted, to
- * avoid double accounting it. This allows us to account the full size of the
- * page, not just the constituent pages of a huge page.
- */
-static bool headpage_already_acct(struct io_ring_ctx *ctx, struct page **pages,
- int nr_pages, struct page *hpage)
-{
- int i, j;
-
- /* check current page array */
- for (i = 0; i < nr_pages; i++) {
- if (!PageCompound(pages[i]))
- continue;
- if (compound_head(pages[i]) == hpage)
- return true;
- }
-
- /* check previously registered pages */
- for (i = 0; i < ctx->nr_user_bufs; i++) {
- struct io_mapped_ubuf *imu = ctx->user_bufs[i];
-
- for (j = 0; j < imu->nr_bvecs; j++) {
- if (!PageCompound(imu->bvec[j].bv_page))
- continue;
- if (compound_head(imu->bvec[j].bv_page) == hpage)
- return true;
- }
- }
-
- return false;
-}
-
-static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages,
- int nr_pages, struct io_mapped_ubuf *imu,
- struct page **last_hpage)
-{
- int i, ret;
-
- imu->acct_pages = 0;
- for (i = 0; i < nr_pages; i++) {
- if (!PageCompound(pages[i])) {
- imu->acct_pages++;
- } else {
- struct page *hpage;
-
- hpage = compound_head(pages[i]);
- if (hpage == *last_hpage)
- continue;
- *last_hpage = hpage;
- if (headpage_already_acct(ctx, pages, i, hpage))
- continue;
- imu->acct_pages += page_size(hpage) >> PAGE_SHIFT;
- }
- }
-
- if (!imu->acct_pages)
- return 0;
-
- ret = io_account_mem(ctx, imu->acct_pages);
- if (ret)
- imu->acct_pages = 0;
- return ret;
-}
-
-static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov,
- struct io_mapped_ubuf **pimu,
- struct page **last_hpage)
-{
- struct io_mapped_ubuf *imu = NULL;
- struct vm_area_struct **vmas = NULL;
- struct page **pages = NULL;
- unsigned long off, start, end, ubuf;
- size_t size;
- int ret, pret, nr_pages, i;
-
- if (!iov->iov_base) {
- *pimu = ctx->dummy_ubuf;
- return 0;
- }
-
- ubuf = (unsigned long) iov->iov_base;
- end = (ubuf + iov->iov_len + PAGE_SIZE - 1) >> PAGE_SHIFT;
- start = ubuf >> PAGE_SHIFT;
- nr_pages = end - start;
-
- *pimu = NULL;
- ret = -ENOMEM;
-
- pages = kvmalloc_array(nr_pages, sizeof(struct page *), GFP_KERNEL);
- if (!pages)
- goto done;
-
- vmas = kvmalloc_array(nr_pages, sizeof(struct vm_area_struct *),
- GFP_KERNEL);
- if (!vmas)
- goto done;
-
- imu = kvmalloc(struct_size(imu, bvec, nr_pages), GFP_KERNEL);
- if (!imu)
- goto done;
-
- ret = 0;
- mmap_read_lock(current->mm);
- pret = pin_user_pages(ubuf, nr_pages, FOLL_WRITE | FOLL_LONGTERM,
- pages, vmas);
- if (pret == nr_pages) {
- /* don't support file backed memory */
- for (i = 0; i < nr_pages; i++) {
- struct vm_area_struct *vma = vmas[i];
-
- if (vma_is_shmem(vma))
- continue;
- if (vma->vm_file &&
- !is_file_hugepages(vma->vm_file)) {
- ret = -EOPNOTSUPP;
- break;
- }
- }
- } else {
- ret = pret < 0 ? pret : -EFAULT;
- }
- mmap_read_unlock(current->mm);
- if (ret) {
- /*
- * if we did partial map, or found file backed vmas,
- * release any pages we did get
- */
- if (pret > 0)
- unpin_user_pages(pages, pret);
- goto done;
- }
-
- ret = io_buffer_account_pin(ctx, pages, pret, imu, last_hpage);
- if (ret) {
- unpin_user_pages(pages, pret);
- goto done;
- }
-
- off = ubuf & ~PAGE_MASK;
- size = iov->iov_len;
- for (i = 0; i < nr_pages; i++) {
- size_t vec_len;
-
- vec_len = min_t(size_t, size, PAGE_SIZE - off);
- imu->bvec[i].bv_page = pages[i];
- imu->bvec[i].bv_len = vec_len;
- imu->bvec[i].bv_offset = off;
- off = 0;
- size -= vec_len;
- }
- /* store original address for later verification */
- imu->ubuf = ubuf;
- imu->ubuf_end = ubuf + iov->iov_len;
- imu->nr_bvecs = nr_pages;
- *pimu = imu;
- ret = 0;
-done:
- if (ret)
- kvfree(imu);
- kvfree(pages);
- kvfree(vmas);
- return ret;
-}
-
-static int io_buffers_map_alloc(struct io_ring_ctx *ctx, unsigned int nr_args)
-{
- ctx->user_bufs = kcalloc(nr_args, sizeof(*ctx->user_bufs), GFP_KERNEL);
- return ctx->user_bufs ? 0 : -ENOMEM;
-}
-
-static int io_buffer_validate(struct iovec *iov)
-{
- unsigned long tmp, acct_len = iov->iov_len + (PAGE_SIZE - 1);
-
- /*
- * Don't impose further limits on the size and buffer
- * constraints here, we'll -EINVAL later when IO is
- * submitted if they are wrong.
- */
- if (!iov->iov_base)
- return iov->iov_len ? -EFAULT : 0;
- if (!iov->iov_len)
- return -EFAULT;
-
- /* arbitrary limit, but we need something */
- if (iov->iov_len > SZ_1G)
- return -EFAULT;
-
- if (check_add_overflow((unsigned long)iov->iov_base, acct_len, &tmp))
- return -EOVERFLOW;
-
- return 0;
-}
-
-static int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
- unsigned int nr_args, u64 __user *tags)
-{
- struct page *last_hpage = NULL;
- struct io_rsrc_data *data;
- int i, ret;
- struct iovec iov;
-
- if (ctx->user_bufs)
- return -EBUSY;
- if (!nr_args || nr_args > IORING_MAX_REG_BUFFERS)
- return -EINVAL;
- ret = io_rsrc_node_switch_start(ctx);
- if (ret)
- return ret;
- ret = io_rsrc_data_alloc(ctx, io_rsrc_buf_put, tags, nr_args, &data);
- if (ret)
- return ret;
- ret = io_buffers_map_alloc(ctx, nr_args);
- if (ret) {
- io_rsrc_data_free(data);
- return ret;
- }
-
- for (i = 0; i < nr_args; i++, ctx->nr_user_bufs++) {
- ret = io_copy_iov(ctx, &iov, arg, i);
- if (ret)
- break;
- ret = io_buffer_validate(&iov);
- if (ret)
- break;
- if (!iov.iov_base && *io_get_tag_slot(data, i)) {
- ret = -EINVAL;
- break;
- }
-
- ret = io_sqe_buffer_register(ctx, &iov, &ctx->user_bufs[i],
- &last_hpage);
- if (ret)
- break;
- }
-
- WARN_ON_ONCE(ctx->buf_data);
-
- ctx->buf_data = data;
- if (ret)
- __io_sqe_buffers_unregister(ctx);
- else
- io_rsrc_node_switch(ctx, NULL);
- return ret;
-}
-
-static int __io_sqe_buffers_update(struct io_ring_ctx *ctx,
- struct io_uring_rsrc_update2 *up,
- unsigned int nr_args)
-{
- u64 __user *tags = u64_to_user_ptr(up->tags);
- struct iovec iov, __user *iovs = u64_to_user_ptr(up->data);
- struct page *last_hpage = NULL;
- bool needs_switch = false;
- __u32 done;
- int i, err;
-
- if (!ctx->buf_data)
- return -ENXIO;
- if (up->offset + nr_args > ctx->nr_user_bufs)
- return -EINVAL;
-
- for (done = 0; done < nr_args; done++) {
- struct io_mapped_ubuf *imu;
- int offset = up->offset + done;
- u64 tag = 0;
-
- err = io_copy_iov(ctx, &iov, iovs, done);
- if (err)
- break;
- if (tags && copy_from_user(&tag, &tags[done], sizeof(tag))) {
- err = -EFAULT;
- break;
- }
- err = io_buffer_validate(&iov);
- if (err)
- break;
- if (!iov.iov_base && tag) {
- err = -EINVAL;
- break;
- }
- err = io_sqe_buffer_register(ctx, &iov, &imu, &last_hpage);
- if (err)
- break;
-
- i = array_index_nospec(offset, ctx->nr_user_bufs);
- if (ctx->user_bufs[i] != ctx->dummy_ubuf) {
- err = io_queue_rsrc_removal(ctx->buf_data, i,
- ctx->rsrc_node, ctx->user_bufs[i]);
- if (unlikely(err)) {
- io_buffer_unmap(ctx, &imu);
- break;
- }
- ctx->user_bufs[i] = NULL;
- needs_switch = true;
- }
-
- ctx->user_bufs[i] = imu;
- *io_get_tag_slot(ctx->buf_data, offset) = tag;
- }
-
- if (needs_switch)
- io_rsrc_node_switch(ctx, ctx->buf_data);
- return done ? done : err;
-}
-
-static int io_eventfd_register(struct io_ring_ctx *ctx, void __user *arg,
- unsigned int eventfd_async)
-{
- struct io_ev_fd *ev_fd;
- __s32 __user *fds = arg;
- int fd;
-
- ev_fd = rcu_dereference_protected(ctx->io_ev_fd,
- lockdep_is_held(&ctx->uring_lock));
- if (ev_fd)
- return -EBUSY;
-
- if (copy_from_user(&fd, fds, sizeof(*fds)))
- return -EFAULT;
-
- ev_fd = kmalloc(sizeof(*ev_fd), GFP_KERNEL);
- if (!ev_fd)
- return -ENOMEM;
-
- ev_fd->cq_ev_fd = eventfd_ctx_fdget(fd);
- if (IS_ERR(ev_fd->cq_ev_fd)) {
- int ret = PTR_ERR(ev_fd->cq_ev_fd);
- kfree(ev_fd);
- return ret;
- }
- ev_fd->eventfd_async = eventfd_async;
- ctx->has_evfd = true;
- rcu_assign_pointer(ctx->io_ev_fd, ev_fd);
- return 0;
-}
-
-static void io_eventfd_put(struct rcu_head *rcu)
-{
- struct io_ev_fd *ev_fd = container_of(rcu, struct io_ev_fd, rcu);
-
- eventfd_ctx_put(ev_fd->cq_ev_fd);
- kfree(ev_fd);
-}
-
-static int io_eventfd_unregister(struct io_ring_ctx *ctx)
-{
- struct io_ev_fd *ev_fd;
-
- ev_fd = rcu_dereference_protected(ctx->io_ev_fd,
- lockdep_is_held(&ctx->uring_lock));
- if (ev_fd) {
- ctx->has_evfd = false;
- rcu_assign_pointer(ctx->io_ev_fd, NULL);
- call_rcu(&ev_fd->rcu, io_eventfd_put);
- return 0;
- }
-
- return -ENXIO;
-}
-
-static void io_destroy_buffers(struct io_ring_ctx *ctx)
-{
- int i;
-
- for (i = 0; i < (1U << IO_BUFFERS_HASH_BITS); i++) {
- struct list_head *list = &ctx->io_buffers[i];
-
- while (!list_empty(list)) {
- struct io_buffer_list *bl;
-
- bl = list_first_entry(list, struct io_buffer_list, list);
- __io_remove_buffers(ctx, bl, -1U);
- list_del(&bl->list);
- kfree(bl);
- }
- }
-
- while (!list_empty(&ctx->io_buffers_pages)) {
- struct page *page;
-
- page = list_first_entry(&ctx->io_buffers_pages, struct page, lru);
- list_del_init(&page->lru);
- __free_page(page);
- }
-}
-
-static void io_req_caches_free(struct io_ring_ctx *ctx)
-{
- struct io_submit_state *state = &ctx->submit_state;
- int nr = 0;
-
- mutex_lock(&ctx->uring_lock);
- io_flush_cached_locked_reqs(ctx, state);
-
- while (state->free_list.next) {
- struct io_wq_work_node *node;
- struct io_kiocb *req;
-
- node = wq_stack_extract(&state->free_list);
- req = container_of(node, struct io_kiocb, comp_list);
- kmem_cache_free(req_cachep, req);
- nr++;
- }
- if (nr)
- percpu_ref_put_many(&ctx->refs, nr);
- mutex_unlock(&ctx->uring_lock);
-}
-
-static void io_wait_rsrc_data(struct io_rsrc_data *data)
-{
- if (data && !atomic_dec_and_test(&data->refs))
- wait_for_completion(&data->done);
-}
-
-static void io_flush_apoll_cache(struct io_ring_ctx *ctx)
-{
- struct async_poll *apoll;
-
- while (!list_empty(&ctx->apoll_cache)) {
- apoll = list_first_entry(&ctx->apoll_cache, struct async_poll,
- poll.wait.entry);
- list_del(&apoll->poll.wait.entry);
- kfree(apoll);
- }
-}
-
-static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
-{
- io_sq_thread_finish(ctx);
-
- if (ctx->mm_account) {
- mmdrop(ctx->mm_account);
- ctx->mm_account = NULL;
- }
-
- io_rsrc_refs_drop(ctx);
- /* __io_rsrc_put_work() may need uring_lock to progress, wait w/o it */
- io_wait_rsrc_data(ctx->buf_data);
- io_wait_rsrc_data(ctx->file_data);
-
- mutex_lock(&ctx->uring_lock);
- if (ctx->buf_data)
- __io_sqe_buffers_unregister(ctx);
- if (ctx->file_data)
- __io_sqe_files_unregister(ctx);
- if (ctx->rings)
- __io_cqring_overflow_flush(ctx, true);
- io_eventfd_unregister(ctx);
- io_flush_apoll_cache(ctx);
- mutex_unlock(&ctx->uring_lock);
- io_destroy_buffers(ctx);
- if (ctx->sq_creds)
- put_cred(ctx->sq_creds);
-
- /* there are no registered resources left, nobody uses it */
- if (ctx->rsrc_node)
- io_rsrc_node_destroy(ctx->rsrc_node);
- if (ctx->rsrc_backup_node)
- io_rsrc_node_destroy(ctx->rsrc_backup_node);
- flush_delayed_work(&ctx->rsrc_put_work);
- flush_delayed_work(&ctx->fallback_work);
-
- WARN_ON_ONCE(!list_empty(&ctx->rsrc_ref_list));
- WARN_ON_ONCE(!llist_empty(&ctx->rsrc_put_llist));
-
-#if defined(CONFIG_UNIX)
- if (ctx->ring_sock) {
- ctx->ring_sock->file = NULL; /* so that iput() is called */
- sock_release(ctx->ring_sock);
- }
-#endif
- WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list));
-
- io_mem_free(ctx->rings);
- io_mem_free(ctx->sq_sqes);
-
- percpu_ref_exit(&ctx->refs);
- free_uid(ctx->user);
- io_req_caches_free(ctx);
- if (ctx->hash_map)
- io_wq_put_hash(ctx->hash_map);
- kfree(ctx->cancel_hash);
- kfree(ctx->dummy_ubuf);
- kfree(ctx->io_buffers);
- kfree(ctx);
-}
-
-static __poll_t io_uring_poll(struct file *file, poll_table *wait)
-{
- struct io_ring_ctx *ctx = file->private_data;
- __poll_t mask = 0;
-
- poll_wait(file, &ctx->cq_wait, wait);
- /*
- * synchronizes with barrier from wq_has_sleeper call in
- * io_commit_cqring
- */
- smp_rmb();
- if (!io_sqring_full(ctx))
- mask |= EPOLLOUT | EPOLLWRNORM;
-
- /*
- * Don't flush cqring overflow list here, just do a simple check.
- * Otherwise there could possible be ABBA deadlock:
- * CPU0 CPU1
- * ---- ----
- * lock(&ctx->uring_lock);
- * lock(&ep->mtx);
- * lock(&ctx->uring_lock);
- * lock(&ep->mtx);
- *
- * Users may get EPOLLIN meanwhile seeing nothing in cqring, this
- * pushs them to do the flush.
- */
- if (io_cqring_events(ctx) || test_bit(0, &ctx->check_cq_overflow))
- mask |= EPOLLIN | EPOLLRDNORM;
-
- return mask;
-}
-
-static int io_unregister_personality(struct io_ring_ctx *ctx, unsigned id)
-{
- const struct cred *creds;
-
- creds = xa_erase(&ctx->personalities, id);
- if (creds) {
- put_cred(creds);
- return 0;
- }
-
- return -EINVAL;
-}
-
-struct io_tctx_exit {
- struct callback_head task_work;
- struct completion completion;
- struct io_ring_ctx *ctx;
-};
-
-static __cold void io_tctx_exit_cb(struct callback_head *cb)
-{
- struct io_uring_task *tctx = current->io_uring;
- struct io_tctx_exit *work;
-
- work = container_of(cb, struct io_tctx_exit, task_work);
- /*
- * When @in_idle, we're in cancellation and it's racy to remove the
- * node. It'll be removed by the end of cancellation, just ignore it.
- */
- if (!atomic_read(&tctx->in_idle))
- io_uring_del_tctx_node((unsigned long)work->ctx);
- complete(&work->completion);
-}
-
-static __cold bool io_cancel_ctx_cb(struct io_wq_work *work, void *data)
-{
- struct io_kiocb *req = container_of(work, struct io_kiocb, work);
-
- return req->ctx == data;
-}
-
-static __cold void io_ring_exit_work(struct work_struct *work)
-{
- struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, exit_work);
- unsigned long timeout = jiffies + HZ * 60 * 5;
- unsigned long interval = HZ / 20;
- struct io_tctx_exit exit;
- struct io_tctx_node *node;
- int ret;
-
- /*
- * If we're doing polled IO and end up having requests being
- * submitted async (out-of-line), then completions can come in while
- * we're waiting for refs to drop. We need to reap these manually,
- * as nobody else will be looking for them.
- */
- do {
- io_uring_try_cancel_requests(ctx, NULL, true);
- if (ctx->sq_data) {
- struct io_sq_data *sqd = ctx->sq_data;
- struct task_struct *tsk;
-
- io_sq_thread_park(sqd);
- tsk = sqd->thread;
- if (tsk && tsk->io_uring && tsk->io_uring->io_wq)
- io_wq_cancel_cb(tsk->io_uring->io_wq,
- io_cancel_ctx_cb, ctx, true);
- io_sq_thread_unpark(sqd);
- }
-
- io_req_caches_free(ctx);
-
- if (WARN_ON_ONCE(time_after(jiffies, timeout))) {
- /* there is little hope left, don't run it too often */
- interval = HZ * 60;
- }
- } while (!wait_for_completion_timeout(&ctx->ref_comp, interval));
-
- init_completion(&exit.completion);
- init_task_work(&exit.task_work, io_tctx_exit_cb);
- exit.ctx = ctx;
- /*
- * Some may use context even when all refs and requests have been put,
- * and they are free to do so while still holding uring_lock or
- * completion_lock, see io_req_task_submit(). Apart from other work,
- * this lock/unlock section also waits them to finish.
- */
- mutex_lock(&ctx->uring_lock);
- while (!list_empty(&ctx->tctx_list)) {
- WARN_ON_ONCE(time_after(jiffies, timeout));
-
- node = list_first_entry(&ctx->tctx_list, struct io_tctx_node,
- ctx_node);
- /* don't spin on a single task if cancellation failed */
- list_rotate_left(&ctx->tctx_list);
- ret = task_work_add(node->task, &exit.task_work, TWA_SIGNAL);
- if (WARN_ON_ONCE(ret))
- continue;
-
- mutex_unlock(&ctx->uring_lock);
- wait_for_completion(&exit.completion);
- mutex_lock(&ctx->uring_lock);
- }
- mutex_unlock(&ctx->uring_lock);
- spin_lock(&ctx->completion_lock);
- spin_unlock(&ctx->completion_lock);
-
- io_ring_ctx_free(ctx);
-}
-
-/* Returns true if we found and killed one or more timeouts */
-static __cold bool io_kill_timeouts(struct io_ring_ctx *ctx,
- struct task_struct *tsk, bool cancel_all)
-{
- struct io_kiocb *req, *tmp;
- int canceled = 0;
-
- spin_lock(&ctx->completion_lock);
- spin_lock_irq(&ctx->timeout_lock);
- list_for_each_entry_safe(req, tmp, &ctx->timeout_list, timeout.list) {
- if (io_match_task(req, tsk, cancel_all)) {
- io_kill_timeout(req, -ECANCELED);
- canceled++;
- }
- }
- spin_unlock_irq(&ctx->timeout_lock);
- if (canceled != 0)
- io_commit_cqring(ctx);
- spin_unlock(&ctx->completion_lock);
- if (canceled != 0)
- io_cqring_ev_posted(ctx);
- return canceled != 0;
-}
-
-static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
-{
- unsigned long index;
- struct creds *creds;
-
- mutex_lock(&ctx->uring_lock);
- percpu_ref_kill(&ctx->refs);
- if (ctx->rings)
- __io_cqring_overflow_flush(ctx, true);
- xa_for_each(&ctx->personalities, index, creds)
- io_unregister_personality(ctx, index);
- mutex_unlock(&ctx->uring_lock);
-
- io_kill_timeouts(ctx, NULL, true);
- io_poll_remove_all(ctx, NULL, true);
-
- /* if we failed setting up the ctx, we might not have any rings */
- io_iopoll_try_reap_events(ctx);
-
- INIT_WORK(&ctx->exit_work, io_ring_exit_work);
- /*
- * Use system_unbound_wq to avoid spawning tons of event kworkers
- * if we're exiting a ton of rings at the same time. It just adds
- * noise and overhead, there's no discernable change in runtime
- * over using system_wq.
- */
- queue_work(system_unbound_wq, &ctx->exit_work);
-}
-
-static int io_uring_release(struct inode *inode, struct file *file)
-{
- struct io_ring_ctx *ctx = file->private_data;
-
- file->private_data = NULL;
- io_ring_ctx_wait_and_kill(ctx);
- return 0;
-}
-
-struct io_task_cancel {
- struct task_struct *task;
- bool all;
-};
-
-static bool io_cancel_task_cb(struct io_wq_work *work, void *data)
-{
- struct io_kiocb *req = container_of(work, struct io_kiocb, work);
- struct io_task_cancel *cancel = data;
-
- return io_match_task_safe(req, cancel->task, cancel->all);
-}
-
-static __cold bool io_cancel_defer_files(struct io_ring_ctx *ctx,
- struct task_struct *task,
- bool cancel_all)
-{
- struct io_defer_entry *de;
- LIST_HEAD(list);
-
- spin_lock(&ctx->completion_lock);
- list_for_each_entry_reverse(de, &ctx->defer_list, list) {
- if (io_match_task_safe(de->req, task, cancel_all)) {
- list_cut_position(&list, &ctx->defer_list, &de->list);
- break;
- }
- }
- spin_unlock(&ctx->completion_lock);
- if (list_empty(&list))
- return false;
-
- while (!list_empty(&list)) {
- de = list_first_entry(&list, struct io_defer_entry, list);
- list_del_init(&de->list);
- io_req_complete_failed(de->req, -ECANCELED);
- kfree(de);
- }
- return true;
-}
-
-static __cold bool io_uring_try_cancel_iowq(struct io_ring_ctx *ctx)
-{
- struct io_tctx_node *node;
- enum io_wq_cancel cret;
- bool ret = false;
-
- mutex_lock(&ctx->uring_lock);
- list_for_each_entry(node, &ctx->tctx_list, ctx_node) {
- struct io_uring_task *tctx = node->task->io_uring;
-
- /*
- * io_wq will stay alive while we hold uring_lock, because it's
- * killed after ctx nodes, which requires to take the lock.
- */
- if (!tctx || !tctx->io_wq)
- continue;
- cret = io_wq_cancel_cb(tctx->io_wq, io_cancel_ctx_cb, ctx, true);
- ret |= (cret != IO_WQ_CANCEL_NOTFOUND);
- }
- mutex_unlock(&ctx->uring_lock);
-
- return ret;
-}
-
-static __cold void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
- struct task_struct *task,
- bool cancel_all)
-{
- struct io_task_cancel cancel = { .task = task, .all = cancel_all, };
- struct io_uring_task *tctx = task ? task->io_uring : NULL;
-
- while (1) {
- enum io_wq_cancel cret;
- bool ret = false;
-
- if (!task) {
- ret |= io_uring_try_cancel_iowq(ctx);
- } else if (tctx && tctx->io_wq) {
- /*
- * Cancels requests of all rings, not only @ctx, but
- * it's fine as the task is in exit/exec.
- */
- cret = io_wq_cancel_cb(tctx->io_wq, io_cancel_task_cb,
- &cancel, true);
- ret |= (cret != IO_WQ_CANCEL_NOTFOUND);
- }
-
- /* SQPOLL thread does its own polling */
- if ((!(ctx->flags & IORING_SETUP_SQPOLL) && cancel_all) ||
- (ctx->sq_data && ctx->sq_data->thread == current)) {
- while (!wq_list_empty(&ctx->iopoll_list)) {
- io_iopoll_try_reap_events(ctx);
- ret = true;
- }
- }
-
- ret |= io_cancel_defer_files(ctx, task, cancel_all);
- ret |= io_poll_remove_all(ctx, task, cancel_all);
- ret |= io_kill_timeouts(ctx, task, cancel_all);
- if (task)
- ret |= io_run_task_work();
- if (!ret)
- break;
- cond_resched();
- }
-}
-
-static int __io_uring_add_tctx_node(struct io_ring_ctx *ctx)
-{
- struct io_uring_task *tctx = current->io_uring;
- struct io_tctx_node *node;
- int ret;
-
- if (unlikely(!tctx)) {
- ret = io_uring_alloc_task_context(current, ctx);
- if (unlikely(ret))
- return ret;
-
- tctx = current->io_uring;
- if (ctx->iowq_limits_set) {
- unsigned int limits[2] = { ctx->iowq_limits[0],
- ctx->iowq_limits[1], };
-
- ret = io_wq_max_workers(tctx->io_wq, limits);
- if (ret)
- return ret;
- }
- }
- if (!xa_load(&tctx->xa, (unsigned long)ctx)) {
- node = kmalloc(sizeof(*node), GFP_KERNEL);
- if (!node)
- return -ENOMEM;
- node->ctx = ctx;
- node->task = current;
-
- ret = xa_err(xa_store(&tctx->xa, (unsigned long)ctx,
- node, GFP_KERNEL));
- if (ret) {
- kfree(node);
- return ret;
- }
-
- mutex_lock(&ctx->uring_lock);
- list_add(&node->ctx_node, &ctx->tctx_list);
- mutex_unlock(&ctx->uring_lock);
- }
- tctx->last = ctx;
- return 0;
-}
-
-/*
- * Note that this task has used io_uring. We use it for cancelation purposes.
- */
-static inline int io_uring_add_tctx_node(struct io_ring_ctx *ctx)
-{
- struct io_uring_task *tctx = current->io_uring;
-
- if (likely(tctx && tctx->last == ctx))
- return 0;
- return __io_uring_add_tctx_node(ctx);
-}
-
-/*
- * Remove this io_uring_file -> task mapping.
- */
-static __cold void io_uring_del_tctx_node(unsigned long index)
-{
- struct io_uring_task *tctx = current->io_uring;
- struct io_tctx_node *node;
-
- if (!tctx)
- return;
- node = xa_erase(&tctx->xa, index);
- if (!node)
- return;
-
- WARN_ON_ONCE(current != node->task);
- WARN_ON_ONCE(list_empty(&node->ctx_node));
-
- mutex_lock(&node->ctx->uring_lock);
- list_del(&node->ctx_node);
- mutex_unlock(&node->ctx->uring_lock);
-
- if (tctx->last == node->ctx)
- tctx->last = NULL;
- kfree(node);
-}
-
-static __cold void io_uring_clean_tctx(struct io_uring_task *tctx)
-{
- struct io_wq *wq = tctx->io_wq;
- struct io_tctx_node *node;
- unsigned long index;
-
- xa_for_each(&tctx->xa, index, node) {
- io_uring_del_tctx_node(index);
- cond_resched();
- }
- if (wq) {
- /*
- * Must be after io_uring_del_tctx_node() (removes nodes under
- * uring_lock) to avoid race with io_uring_try_cancel_iowq().
- */
- io_wq_put_and_exit(wq);
- tctx->io_wq = NULL;
- }
-}
-
-static s64 tctx_inflight(struct io_uring_task *tctx, bool tracked)
-{
- if (tracked)
- return atomic_read(&tctx->inflight_tracked);
- return percpu_counter_sum(&tctx->inflight);
-}
-
-/*
- * Find any io_uring ctx that this task has registered or done IO on, and cancel
- * requests. @sqd should be not-null IFF it's an SQPOLL thread cancellation.
- */
-static __cold void io_uring_cancel_generic(bool cancel_all,
- struct io_sq_data *sqd)
-{
- struct io_uring_task *tctx = current->io_uring;
- struct io_ring_ctx *ctx;
- s64 inflight;
- DEFINE_WAIT(wait);
-
- WARN_ON_ONCE(sqd && sqd->thread != current);
-
- if (!current->io_uring)
- return;
- if (tctx->io_wq)
- io_wq_exit_start(tctx->io_wq);
-
- atomic_inc(&tctx->in_idle);
- do {
- io_uring_drop_tctx_refs(current);
- /* read completions before cancelations */
- inflight = tctx_inflight(tctx, !cancel_all);
- if (!inflight)
- break;
-
- if (!sqd) {
- struct io_tctx_node *node;
- unsigned long index;
-
- xa_for_each(&tctx->xa, index, node) {
- /* sqpoll task will cancel all its requests */
- if (node->ctx->sq_data)
- continue;
- io_uring_try_cancel_requests(node->ctx, current,
- cancel_all);
- }
- } else {
- list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
- io_uring_try_cancel_requests(ctx, current,
- cancel_all);
- }
-
- prepare_to_wait(&tctx->wait, &wait, TASK_INTERRUPTIBLE);
- io_run_task_work();
- io_uring_drop_tctx_refs(current);
-
- /*
- * If we've seen completions, retry without waiting. This
- * avoids a race where a completion comes in before we did
- * prepare_to_wait().
- */
- if (inflight == tctx_inflight(tctx, !cancel_all))
- schedule();
- finish_wait(&tctx->wait, &wait);
- } while (1);
-
- io_uring_clean_tctx(tctx);
- if (cancel_all) {
- /*
- * We shouldn't run task_works after cancel, so just leave
- * ->in_idle set for normal exit.
- */
- atomic_dec(&tctx->in_idle);
- /* for exec all current's requests should be gone, kill tctx */
- __io_uring_free(current);
- }
-}
-
-void __io_uring_cancel(bool cancel_all)
-{
- io_uring_cancel_generic(cancel_all, NULL);
-}
-
-void io_uring_unreg_ringfd(void)
-{
- struct io_uring_task *tctx = current->io_uring;
- int i;
-
- for (i = 0; i < IO_RINGFD_REG_MAX; i++) {
- if (tctx->registered_rings[i]) {
- fput(tctx->registered_rings[i]);
- tctx->registered_rings[i] = NULL;
- }
- }
-}
-
-static int io_ring_add_registered_fd(struct io_uring_task *tctx, int fd,
- int start, int end)
-{
- struct file *file;
- int offset;
-
- for (offset = start; offset < end; offset++) {
- offset = array_index_nospec(offset, IO_RINGFD_REG_MAX);
- if (tctx->registered_rings[offset])
- continue;
-
- file = fget(fd);
- if (!file) {
- return -EBADF;
- } else if (file->f_op != &io_uring_fops) {
- fput(file);
- return -EOPNOTSUPP;
- }
- tctx->registered_rings[offset] = file;
- return offset;
- }
-
- return -EBUSY;
-}
-
-/*
- * Register a ring fd to avoid fdget/fdput for each io_uring_enter()
- * invocation. User passes in an array of struct io_uring_rsrc_update
- * with ->data set to the ring_fd, and ->offset given for the desired
- * index. If no index is desired, application may set ->offset == -1U
- * and we'll find an available index. Returns number of entries
- * successfully processed, or < 0 on error if none were processed.
- */
-static int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg,
- unsigned nr_args)
-{
- struct io_uring_rsrc_update __user *arg = __arg;
- struct io_uring_rsrc_update reg;
- struct io_uring_task *tctx;
- int ret, i;
-
- if (!nr_args || nr_args > IO_RINGFD_REG_MAX)
- return -EINVAL;
-
- mutex_unlock(&ctx->uring_lock);
- ret = io_uring_add_tctx_node(ctx);
- mutex_lock(&ctx->uring_lock);
- if (ret)
- return ret;
-
- tctx = current->io_uring;
- for (i = 0; i < nr_args; i++) {
- int start, end;
-
- if (copy_from_user(&reg, &arg[i], sizeof(reg))) {
- ret = -EFAULT;
- break;
- }
-
- if (reg.resv) {
- ret = -EINVAL;
- break;
- }
-
- if (reg.offset == -1U) {
- start = 0;
- end = IO_RINGFD_REG_MAX;
- } else {
- if (reg.offset >= IO_RINGFD_REG_MAX) {
- ret = -EINVAL;
- break;
- }
- start = reg.offset;
- end = start + 1;
- }
-
- ret = io_ring_add_registered_fd(tctx, reg.data, start, end);
- if (ret < 0)
- break;
-
- reg.offset = ret;
- if (copy_to_user(&arg[i], &reg, sizeof(reg))) {
- fput(tctx->registered_rings[reg.offset]);
- tctx->registered_rings[reg.offset] = NULL;
- ret = -EFAULT;
- break;
- }
- }
-
- return i ? i : ret;
-}
-
-static int io_ringfd_unregister(struct io_ring_ctx *ctx, void __user *__arg,
- unsigned nr_args)
-{
- struct io_uring_rsrc_update __user *arg = __arg;
- struct io_uring_task *tctx = current->io_uring;
- struct io_uring_rsrc_update reg;
- int ret = 0, i;
-
- if (!nr_args || nr_args > IO_RINGFD_REG_MAX)
- return -EINVAL;
- if (!tctx)
- return 0;
-
- for (i = 0; i < nr_args; i++) {
- if (copy_from_user(&reg, &arg[i], sizeof(reg))) {
- ret = -EFAULT;
- break;
- }
- if (reg.resv || reg.data || reg.offset >= IO_RINGFD_REG_MAX) {
- ret = -EINVAL;
- break;
- }
-
- reg.offset = array_index_nospec(reg.offset, IO_RINGFD_REG_MAX);
- if (tctx->registered_rings[reg.offset]) {
- fput(tctx->registered_rings[reg.offset]);
- tctx->registered_rings[reg.offset] = NULL;
- }
- }
-
- return i ? i : ret;
-}
-
-static void *io_uring_validate_mmap_request(struct file *file,
- loff_t pgoff, size_t sz)
-{
- struct io_ring_ctx *ctx = file->private_data;
- loff_t offset = pgoff << PAGE_SHIFT;
- struct page *page;
- void *ptr;
-
- switch (offset) {
- case IORING_OFF_SQ_RING:
- case IORING_OFF_CQ_RING:
- ptr = ctx->rings;
- break;
- case IORING_OFF_SQES:
- ptr = ctx->sq_sqes;
- break;
- default:
- return ERR_PTR(-EINVAL);
- }
-
- page = virt_to_head_page(ptr);
- if (sz > page_size(page))
- return ERR_PTR(-EINVAL);
-
- return ptr;
-}
-
-#ifdef CONFIG_MMU
-
-static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
-{
- size_t sz = vma->vm_end - vma->vm_start;
- unsigned long pfn;
- void *ptr;
-
- ptr = io_uring_validate_mmap_request(file, vma->vm_pgoff, sz);
- if (IS_ERR(ptr))
- return PTR_ERR(ptr);
-
- pfn = virt_to_phys(ptr) >> PAGE_SHIFT;
- return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
-}
-
-#else /* !CONFIG_MMU */
-
-static int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
-{
- return vma->vm_flags & (VM_SHARED | VM_MAYSHARE) ? 0 : -EINVAL;
-}
-
-static unsigned int io_uring_nommu_mmap_capabilities(struct file *file)
-{
- return NOMMU_MAP_DIRECT | NOMMU_MAP_READ | NOMMU_MAP_WRITE;
-}
-
-static unsigned long io_uring_nommu_get_unmapped_area(struct file *file,
- unsigned long addr, unsigned long len,
- unsigned long pgoff, unsigned long flags)
-{
- void *ptr;
-
- ptr = io_uring_validate_mmap_request(file, pgoff, len);
- if (IS_ERR(ptr))
- return PTR_ERR(ptr);
-
- return (unsigned long) ptr;
-}
-
-#endif /* !CONFIG_MMU */
-
-static int io_sqpoll_wait_sq(struct io_ring_ctx *ctx)
-{
- DEFINE_WAIT(wait);
-
- do {
- if (!io_sqring_full(ctx))
- break;
- prepare_to_wait(&ctx->sqo_sq_wait, &wait, TASK_INTERRUPTIBLE);
-
- if (!io_sqring_full(ctx))
- break;
- schedule();
- } while (!signal_pending(current));
-
- finish_wait(&ctx->sqo_sq_wait, &wait);
- return 0;
-}
-
-static int io_get_ext_arg(unsigned flags, const void __user *argp, size_t *argsz,
- struct __kernel_timespec __user **ts,
- const sigset_t __user **sig)
-{
- struct io_uring_getevents_arg arg;
-
- /*
- * If EXT_ARG isn't set, then we have no timespec and the argp pointer
- * is just a pointer to the sigset_t.
- */
- if (!(flags & IORING_ENTER_EXT_ARG)) {
- *sig = (const sigset_t __user *) argp;
- *ts = NULL;
- return 0;
- }
-
- /*
- * EXT_ARG is set - ensure we agree on the size of it and copy in our
- * timespec and sigset_t pointers if good.
- */
- if (*argsz != sizeof(arg))
- return -EINVAL;
- if (copy_from_user(&arg, argp, sizeof(arg)))
- return -EFAULT;
- if (arg.pad)
- return -EINVAL;
- *sig = u64_to_user_ptr(arg.sigmask);
- *argsz = arg.sigmask_sz;
- *ts = u64_to_user_ptr(arg.ts);
- return 0;
-}
-
-SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
- u32, min_complete, u32, flags, const void __user *, argp,
- size_t, argsz)
-{
- struct io_ring_ctx *ctx;
- int submitted = 0;
- struct fd f;
- long ret;
-
- io_run_task_work();
-
- if (unlikely(flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP |
- IORING_ENTER_SQ_WAIT | IORING_ENTER_EXT_ARG |
- IORING_ENTER_REGISTERED_RING)))
- return -EINVAL;
-
- /*
- * Ring fd has been registered via IORING_REGISTER_RING_FDS, we
- * need only dereference our task private array to find it.
- */
- if (flags & IORING_ENTER_REGISTERED_RING) {
- struct io_uring_task *tctx = current->io_uring;
-
- if (!tctx || fd >= IO_RINGFD_REG_MAX)
- return -EINVAL;
- fd = array_index_nospec(fd, IO_RINGFD_REG_MAX);
- f.file = tctx->registered_rings[fd];
- if (unlikely(!f.file))
- return -EBADF;
- } else {
- f = fdget(fd);
- if (unlikely(!f.file))
- return -EBADF;
- }
-
- ret = -EOPNOTSUPP;
- if (unlikely(f.file->f_op != &io_uring_fops))
- goto out_fput;
-
- ret = -ENXIO;
- ctx = f.file->private_data;
- if (unlikely(!percpu_ref_tryget(&ctx->refs)))
- goto out_fput;
-
- ret = -EBADFD;
- if (unlikely(ctx->flags & IORING_SETUP_R_DISABLED))
- goto out;
-
- /*
- * For SQ polling, the thread will do all submissions and completions.
- * Just return the requested submit count, and wake the thread if
- * we were asked to.
- */
- ret = 0;
- if (ctx->flags & IORING_SETUP_SQPOLL) {
- io_cqring_overflow_flush(ctx);
-
- if (unlikely(ctx->sq_data->thread == NULL)) {
- ret = -EOWNERDEAD;
- goto out;
- }
- if (flags & IORING_ENTER_SQ_WAKEUP)
- wake_up(&ctx->sq_data->wait);
- if (flags & IORING_ENTER_SQ_WAIT) {
- ret = io_sqpoll_wait_sq(ctx);
- if (ret)
- goto out;
- }
- submitted = to_submit;
- } else if (to_submit) {
- ret = io_uring_add_tctx_node(ctx);
- if (unlikely(ret))
- goto out;
- mutex_lock(&ctx->uring_lock);
- submitted = io_submit_sqes(ctx, to_submit);
- mutex_unlock(&ctx->uring_lock);
-
- if (submitted != to_submit)
- goto out;
- }
- if (flags & IORING_ENTER_GETEVENTS) {
- const sigset_t __user *sig;
- struct __kernel_timespec __user *ts;
-
- ret = io_get_ext_arg(flags, argp, &argsz, &ts, &sig);
- if (unlikely(ret))
- goto out;
-
- min_complete = min(min_complete, ctx->cq_entries);
-
- /*
- * When SETUP_IOPOLL and SETUP_SQPOLL are both enabled, user
- * space applications don't need to do io completion events
- * polling again, they can rely on io_sq_thread to do polling
- * work, which can reduce cpu usage and uring_lock contention.
- */
- if (ctx->flags & IORING_SETUP_IOPOLL &&
- !(ctx->flags & IORING_SETUP_SQPOLL)) {
- ret = io_iopoll_check(ctx, min_complete);
- } else {
- ret = io_cqring_wait(ctx, min_complete, sig, argsz, ts);
- }
- }
-
-out:
- percpu_ref_put(&ctx->refs);
-out_fput:
- if (!(flags & IORING_ENTER_REGISTERED_RING))
- fdput(f);
- return submitted ? submitted : ret;
-}
-
-#ifdef CONFIG_PROC_FS
-static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id,
- const struct cred *cred)
-{
- struct user_namespace *uns = seq_user_ns(m);
- struct group_info *gi;
- kernel_cap_t cap;
- unsigned __capi;
- int g;
-
- seq_printf(m, "%5d\n", id);
- seq_put_decimal_ull(m, "\tUid:\t", from_kuid_munged(uns, cred->uid));
- seq_put_decimal_ull(m, "\t\t", from_kuid_munged(uns, cred->euid));
- seq_put_decimal_ull(m, "\t\t", from_kuid_munged(uns, cred->suid));
- seq_put_decimal_ull(m, "\t\t", from_kuid_munged(uns, cred->fsuid));
- seq_put_decimal_ull(m, "\n\tGid:\t", from_kgid_munged(uns, cred->gid));
- seq_put_decimal_ull(m, "\t\t", from_kgid_munged(uns, cred->egid));
- seq_put_decimal_ull(m, "\t\t", from_kgid_munged(uns, cred->sgid));
- seq_put_decimal_ull(m, "\t\t", from_kgid_munged(uns, cred->fsgid));
- seq_puts(m, "\n\tGroups:\t");
- gi = cred->group_info;
- for (g = 0; g < gi->ngroups; g++) {
- seq_put_decimal_ull(m, g ? " " : "",
- from_kgid_munged(uns, gi->gid[g]));
- }
- seq_puts(m, "\n\tCapEff:\t");
- cap = cred->cap_effective;
- CAP_FOR_EACH_U32(__capi)
- seq_put_hex_ll(m, NULL, cap.cap[CAP_LAST_U32 - __capi], 8);
- seq_putc(m, '\n');
- return 0;
-}
-
-static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
- struct seq_file *m)
-{
- struct io_sq_data *sq = NULL;
- struct io_overflow_cqe *ocqe;
- struct io_rings *r = ctx->rings;
- unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1;
- unsigned int sq_head = READ_ONCE(r->sq.head);
- unsigned int sq_tail = READ_ONCE(r->sq.tail);
- unsigned int cq_head = READ_ONCE(r->cq.head);
- unsigned int cq_tail = READ_ONCE(r->cq.tail);
- unsigned int sq_entries, cq_entries;
- bool has_lock;
- unsigned int i;
-
- /*
- * we may get imprecise sqe and cqe info if uring is actively running
- * since we get cached_sq_head and cached_cq_tail without uring_lock
- * and sq_tail and cq_head are changed by userspace. But it's ok since
- * we usually use these info when it is stuck.
- */
- seq_printf(m, "SqMask:\t0x%x\n", sq_mask);
- seq_printf(m, "SqHead:\t%u\n", sq_head);
- seq_printf(m, "SqTail:\t%u\n", sq_tail);
- seq_printf(m, "CachedSqHead:\t%u\n", ctx->cached_sq_head);
- seq_printf(m, "CqMask:\t0x%x\n", cq_mask);
- seq_printf(m, "CqHead:\t%u\n", cq_head);
- seq_printf(m, "CqTail:\t%u\n", cq_tail);
- seq_printf(m, "CachedCqTail:\t%u\n", ctx->cached_cq_tail);
- seq_printf(m, "SQEs:\t%u\n", sq_tail - ctx->cached_sq_head);
- sq_entries = min(sq_tail - sq_head, ctx->sq_entries);
- for (i = 0; i < sq_entries; i++) {
- unsigned int entry = i + sq_head;
- unsigned int sq_idx = READ_ONCE(ctx->sq_array[entry & sq_mask]);
- struct io_uring_sqe *sqe;
-
- if (sq_idx > sq_mask)
- continue;
- sqe = &ctx->sq_sqes[sq_idx];
- seq_printf(m, "%5u: opcode:%d, fd:%d, flags:%x, user_data:%llu\n",
- sq_idx, sqe->opcode, sqe->fd, sqe->flags,
- sqe->user_data);
- }
- seq_printf(m, "CQEs:\t%u\n", cq_tail - cq_head);
- cq_entries = min(cq_tail - cq_head, ctx->cq_entries);
- for (i = 0; i < cq_entries; i++) {
- unsigned int entry = i + cq_head;
- struct io_uring_cqe *cqe = &r->cqes[entry & cq_mask];
-
- seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x\n",
- entry & cq_mask, cqe->user_data, cqe->res,
- cqe->flags);
- }
-
- /*
- * Avoid ABBA deadlock between the seq lock and the io_uring mutex,
- * since fdinfo case grabs it in the opposite direction of normal use
- * cases. If we fail to get the lock, we just don't iterate any
- * structures that could be going away outside the io_uring mutex.
- */
- has_lock = mutex_trylock(&ctx->uring_lock);
-
- if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) {
- sq = ctx->sq_data;
- if (!sq->thread)
- sq = NULL;
- }
-
- seq_printf(m, "SqThread:\t%d\n", sq ? task_pid_nr(sq->thread) : -1);
- seq_printf(m, "SqThreadCpu:\t%d\n", sq ? task_cpu(sq->thread) : -1);
- seq_printf(m, "UserFiles:\t%u\n", ctx->nr_user_files);
- for (i = 0; has_lock && i < ctx->nr_user_files; i++) {
- struct file *f = io_file_from_index(ctx, i);
-
- if (f)
- seq_printf(m, "%5u: %s\n", i, file_dentry(f)->d_iname);
- else
- seq_printf(m, "%5u: <none>\n", i);
- }
- seq_printf(m, "UserBufs:\t%u\n", ctx->nr_user_bufs);
- for (i = 0; has_lock && i < ctx->nr_user_bufs; i++) {
- struct io_mapped_ubuf *buf = ctx->user_bufs[i];
- unsigned int len = buf->ubuf_end - buf->ubuf;
-
- seq_printf(m, "%5u: 0x%llx/%u\n", i, buf->ubuf, len);
- }
- if (has_lock && !xa_empty(&ctx->personalities)) {
- unsigned long index;
- const struct cred *cred;
-
- seq_printf(m, "Personalities:\n");
- xa_for_each(&ctx->personalities, index, cred)
- io_uring_show_cred(m, index, cred);
- }
- if (has_lock)
- mutex_unlock(&ctx->uring_lock);
-
- seq_puts(m, "PollList:\n");
- spin_lock(&ctx->completion_lock);
- for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
- struct hlist_head *list = &ctx->cancel_hash[i];
- struct io_kiocb *req;
-
- hlist_for_each_entry(req, list, hash_node)
- seq_printf(m, " op=%d, task_works=%d\n", req->opcode,
- task_work_pending(req->task));
- }
-
- seq_puts(m, "CqOverflowList:\n");
- list_for_each_entry(ocqe, &ctx->cq_overflow_list, list) {
- struct io_uring_cqe *cqe = &ocqe->cqe;
-
- seq_printf(m, " user_data=%llu, res=%d, flags=%x\n",
- cqe->user_data, cqe->res, cqe->flags);
-
- }
-
- spin_unlock(&ctx->completion_lock);
-}
-
-static __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
-{
- struct io_ring_ctx *ctx = f->private_data;
-
- if (percpu_ref_tryget(&ctx->refs)) {
- __io_uring_show_fdinfo(ctx, m);
- percpu_ref_put(&ctx->refs);
- }
-}
-#endif
-
-static const struct file_operations io_uring_fops = {
- .release = io_uring_release,
- .mmap = io_uring_mmap,
-#ifndef CONFIG_MMU
- .get_unmapped_area = io_uring_nommu_get_unmapped_area,
- .mmap_capabilities = io_uring_nommu_mmap_capabilities,
-#endif
- .poll = io_uring_poll,
-#ifdef CONFIG_PROC_FS
- .show_fdinfo = io_uring_show_fdinfo,
-#endif
-};
-
-static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
- struct io_uring_params *p)
-{
- struct io_rings *rings;
- size_t size, sq_array_offset;
-
- /* make sure these are sane, as we already accounted them */
- ctx->sq_entries = p->sq_entries;
- ctx->cq_entries = p->cq_entries;
-
- size = rings_size(p->sq_entries, p->cq_entries, &sq_array_offset);
- if (size == SIZE_MAX)
- return -EOVERFLOW;
-
- rings = io_mem_alloc(size);
- if (!rings)
- return -ENOMEM;
-
- ctx->rings = rings;
- ctx->sq_array = (u32 *)((char *)rings + sq_array_offset);
- rings->sq_ring_mask = p->sq_entries - 1;
- rings->cq_ring_mask = p->cq_entries - 1;
- rings->sq_ring_entries = p->sq_entries;
- rings->cq_ring_entries = p->cq_entries;
-
- size = array_size(sizeof(struct io_uring_sqe), p->sq_entries);
- if (size == SIZE_MAX) {
- io_mem_free(ctx->rings);
- ctx->rings = NULL;
- return -EOVERFLOW;
- }
-
- ctx->sq_sqes = io_mem_alloc(size);
- if (!ctx->sq_sqes) {
- io_mem_free(ctx->rings);
- ctx->rings = NULL;
- return -ENOMEM;
- }
-
- return 0;
-}
-
-static int io_uring_install_fd(struct io_ring_ctx *ctx, struct file *file)
-{
- int ret, fd;
-
- fd = get_unused_fd_flags(O_RDWR | O_CLOEXEC);
- if (fd < 0)
- return fd;
-
- ret = io_uring_add_tctx_node(ctx);
- if (ret) {
- put_unused_fd(fd);
- return ret;
- }
- fd_install(fd, file);
- return fd;
-}
-
-/*
- * Allocate an anonymous fd, this is what constitutes the application
- * visible backing of an io_uring instance. The application mmaps this
- * fd to gain access to the SQ/CQ ring details. If UNIX sockets are enabled,
- * we have to tie this fd to a socket for file garbage collection purposes.
- */
-static struct file *io_uring_get_file(struct io_ring_ctx *ctx)
-{
- struct file *file;
-#if defined(CONFIG_UNIX)
- int ret;
-
- ret = sock_create_kern(&init_net, PF_UNIX, SOCK_RAW, IPPROTO_IP,
- &ctx->ring_sock);
- if (ret)
- return ERR_PTR(ret);
-#endif
-
- file = anon_inode_getfile_secure("[io_uring]", &io_uring_fops, ctx,
- O_RDWR | O_CLOEXEC, NULL);
-#if defined(CONFIG_UNIX)
- if (IS_ERR(file)) {
- sock_release(ctx->ring_sock);
- ctx->ring_sock = NULL;
- } else {
- ctx->ring_sock->file = file;
- }
-#endif
- return file;
-}
-
-static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
- struct io_uring_params __user *params)
-{
- struct io_ring_ctx *ctx;
- struct file *file;
- int ret;
-
- if (!entries)
- return -EINVAL;
- if (entries > IORING_MAX_ENTRIES) {
- if (!(p->flags & IORING_SETUP_CLAMP))
- return -EINVAL;
- entries = IORING_MAX_ENTRIES;
- }
-
- /*
- * Use twice as many entries for the CQ ring. It's possible for the
- * application to drive a higher depth than the size of the SQ ring,
- * since the sqes are only used at submission time. This allows for
- * some flexibility in overcommitting a bit. If the application has
- * set IORING_SETUP_CQSIZE, it will have passed in the desired number
- * of CQ ring entries manually.
- */
- p->sq_entries = roundup_pow_of_two(entries);
- if (p->flags & IORING_SETUP_CQSIZE) {
- /*
- * If IORING_SETUP_CQSIZE is set, we do the same roundup
- * to a power-of-two, if it isn't already. We do NOT impose
- * any cq vs sq ring sizing.
- */
- if (!p->cq_entries)
- return -EINVAL;
- if (p->cq_entries > IORING_MAX_CQ_ENTRIES) {
- if (!(p->flags & IORING_SETUP_CLAMP))
- return -EINVAL;
- p->cq_entries = IORING_MAX_CQ_ENTRIES;
- }
- p->cq_entries = roundup_pow_of_two(p->cq_entries);
- if (p->cq_entries < p->sq_entries)
- return -EINVAL;
- } else {
- p->cq_entries = 2 * p->sq_entries;
- }
-
- ctx = io_ring_ctx_alloc(p);
- if (!ctx)
- return -ENOMEM;
- ctx->compat = in_compat_syscall();
- if (!capable(CAP_IPC_LOCK))
- ctx->user = get_uid(current_user());
-
- /*
- * This is just grabbed for accounting purposes. When a process exits,
- * the mm is exited and dropped before the files, hence we need to hang
- * on to this mm purely for the purposes of being able to unaccount
- * memory (locked/pinned vm). It's not used for anything else.
- */
- mmgrab(current->mm);
- ctx->mm_account = current->mm;
-
- ret = io_allocate_scq_urings(ctx, p);
- if (ret)
- goto err;
-
- ret = io_sq_offload_create(ctx, p);
- if (ret)
- goto err;
- /* always set a rsrc node */
- ret = io_rsrc_node_switch_start(ctx);
- if (ret)
- goto err;
- io_rsrc_node_switch(ctx, NULL);
-
- memset(&p->sq_off, 0, sizeof(p->sq_off));
- p->sq_off.head = offsetof(struct io_rings, sq.head);
- p->sq_off.tail = offsetof(struct io_rings, sq.tail);
- p->sq_off.ring_mask = offsetof(struct io_rings, sq_ring_mask);
- p->sq_off.ring_entries = offsetof(struct io_rings, sq_ring_entries);
- p->sq_off.flags = offsetof(struct io_rings, sq_flags);
- p->sq_off.dropped = offsetof(struct io_rings, sq_dropped);
- p->sq_off.array = (char *)ctx->sq_array - (char *)ctx->rings;
-
- memset(&p->cq_off, 0, sizeof(p->cq_off));
- p->cq_off.head = offsetof(struct io_rings, cq.head);
- p->cq_off.tail = offsetof(struct io_rings, cq.tail);
- p->cq_off.ring_mask = offsetof(struct io_rings, cq_ring_mask);
- p->cq_off.ring_entries = offsetof(struct io_rings, cq_ring_entries);
- p->cq_off.overflow = offsetof(struct io_rings, cq_overflow);
- p->cq_off.cqes = offsetof(struct io_rings, cqes);
- p->cq_off.flags = offsetof(struct io_rings, cq_flags);
-
- p->features = IORING_FEAT_SINGLE_MMAP | IORING_FEAT_NODROP |
- IORING_FEAT_SUBMIT_STABLE | IORING_FEAT_RW_CUR_POS |
- IORING_FEAT_CUR_PERSONALITY | IORING_FEAT_FAST_POLL |
- IORING_FEAT_POLL_32BITS | IORING_FEAT_SQPOLL_NONFIXED |
- IORING_FEAT_EXT_ARG | IORING_FEAT_NATIVE_WORKERS |
- IORING_FEAT_RSRC_TAGS | IORING_FEAT_CQE_SKIP |
- IORING_FEAT_LINKED_FILE;
-
- if (copy_to_user(params, p, sizeof(*p))) {
- ret = -EFAULT;
- goto err;
- }
-
- file = io_uring_get_file(ctx);
- if (IS_ERR(file)) {
- ret = PTR_ERR(file);
- goto err;
- }
-
- /*
- * Install ring fd as the very last thing, so we don't risk someone
- * having closed it before we finish setup
- */
- ret = io_uring_install_fd(ctx, file);
- if (ret < 0) {
- /* fput will clean it up */
- fput(file);
- return ret;
- }
-
- trace_io_uring_create(ret, ctx, p->sq_entries, p->cq_entries, p->flags);
- return ret;
-err:
- io_ring_ctx_wait_and_kill(ctx);
- return ret;
-}
-
-/*
- * Sets up an aio uring context, and returns the fd. Applications asks for a
- * ring size, we return the actual sq/cq ring sizes (among other things) in the
- * params structure passed in.
- */
-static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
-{
- struct io_uring_params p;
- int i;
-
- if (copy_from_user(&p, params, sizeof(p)))
- return -EFAULT;
- for (i = 0; i < ARRAY_SIZE(p.resv); i++) {
- if (p.resv[i])
- return -EINVAL;
- }
-
- if (p.flags & ~(IORING_SETUP_IOPOLL | IORING_SETUP_SQPOLL |
- IORING_SETUP_SQ_AFF | IORING_SETUP_CQSIZE |
- IORING_SETUP_CLAMP | IORING_SETUP_ATTACH_WQ |
- IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL))
- return -EINVAL;
-
- return io_uring_create(entries, &p, params);
-}
-
-SYSCALL_DEFINE2(io_uring_setup, u32, entries,
- struct io_uring_params __user *, params)
-{
- return io_uring_setup(entries, params);
-}
-
-static __cold int io_probe(struct io_ring_ctx *ctx, void __user *arg,
- unsigned nr_args)
-{
- struct io_uring_probe *p;
- size_t size;
- int i, ret;
-
- size = struct_size(p, ops, nr_args);
- if (size == SIZE_MAX)
- return -EOVERFLOW;
- p = kzalloc(size, GFP_KERNEL);
- if (!p)
- return -ENOMEM;
-
- ret = -EFAULT;
- if (copy_from_user(p, arg, size))
- goto out;
- ret = -EINVAL;
- if (memchr_inv(p, 0, size))
- goto out;
-
- p->last_op = IORING_OP_LAST - 1;
- if (nr_args > IORING_OP_LAST)
- nr_args = IORING_OP_LAST;
-
- for (i = 0; i < nr_args; i++) {
- p->ops[i].op = i;
- if (!io_op_defs[i].not_supported)
- p->ops[i].flags = IO_URING_OP_SUPPORTED;
- }
- p->ops_len = i;
-
- ret = 0;
- if (copy_to_user(arg, p, size))
- ret = -EFAULT;
-out:
- kfree(p);
- return ret;
-}
-
-static int io_register_personality(struct io_ring_ctx *ctx)
-{
- const struct cred *creds;
- u32 id;
- int ret;
-
- creds = get_current_cred();
-
- ret = xa_alloc_cyclic(&ctx->personalities, &id, (void *)creds,
- XA_LIMIT(0, USHRT_MAX), &ctx->pers_next, GFP_KERNEL);
- if (ret < 0) {
- put_cred(creds);
- return ret;
- }
- return id;
-}
-
-static __cold int io_register_restrictions(struct io_ring_ctx *ctx,
- void __user *arg, unsigned int nr_args)
-{
- struct io_uring_restriction *res;
- size_t size;
- int i, ret;
-
- /* Restrictions allowed only if rings started disabled */
- if (!(ctx->flags & IORING_SETUP_R_DISABLED))
- return -EBADFD;
-
- /* We allow only a single restrictions registration */
- if (ctx->restrictions.registered)
- return -EBUSY;
-
- if (!arg || nr_args > IORING_MAX_RESTRICTIONS)
- return -EINVAL;
-
- size = array_size(nr_args, sizeof(*res));
- if (size == SIZE_MAX)
- return -EOVERFLOW;
-
- res = memdup_user(arg, size);
- if (IS_ERR(res))
- return PTR_ERR(res);
-
- ret = 0;
-
- for (i = 0; i < nr_args; i++) {
- switch (res[i].opcode) {
- case IORING_RESTRICTION_REGISTER_OP:
- if (res[i].register_op >= IORING_REGISTER_LAST) {
- ret = -EINVAL;
- goto out;
- }
-
- __set_bit(res[i].register_op,
- ctx->restrictions.register_op);
- break;
- case IORING_RESTRICTION_SQE_OP:
- if (res[i].sqe_op >= IORING_OP_LAST) {
- ret = -EINVAL;
- goto out;
- }
-
- __set_bit(res[i].sqe_op, ctx->restrictions.sqe_op);
- break;
- case IORING_RESTRICTION_SQE_FLAGS_ALLOWED:
- ctx->restrictions.sqe_flags_allowed = res[i].sqe_flags;
- break;
- case IORING_RESTRICTION_SQE_FLAGS_REQUIRED:
- ctx->restrictions.sqe_flags_required = res[i].sqe_flags;
- break;
- default:
- ret = -EINVAL;
- goto out;
- }
- }
-
-out:
- /* Reset all restrictions if an error happened */
- if (ret != 0)
- memset(&ctx->restrictions, 0, sizeof(ctx->restrictions));
- else
- ctx->restrictions.registered = true;
-
- kfree(res);
- return ret;
-}
-
-static int io_register_enable_rings(struct io_ring_ctx *ctx)
-{
- if (!(ctx->flags & IORING_SETUP_R_DISABLED))
- return -EBADFD;
-
- if (ctx->restrictions.registered)
- ctx->restricted = 1;
-
- ctx->flags &= ~IORING_SETUP_R_DISABLED;
- if (ctx->sq_data && wq_has_sleeper(&ctx->sq_data->wait))
- wake_up(&ctx->sq_data->wait);
- return 0;
-}
-
-static int __io_register_rsrc_update(struct io_ring_ctx *ctx, unsigned type,
- struct io_uring_rsrc_update2 *up,
- unsigned nr_args)
-{
- __u32 tmp;
- int err;
-
- if (check_add_overflow(up->offset, nr_args, &tmp))
- return -EOVERFLOW;
- err = io_rsrc_node_switch_start(ctx);
- if (err)
- return err;
-
- switch (type) {
- case IORING_RSRC_FILE:
- return __io_sqe_files_update(ctx, up, nr_args);
- case IORING_RSRC_BUFFER:
- return __io_sqe_buffers_update(ctx, up, nr_args);
- }
- return -EINVAL;
-}
-
-static int io_register_files_update(struct io_ring_ctx *ctx, void __user *arg,
- unsigned nr_args)
-{
- struct io_uring_rsrc_update2 up;
-
- if (!nr_args)
- return -EINVAL;
- memset(&up, 0, sizeof(up));
- if (copy_from_user(&up, arg, sizeof(struct io_uring_rsrc_update)))
- return -EFAULT;
- if (up.resv || up.resv2)
- return -EINVAL;
- return __io_register_rsrc_update(ctx, IORING_RSRC_FILE, &up, nr_args);
-}
-
-static int io_register_rsrc_update(struct io_ring_ctx *ctx, void __user *arg,
- unsigned size, unsigned type)
-{
- struct io_uring_rsrc_update2 up;
-
- if (size != sizeof(up))
- return -EINVAL;
- if (copy_from_user(&up, arg, sizeof(up)))
- return -EFAULT;
- if (!up.nr || up.resv || up.resv2)
- return -EINVAL;
- return __io_register_rsrc_update(ctx, type, &up, up.nr);
-}
-
-static __cold int io_register_rsrc(struct io_ring_ctx *ctx, void __user *arg,
- unsigned int size, unsigned int type)
-{
- struct io_uring_rsrc_register rr;
-
- /* keep it extendible */
- if (size != sizeof(rr))
- return -EINVAL;
-
- memset(&rr, 0, sizeof(rr));
- if (copy_from_user(&rr, arg, size))
- return -EFAULT;
- if (!rr.nr || rr.resv || rr.resv2)
- return -EINVAL;
-
- switch (type) {
- case IORING_RSRC_FILE:
- return io_sqe_files_register(ctx, u64_to_user_ptr(rr.data),
- rr.nr, u64_to_user_ptr(rr.tags));
- case IORING_RSRC_BUFFER:
- return io_sqe_buffers_register(ctx, u64_to_user_ptr(rr.data),
- rr.nr, u64_to_user_ptr(rr.tags));
- }
- return -EINVAL;
-}
-
-static __cold int io_register_iowq_aff(struct io_ring_ctx *ctx,
- void __user *arg, unsigned len)
-{
- struct io_uring_task *tctx = current->io_uring;
- cpumask_var_t new_mask;
- int ret;
-
- if (!tctx || !tctx->io_wq)
- return -EINVAL;
-
- if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))
- return -ENOMEM;
-
- cpumask_clear(new_mask);
- if (len > cpumask_size())
- len = cpumask_size();
-
- if (in_compat_syscall()) {
- ret = compat_get_bitmap(cpumask_bits(new_mask),
- (const compat_ulong_t __user *)arg,
- len * 8 /* CHAR_BIT */);
- } else {
- ret = copy_from_user(new_mask, arg, len);
- }
-
- if (ret) {
- free_cpumask_var(new_mask);
- return -EFAULT;
- }
-
- ret = io_wq_cpu_affinity(tctx->io_wq, new_mask);
- free_cpumask_var(new_mask);
- return ret;
-}
-
-static __cold int io_unregister_iowq_aff(struct io_ring_ctx *ctx)
-{
- struct io_uring_task *tctx = current->io_uring;
-
- if (!tctx || !tctx->io_wq)
- return -EINVAL;
-
- return io_wq_cpu_affinity(tctx->io_wq, NULL);
-}
-
-static __cold int io_register_iowq_max_workers(struct io_ring_ctx *ctx,
- void __user *arg)
- __must_hold(&ctx->uring_lock)
-{
- struct io_tctx_node *node;
- struct io_uring_task *tctx = NULL;
- struct io_sq_data *sqd = NULL;
- __u32 new_count[2];
- int i, ret;
-
- if (copy_from_user(new_count, arg, sizeof(new_count)))
- return -EFAULT;
- for (i = 0; i < ARRAY_SIZE(new_count); i++)
- if (new_count[i] > INT_MAX)
- return -EINVAL;
-
- if (ctx->flags & IORING_SETUP_SQPOLL) {
- sqd = ctx->sq_data;
- if (sqd) {
- /*
- * Observe the correct sqd->lock -> ctx->uring_lock
- * ordering. Fine to drop uring_lock here, we hold
- * a ref to the ctx.
- */
- refcount_inc(&sqd->refs);
- mutex_unlock(&ctx->uring_lock);
- mutex_lock(&sqd->lock);
- mutex_lock(&ctx->uring_lock);
- if (sqd->thread)
- tctx = sqd->thread->io_uring;
- }
- } else {
- tctx = current->io_uring;
- }
-
- BUILD_BUG_ON(sizeof(new_count) != sizeof(ctx->iowq_limits));
-
- for (i = 0; i < ARRAY_SIZE(new_count); i++)
- if (new_count[i])
- ctx->iowq_limits[i] = new_count[i];
- ctx->iowq_limits_set = true;
-
- if (tctx && tctx->io_wq) {
- ret = io_wq_max_workers(tctx->io_wq, new_count);
- if (ret)
- goto err;
- } else {
- memset(new_count, 0, sizeof(new_count));
- }
-
- if (sqd) {
- mutex_unlock(&sqd->lock);
- io_put_sq_data(sqd);
- }
-
- if (copy_to_user(arg, new_count, sizeof(new_count)))
- return -EFAULT;
-
- /* that's it for SQPOLL, only the SQPOLL task creates requests */
- if (sqd)
- return 0;
-
- /* now propagate the restriction to all registered users */
- list_for_each_entry(node, &ctx->tctx_list, ctx_node) {
- struct io_uring_task *tctx = node->task->io_uring;
-
- if (WARN_ON_ONCE(!tctx->io_wq))
- continue;
-
- for (i = 0; i < ARRAY_SIZE(new_count); i++)
- new_count[i] = ctx->iowq_limits[i];
- /* ignore errors, it always returns zero anyway */
- (void)io_wq_max_workers(tctx->io_wq, new_count);
- }
- return 0;
-err:
- if (sqd) {
- mutex_unlock(&sqd->lock);
- io_put_sq_data(sqd);
- }
- return ret;
-}
-
-static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
- void __user *arg, unsigned nr_args)
- __releases(ctx->uring_lock)
- __acquires(ctx->uring_lock)
-{
- int ret;
-
- /*
- * We're inside the ring mutex, if the ref is already dying, then
- * someone else killed the ctx or is already going through
- * io_uring_register().
- */
- if (percpu_ref_is_dying(&ctx->refs))
- return -ENXIO;
-
- if (ctx->restricted) {
- if (opcode >= IORING_REGISTER_LAST)
- return -EINVAL;
- opcode = array_index_nospec(opcode, IORING_REGISTER_LAST);
- if (!test_bit(opcode, ctx->restrictions.register_op))
- return -EACCES;
- }
-
- switch (opcode) {
- case IORING_REGISTER_BUFFERS:
- ret = io_sqe_buffers_register(ctx, arg, nr_args, NULL);
- break;
- case IORING_UNREGISTER_BUFFERS:
- ret = -EINVAL;
- if (arg || nr_args)
- break;
- ret = io_sqe_buffers_unregister(ctx);
- break;
- case IORING_REGISTER_FILES:
- ret = io_sqe_files_register(ctx, arg, nr_args, NULL);
- break;
- case IORING_UNREGISTER_FILES:
- ret = -EINVAL;
- if (arg || nr_args)
- break;
- ret = io_sqe_files_unregister(ctx);
- break;
- case IORING_REGISTER_FILES_UPDATE:
- ret = io_register_files_update(ctx, arg, nr_args);
- break;
- case IORING_REGISTER_EVENTFD:
- ret = -EINVAL;
- if (nr_args != 1)
- break;
- ret = io_eventfd_register(ctx, arg, 0);
- break;
- case IORING_REGISTER_EVENTFD_ASYNC:
- ret = -EINVAL;
- if (nr_args != 1)
- break;
- ret = io_eventfd_register(ctx, arg, 1);
- break;
- case IORING_UNREGISTER_EVENTFD:
- ret = -EINVAL;
- if (arg || nr_args)
- break;
- ret = io_eventfd_unregister(ctx);
- break;
- case IORING_REGISTER_PROBE:
- ret = -EINVAL;
- if (!arg || nr_args > 256)
- break;
- ret = io_probe(ctx, arg, nr_args);
- break;
- case IORING_REGISTER_PERSONALITY:
- ret = -EINVAL;
- if (arg || nr_args)
- break;
- ret = io_register_personality(ctx);
- break;
- case IORING_UNREGISTER_PERSONALITY:
- ret = -EINVAL;
- if (arg)
- break;
- ret = io_unregister_personality(ctx, nr_args);
- break;
- case IORING_REGISTER_ENABLE_RINGS:
- ret = -EINVAL;
- if (arg || nr_args)
- break;
- ret = io_register_enable_rings(ctx);
- break;
- case IORING_REGISTER_RESTRICTIONS:
- ret = io_register_restrictions(ctx, arg, nr_args);
- break;
- case IORING_REGISTER_FILES2:
- ret = io_register_rsrc(ctx, arg, nr_args, IORING_RSRC_FILE);
- break;
- case IORING_REGISTER_FILES_UPDATE2:
- ret = io_register_rsrc_update(ctx, arg, nr_args,
- IORING_RSRC_FILE);
- break;
- case IORING_REGISTER_BUFFERS2:
- ret = io_register_rsrc(ctx, arg, nr_args, IORING_RSRC_BUFFER);
- break;
- case IORING_REGISTER_BUFFERS_UPDATE:
- ret = io_register_rsrc_update(ctx, arg, nr_args,
- IORING_RSRC_BUFFER);
- break;
- case IORING_REGISTER_IOWQ_AFF:
- ret = -EINVAL;
- if (!arg || !nr_args)
- break;
- ret = io_register_iowq_aff(ctx, arg, nr_args);
- break;
- case IORING_UNREGISTER_IOWQ_AFF:
- ret = -EINVAL;
- if (arg || nr_args)
- break;
- ret = io_unregister_iowq_aff(ctx);
- break;
- case IORING_REGISTER_IOWQ_MAX_WORKERS:
- ret = -EINVAL;
- if (!arg || nr_args != 2)
- break;
- ret = io_register_iowq_max_workers(ctx, arg);
- break;
- case IORING_REGISTER_RING_FDS:
- ret = io_ringfd_register(ctx, arg, nr_args);
- break;
- case IORING_UNREGISTER_RING_FDS:
- ret = io_ringfd_unregister(ctx, arg, nr_args);
- break;
- default:
- ret = -EINVAL;
- break;
- }
-
- return ret;
-}
-
-SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode,
- void __user *, arg, unsigned int, nr_args)
-{
- struct io_ring_ctx *ctx;
- long ret = -EBADF;
- struct fd f;
-
- f = fdget(fd);
- if (!f.file)
- return -EBADF;
-
- ret = -EOPNOTSUPP;
- if (f.file->f_op != &io_uring_fops)
- goto out_fput;
-
- ctx = f.file->private_data;
-
- io_run_task_work();
-
- mutex_lock(&ctx->uring_lock);
- ret = __io_uring_register(ctx, opcode, arg, nr_args);
- mutex_unlock(&ctx->uring_lock);
- trace_io_uring_register(ctx, opcode, ctx->nr_user_files, ctx->nr_user_bufs, ret);
-out_fput:
- fdput(f);
- return ret;
-}
-
-static int __init io_uring_init(void)
-{
-#define __BUILD_BUG_VERIFY_ELEMENT(stype, eoffset, etype, ename) do { \
- BUILD_BUG_ON(offsetof(stype, ename) != eoffset); \
- BUILD_BUG_ON(sizeof(etype) != sizeof_field(stype, ename)); \
-} while (0)
-
-#define BUILD_BUG_SQE_ELEM(eoffset, etype, ename) \
- __BUILD_BUG_VERIFY_ELEMENT(struct io_uring_sqe, eoffset, etype, ename)
- BUILD_BUG_ON(sizeof(struct io_uring_sqe) != 64);
- BUILD_BUG_SQE_ELEM(0, __u8, opcode);
- BUILD_BUG_SQE_ELEM(1, __u8, flags);
- BUILD_BUG_SQE_ELEM(2, __u16, ioprio);
- BUILD_BUG_SQE_ELEM(4, __s32, fd);
- BUILD_BUG_SQE_ELEM(8, __u64, off);
- BUILD_BUG_SQE_ELEM(8, __u64, addr2);
- BUILD_BUG_SQE_ELEM(16, __u64, addr);
- BUILD_BUG_SQE_ELEM(16, __u64, splice_off_in);
- BUILD_BUG_SQE_ELEM(24, __u32, len);
- BUILD_BUG_SQE_ELEM(28, __kernel_rwf_t, rw_flags);
- BUILD_BUG_SQE_ELEM(28, /* compat */ int, rw_flags);
- BUILD_BUG_SQE_ELEM(28, /* compat */ __u32, rw_flags);
- BUILD_BUG_SQE_ELEM(28, __u32, fsync_flags);
- BUILD_BUG_SQE_ELEM(28, /* compat */ __u16, poll_events);
- BUILD_BUG_SQE_ELEM(28, __u32, poll32_events);
- BUILD_BUG_SQE_ELEM(28, __u32, sync_range_flags);
- BUILD_BUG_SQE_ELEM(28, __u32, msg_flags);
- BUILD_BUG_SQE_ELEM(28, __u32, timeout_flags);
- BUILD_BUG_SQE_ELEM(28, __u32, accept_flags);
- BUILD_BUG_SQE_ELEM(28, __u32, cancel_flags);
- BUILD_BUG_SQE_ELEM(28, __u32, open_flags);
- BUILD_BUG_SQE_ELEM(28, __u32, statx_flags);
- BUILD_BUG_SQE_ELEM(28, __u32, fadvise_advice);
- BUILD_BUG_SQE_ELEM(28, __u32, splice_flags);
- BUILD_BUG_SQE_ELEM(32, __u64, user_data);
- BUILD_BUG_SQE_ELEM(40, __u16, buf_index);
- BUILD_BUG_SQE_ELEM(40, __u16, buf_group);
- BUILD_BUG_SQE_ELEM(42, __u16, personality);
- BUILD_BUG_SQE_ELEM(44, __s32, splice_fd_in);
- BUILD_BUG_SQE_ELEM(44, __u32, file_index);
-
- BUILD_BUG_ON(sizeof(struct io_uring_files_update) !=
- sizeof(struct io_uring_rsrc_update));
- BUILD_BUG_ON(sizeof(struct io_uring_rsrc_update) >
- sizeof(struct io_uring_rsrc_update2));
-
- /* ->buf_index is u16 */
- BUILD_BUG_ON(IORING_MAX_REG_BUFFERS >= (1u << 16));
-
- /* should fit into one byte */
- BUILD_BUG_ON(SQE_VALID_FLAGS >= (1 << 8));
- BUILD_BUG_ON(SQE_COMMON_FLAGS >= (1 << 8));
- BUILD_BUG_ON((SQE_VALID_FLAGS | SQE_COMMON_FLAGS) != SQE_VALID_FLAGS);
-
- BUILD_BUG_ON(ARRAY_SIZE(io_op_defs) != IORING_OP_LAST);
- BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(int));
-
- req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC |
- SLAB_ACCOUNT);
- return 0;
-};
-__initcall(io_uring_init);
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index ac7f067b7bdd..f306b52b8e0b 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -551,13 +551,13 @@ void jbd2_journal_commit_transaction(journal_t *journal)
*/
jbd2_journal_switch_revoke_table(journal);

+ write_lock(&journal->j_state_lock);
/*
* Reserved credits cannot be claimed anymore, free them
*/
atomic_sub(atomic_read(&journal->j_reserved_credits),
&commit_transaction->t_outstanding_credits);

- write_lock(&journal->j_state_lock);
trace_jbd2_commit_flushing(journal, commit_transaction);
stats.run.rs_flushing = jiffies;
stats.run.rs_locked = jbd2_time_diff(stats.run.rs_locked,
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index fcb9175016a5..49f109699933 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1486,8 +1486,6 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh)
struct journal_head *jh;
int ret = 0;

- if (is_handle_aborted(handle))
- return -EROFS;
if (!buffer_jbd(bh))
return -EUCLEAN;

@@ -1534,6 +1532,18 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh)
journal = transaction->t_journal;
spin_lock(&jh->b_state_lock);

+ if (is_handle_aborted(handle)) {
+ /*
+ * Check journal aborting with @jh->b_state_lock locked,
+ * since 'jh->b_transaction' could be replaced with
+ * 'jh->b_next_transaction' during old transaction
+ * committing if journal aborted, which may fail
+ * assertion on 'jh->b_frozen_data == NULL'.
+ */
+ ret = -EROFS;
+ goto out_unlock_bh;
+ }
+
if (jh->b_modified == 0) {
/*
* This buffer's got modified and becoming part
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 6eca72cfa1f2..1cc88ba6de90 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1343,14 +1343,17 @@ static void __kernfs_remove(struct kernfs_node *kn)
{
struct kernfs_node *pos;

+ /* Short-circuit if non-root @kn has already finished removal. */
+ if (!kn)
+ return;
+
lockdep_assert_held_write(&kernfs_root(kn)->kernfs_rwsem);

/*
- * Short-circuit if non-root @kn has already finished removal.
* This is for kernfs_remove_self() which plays with active ref
* after removal.
*/
- if (!kn || (kn->parent && RB_EMPTY_NODE(&kn->rb)))
+ if (kn->parent && RB_EMPTY_NODE(&kn->rb))
return;

pr_debug("kernfs %s: removing\n", kn->name);
diff --git a/fs/ksmbd/connection.c b/fs/ksmbd/connection.c
index bc6050b67256..e8f476c5f189 100644
--- a/fs/ksmbd/connection.c
+++ b/fs/ksmbd/connection.c
@@ -205,31 +205,31 @@ int ksmbd_conn_write(struct ksmbd_work *work)
return 0;
}

-int ksmbd_conn_rdma_read(struct ksmbd_conn *conn, void *buf,
- unsigned int buflen, u32 remote_key, u64 remote_offset,
- u32 remote_len)
+int ksmbd_conn_rdma_read(struct ksmbd_conn *conn,
+ void *buf, unsigned int buflen,
+ struct smb2_buffer_desc_v1 *desc,
+ unsigned int desc_len)
{
int ret = -EINVAL;

if (conn->transport->ops->rdma_read)
ret = conn->transport->ops->rdma_read(conn->transport,
buf, buflen,
- remote_key, remote_offset,
- remote_len);
+ desc, desc_len);
return ret;
}

-int ksmbd_conn_rdma_write(struct ksmbd_conn *conn, void *buf,
- unsigned int buflen, u32 remote_key,
- u64 remote_offset, u32 remote_len)
+int ksmbd_conn_rdma_write(struct ksmbd_conn *conn,
+ void *buf, unsigned int buflen,
+ struct smb2_buffer_desc_v1 *desc,
+ unsigned int desc_len)
{
int ret = -EINVAL;

if (conn->transport->ops->rdma_write)
ret = conn->transport->ops->rdma_write(conn->transport,
buf, buflen,
- remote_key, remote_offset,
- remote_len);
+ desc, desc_len);
return ret;
}

diff --git a/fs/ksmbd/connection.h b/fs/ksmbd/connection.h
index 7a59aacb5daa..98c1cbe45ec9 100644
--- a/fs/ksmbd/connection.h
+++ b/fs/ksmbd/connection.h
@@ -122,11 +122,14 @@ struct ksmbd_transport_ops {
int (*writev)(struct ksmbd_transport *t, struct kvec *iovs, int niov,
int size, bool need_invalidate_rkey,
unsigned int remote_key);
- int (*rdma_read)(struct ksmbd_transport *t, void *buf, unsigned int len,
- u32 remote_key, u64 remote_offset, u32 remote_len);
- int (*rdma_write)(struct ksmbd_transport *t, void *buf,
- unsigned int len, u32 remote_key, u64 remote_offset,
- u32 remote_len);
+ int (*rdma_read)(struct ksmbd_transport *t,
+ void *buf, unsigned int len,
+ struct smb2_buffer_desc_v1 *desc,
+ unsigned int desc_len);
+ int (*rdma_write)(struct ksmbd_transport *t,
+ void *buf, unsigned int len,
+ struct smb2_buffer_desc_v1 *desc,
+ unsigned int desc_len);
};

struct ksmbd_transport {
@@ -148,12 +151,14 @@ struct ksmbd_conn *ksmbd_conn_alloc(void);
void ksmbd_conn_free(struct ksmbd_conn *conn);
bool ksmbd_conn_lookup_dialect(struct ksmbd_conn *c);
int ksmbd_conn_write(struct ksmbd_work *work);
-int ksmbd_conn_rdma_read(struct ksmbd_conn *conn, void *buf,
- unsigned int buflen, u32 remote_key, u64 remote_offset,
- u32 remote_len);
-int ksmbd_conn_rdma_write(struct ksmbd_conn *conn, void *buf,
- unsigned int buflen, u32 remote_key, u64 remote_offset,
- u32 remote_len);
+int ksmbd_conn_rdma_read(struct ksmbd_conn *conn,
+ void *buf, unsigned int buflen,
+ struct smb2_buffer_desc_v1 *desc,
+ unsigned int desc_len);
+int ksmbd_conn_rdma_write(struct ksmbd_conn *conn,
+ void *buf, unsigned int buflen,
+ struct smb2_buffer_desc_v1 *desc,
+ unsigned int desc_len);
void ksmbd_conn_enqueue_request(struct ksmbd_work *work);
int ksmbd_conn_try_dequeue_request(struct ksmbd_work *work);
void ksmbd_conn_init_server_callbacks(struct ksmbd_conn_ops *ops);
diff --git a/fs/ksmbd/ksmbd_netlink.h b/fs/ksmbd/ksmbd_netlink.h
index ebe6ca08467a..52aa0adeb951 100644
--- a/fs/ksmbd/ksmbd_netlink.h
+++ b/fs/ksmbd/ksmbd_netlink.h
@@ -104,7 +104,8 @@ struct ksmbd_startup_request {
*/
__u32 sub_auth[3]; /* Subauth value for Security ID */
__u32 smb2_max_credits; /* MAX credits */
- __u32 reserved[128]; /* Reserved room */
+ __u32 smbd_max_io_size; /* smbd read write size */
+ __u32 reserved[127]; /* Reserved room */
__u32 ifc_list_sz; /* interfaces list size */
__s8 ____payload[];
};
diff --git a/fs/ksmbd/smb2misc.c b/fs/ksmbd/smb2misc.c
index f8f456377a51..6e25ace36568 100644
--- a/fs/ksmbd/smb2misc.c
+++ b/fs/ksmbd/smb2misc.c
@@ -90,11 +90,6 @@ static int smb2_get_data_area_len(unsigned int *off, unsigned int *len,
*off = 0;
*len = 0;

- /* error reqeusts do not have data area */
- if (hdr->Status && hdr->Status != STATUS_MORE_PROCESSING_REQUIRED &&
- (((struct smb2_err_rsp *)hdr)->StructureSize) == SMB2_ERROR_STRUCTURE_SIZE2_LE)
- return ret;
-
/*
* Following commands have data areas so we have to get the location
* of the data buffer offset and data buffer length for the particular
@@ -136,8 +131,11 @@ static int smb2_get_data_area_len(unsigned int *off, unsigned int *len,
*len = le16_to_cpu(((struct smb2_read_req *)hdr)->ReadChannelInfoLength);
break;
case SMB2_WRITE:
- if (((struct smb2_write_req *)hdr)->DataOffset) {
- *off = le16_to_cpu(((struct smb2_write_req *)hdr)->DataOffset);
+ if (((struct smb2_write_req *)hdr)->DataOffset ||
+ ((struct smb2_write_req *)hdr)->Length) {
+ *off = max_t(unsigned int,
+ le16_to_cpu(((struct smb2_write_req *)hdr)->DataOffset),
+ offsetof(struct smb2_write_req, Buffer));
*len = le32_to_cpu(((struct smb2_write_req *)hdr)->Length);
break;
}
diff --git a/fs/ksmbd/smb2pdu.c b/fs/ksmbd/smb2pdu.c
index 31138e9be1dc..85a9ed7156ea 100644
--- a/fs/ksmbd/smb2pdu.c
+++ b/fs/ksmbd/smb2pdu.c
@@ -535,9 +535,10 @@ int smb2_allocate_rsp_buf(struct ksmbd_work *work)
struct smb2_query_info_req *req;

req = smb2_get_msg(work->request_buf);
- if (req->InfoType == SMB2_O_INFO_FILE &&
- (req->FileInfoClass == FILE_FULL_EA_INFORMATION ||
- req->FileInfoClass == FILE_ALL_INFORMATION))
+ if ((req->InfoType == SMB2_O_INFO_FILE &&
+ (req->FileInfoClass == FILE_FULL_EA_INFORMATION ||
+ req->FileInfoClass == FILE_ALL_INFORMATION)) ||
+ req->InfoType == SMB2_O_INFO_SECURITY)
sz = large_sz;
}

@@ -1139,12 +1140,16 @@ int smb2_handle_negotiate(struct ksmbd_work *work)
status);
rsp->hdr.Status = status;
rc = -EINVAL;
+ kfree(conn->preauth_info);
+ conn->preauth_info = NULL;
goto err_out;
}

rc = init_smb3_11_server(conn);
if (rc < 0) {
rsp->hdr.Status = STATUS_INVALID_PARAMETER;
+ kfree(conn->preauth_info);
+ conn->preauth_info = NULL;
goto err_out;
}

@@ -2039,6 +2044,7 @@ int smb2_tree_disconnect(struct ksmbd_work *work)

ksmbd_close_tree_conn_fds(work);
ksmbd_tree_conn_disconnect(sess, tcon);
+ work->tcon = NULL;
return 0;
}

@@ -2969,7 +2975,7 @@ int smb2_open(struct ksmbd_work *work)
goto err_out;

rc = build_sec_desc(user_ns,
- pntsd, NULL,
+ pntsd, NULL, 0,
OWNER_SECINFO |
GROUP_SECINFO |
DACL_SECINFO,
@@ -3814,6 +3820,15 @@ static int verify_info_level(int info_level)
return 0;
}

+static int smb2_resp_buf_len(struct ksmbd_work *work, unsigned short hdr2_len)
+{
+ int free_len;
+
+ free_len = (int)(work->response_sz -
+ (get_rfc1002_len(work->response_buf) + 4)) - hdr2_len;
+ return free_len;
+}
+
static int smb2_calc_max_out_buf_len(struct ksmbd_work *work,
unsigned short hdr2_len,
unsigned int out_buf_len)
@@ -3823,9 +3838,7 @@ static int smb2_calc_max_out_buf_len(struct ksmbd_work *work,
if (out_buf_len > work->conn->vals->max_trans_size)
return -EINVAL;

- free_len = (int)(work->response_sz -
- (get_rfc1002_len(work->response_buf) + 4)) -
- hdr2_len;
+ free_len = smb2_resp_buf_len(work, hdr2_len);
if (free_len < 0)
return -EINVAL;

@@ -5080,10 +5093,10 @@ static int smb2_get_info_sec(struct ksmbd_work *work,
struct smb_ntsd *pntsd = (struct smb_ntsd *)rsp->Buffer, *ppntsd = NULL;
struct smb_fattr fattr = {{0}};
struct inode *inode;
- __u32 secdesclen;
+ __u32 secdesclen = 0;
unsigned int id = KSMBD_NO_FID, pid = KSMBD_NO_FID;
int addition_info = le32_to_cpu(req->AdditionalInformation);
- int rc;
+ int rc = 0, ppntsd_size = 0;

if (addition_info & ~(OWNER_SECINFO | GROUP_SECINFO | DACL_SECINFO |
PROTECTED_DACL_SECINFO |
@@ -5129,11 +5142,14 @@ static int smb2_get_info_sec(struct ksmbd_work *work,

if (test_share_config_flag(work->tcon->share_conf,
KSMBD_SHARE_FLAG_ACL_XATTR))
- ksmbd_vfs_get_sd_xattr(work->conn, user_ns,
- fp->filp->f_path.dentry, &ppntsd);
-
- rc = build_sec_desc(user_ns, pntsd, ppntsd, addition_info,
- &secdesclen, &fattr);
+ ppntsd_size = ksmbd_vfs_get_sd_xattr(work->conn, user_ns,
+ fp->filp->f_path.dentry,
+ &ppntsd);
+
+ /* Check if sd buffer size exceeds response buffer size */
+ if (smb2_resp_buf_len(work, 8) > ppntsd_size)
+ rc = build_sec_desc(user_ns, pntsd, ppntsd, ppntsd_size,
+ addition_info, &secdesclen, &fattr);
posix_acl_release(fattr.cf_acls);
posix_acl_release(fattr.cf_dacls);
kfree(ppntsd);
@@ -6116,7 +6132,6 @@ static noinline int smb2_read_pipe(struct ksmbd_work *work)
static int smb2_set_remote_key_for_rdma(struct ksmbd_work *work,
struct smb2_buffer_desc_v1 *desc,
__le32 Channel,
- __le16 ChannelInfoOffset,
__le16 ChannelInfoLength)
{
unsigned int i, ch_count;
@@ -6142,7 +6157,8 @@ static int smb2_set_remote_key_for_rdma(struct ksmbd_work *work,

work->need_invalidate_rkey =
(Channel == SMB2_CHANNEL_RDMA_V1_INVALIDATE);
- work->remote_key = le32_to_cpu(desc->token);
+ if (Channel == SMB2_CHANNEL_RDMA_V1_INVALIDATE)
+ work->remote_key = le32_to_cpu(desc->token);
return 0;
}

@@ -6150,14 +6166,12 @@ static ssize_t smb2_read_rdma_channel(struct ksmbd_work *work,
struct smb2_read_req *req, void *data_buf,
size_t length)
{
- struct smb2_buffer_desc_v1 *desc =
- (struct smb2_buffer_desc_v1 *)&req->Buffer[0];
int err;

err = ksmbd_conn_rdma_write(work->conn, data_buf, length,
- le32_to_cpu(desc->token),
- le64_to_cpu(desc->offset),
- le32_to_cpu(desc->length));
+ (struct smb2_buffer_desc_v1 *)
+ ((char *)req + le16_to_cpu(req->ReadChannelInfoOffset)),
+ le16_to_cpu(req->ReadChannelInfoLength));
if (err)
return err;

@@ -6180,6 +6194,8 @@ int smb2_read(struct ksmbd_work *work)
size_t length, mincount;
ssize_t nbytes = 0, remain_bytes = 0;
int err = 0;
+ bool is_rdma_channel = false;
+ unsigned int max_read_size = conn->vals->max_read_size;

WORK_BUFFERS(work, req, rsp);

@@ -6191,6 +6207,11 @@ int smb2_read(struct ksmbd_work *work)

if (req->Channel == SMB2_CHANNEL_RDMA_V1_INVALIDATE ||
req->Channel == SMB2_CHANNEL_RDMA_V1) {
+ is_rdma_channel = true;
+ max_read_size = get_smbd_max_read_write_size();
+ }
+
+ if (is_rdma_channel == true) {
unsigned int ch_offset = le16_to_cpu(req->ReadChannelInfoOffset);

if (ch_offset < offsetof(struct smb2_read_req, Buffer)) {
@@ -6201,7 +6222,6 @@ int smb2_read(struct ksmbd_work *work)
(struct smb2_buffer_desc_v1 *)
((char *)req + ch_offset),
req->Channel,
- req->ReadChannelInfoOffset,
req->ReadChannelInfoLength);
if (err)
goto out;
@@ -6223,9 +6243,9 @@ int smb2_read(struct ksmbd_work *work)
length = le32_to_cpu(req->Length);
mincount = le32_to_cpu(req->MinimumCount);

- if (length > conn->vals->max_read_size) {
+ if (length > max_read_size) {
ksmbd_debug(SMB, "limiting read size to max size(%u)\n",
- conn->vals->max_read_size);
+ max_read_size);
err = -EINVAL;
goto out;
}
@@ -6257,8 +6277,7 @@ int smb2_read(struct ksmbd_work *work)
ksmbd_debug(SMB, "nbytes %zu, offset %lld mincount %zu\n",
nbytes, offset, mincount);

- if (req->Channel == SMB2_CHANNEL_RDMA_V1_INVALIDATE ||
- req->Channel == SMB2_CHANNEL_RDMA_V1) {
+ if (is_rdma_channel == true) {
/* write data to the client using rdma channel */
remain_bytes = smb2_read_rdma_channel(work, req,
work->aux_payload_buf,
@@ -6328,23 +6347,18 @@ static noinline int smb2_write_pipe(struct ksmbd_work *work)
length = le32_to_cpu(req->Length);
id = req->VolatileFileId;

- if (le16_to_cpu(req->DataOffset) ==
- offsetof(struct smb2_write_req, Buffer)) {
- data_buf = (char *)&req->Buffer[0];
- } else {
- if ((u64)le16_to_cpu(req->DataOffset) + length >
- get_rfc1002_len(work->request_buf)) {
- pr_err("invalid write data offset %u, smb_len %u\n",
- le16_to_cpu(req->DataOffset),
- get_rfc1002_len(work->request_buf));
- err = -EINVAL;
- goto out;
- }
-
- data_buf = (char *)(((char *)&req->hdr.ProtocolId) +
- le16_to_cpu(req->DataOffset));
+ if ((u64)le16_to_cpu(req->DataOffset) + length >
+ get_rfc1002_len(work->request_buf)) {
+ pr_err("invalid write data offset %u, smb_len %u\n",
+ le16_to_cpu(req->DataOffset),
+ get_rfc1002_len(work->request_buf));
+ err = -EINVAL;
+ goto out;
}

+ data_buf = (char *)(((char *)&req->hdr.ProtocolId) +
+ le16_to_cpu(req->DataOffset));
+
rpc_resp = ksmbd_rpc_write(work->sess, id, data_buf, length);
if (rpc_resp) {
if (rpc_resp->flags == KSMBD_RPC_ENOTIMPLEMENTED) {
@@ -6384,21 +6398,18 @@ static ssize_t smb2_write_rdma_channel(struct ksmbd_work *work,
struct ksmbd_file *fp,
loff_t offset, size_t length, bool sync)
{
- struct smb2_buffer_desc_v1 *desc;
char *data_buf;
int ret;
ssize_t nbytes;

- desc = (struct smb2_buffer_desc_v1 *)&req->Buffer[0];
-
data_buf = kvmalloc(length, GFP_KERNEL | __GFP_ZERO);
if (!data_buf)
return -ENOMEM;

ret = ksmbd_conn_rdma_read(work->conn, data_buf, length,
- le32_to_cpu(desc->token),
- le64_to_cpu(desc->offset),
- le32_to_cpu(desc->length));
+ (struct smb2_buffer_desc_v1 *)
+ ((char *)req + le16_to_cpu(req->WriteChannelInfoOffset)),
+ le16_to_cpu(req->WriteChannelInfoLength));
if (ret < 0) {
kvfree(data_buf);
return ret;
@@ -6427,8 +6438,9 @@ int smb2_write(struct ksmbd_work *work)
size_t length;
ssize_t nbytes;
char *data_buf;
- bool writethrough = false;
+ bool writethrough = false, is_rdma_channel = false;
int err = 0;
+ unsigned int max_write_size = work->conn->vals->max_write_size;

WORK_BUFFERS(work, req, rsp);

@@ -6437,8 +6449,17 @@ int smb2_write(struct ksmbd_work *work)
return smb2_write_pipe(work);
}

+ offset = le64_to_cpu(req->Offset);
+ length = le32_to_cpu(req->Length);
+
if (req->Channel == SMB2_CHANNEL_RDMA_V1 ||
req->Channel == SMB2_CHANNEL_RDMA_V1_INVALIDATE) {
+ is_rdma_channel = true;
+ max_write_size = get_smbd_max_read_write_size();
+ length = le32_to_cpu(req->RemainingBytes);
+ }
+
+ if (is_rdma_channel == true) {
unsigned int ch_offset = le16_to_cpu(req->WriteChannelInfoOffset);

if (req->Length != 0 || req->DataOffset != 0 ||
@@ -6450,7 +6471,6 @@ int smb2_write(struct ksmbd_work *work)
(struct smb2_buffer_desc_v1 *)
((char *)req + ch_offset),
req->Channel,
- req->WriteChannelInfoOffset,
req->WriteChannelInfoLength);
if (err)
goto out;
@@ -6474,12 +6494,9 @@ int smb2_write(struct ksmbd_work *work)
goto out;
}

- offset = le64_to_cpu(req->Offset);
- length = le32_to_cpu(req->Length);
-
- if (length > work->conn->vals->max_write_size) {
+ if (length > max_write_size) {
ksmbd_debug(SMB, "limiting write size to max size(%u)\n",
- work->conn->vals->max_write_size);
+ max_write_size);
err = -EINVAL;
goto out;
}
@@ -6487,25 +6504,16 @@ int smb2_write(struct ksmbd_work *work)
if (le32_to_cpu(req->Flags) & SMB2_WRITEFLAG_WRITE_THROUGH)
writethrough = true;

- if (req->Channel != SMB2_CHANNEL_RDMA_V1 &&
- req->Channel != SMB2_CHANNEL_RDMA_V1_INVALIDATE) {
- if (le16_to_cpu(req->DataOffset) ==
+ if (is_rdma_channel == false) {
+ if (le16_to_cpu(req->DataOffset) <
offsetof(struct smb2_write_req, Buffer)) {
- data_buf = (char *)&req->Buffer[0];
- } else {
- if ((u64)le16_to_cpu(req->DataOffset) + length >
- get_rfc1002_len(work->request_buf)) {
- pr_err("invalid write data offset %u, smb_len %u\n",
- le16_to_cpu(req->DataOffset),
- get_rfc1002_len(work->request_buf));
- err = -EINVAL;
- goto out;
- }
-
- data_buf = (char *)(((char *)&req->hdr.ProtocolId) +
- le16_to_cpu(req->DataOffset));
+ err = -EINVAL;
+ goto out;
}

+ data_buf = (char *)(((char *)&req->hdr.ProtocolId) +
+ le16_to_cpu(req->DataOffset));
+
ksmbd_debug(SMB, "flags %u\n", le32_to_cpu(req->Flags));
if (le32_to_cpu(req->Flags) & SMB2_WRITEFLAG_WRITE_THROUGH)
writethrough = true;
@@ -6520,8 +6528,7 @@ int smb2_write(struct ksmbd_work *work)
/* read data from the client using rdma channel, and
* write the data.
*/
- nbytes = smb2_write_rdma_channel(work, req, fp, offset,
- le32_to_cpu(req->RemainingBytes),
+ nbytes = smb2_write_rdma_channel(work, req, fp, offset, length,
writethrough);
if (nbytes < 0) {
err = (int)nbytes;
diff --git a/fs/ksmbd/smbacl.c b/fs/ksmbd/smbacl.c
index 38f23bf981ac..3781bca2c8fc 100644
--- a/fs/ksmbd/smbacl.c
+++ b/fs/ksmbd/smbacl.c
@@ -690,6 +690,7 @@ static void set_posix_acl_entries_dacl(struct user_namespace *user_ns,
static void set_ntacl_dacl(struct user_namespace *user_ns,
struct smb_acl *pndacl,
struct smb_acl *nt_dacl,
+ unsigned int aces_size,
const struct smb_sid *pownersid,
const struct smb_sid *pgrpsid,
struct smb_fattr *fattr)
@@ -703,9 +704,19 @@ static void set_ntacl_dacl(struct user_namespace *user_ns,
if (nt_num_aces) {
ntace = (struct smb_ace *)((char *)nt_dacl + sizeof(struct smb_acl));
for (i = 0; i < nt_num_aces; i++) {
- memcpy((char *)pndace + size, ntace, le16_to_cpu(ntace->size));
- size += le16_to_cpu(ntace->size);
- ntace = (struct smb_ace *)((char *)ntace + le16_to_cpu(ntace->size));
+ unsigned short nt_ace_size;
+
+ if (offsetof(struct smb_ace, access_req) > aces_size)
+ break;
+
+ nt_ace_size = le16_to_cpu(ntace->size);
+ if (nt_ace_size > aces_size)
+ break;
+
+ memcpy((char *)pndace + size, ntace, nt_ace_size);
+ size += nt_ace_size;
+ aces_size -= nt_ace_size;
+ ntace = (struct smb_ace *)((char *)ntace + nt_ace_size);
num_aces++;
}
}
@@ -878,7 +889,7 @@ int parse_sec_desc(struct user_namespace *user_ns, struct smb_ntsd *pntsd,
/* Convert permission bits from mode to equivalent CIFS ACL */
int build_sec_desc(struct user_namespace *user_ns,
struct smb_ntsd *pntsd, struct smb_ntsd *ppntsd,
- int addition_info, __u32 *secdesclen,
+ int ppntsd_size, int addition_info, __u32 *secdesclen,
struct smb_fattr *fattr)
{
int rc = 0;
@@ -938,15 +949,25 @@ int build_sec_desc(struct user_namespace *user_ns,

if (!ppntsd) {
set_mode_dacl(user_ns, dacl_ptr, fattr);
- } else if (!ppntsd->dacloffset) {
- goto out;
} else {
struct smb_acl *ppdacl_ptr;
+ unsigned int dacl_offset = le32_to_cpu(ppntsd->dacloffset);
+ int ppdacl_size, ntacl_size = ppntsd_size - dacl_offset;
+
+ if (!dacl_offset ||
+ (dacl_offset + sizeof(struct smb_acl) > ppntsd_size))
+ goto out;
+
+ ppdacl_ptr = (struct smb_acl *)((char *)ppntsd + dacl_offset);
+ ppdacl_size = le16_to_cpu(ppdacl_ptr->size);
+ if (ppdacl_size > ntacl_size ||
+ ppdacl_size < sizeof(struct smb_acl))
+ goto out;

- ppdacl_ptr = (struct smb_acl *)((char *)ppntsd +
- le32_to_cpu(ppntsd->dacloffset));
set_ntacl_dacl(user_ns, dacl_ptr, ppdacl_ptr,
- nowner_sid_ptr, ngroup_sid_ptr, fattr);
+ ntacl_size - sizeof(struct smb_acl),
+ nowner_sid_ptr, ngroup_sid_ptr,
+ fattr);
}
pntsd->dacloffset = cpu_to_le32(offset);
offset += le16_to_cpu(dacl_ptr->size);
@@ -980,24 +1001,31 @@ int smb_inherit_dacl(struct ksmbd_conn *conn,
struct smb_sid owner_sid, group_sid;
struct dentry *parent = path->dentry->d_parent;
struct user_namespace *user_ns = mnt_user_ns(path->mnt);
- int inherited_flags = 0, flags = 0, i, ace_cnt = 0, nt_size = 0;
- int rc = 0, num_aces, dacloffset, pntsd_type, acl_len;
+ int inherited_flags = 0, flags = 0, i, ace_cnt = 0, nt_size = 0, pdacl_size;
+ int rc = 0, num_aces, dacloffset, pntsd_type, pntsd_size, acl_len, aces_size;
char *aces_base;
bool is_dir = S_ISDIR(d_inode(path->dentry)->i_mode);

- acl_len = ksmbd_vfs_get_sd_xattr(conn, user_ns,
- parent, &parent_pntsd);
- if (acl_len <= 0)
+ pntsd_size = ksmbd_vfs_get_sd_xattr(conn, user_ns,
+ parent, &parent_pntsd);
+ if (pntsd_size <= 0)
return -ENOENT;
dacloffset = le32_to_cpu(parent_pntsd->dacloffset);
- if (!dacloffset) {
+ if (!dacloffset || (dacloffset + sizeof(struct smb_acl) > pntsd_size)) {
rc = -EINVAL;
goto free_parent_pntsd;
}

parent_pdacl = (struct smb_acl *)((char *)parent_pntsd + dacloffset);
+ acl_len = pntsd_size - dacloffset;
num_aces = le32_to_cpu(parent_pdacl->num_aces);
pntsd_type = le16_to_cpu(parent_pntsd->type);
+ pdacl_size = le16_to_cpu(parent_pdacl->size);
+
+ if (pdacl_size > acl_len || pdacl_size < sizeof(struct smb_acl)) {
+ rc = -EINVAL;
+ goto free_parent_pntsd;
+ }

aces_base = kmalloc(sizeof(struct smb_ace) * num_aces * 2, GFP_KERNEL);
if (!aces_base) {
@@ -1008,11 +1036,23 @@ int smb_inherit_dacl(struct ksmbd_conn *conn,
aces = (struct smb_ace *)aces_base;
parent_aces = (struct smb_ace *)((char *)parent_pdacl +
sizeof(struct smb_acl));
+ aces_size = acl_len - sizeof(struct smb_acl);

if (pntsd_type & DACL_AUTO_INHERITED)
inherited_flags = INHERITED_ACE;

for (i = 0; i < num_aces; i++) {
+ int pace_size;
+
+ if (offsetof(struct smb_ace, access_req) > aces_size)
+ break;
+
+ pace_size = le16_to_cpu(parent_aces->size);
+ if (pace_size > aces_size)
+ break;
+
+ aces_size -= pace_size;
+
flags = parent_aces->flags;
if (!smb_inherit_flags(flags, is_dir))
goto pass;
@@ -1057,8 +1097,7 @@ int smb_inherit_dacl(struct ksmbd_conn *conn,
aces = (struct smb_ace *)((char *)aces + le16_to_cpu(aces->size));
ace_cnt++;
pass:
- parent_aces =
- (struct smb_ace *)((char *)parent_aces + le16_to_cpu(parent_aces->size));
+ parent_aces = (struct smb_ace *)((char *)parent_aces + pace_size);
}

if (nt_size > 0) {
@@ -1153,7 +1192,7 @@ int smb_check_perm_dacl(struct ksmbd_conn *conn, struct path *path,
struct smb_ntsd *pntsd = NULL;
struct smb_acl *pdacl;
struct posix_acl *posix_acls;
- int rc = 0, acl_size;
+ int rc = 0, pntsd_size, acl_size, aces_size, pdacl_size, dacl_offset;
struct smb_sid sid;
int granted = le32_to_cpu(*pdaccess & ~FILE_MAXIMAL_ACCESS_LE);
struct smb_ace *ace;
@@ -1162,37 +1201,33 @@ int smb_check_perm_dacl(struct ksmbd_conn *conn, struct path *path,
struct smb_ace *others_ace = NULL;
struct posix_acl_entry *pa_entry;
unsigned int sid_type = SIDOWNER;
- char *end_of_acl;
+ unsigned short ace_size;

ksmbd_debug(SMB, "check permission using windows acl\n");
- acl_size = ksmbd_vfs_get_sd_xattr(conn, user_ns,
- path->dentry, &pntsd);
- if (acl_size <= 0 || !pntsd || !pntsd->dacloffset) {
- kfree(pntsd);
- return 0;
- }
+ pntsd_size = ksmbd_vfs_get_sd_xattr(conn, user_ns,
+ path->dentry, &pntsd);
+ if (pntsd_size <= 0 || !pntsd)
+ goto err_out;
+
+ dacl_offset = le32_to_cpu(pntsd->dacloffset);
+ if (!dacl_offset ||
+ (dacl_offset + sizeof(struct smb_acl) > pntsd_size))
+ goto err_out;

pdacl = (struct smb_acl *)((char *)pntsd + le32_to_cpu(pntsd->dacloffset));
- end_of_acl = ((char *)pntsd) + acl_size;
- if (end_of_acl <= (char *)pdacl) {
- kfree(pntsd);
- return 0;
- }
+ acl_size = pntsd_size - dacl_offset;
+ pdacl_size = le16_to_cpu(pdacl->size);

- if (end_of_acl < (char *)pdacl + le16_to_cpu(pdacl->size) ||
- le16_to_cpu(pdacl->size) < sizeof(struct smb_acl)) {
- kfree(pntsd);
- return 0;
- }
+ if (pdacl_size > acl_size || pdacl_size < sizeof(struct smb_acl))
+ goto err_out;

if (!pdacl->num_aces) {
- if (!(le16_to_cpu(pdacl->size) - sizeof(struct smb_acl)) &&
+ if (!(pdacl_size - sizeof(struct smb_acl)) &&
*pdaccess & ~(FILE_READ_CONTROL_LE | FILE_WRITE_DAC_LE)) {
rc = -EACCES;
goto err_out;
}
- kfree(pntsd);
- return 0;
+ goto err_out;
}

if (*pdaccess & FILE_MAXIMAL_ACCESS_LE) {
@@ -1200,11 +1235,16 @@ int smb_check_perm_dacl(struct ksmbd_conn *conn, struct path *path,
DELETE;

ace = (struct smb_ace *)((char *)pdacl + sizeof(struct smb_acl));
+ aces_size = acl_size - sizeof(struct smb_acl);
for (i = 0; i < le32_to_cpu(pdacl->num_aces); i++) {
+ if (offsetof(struct smb_ace, access_req) > aces_size)
+ break;
+ ace_size = le16_to_cpu(ace->size);
+ if (ace_size > aces_size)
+ break;
+ aces_size -= ace_size;
granted |= le32_to_cpu(ace->access_req);
ace = (struct smb_ace *)((char *)ace + le16_to_cpu(ace->size));
- if (end_of_acl < (char *)ace)
- goto err_out;
}

if (!pdacl->num_aces)
@@ -1216,7 +1256,15 @@ int smb_check_perm_dacl(struct ksmbd_conn *conn, struct path *path,
id_to_sid(uid, sid_type, &sid);

ace = (struct smb_ace *)((char *)pdacl + sizeof(struct smb_acl));
+ aces_size = acl_size - sizeof(struct smb_acl);
for (i = 0; i < le32_to_cpu(pdacl->num_aces); i++) {
+ if (offsetof(struct smb_ace, access_req) > aces_size)
+ break;
+ ace_size = le16_to_cpu(ace->size);
+ if (ace_size > aces_size)
+ break;
+ aces_size -= ace_size;
+
if (!compare_sids(&sid, &ace->sid) ||
!compare_sids(&sid_unix_NFS_mode, &ace->sid)) {
found = 1;
@@ -1226,8 +1274,6 @@ int smb_check_perm_dacl(struct ksmbd_conn *conn, struct path *path,
others_ace = ace;

ace = (struct smb_ace *)((char *)ace + le16_to_cpu(ace->size));
- if (end_of_acl < (char *)ace)
- goto err_out;
}

if (*pdaccess & FILE_MAXIMAL_ACCESS_LE && found) {
diff --git a/fs/ksmbd/smbacl.h b/fs/ksmbd/smbacl.h
index 811af3309429..fcb2c83f2992 100644
--- a/fs/ksmbd/smbacl.h
+++ b/fs/ksmbd/smbacl.h
@@ -193,7 +193,7 @@ struct posix_acl_state {
int parse_sec_desc(struct user_namespace *user_ns, struct smb_ntsd *pntsd,
int acl_len, struct smb_fattr *fattr);
int build_sec_desc(struct user_namespace *user_ns, struct smb_ntsd *pntsd,
- struct smb_ntsd *ppntsd, int addition_info,
+ struct smb_ntsd *ppntsd, int ppntsd_size, int addition_info,
__u32 *secdesclen, struct smb_fattr *fattr);
int init_acl_state(struct posix_acl_state *state, int cnt);
void free_acl_state(struct posix_acl_state *state);
diff --git a/fs/ksmbd/transport_ipc.c b/fs/ksmbd/transport_ipc.c
index 3ad6881e0f7e..7cb0eeb07c80 100644
--- a/fs/ksmbd/transport_ipc.c
+++ b/fs/ksmbd/transport_ipc.c
@@ -26,6 +26,7 @@
#include "mgmt/ksmbd_ida.h"
#include "connection.h"
#include "transport_tcp.h"
+#include "transport_rdma.h"

#define IPC_WAIT_TIMEOUT (2 * HZ)

@@ -303,6 +304,8 @@ static int ipc_server_config_on_startup(struct ksmbd_startup_request *req)
init_smb2_max_trans_size(req->smb2_max_trans);
if (req->smb2_max_credits)
init_smb2_max_credits(req->smb2_max_credits);
+ if (req->smbd_max_io_size)
+ init_smbd_max_io_size(req->smbd_max_io_size);

ret = ksmbd_set_netbios_name(req->netbios_name);
ret |= ksmbd_set_server_string(req->server_string);
diff --git a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c
index 3f5d13571694..c6af8d89b7f7 100644
--- a/fs/ksmbd/transport_rdma.c
+++ b/fs/ksmbd/transport_rdma.c
@@ -80,9 +80,7 @@ static int smb_direct_max_fragmented_recv_size = 1024 * 1024;
/* The maximum single-message size which can be received */
static int smb_direct_max_receive_size = 8192;

-static int smb_direct_max_read_write_size = 524224;
-
-static int smb_direct_max_outstanding_rw_ops = 8;
+static int smb_direct_max_read_write_size = SMBD_DEFAULT_IOSIZE;

static LIST_HEAD(smb_direct_device_list);
static DEFINE_RWLOCK(smb_direct_device_lock);
@@ -147,10 +145,12 @@ struct smb_direct_transport {
atomic_t send_credits;
spinlock_t lock_new_recv_credits;
int new_recv_credits;
- atomic_t rw_avail_ops;
+ int max_rw_credits;
+ int pages_per_rw_credit;
+ atomic_t rw_credits;

wait_queue_head_t wait_send_credits;
- wait_queue_head_t wait_rw_avail_ops;
+ wait_queue_head_t wait_rw_credits;

mempool_t *sendmsg_mempool;
struct kmem_cache *sendmsg_cache;
@@ -214,6 +214,17 @@ struct smb_direct_rdma_rw_msg {
struct scatterlist sg_list[];
};

+void init_smbd_max_io_size(unsigned int sz)
+{
+ sz = clamp_val(sz, SMBD_MIN_IOSIZE, SMBD_MAX_IOSIZE);
+ smb_direct_max_read_write_size = sz;
+}
+
+unsigned int get_smbd_max_read_write_size(void)
+{
+ return smb_direct_max_read_write_size;
+}
+
static inline int get_buf_page_count(void *buf, int size)
{
return DIV_ROUND_UP((uintptr_t)buf + size, PAGE_SIZE) -
@@ -377,7 +388,7 @@ static struct smb_direct_transport *alloc_transport(struct rdma_cm_id *cm_id)
t->reassembly_queue_length = 0;
init_waitqueue_head(&t->wait_reassembly_queue);
init_waitqueue_head(&t->wait_send_credits);
- init_waitqueue_head(&t->wait_rw_avail_ops);
+ init_waitqueue_head(&t->wait_rw_credits);

spin_lock_init(&t->receive_credit_lock);
spin_lock_init(&t->recvmsg_queue_lock);
@@ -984,18 +995,19 @@ static int smb_direct_flush_send_list(struct smb_direct_transport *t,
}

static int wait_for_credits(struct smb_direct_transport *t,
- wait_queue_head_t *waitq, atomic_t *credits)
+ wait_queue_head_t *waitq, atomic_t *total_credits,
+ int needed)
{
int ret;

do {
- if (atomic_dec_return(credits) >= 0)
+ if (atomic_sub_return(needed, total_credits) >= 0)
return 0;

- atomic_inc(credits);
+ atomic_add(needed, total_credits);
ret = wait_event_interruptible(*waitq,
- atomic_read(credits) > 0 ||
- t->status != SMB_DIRECT_CS_CONNECTED);
+ atomic_read(total_credits) >= needed ||
+ t->status != SMB_DIRECT_CS_CONNECTED);

if (t->status != SMB_DIRECT_CS_CONNECTED)
return -ENOTCONN;
@@ -1016,7 +1028,19 @@ static int wait_for_send_credits(struct smb_direct_transport *t,
return ret;
}

- return wait_for_credits(t, &t->wait_send_credits, &t->send_credits);
+ return wait_for_credits(t, &t->wait_send_credits, &t->send_credits, 1);
+}
+
+static int wait_for_rw_credits(struct smb_direct_transport *t, int credits)
+{
+ return wait_for_credits(t, &t->wait_rw_credits, &t->rw_credits, credits);
+}
+
+static int calc_rw_credits(struct smb_direct_transport *t,
+ char *buf, unsigned int len)
+{
+ return DIV_ROUND_UP(get_buf_page_count(buf, len),
+ t->pages_per_rw_credit);
}

static int smb_direct_create_header(struct smb_direct_transport *t,
@@ -1332,8 +1356,8 @@ static void read_write_done(struct ib_cq *cq, struct ib_wc *wc,
smb_direct_disconnect_rdma_connection(t);
}

- if (atomic_inc_return(&t->rw_avail_ops) > 0)
- wake_up(&t->wait_rw_avail_ops);
+ if (atomic_inc_return(&t->rw_credits) > 0)
+ wake_up(&t->wait_rw_credits);

rdma_rw_ctx_destroy(&msg->rw_ctx, t->qp, t->qp->port,
msg->sg_list, msg->sgt.nents, dir);
@@ -1352,16 +1376,22 @@ static void write_done(struct ib_cq *cq, struct ib_wc *wc)
read_write_done(cq, wc, DMA_TO_DEVICE);
}

-static int smb_direct_rdma_xmit(struct smb_direct_transport *t, void *buf,
- int buf_len, u32 remote_key, u64 remote_offset,
- u32 remote_len, bool is_read)
+static int smb_direct_rdma_xmit(struct smb_direct_transport *t,
+ void *buf, int buf_len,
+ struct smb2_buffer_desc_v1 *desc,
+ unsigned int desc_len,
+ bool is_read)
{
struct smb_direct_rdma_rw_msg *msg;
int ret;
DECLARE_COMPLETION_ONSTACK(completion);
struct ib_send_wr *first_wr = NULL;
+ u32 remote_key = le32_to_cpu(desc[0].token);
+ u64 remote_offset = le64_to_cpu(desc[0].offset);
+ int credits_needed;

- ret = wait_for_credits(t, &t->wait_rw_avail_ops, &t->rw_avail_ops);
+ credits_needed = calc_rw_credits(t, buf, buf_len);
+ ret = wait_for_rw_credits(t, credits_needed);
if (ret < 0)
return ret;

@@ -1369,7 +1399,7 @@ static int smb_direct_rdma_xmit(struct smb_direct_transport *t, void *buf,
msg = kmalloc(offsetof(struct smb_direct_rdma_rw_msg, sg_list) +
sizeof(struct scatterlist) * SG_CHUNK_SIZE, GFP_KERNEL);
if (!msg) {
- atomic_inc(&t->rw_avail_ops);
+ atomic_add(credits_needed, &t->rw_credits);
return -ENOMEM;
}

@@ -1378,7 +1408,7 @@ static int smb_direct_rdma_xmit(struct smb_direct_transport *t, void *buf,
get_buf_page_count(buf, buf_len),
msg->sg_list, SG_CHUNK_SIZE);
if (ret) {
- atomic_inc(&t->rw_avail_ops);
+ atomic_add(credits_needed, &t->rw_credits);
kfree(msg);
return -ENOMEM;
}
@@ -1414,7 +1444,7 @@ static int smb_direct_rdma_xmit(struct smb_direct_transport *t, void *buf,
return 0;

err:
- atomic_inc(&t->rw_avail_ops);
+ atomic_add(credits_needed, &t->rw_credits);
if (first_wr)
rdma_rw_ctx_destroy(&msg->rw_ctx, t->qp, t->qp->port,
msg->sg_list, msg->sgt.nents,
@@ -1424,22 +1454,22 @@ static int smb_direct_rdma_xmit(struct smb_direct_transport *t, void *buf,
return ret;
}

-static int smb_direct_rdma_write(struct ksmbd_transport *t, void *buf,
- unsigned int buflen, u32 remote_key,
- u64 remote_offset, u32 remote_len)
+static int smb_direct_rdma_write(struct ksmbd_transport *t,
+ void *buf, unsigned int buflen,
+ struct smb2_buffer_desc_v1 *desc,
+ unsigned int desc_len)
{
return smb_direct_rdma_xmit(smb_trans_direct_transfort(t), buf, buflen,
- remote_key, remote_offset,
- remote_len, false);
+ desc, desc_len, false);
}

-static int smb_direct_rdma_read(struct ksmbd_transport *t, void *buf,
- unsigned int buflen, u32 remote_key,
- u64 remote_offset, u32 remote_len)
+static int smb_direct_rdma_read(struct ksmbd_transport *t,
+ void *buf, unsigned int buflen,
+ struct smb2_buffer_desc_v1 *desc,
+ unsigned int desc_len)
{
return smb_direct_rdma_xmit(smb_trans_direct_transfort(t), buf, buflen,
- remote_key, remote_offset,
- remote_len, true);
+ desc, desc_len, true);
}

static void smb_direct_disconnect(struct ksmbd_transport *t)
@@ -1639,11 +1669,19 @@ static int smb_direct_prepare_negotiation(struct smb_direct_transport *t)
return ret;
}

+static unsigned int smb_direct_get_max_fr_pages(struct smb_direct_transport *t)
+{
+ return min_t(unsigned int,
+ t->cm_id->device->attrs.max_fast_reg_page_list_len,
+ 256);
+}
+
static int smb_direct_init_params(struct smb_direct_transport *t,
struct ib_qp_cap *cap)
{
struct ib_device *device = t->cm_id->device;
- int max_send_sges, max_pages, max_rw_wrs, max_send_wrs;
+ int max_send_sges, max_rw_wrs, max_send_wrs;
+ unsigned int max_sge_per_wr, wrs_per_credit;

/* need 2 more sge. because a SMB_DIRECT header will be mapped,
* and maybe a send buffer could be not page aligned.
@@ -1655,25 +1693,31 @@ static int smb_direct_init_params(struct smb_direct_transport *t,
return -EINVAL;
}

- /*
- * allow smb_direct_max_outstanding_rw_ops of in-flight RDMA
- * read/writes. HCA guarantees at least max_send_sge of sges for
- * a RDMA read/write work request, and if memory registration is used,
- * we need reg_mr, local_inv wrs for each read/write.
+ /* Calculate the number of work requests for RDMA R/W.
+ * The maximum number of pages which can be registered
+ * with one Memory region can be transferred with one
+ * R/W credit. And at least 4 work requests for each credit
+ * are needed for MR registration, RDMA R/W, local & remote
+ * MR invalidation.
*/
t->max_rdma_rw_size = smb_direct_max_read_write_size;
- max_pages = DIV_ROUND_UP(t->max_rdma_rw_size, PAGE_SIZE) + 1;
- max_rw_wrs = DIV_ROUND_UP(max_pages, SMB_DIRECT_MAX_SEND_SGES);
- max_rw_wrs += rdma_rw_mr_factor(device, t->cm_id->port_num,
- max_pages) * 2;
- max_rw_wrs *= smb_direct_max_outstanding_rw_ops;
+ t->pages_per_rw_credit = smb_direct_get_max_fr_pages(t);
+ t->max_rw_credits = DIV_ROUND_UP(t->max_rdma_rw_size,
+ (t->pages_per_rw_credit - 1) *
+ PAGE_SIZE);
+
+ max_sge_per_wr = min_t(unsigned int, device->attrs.max_send_sge,
+ device->attrs.max_sge_rd);
+ wrs_per_credit = max_t(unsigned int, 4,
+ DIV_ROUND_UP(t->pages_per_rw_credit,
+ max_sge_per_wr) + 1);
+ max_rw_wrs = t->max_rw_credits * wrs_per_credit;

max_send_wrs = smb_direct_send_credit_target + max_rw_wrs;
if (max_send_wrs > device->attrs.max_cqe ||
max_send_wrs > device->attrs.max_qp_wr) {
- pr_err("consider lowering send_credit_target = %d, or max_outstanding_rw_ops = %d\n",
- smb_direct_send_credit_target,
- smb_direct_max_outstanding_rw_ops);
+ pr_err("consider lowering send_credit_target = %d\n",
+ smb_direct_send_credit_target);
pr_err("Possible CQE overrun, device reporting max_cqe %d max_qp_wr %d\n",
device->attrs.max_cqe, device->attrs.max_qp_wr);
return -EINVAL;
@@ -1708,7 +1752,7 @@ static int smb_direct_init_params(struct smb_direct_transport *t,

t->send_credit_target = smb_direct_send_credit_target;
atomic_set(&t->send_credits, 0);
- atomic_set(&t->rw_avail_ops, smb_direct_max_outstanding_rw_ops);
+ atomic_set(&t->rw_credits, t->max_rw_credits);

t->max_send_size = smb_direct_max_send_size;
t->max_recv_size = smb_direct_max_receive_size;
@@ -1716,12 +1760,10 @@ static int smb_direct_init_params(struct smb_direct_transport *t,

cap->max_send_wr = max_send_wrs;
cap->max_recv_wr = t->recv_credit_max;
- cap->max_send_sge = SMB_DIRECT_MAX_SEND_SGES;
+ cap->max_send_sge = max_sge_per_wr;
cap->max_recv_sge = SMB_DIRECT_MAX_RECV_SGES;
cap->max_inline_data = 0;
- cap->max_rdma_ctxs =
- rdma_rw_mr_factor(device, t->cm_id->port_num, max_pages) *
- smb_direct_max_outstanding_rw_ops;
+ cap->max_rdma_ctxs = t->max_rw_credits;
return 0;
}

@@ -1814,7 +1856,8 @@ static int smb_direct_create_qpair(struct smb_direct_transport *t,
}

t->send_cq = ib_alloc_cq(t->cm_id->device, t,
- t->send_credit_target, 0, IB_POLL_WORKQUEUE);
+ smb_direct_send_credit_target + cap->max_rdma_ctxs,
+ 0, IB_POLL_WORKQUEUE);
if (IS_ERR(t->send_cq)) {
pr_err("Can't create RDMA send CQ\n");
ret = PTR_ERR(t->send_cq);
@@ -1823,8 +1866,7 @@ static int smb_direct_create_qpair(struct smb_direct_transport *t,
}

t->recv_cq = ib_alloc_cq(t->cm_id->device, t,
- cap->max_send_wr + cap->max_rdma_ctxs,
- 0, IB_POLL_WORKQUEUE);
+ t->recv_credit_max, 0, IB_POLL_WORKQUEUE);
if (IS_ERR(t->recv_cq)) {
pr_err("Can't create RDMA recv CQ\n");
ret = PTR_ERR(t->recv_cq);
@@ -1853,17 +1895,12 @@ static int smb_direct_create_qpair(struct smb_direct_transport *t,

pages_per_rw = DIV_ROUND_UP(t->max_rdma_rw_size, PAGE_SIZE) + 1;
if (pages_per_rw > t->cm_id->device->attrs.max_sgl_rd) {
- int pages_per_mr, mr_count;
-
- pages_per_mr = min_t(int, pages_per_rw,
- t->cm_id->device->attrs.max_fast_reg_page_list_len);
- mr_count = DIV_ROUND_UP(pages_per_rw, pages_per_mr) *
- atomic_read(&t->rw_avail_ops);
- ret = ib_mr_pool_init(t->qp, &t->qp->rdma_mrs, mr_count,
- IB_MR_TYPE_MEM_REG, pages_per_mr, 0);
+ ret = ib_mr_pool_init(t->qp, &t->qp->rdma_mrs,
+ t->max_rw_credits, IB_MR_TYPE_MEM_REG,
+ t->pages_per_rw_credit, 0);
if (ret) {
pr_err("failed to init mr pool count %d pages %d\n",
- mr_count, pages_per_mr);
+ t->max_rw_credits, t->pages_per_rw_credit);
goto err;
}
}
diff --git a/fs/ksmbd/transport_rdma.h b/fs/ksmbd/transport_rdma.h
index 5567d93a6f96..77aee4e5c9dc 100644
--- a/fs/ksmbd/transport_rdma.h
+++ b/fs/ksmbd/transport_rdma.h
@@ -7,6 +7,10 @@
#ifndef __KSMBD_TRANSPORT_RDMA_H__
#define __KSMBD_TRANSPORT_RDMA_H__

+#define SMBD_DEFAULT_IOSIZE (8 * 1024 * 1024)
+#define SMBD_MIN_IOSIZE (512 * 1024)
+#define SMBD_MAX_IOSIZE (16 * 1024 * 1024)
+
/* SMB DIRECT negotiation request packet [MS-SMBD] 2.2.1 */
struct smb_direct_negotiate_req {
__le16 min_version;
@@ -52,10 +56,14 @@ struct smb_direct_data_transfer {
int ksmbd_rdma_init(void);
void ksmbd_rdma_destroy(void);
bool ksmbd_rdma_capable_netdev(struct net_device *netdev);
+void init_smbd_max_io_size(unsigned int sz);
+unsigned int get_smbd_max_read_write_size(void);
#else
static inline int ksmbd_rdma_init(void) { return 0; }
static inline int ksmbd_rdma_destroy(void) { return 0; }
static inline bool ksmbd_rdma_capable_netdev(struct net_device *netdev) { return false; }
+static inline void init_smbd_max_io_size(unsigned int sz) { }
+static inline unsigned int get_smbd_max_read_write_size(void) { return 0; }
#endif

#endif /* __KSMBD_TRANSPORT_RDMA_H__ */
diff --git a/fs/ksmbd/vfs.c b/fs/ksmbd/vfs.c
index 05efcdf7a4a7..201962f03772 100644
--- a/fs/ksmbd/vfs.c
+++ b/fs/ksmbd/vfs.c
@@ -1540,6 +1540,11 @@ int ksmbd_vfs_get_sd_xattr(struct ksmbd_conn *conn,
}

*pntsd = acl.sd_buf;
+ if (acl.sd_size < sizeof(struct smb_ntsd)) {
+ pr_err("sd size is invalid\n");
+ goto out_free;
+ }
+
(*pntsd)->osidoffset = cpu_to_le32(le32_to_cpu((*pntsd)->osidoffset) -
NDR_NTSD_OFFSETOF);
(*pntsd)->gsidoffset = cpu_to_le32(le32_to_cpu((*pntsd)->gsidoffset) -
diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c
index 176b468a61c7..e5adb524a445 100644
--- a/fs/lockd/svc4proc.c
+++ b/fs/lockd/svc4proc.c
@@ -32,6 +32,10 @@ nlm4svc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp,
if (!nlmsvc_ops)
return nlm_lck_denied_nolocks;

+ if (lock->lock_start > OFFSET_MAX ||
+ (lock->lock_len && ((lock->lock_len - 1) > (OFFSET_MAX - lock->lock_start))))
+ return nlm4_fbig;
+
/* Obtain host handle */
if (!(host = nlmsvc_lookup_host(rqstp, lock->caller, lock->len))
|| (argp->monitor && nsm_monitor(host) < 0))
@@ -50,6 +54,10 @@ nlm4svc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp,
/* Set up the missing parts of the file_lock structure */
lock->fl.fl_file = file->f_file[mode];
lock->fl.fl_pid = current->tgid;
+ lock->fl.fl_start = (loff_t)lock->lock_start;
+ lock->fl.fl_end = lock->lock_len ?
+ (loff_t)(lock->lock_start + lock->lock_len - 1) :
+ OFFSET_MAX;
lock->fl.fl_lmops = &nlmsvc_lock_operations;
nlmsvc_locks_init_private(&lock->fl, host, (pid_t)lock->svid);
if (!lock->fl.fl_owner) {
diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c
index 856267c0864b..712fdfeb8ef0 100644
--- a/fs/lockd/xdr4.c
+++ b/fs/lockd/xdr4.c
@@ -20,13 +20,6 @@

#include "svcxdr.h"

-static inline loff_t
-s64_to_loff_t(__s64 offset)
-{
- return (loff_t)offset;
-}
-
-
static inline s64
loff_t_to_s64(loff_t offset)
{
@@ -70,8 +63,6 @@ static bool
svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock)
{
struct file_lock *fl = &lock->fl;
- u64 len, start;
- s64 end;

if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len))
return false;
@@ -81,20 +72,14 @@ svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock)
return false;
if (xdr_stream_decode_u32(xdr, &lock->svid) < 0)
return false;
- if (xdr_stream_decode_u64(xdr, &start) < 0)
+ if (xdr_stream_decode_u64(xdr, &lock->lock_start) < 0)
return false;
- if (xdr_stream_decode_u64(xdr, &len) < 0)
+ if (xdr_stream_decode_u64(xdr, &lock->lock_len) < 0)
return false;

locks_init_lock(fl);
fl->fl_flags = FL_POSIX;
fl->fl_type = F_RDLCK;
- end = start + len - 1;
- fl->fl_start = s64_to_loff_t(start);
- if (len == 0 || end < 0)
- fl->fl_end = OFFSET_MAX;
- else
- fl->fl_end = s64_to_loff_t(end);

return true;
}
diff --git a/fs/mbcache.c b/fs/mbcache.c
index 97c54d3a2227..2010bc80a3f2 100644
--- a/fs/mbcache.c
+++ b/fs/mbcache.c
@@ -11,7 +11,7 @@
/*
* Mbcache is a simple key-value store. Keys need not be unique, however
* key-value pairs are expected to be unique (we use this fact in
- * mb_cache_entry_delete()).
+ * mb_cache_entry_delete_or_get()).
*
* Ext2 and ext4 use this cache for deduplication of extended attribute blocks.
* Ext4 also uses it for deduplication of xattr values stored in inodes.
@@ -125,6 +125,19 @@ void __mb_cache_entry_free(struct mb_cache_entry *entry)
}
EXPORT_SYMBOL(__mb_cache_entry_free);

+/*
+ * mb_cache_entry_wait_unused - wait to be the last user of the entry
+ *
+ * @entry - entry to work on
+ *
+ * Wait to be the last user of the entry.
+ */
+void mb_cache_entry_wait_unused(struct mb_cache_entry *entry)
+{
+ wait_var_event(&entry->e_refcnt, atomic_read(&entry->e_refcnt) <= 3);
+}
+EXPORT_SYMBOL(mb_cache_entry_wait_unused);
+
static struct mb_cache_entry *__entry_find(struct mb_cache *cache,
struct mb_cache_entry *entry,
u32 key)
@@ -217,7 +230,7 @@ struct mb_cache_entry *mb_cache_entry_get(struct mb_cache *cache, u32 key,
}
EXPORT_SYMBOL(mb_cache_entry_get);

-/* mb_cache_entry_delete - remove a cache entry
+/* mb_cache_entry_delete - try to remove a cache entry
* @cache - cache we work with
* @key - key
* @value - value
@@ -254,6 +267,55 @@ void mb_cache_entry_delete(struct mb_cache *cache, u32 key, u64 value)
}
EXPORT_SYMBOL(mb_cache_entry_delete);

+/* mb_cache_entry_delete_or_get - remove a cache entry if it has no users
+ * @cache - cache we work with
+ * @key - key
+ * @value - value
+ *
+ * Remove entry from cache @cache with key @key and value @value. The removal
+ * happens only if the entry is unused. The function returns NULL in case the
+ * entry was successfully removed or there's no entry in cache. Otherwise the
+ * function grabs reference of the entry that we failed to delete because it
+ * still has users and return it.
+ */
+struct mb_cache_entry *mb_cache_entry_delete_or_get(struct mb_cache *cache,
+ u32 key, u64 value)
+{
+ struct hlist_bl_node *node;
+ struct hlist_bl_head *head;
+ struct mb_cache_entry *entry;
+
+ head = mb_cache_entry_head(cache, key);
+ hlist_bl_lock(head);
+ hlist_bl_for_each_entry(entry, node, head, e_hash_list) {
+ if (entry->e_key == key && entry->e_value == value) {
+ if (atomic_read(&entry->e_refcnt) > 2) {
+ atomic_inc(&entry->e_refcnt);
+ hlist_bl_unlock(head);
+ return entry;
+ }
+ /* We keep hash list reference to keep entry alive */
+ hlist_bl_del_init(&entry->e_hash_list);
+ hlist_bl_unlock(head);
+ spin_lock(&cache->c_list_lock);
+ if (!list_empty(&entry->e_list)) {
+ list_del_init(&entry->e_list);
+ if (!WARN_ONCE(cache->c_entry_count == 0,
+ "mbcache: attempt to decrement c_entry_count past zero"))
+ cache->c_entry_count--;
+ atomic_dec(&entry->e_refcnt);
+ }
+ spin_unlock(&cache->c_list_lock);
+ mb_cache_entry_put(cache, entry);
+ return NULL;
+ }
+ }
+ hlist_bl_unlock(head);
+
+ return NULL;
+}
+EXPORT_SYMBOL(mb_cache_entry_delete_or_get);
+
/* mb_cache_entry_touch - cache entry got used
* @cache - cache the entry belongs to
* @entry - entry that got used
@@ -288,7 +350,7 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache,
while (nr_to_scan-- && !list_empty(&cache->c_list)) {
entry = list_first_entry(&cache->c_list,
struct mb_cache_entry, e_list);
- if (entry->e_referenced) {
+ if (entry->e_referenced || atomic_read(&entry->e_refcnt) > 2) {
entry->e_referenced = 0;
list_move_tail(&entry->e_list, &cache->c_list);
continue;
@@ -302,6 +364,14 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache,
spin_unlock(&cache->c_list_lock);
head = mb_cache_entry_head(cache, entry->e_key);
hlist_bl_lock(head);
+ /* Now a reliable check if the entry didn't get used... */
+ if (atomic_read(&entry->e_refcnt) > 2) {
+ hlist_bl_unlock(head);
+ spin_lock(&cache->c_list_lock);
+ list_add_tail(&entry->e_list, &cache->c_list);
+ cache->c_entry_count++;
+ continue;
+ }
if (!hlist_bl_unhashed(&entry->e_hash_list)) {
hlist_bl_del_init(&entry->e_hash_list);
atomic_dec(&entry->e_refcnt);
diff --git a/fs/namei.c b/fs/namei.c
index fd3c95ac261b..2fa412c5a082 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1511,6 +1511,8 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
* becoming unpinned.
*/
flags = dentry->d_flags;
+ if (read_seqretry(&mount_lock, nd->m_seq))
+ return false;
continue;
}
if (read_seqretry(&mount_lock, nd->m_seq))
@@ -3571,6 +3573,8 @@ struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns,
child = d_alloc(dentry, &slash_name);
if (unlikely(!child))
goto out_err;
+ if (!IS_POSIXACL(dir))
+ mode &= ~current_umask();
error = dir->i_op->tmpfile(mnt_userns, dir, child, mode);
if (error)
goto out_err;
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 604be402ae13..7d285561e59f 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1131,6 +1131,8 @@ static int ff_layout_async_handle_error_v4(struct rpc_task *task,
case -EIO:
case -ETIMEDOUT:
case -EPIPE:
+ case -EPROTO:
+ case -ENODEV:
dprintk("%s DS connection error %d\n", __func__,
task->tk_status);
nfs4_delete_deviceid(devid->ld, devid->nfs_client,
@@ -1236,6 +1238,8 @@ static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg,
case -ENOBUFS:
case -EPIPE:
case -EPERM:
+ case -EPROTO:
+ case -ENODEV:
*op_status = status = NFS4ERR_NXIO;
break;
case -EACCES:
diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
index 5601e47360c2..b49359afac88 100644
--- a/fs/nfs/nfs3client.c
+++ b/fs/nfs/nfs3client.c
@@ -108,7 +108,6 @@ struct nfs_client *nfs3_set_ds_client(struct nfs_server *mds_srv,
if (mds_srv->flags & NFS_MOUNT_NORESVPORT)
__set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags);

- __set_bit(NFS_CS_NOPING, &cl_init.init_flags);
__set_bit(NFS_CS_DS, &cl_init.init_flags);

/* Use the MDS nfs_client cl_ipaddr. */
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 0326bdec5de7..a46260965cf1 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -183,12 +183,6 @@ nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval,
nf->nf_hashval = hashval;
refcount_set(&nf->nf_ref, 1);
nf->nf_may = may & NFSD_FILE_MAY_MASK;
- if (may & NFSD_MAY_NOT_BREAK_LEASE) {
- if (may & NFSD_MAY_WRITE)
- __set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags);
- if (may & NFSD_MAY_READ)
- __set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
- }
nf->nf_mark = NULL;
trace_nfsd_file_alloc(nf);
}
@@ -954,21 +948,7 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,

this_cpu_inc(nfsd_file_cache_hits);

- if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) {
- bool write = (may_flags & NFSD_MAY_WRITE);
-
- if (test_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags) ||
- (test_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags) && write)) {
- status = nfserrno(nfsd_open_break_lease(
- file_inode(nf->nf_file), may_flags));
- if (status == nfs_ok) {
- clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
- if (write)
- clear_bit(NFSD_FILE_BREAK_WRITE,
- &nf->nf_flags);
- }
- }
- }
+ status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags));
out:
if (status == nfs_ok) {
*pnf = nf;
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 435ceab27897..63104be2865c 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -37,9 +37,7 @@ struct nfsd_file {
struct net *nf_net;
#define NFSD_FILE_HASHED (0)
#define NFSD_FILE_PENDING (1)
-#define NFSD_FILE_BREAK_READ (2)
-#define NFSD_FILE_BREAK_WRITE (3)
-#define NFSD_FILE_REFERENCED (4)
+#define NFSD_FILE_REFERENCED (2)
unsigned long nf_flags;
struct inode *nf_inode;
unsigned int nf_hashval;
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 242fa123e0e9..61ee50843f5c 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -692,18 +692,10 @@ DEFINE_CLID_EVENT(confirmed_r);
/*
* from fs/nfsd/filecache.h
*/
-TRACE_DEFINE_ENUM(NFSD_FILE_HASHED);
-TRACE_DEFINE_ENUM(NFSD_FILE_PENDING);
-TRACE_DEFINE_ENUM(NFSD_FILE_BREAK_READ);
-TRACE_DEFINE_ENUM(NFSD_FILE_BREAK_WRITE);
-TRACE_DEFINE_ENUM(NFSD_FILE_REFERENCED);
-
#define show_nf_flags(val) \
__print_flags(val, "|", \
{ 1 << NFSD_FILE_HASHED, "HASHED" }, \
{ 1 << NFSD_FILE_PENDING, "PENDING" }, \
- { 1 << NFSD_FILE_BREAK_READ, "BREAK_READ" }, \
- { 1 << NFSD_FILE_BREAK_WRITE, "BREAK_WRITE" }, \
{ 1 << NFSD_FILE_REFERENCED, "REFERENCED"})

DECLARE_EVENT_CLASS(nfsd_file_class,
diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index ebde05c9cf62..dbb944b5f81e 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -259,7 +259,7 @@ static int ovl_encode_fh(struct inode *inode, u32 *fid, int *max_len,
return FILEID_INVALID;

dentry = d_find_any_alias(inode);
- if (WARN_ON(!dentry))
+ if (!dentry)
return FILEID_INVALID;

bytes = ovl_dentry_to_fid(ofs, dentry, fid, buflen);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index c1031843cc6a..2f0595e8ec4a 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1885,7 +1885,7 @@ void proc_pid_evict_inode(struct proc_inode *ei)
put_pid(pid);
}

-struct inode *proc_pid_make_inode(struct super_block * sb,
+struct inode *proc_pid_make_inode(struct super_block *sb,
struct task_struct *task, umode_t mode)
{
struct inode * inode;
@@ -1914,11 +1914,6 @@ struct inode *proc_pid_make_inode(struct super_block * sb,

/* Let the pid remember us for quick removal */
ei->pid = pid;
- if (S_ISDIR(mode)) {
- spin_lock(&pid->lock);
- hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes);
- spin_unlock(&pid->lock);
- }

task_dump_owner(task, 0, &inode->i_uid, &inode->i_gid);
security_task_to_inode(task, inode);
@@ -1931,6 +1926,39 @@ struct inode *proc_pid_make_inode(struct super_block * sb,
return NULL;
}

+/*
+ * Generating an inode and adding it into @pid->inodes, so that task will
+ * invalidate inode's dentry before being released.
+ *
+ * This helper is used for creating dir-type entries under '/proc' and
+ * '/proc/<tgid>/task'. Other entries(eg. fd, stat) under '/proc/<tgid>'
+ * can be released by invalidating '/proc/<tgid>' dentry.
+ * In theory, dentries under '/proc/<tgid>/task' can also be released by
+ * invalidating '/proc/<tgid>' dentry, we reserve it to handle single
+ * thread exiting situation: Any one of threads should invalidate its
+ * '/proc/<tgid>/task/<pid>' dentry before released.
+ */
+static struct inode *proc_pid_make_base_inode(struct super_block *sb,
+ struct task_struct *task, umode_t mode)
+{
+ struct inode *inode;
+ struct proc_inode *ei;
+ struct pid *pid;
+
+ inode = proc_pid_make_inode(sb, task, mode);
+ if (!inode)
+ return NULL;
+
+ /* Let proc_flush_pid find this directory inode */
+ ei = PROC_I(inode);
+ pid = ei->pid;
+ spin_lock(&pid->lock);
+ hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes);
+ spin_unlock(&pid->lock);
+
+ return inode;
+}
+
int pid_getattr(struct user_namespace *mnt_userns, const struct path *path,
struct kstat *stat, u32 request_mask, unsigned int query_flags)
{
@@ -3350,7 +3378,8 @@ static struct dentry *proc_pid_instantiate(struct dentry * dentry,
{
struct inode *inode;

- inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
+ inode = proc_pid_make_base_inode(dentry->d_sb, task,
+ S_IFDIR | S_IRUGO | S_IXUGO);
if (!inode)
return ERR_PTR(-ENOENT);

@@ -3649,7 +3678,8 @@ static struct dentry *proc_task_instantiate(struct dentry *dentry,
struct task_struct *task, const void *ptr)
{
struct inode *inode;
- inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
+ inode = proc_pid_make_base_inode(dentry->d_sb, task,
+ S_IFDIR | S_IRUGO | S_IXUGO);
if (!inode)
return ERR_PTR(-ENOENT);

diff --git a/fs/splice.c b/fs/splice.c
index 047b79db8eb5..93a2c9bf6249 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -814,17 +814,15 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
{
struct pipe_inode_info *pipe;
long ret, bytes;
- umode_t i_mode;
size_t len;
int i, flags, more;

/*
- * We require the input being a regular file, as we don't want to
- * randomly drop data for eg socket -> socket splicing. Use the
- * piped splicing for that!
+ * We require the input to be seekable, as we don't want to randomly
+ * drop data for eg socket -> socket splicing. Use the piped splicing
+ * for that!
*/
- i_mode = file_inode(in)->i_mode;
- if (unlikely(!S_ISREG(i_mode) && !S_ISBLK(i_mode)))
+ if (unlikely(!(in->f_mode & FMODE_LSEEK)))
return -EINVAL;

/*
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 15a4c7c07a3b..b68798a572fc 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -723,13 +723,12 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file_inode(iocb->ki_filp);
struct zonefs_inode_info *zi = ZONEFS_I(inode);
struct block_device *bdev = inode->i_sb->s_bdev;
- unsigned int max;
+ unsigned int max = bdev_max_zone_append_sectors(bdev);
struct bio *bio;
ssize_t size;
int nr_pages;
ssize_t ret;

- max = queue_max_zone_append_sectors(bdev_get_queue(bdev));
max = ALIGN_DOWN(max << SECTOR_SHIFT, inode->i_sb->s_blocksize);
iov_iter_truncate(from, max);

diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
index 181907349b49..a76f8c6b732d 100644
--- a/include/acpi/cppc_acpi.h
+++ b/include/acpi/cppc_acpi.h
@@ -17,7 +17,7 @@
#include <acpi/pcc.h>
#include <acpi/processor.h>

-/* Support CPPCv2 and CPPCv3 */
+/* CPPCv2 and CPPCv3 support */
#define CPPC_V2_REV 2
#define CPPC_V3_REV 3
#define CPPC_V2_NUM_ENT 21
diff --git a/include/crypto/internal/blake2s.h b/include/crypto/internal/blake2s.h
index 52363eee2b20..506d56530ca9 100644
--- a/include/crypto/internal/blake2s.h
+++ b/include/crypto/internal/blake2s.h
@@ -8,7 +8,6 @@
#define _CRYPTO_INTERNAL_BLAKE2S_H

#include <crypto/blake2s.h>
-#include <crypto/internal/hash.h>
#include <linux/string.h>

void blake2s_compress_generic(struct blake2s_state *state, const u8 *block,
@@ -19,111 +18,4 @@ void blake2s_compress(struct blake2s_state *state, const u8 *block,

bool blake2s_selftest(void);

-static inline void blake2s_set_lastblock(struct blake2s_state *state)
-{
- state->f[0] = -1;
-}
-
-/* Helper functions for BLAKE2s shared by the library and shash APIs */
-
-static __always_inline void
-__blake2s_update(struct blake2s_state *state, const u8 *in, size_t inlen,
- bool force_generic)
-{
- const size_t fill = BLAKE2S_BLOCK_SIZE - state->buflen;
-
- if (unlikely(!inlen))
- return;
- if (inlen > fill) {
- memcpy(state->buf + state->buflen, in, fill);
- if (force_generic)
- blake2s_compress_generic(state, state->buf, 1,
- BLAKE2S_BLOCK_SIZE);
- else
- blake2s_compress(state, state->buf, 1,
- BLAKE2S_BLOCK_SIZE);
- state->buflen = 0;
- in += fill;
- inlen -= fill;
- }
- if (inlen > BLAKE2S_BLOCK_SIZE) {
- const size_t nblocks = DIV_ROUND_UP(inlen, BLAKE2S_BLOCK_SIZE);
- /* Hash one less (full) block than strictly possible */
- if (force_generic)
- blake2s_compress_generic(state, in, nblocks - 1,
- BLAKE2S_BLOCK_SIZE);
- else
- blake2s_compress(state, in, nblocks - 1,
- BLAKE2S_BLOCK_SIZE);
- in += BLAKE2S_BLOCK_SIZE * (nblocks - 1);
- inlen -= BLAKE2S_BLOCK_SIZE * (nblocks - 1);
- }
- memcpy(state->buf + state->buflen, in, inlen);
- state->buflen += inlen;
-}
-
-static __always_inline void
-__blake2s_final(struct blake2s_state *state, u8 *out, bool force_generic)
-{
- blake2s_set_lastblock(state);
- memset(state->buf + state->buflen, 0,
- BLAKE2S_BLOCK_SIZE - state->buflen); /* Padding */
- if (force_generic)
- blake2s_compress_generic(state, state->buf, 1, state->buflen);
- else
- blake2s_compress(state, state->buf, 1, state->buflen);
- cpu_to_le32_array(state->h, ARRAY_SIZE(state->h));
- memcpy(out, state->h, state->outlen);
-}
-
-/* Helper functions for shash implementations of BLAKE2s */
-
-struct blake2s_tfm_ctx {
- u8 key[BLAKE2S_KEY_SIZE];
- unsigned int keylen;
-};
-
-static inline int crypto_blake2s_setkey(struct crypto_shash *tfm,
- const u8 *key, unsigned int keylen)
-{
- struct blake2s_tfm_ctx *tctx = crypto_shash_ctx(tfm);
-
- if (keylen == 0 || keylen > BLAKE2S_KEY_SIZE)
- return -EINVAL;
-
- memcpy(tctx->key, key, keylen);
- tctx->keylen = keylen;
-
- return 0;
-}
-
-static inline int crypto_blake2s_init(struct shash_desc *desc)
-{
- const struct blake2s_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm);
- struct blake2s_state *state = shash_desc_ctx(desc);
- unsigned int outlen = crypto_shash_digestsize(desc->tfm);
-
- __blake2s_init(state, outlen, tctx->key, tctx->keylen);
- return 0;
-}
-
-static inline int crypto_blake2s_update(struct shash_desc *desc,
- const u8 *in, unsigned int inlen,
- bool force_generic)
-{
- struct blake2s_state *state = shash_desc_ctx(desc);
-
- __blake2s_update(state, in, inlen, force_generic);
- return 0;
-}
-
-static inline int crypto_blake2s_final(struct shash_desc *desc, u8 *out,
- bool force_generic)
-{
- struct blake2s_state *state = shash_desc_ctx(desc);
-
- __blake2s_final(state, out, force_generic);
- return 0;
-}
-
#endif /* _CRYPTO_INTERNAL_BLAKE2S_H */
diff --git a/include/dt-bindings/clock/qcom,gcc-msm8939.h b/include/dt-bindings/clock/qcom,gcc-msm8939.h
index 0634467c4ce5..2d545ed0d35a 100644
--- a/include/dt-bindings/clock/qcom,gcc-msm8939.h
+++ b/include/dt-bindings/clock/qcom,gcc-msm8939.h
@@ -192,6 +192,7 @@
#define GCC_VENUS0_CORE0_VCODEC0_CLK 183
#define GCC_VENUS0_CORE1_VCODEC0_CLK 184
#define GCC_OXILI_TIMER_CLK 185
+#define SYSTEM_MM_NOC_BFDCD_CLK_SRC 186

/* Indexes for GDSCs */
#define BIMC_GDSC 0
diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h
index 1eb8ee5b0e5f..a5a122431563 100644
--- a/include/linux/acpi_viot.h
+++ b/include/linux/acpi_viot.h
@@ -6,9 +6,11 @@
#include <linux/acpi.h>

#ifdef CONFIG_ACPI_VIOT
+void __init acpi_viot_early_init(void);
void __init acpi_viot_init(void);
int viot_iommu_configure(struct device *dev);
#else
+static inline void acpi_viot_early_init(void) {}
static inline void acpi_viot_init(void) {}
static inline int viot_iommu_configure(struct device *dev)
{
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 108e3d114bfc..7927480b9cf7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -466,7 +466,6 @@ struct request_queue {
#endif /* CONFIG_BLK_DEV_ZONED */

int node;
- struct mutex debugfs_mutex;
#ifdef CONFIG_BLK_DEV_IO_TRACE
struct blk_trace __rcu *blk_trace;
#endif
@@ -510,11 +509,12 @@ struct request_queue {
struct bio_set bio_split;

struct dentry *debugfs_dir;
-
-#ifdef CONFIG_BLK_DEBUG_FS
struct dentry *sched_debugfs_dir;
struct dentry *rqos_debugfs_dir;
-#endif
+ /*
+ * Serializes all debugfs metadata operations using the above dentries.
+ */
+ struct mutex debugfs_mutex;

bool mq_sysfs_init_done;

@@ -1190,6 +1190,17 @@ static inline unsigned int queue_max_zone_append_sectors(const struct request_qu
return min(l->max_zone_append_sectors, l->max_sectors);
}

+static inline unsigned int
+bdev_max_zone_append_sectors(struct block_device *bdev)
+{
+ return queue_max_zone_append_sectors(bdev_get_queue(bdev));
+}
+
+static inline unsigned int bdev_max_segments(struct block_device *bdev)
+{
+ return queue_max_segments(bdev_get_queue(bdev));
+}
+
static inline unsigned queue_logical_block_size(const struct request_queue *q)
{
int retval = 512;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 83bd5598ec4d..a5f5172eb52c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -23,6 +23,7 @@
#include <linux/slab.h>
#include <linux/percpu-refcount.h>
#include <linux/bpfptr.h>
+#include <linux/btf.h>

struct bpf_verifier_env;
struct bpf_verifier_log;
@@ -155,6 +156,41 @@ struct bpf_map_ops {
const struct bpf_iter_seq_info *iter_seq_info;
};

+enum {
+ /* Support at most 8 pointers in a BPF map value */
+ BPF_MAP_VALUE_OFF_MAX = 8,
+ BPF_MAP_OFF_ARR_MAX = BPF_MAP_VALUE_OFF_MAX +
+ 1 + /* for bpf_spin_lock */
+ 1, /* for bpf_timer */
+};
+
+enum bpf_kptr_type {
+ BPF_KPTR_UNREF,
+ BPF_KPTR_REF,
+};
+
+struct bpf_map_value_off_desc {
+ u32 offset;
+ enum bpf_kptr_type type;
+ struct {
+ struct btf *btf;
+ struct module *module;
+ btf_dtor_kfunc_t dtor;
+ u32 btf_id;
+ } kptr;
+};
+
+struct bpf_map_value_off {
+ u32 nr_off;
+ struct bpf_map_value_off_desc off[];
+};
+
+struct bpf_map_off_arr {
+ u32 cnt;
+ u32 field_off[BPF_MAP_OFF_ARR_MAX];
+ u8 field_sz[BPF_MAP_OFF_ARR_MAX];
+};
+
struct bpf_map {
/* The first two cachelines with read-mostly members of which some
* are also accessed in fast-path (e.g. ops, max_entries).
@@ -171,6 +207,7 @@ struct bpf_map {
u64 map_extra; /* any per-map-type extra fields */
u32 map_flags;
int spin_lock_off; /* >=0 valid offset, <0 error */
+ struct bpf_map_value_off *kptr_off_tab;
int timer_off; /* >=0 valid offset, <0 error */
u32 id;
int numa_node;
@@ -182,10 +219,7 @@ struct bpf_map {
struct mem_cgroup *memcg;
#endif
char name[BPF_OBJ_NAME_LEN];
- bool bypass_spec_v1;
- bool frozen; /* write-once; write-protected by freeze_mutex */
- /* 14 bytes hole */
-
+ struct bpf_map_off_arr *off_arr;
/* The 3rd and 4th cacheline with misc members to avoid false sharing
* particularly with refcounting.
*/
@@ -205,6 +239,8 @@ struct bpf_map {
bool jited;
bool xdp_has_frags;
} owner;
+ bool bypass_spec_v1;
+ bool frozen; /* write-once; write-protected by freeze_mutex */
};

static inline bool map_value_has_spin_lock(const struct bpf_map *map)
@@ -217,43 +253,44 @@ static inline bool map_value_has_timer(const struct bpf_map *map)
return map->timer_off >= 0;
}

+static inline bool map_value_has_kptrs(const struct bpf_map *map)
+{
+ return !IS_ERR_OR_NULL(map->kptr_off_tab);
+}
+
static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
{
if (unlikely(map_value_has_spin_lock(map)))
memset(dst + map->spin_lock_off, 0, sizeof(struct bpf_spin_lock));
if (unlikely(map_value_has_timer(map)))
memset(dst + map->timer_off, 0, sizeof(struct bpf_timer));
+ if (unlikely(map_value_has_kptrs(map))) {
+ struct bpf_map_value_off *tab = map->kptr_off_tab;
+ int i;
+
+ for (i = 0; i < tab->nr_off; i++)
+ *(u64 *)(dst + tab->off[i].offset) = 0;
+ }
}

/* copy everything but bpf_spin_lock and bpf_timer. There could be one of each. */
static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
{
- u32 s_off = 0, s_sz = 0, t_off = 0, t_sz = 0;
+ u32 curr_off = 0;
+ int i;

- if (unlikely(map_value_has_spin_lock(map))) {
- s_off = map->spin_lock_off;
- s_sz = sizeof(struct bpf_spin_lock);
- }
- if (unlikely(map_value_has_timer(map))) {
- t_off = map->timer_off;
- t_sz = sizeof(struct bpf_timer);
+ if (likely(!map->off_arr)) {
+ memcpy(dst, src, map->value_size);
+ return;
}

- if (unlikely(s_sz || t_sz)) {
- if (s_off < t_off || !s_sz) {
- swap(s_off, t_off);
- swap(s_sz, t_sz);
- }
- memcpy(dst, src, t_off);
- memcpy(dst + t_off + t_sz,
- src + t_off + t_sz,
- s_off - t_off - t_sz);
- memcpy(dst + s_off + s_sz,
- src + s_off + s_sz,
- map->value_size - s_off - s_sz);
- } else {
- memcpy(dst, src, map->value_size);
+ for (i = 0; i < map->off_arr->cnt; i++) {
+ u32 next_off = map->off_arr->field_off[i];
+
+ memcpy(dst + curr_off, src + curr_off, next_off - curr_off);
+ curr_off += map->off_arr->field_sz[i];
}
+ memcpy(dst + curr_off, src + curr_off, map->value_size - curr_off);
}
void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
bool lock_src);
@@ -342,7 +379,10 @@ enum bpf_type_flag {
*/
MEM_PERCPU = BIT(4 + BPF_BASE_TYPE_BITS),

- __BPF_TYPE_LAST_FLAG = MEM_PERCPU,
+ /* Indicates that the argument will be released. */
+ OBJ_RELEASE = BIT(5 + BPF_BASE_TYPE_BITS),
+
+ __BPF_TYPE_LAST_FLAG = OBJ_RELEASE,
};

/* Max number of base types. */
@@ -391,6 +431,7 @@ enum bpf_arg_type {
ARG_PTR_TO_STACK, /* pointer to stack */
ARG_PTR_TO_CONST_STR, /* pointer to a null terminated read-only string */
ARG_PTR_TO_TIMER, /* pointer to bpf_timer */
+ ARG_PTR_TO_KPTR, /* pointer to referenced kptr */
__BPF_ARG_TYPE_MAX,

/* Extended arg_types. */
@@ -400,6 +441,7 @@ enum bpf_arg_type {
ARG_PTR_TO_SOCKET_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_SOCKET,
ARG_PTR_TO_ALLOC_MEM_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_ALLOC_MEM,
ARG_PTR_TO_STACK_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_STACK,
+ ARG_PTR_TO_BTF_ID_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_BTF_ID,

/* This must be the last entry. Its purpose is to ensure the enum is
* wide enough to hold the higher bits reserved for bpf_type_flag.
@@ -674,11 +716,11 @@ struct btf_func_model {
/* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
* bytes on x86.
*/
-#define BPF_MAX_TRAMP_PROGS 38
+#define BPF_MAX_TRAMP_LINKS 38

-struct bpf_tramp_progs {
- struct bpf_prog *progs[BPF_MAX_TRAMP_PROGS];
- int nr_progs;
+struct bpf_tramp_links {
+ struct bpf_tramp_link *links[BPF_MAX_TRAMP_LINKS];
+ int nr_links;
};

/* Different use cases for BPF trampoline:
@@ -704,7 +746,7 @@ struct bpf_tramp_progs {
struct bpf_tramp_image;
int arch_prepare_bpf_trampoline(struct bpf_tramp_image *tr, void *image, void *image_end,
const struct btf_func_model *m, u32 flags,
- struct bpf_tramp_progs *tprogs,
+ struct bpf_tramp_links *tlinks,
void *orig_call);
/* these two functions are called from generated trampoline */
u64 notrace __bpf_prog_enter(struct bpf_prog *prog);
@@ -803,9 +845,10 @@ static __always_inline __nocfi unsigned int bpf_dispatcher_nop_func(
{
return bpf_func(ctx, insnsi);
}
+
#ifdef CONFIG_BPF_JIT
-int bpf_trampoline_link_prog(struct bpf_prog *prog, struct bpf_trampoline *tr);
-int bpf_trampoline_unlink_prog(struct bpf_prog *prog, struct bpf_trampoline *tr);
+int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr);
+int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr);
struct bpf_trampoline *bpf_trampoline_get(u64 key,
struct bpf_attach_target_info *tgt_info);
void bpf_trampoline_put(struct bpf_trampoline *tr);
@@ -856,12 +899,12 @@ int bpf_jit_charge_modmem(u32 size);
void bpf_jit_uncharge_modmem(u32 size);
bool bpf_prog_has_trampoline(const struct bpf_prog *prog);
#else
-static inline int bpf_trampoline_link_prog(struct bpf_prog *prog,
+static inline int bpf_trampoline_link_prog(struct bpf_tramp_link *link,
struct bpf_trampoline *tr)
{
return -ENOTSUPP;
}
-static inline int bpf_trampoline_unlink_prog(struct bpf_prog *prog,
+static inline int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
struct bpf_trampoline *tr)
{
return -ENOTSUPP;
@@ -959,8 +1002,6 @@ struct bpf_prog_aux {
bool sleepable;
bool tail_call_reachable;
bool xdp_has_frags;
- bool use_bpf_prog_pack;
- struct hlist_node tramp_hlist;
/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
const struct btf_type *attach_func_proto;
/* function name for valid attach_btf_id */
@@ -1047,6 +1088,18 @@ struct bpf_link_ops {
struct bpf_link_info *info);
};

+struct bpf_tramp_link {
+ struct bpf_link link;
+ struct hlist_node tramp_hlist;
+};
+
+struct bpf_tracing_link {
+ struct bpf_tramp_link link;
+ enum bpf_attach_type attach_type;
+ struct bpf_trampoline *trampoline;
+ struct bpf_prog *tgt_prog;
+};
+
struct bpf_link_primer {
struct bpf_link *link;
struct file *file;
@@ -1084,8 +1137,8 @@ bool bpf_struct_ops_get(const void *kdata);
void bpf_struct_ops_put(const void *kdata);
int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key,
void *value);
-int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_progs *tprogs,
- struct bpf_prog *prog,
+int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
+ struct bpf_tramp_link *link,
const struct btf_func_model *model,
void *image, void *image_end);
static inline bool bpf_try_module_get(const void *data, struct module *owner)
@@ -1396,6 +1449,12 @@ void bpf_prog_put(struct bpf_prog *prog);
void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);

+struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u32 offset);
+void bpf_map_free_kptr_off_tab(struct bpf_map *map);
+struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map);
+bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b);
+void bpf_map_free_kptrs(struct bpf_map *map, void *map_value);
+
struct bpf_map *bpf_map_get(u32 ufd);
struct bpf_map *bpf_map_get_with_uref(u32 ufd);
struct bpf_map *__bpf_map_get(struct fd f);
@@ -2169,6 +2228,7 @@ extern const struct bpf_func_proto bpf_find_vma_proto;
extern const struct bpf_func_proto bpf_loop_proto;
extern const struct bpf_func_proto bpf_strncmp_proto;
extern const struct bpf_func_proto bpf_copy_from_user_task_proto;
+extern const struct bpf_func_proto bpf_kptr_xchg_proto;

const struct bpf_func_proto *tracing_prog_func_proto(
enum bpf_func_id func_id, const struct bpf_prog *prog);
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 3e24ad0c4b3c..2b9112b80171 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -141,3 +141,4 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp)
BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
#endif
BPF_LINK_TYPE(BPF_LINK_TYPE_KPROBE_MULTI, kprobe_multi)
+BPF_LINK_TYPE(BPF_LINK_TYPE_STRUCT_OPS, struct_ops)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 3a9d2d7cc6b7..1f1e7f2ea967 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -523,8 +523,7 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
const struct bpf_reg_state *reg, int regno);
int check_func_arg_reg_off(struct bpf_verifier_env *env,
const struct bpf_reg_state *reg, int regno,
- enum bpf_arg_type arg_type,
- bool is_release_func);
+ enum bpf_arg_type arg_type);
int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
u32 regno);
int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 36bc09b8e890..f70625dd5bb4 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -40,6 +40,13 @@ struct btf_kfunc_id_set {
};
};

+struct btf_id_dtor_kfunc {
+ u32 btf_id;
+ u32 kfunc_btf_id;
+};
+
+typedef void (*btf_dtor_kfunc_t)(void *);
+
extern const struct file_operations btf_fops;

void btf_get(struct btf *btf);
@@ -123,6 +130,8 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
u32 expected_offset, u32 expected_size);
int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
int btf_find_timer(const struct btf *btf, const struct btf_type *t);
+struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
+ const struct btf_type *t);
bool btf_type_is_void(const struct btf_type *t);
s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
@@ -344,6 +353,9 @@ bool btf_kfunc_id_set_contains(const struct btf *btf,
enum btf_kfunc_type type, u32 kfunc_btf_id);
int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
const struct btf_kfunc_id_set *s);
+s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id);
+int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
+ struct module *owner);
#else
static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
u32 type_id)
@@ -367,6 +379,15 @@ static inline int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
{
return 0;
}
+static inline s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id)
+{
+ return -ENOENT;
+}
+static inline int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors,
+ u32 add_cnt, struct module *owner)
+{
+ return 0;
+}
#endif

#endif
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index bcb4fe9b8575..8e4552460910 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -117,7 +117,6 @@ static __always_inline int test_clear_buffer_##name(struct buffer_head *bh) \
* of the form "mark_buffer_foo()". These are higher-level functions which
* do something in addition to setting a b_state bit.
*/
-BUFFER_FNS(Uptodate, uptodate)
BUFFER_FNS(Dirty, dirty)
TAS_BUFFER_FNS(Dirty, dirty)
BUFFER_FNS(Lock, locked)
@@ -135,6 +134,30 @@ BUFFER_FNS(Meta, meta)
BUFFER_FNS(Prio, prio)
BUFFER_FNS(Defer_Completion, defer_completion)

+static __always_inline void set_buffer_uptodate(struct buffer_head *bh)
+{
+ /*
+ * make it consistent with folio_mark_uptodate
+ * pairs with smp_load_acquire in buffer_uptodate
+ */
+ smp_mb__before_atomic();
+ set_bit(BH_Uptodate, &bh->b_state);
+}
+
+static __always_inline void clear_buffer_uptodate(struct buffer_head *bh)
+{
+ clear_bit(BH_Uptodate, &bh->b_state);
+}
+
+static __always_inline int buffer_uptodate(const struct buffer_head *bh)
+{
+ /*
+ * make it consistent with folio_test_uptodate
+ * pairs with smp_mb__before_atomic in set_buffer_uptodate
+ */
+ return (smp_load_acquire(&bh->b_state) & (1UL << BH_Uptodate)) != 0;
+}
+
#define bh_offset(bh) ((unsigned long)(bh)->b_data & ~PAGE_MASK)

/* If we *know* page->private refers to buffer_heads */
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index fe29ac7cc469..4592d0845941 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -1071,4 +1071,22 @@ cpumap_print_list_to_buf(char *buf, const struct cpumask *mask,
[0] = 1UL \
} }

+/*
+ * Provide a valid theoretical max size for cpumap and cpulist sysfs files
+ * to avoid breaking userspace which may allocate a buffer based on the size
+ * reported by e.g. fstat.
+ *
+ * for cpumap NR_CPUS * 9/32 - 1 should be an exact length.
+ *
+ * For cpulist 7 is (ceil(log10(NR_CPUS)) + 1) allowing for NR_CPUS to be up
+ * to 2 orders of magnitude larger than 8192. And then we divide by 2 to
+ * cover a worst-case of every other cpu being on one of two nodes for a
+ * very large NR_CPUS.
+ *
+ * Use PAGE_SIZE as a minimum for smaller configurations.
+ */
+#define CPUMAP_FILE_MAX_BYTES ((((NR_CPUS * 9)/32 - 1) > PAGE_SIZE) \
+ ? (NR_CPUS * 9)/32 - 1 : PAGE_SIZE)
+#define CPULIST_FILE_MAX_BYTES (((NR_CPUS * 7)/2 > PAGE_SIZE) ? (NR_CPUS * 7)/2 : PAGE_SIZE)
+
#endif /* __LINUX_CPUMASK_H */
diff --git a/include/linux/filter.h b/include/linux/filter.h
index ed0c0ff42ad5..8fd2e2f58eeb 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -948,6 +948,7 @@ u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
void bpf_jit_compile(struct bpf_prog *prog);
bool bpf_jit_needs_zext(void);
+bool bpf_jit_supports_subprog_tailcalls(void);
bool bpf_jit_supports_kfunc_call(void);
bool bpf_helper_changes_pkt_data(void *func);

@@ -1060,6 +1061,14 @@ u64 bpf_jit_alloc_exec_limit(void);
void *bpf_jit_alloc_exec(unsigned long size);
void bpf_jit_free_exec(void *addr);
void bpf_jit_free(struct bpf_prog *fp);
+struct bpf_binary_header *
+bpf_jit_binary_pack_hdr(const struct bpf_prog *fp);
+
+static inline bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
+{
+ return list_empty(&fp->aux->ksym.lnode) ||
+ fp->aux->ksym.lnode.prev == LIST_POISON2;
+}

struct bpf_binary_header *
bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **ro_image,
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 39bb9b47fa9c..7037c6bed55c 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -218,6 +218,16 @@ static inline void clear_highpage(struct page *page)
kunmap_local(kaddr);
}

+static inline void clear_highpage_kasan_tagged(struct page *page)
+{
+ u8 tag;
+
+ tag = page_kasan_tag(page);
+ page_kasan_tag_reset(page);
+ clear_highpage(page);
+ page_kasan_tag_set(page, tag);
+}
+
#ifndef __HAVE_ARCH_TAG_CLEAR_HIGHPAGE

static inline void tag_clear_highpage(struct page *page)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ac2a1d758a80..fbf4a6509fb3 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -167,7 +167,7 @@ bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
vm_flags_t vm_flags);
long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
long freed);
-bool isolate_huge_page(struct page *page, struct list_head *list);
+int isolate_hugetlb(struct page *page, struct list_head *list);
int get_hwpoison_huge_page(struct page *page, bool *hugetlb);
int get_huge_page_for_hwpoison(unsigned long pfn, int flags);
void putback_active_hugepage(struct page *page);
@@ -369,9 +369,9 @@ static inline pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr,
return NULL;
}

-static inline bool isolate_huge_page(struct page *page, struct list_head *list)
+static inline int isolate_hugetlb(struct page *page, struct list_head *list)
{
- return false;
+ return -EBUSY;
}

static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
diff --git a/include/linux/iio/common/cros_ec_sensors_core.h b/include/linux/iio/common/cros_ec_sensors_core.h
index c582e1a14232..7b5dbd749995 100644
--- a/include/linux/iio/common/cros_ec_sensors_core.h
+++ b/include/linux/iio/common/cros_ec_sensors_core.h
@@ -95,8 +95,11 @@ int cros_ec_sensors_read_cmd(struct iio_dev *indio_dev, unsigned long scan_mask,
struct platform_device;
int cros_ec_sensors_core_init(struct platform_device *pdev,
struct iio_dev *indio_dev, bool physical_device,
- cros_ec_sensors_capture_t trigger_capture,
- cros_ec_sensorhub_push_data_cb_t push_data);
+ cros_ec_sensors_capture_t trigger_capture);
+
+int cros_ec_sensors_core_register(struct device *dev,
+ struct iio_dev *indio_dev,
+ cros_ec_sensorhub_push_data_cb_t push_data);

irqreturn_t cros_ec_sensors_capture(int irq, void *p);
int cros_ec_sensors_push_data(struct iio_dev *indio_dev,
diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h
index faf00f2c0be6..c4ce02293f1f 100644
--- a/include/linux/iio/iio.h
+++ b/include/linux/iio/iio.h
@@ -9,6 +9,7 @@

#include <linux/device.h>
#include <linux/cdev.h>
+#include <linux/slab.h>
#include <linux/iio/types.h>
#include <linux/of.h>
/* IIO TODO LIST */
@@ -657,8 +658,13 @@ static inline void *iio_device_get_drvdata(const struct iio_dev *indio_dev)
return dev_get_drvdata(&indio_dev->dev);
}

-/* Can we make this smaller? */
-#define IIO_ALIGN L1_CACHE_BYTES
+/*
+ * Used to ensure the iio_priv() structure is aligned to allow that structure
+ * to in turn include IIO_DMA_MINALIGN'd elements such as buffers which
+ * must not share cachelines with the rest of the structure, thus making
+ * them safe for use with non-coherent DMA.
+ */
+#define IIO_DMA_MINALIGN ARCH_KMALLOC_MINALIGN
struct iio_dev *iio_device_alloc(struct device *parent, int sizeof_priv);

/* The information at the returned address is guaranteed to be cacheline aligned */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 8d573baaab29..f3e7680befcc 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -188,21 +188,48 @@ int kexec_purgatory_get_set_symbol(struct kimage *image, const char *name,
void *buf, unsigned int size,
bool get_value);
void *kexec_purgatory_get_symbol_addr(struct kimage *image, const char *name);
+void *kexec_image_load_default(struct kimage *image);

-/* Architectures may override the below functions */
-int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
- unsigned long buf_len);
-void *arch_kexec_kernel_image_load(struct kimage *image);
-int arch_kimage_file_post_load_cleanup(struct kimage *image);
-#ifdef CONFIG_KEXEC_SIG
-int arch_kexec_kernel_verify_sig(struct kimage *image, void *buf,
- unsigned long buf_len);
+#ifndef arch_kexec_kernel_image_probe
+static inline int
+arch_kexec_kernel_image_probe(struct kimage *image, void *buf, unsigned long buf_len)
+{
+ return kexec_image_probe_default(image, buf, buf_len);
+}
+#endif
+
+#ifndef arch_kimage_file_post_load_cleanup
+static inline int arch_kimage_file_post_load_cleanup(struct kimage *image)
+{
+ return kexec_image_post_load_cleanup_default(image);
+}
+#endif
+
+#ifndef arch_kexec_kernel_image_load
+static inline void *arch_kexec_kernel_image_load(struct kimage *image)
+{
+ return kexec_image_load_default(image);
+}
#endif
-int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf);

extern int kexec_add_buffer(struct kexec_buf *kbuf);
int kexec_locate_mem_hole(struct kexec_buf *kbuf);

+#ifndef arch_kexec_locate_mem_hole
+/**
+ * arch_kexec_locate_mem_hole - Find free memory to place the segments.
+ * @kbuf: Parameters for the memory search.
+ *
+ * On success, kbuf->mem will have the start address of the memory region found.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+static inline int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
+{
+ return kexec_locate_mem_hole(kbuf);
+}
+#endif
+
/* Alignment required for elf header segment */
#define ELF_CORE_HEADER_ALIGN 4096

diff --git a/include/linux/kfifo.h b/include/linux/kfifo.h
index 86249476b57f..0b35a41440ff 100644
--- a/include/linux/kfifo.h
+++ b/include/linux/kfifo.h
@@ -688,7 +688,7 @@ __kfifo_uint_must_check_helper( \
* writer, you don't need extra locking to use these macro.
*/
#define kfifo_to_user(fifo, to, len, copied) \
-__kfifo_uint_must_check_helper( \
+__kfifo_int_must_check_helper( \
({ \
typeof((fifo) + 1) __tmp = (fifo); \
void __user *__to = (to); \
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index ac1ebb37a0ff..f328a01db4fe 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -19,6 +19,7 @@ struct kvm_memslots;
enum kvm_mr_change;

#include <linux/bits.h>
+#include <linux/mutex.h>
#include <linux/types.h>
#include <linux/spinlock_types.h>

@@ -69,6 +70,7 @@ struct gfn_to_pfn_cache {
struct kvm_vcpu *vcpu;
struct list_head list;
rwlock_t lock;
+ struct mutex refresh_lock;
void *khva;
kvm_pfn_t pfn;
enum pfn_cache_usage usage;
diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h
index 398f70093cd3..67e4a2c5500b 100644
--- a/include/linux/lockd/xdr.h
+++ b/include/linux/lockd/xdr.h
@@ -41,6 +41,8 @@ struct nlm_lock {
struct nfs_fh fh;
struct xdr_netobj oh;
u32 svid;
+ u64 lock_start;
+ u64 lock_len;
struct file_lock fl;
};

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 467b94257105..0a9afe84ea44 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -192,7 +192,7 @@ static inline void
lockdep_init_map_waits(struct lockdep_map *lock, const char *name,
struct lock_class_key *key, int subclass, u8 inner, u8 outer)
{
- lockdep_init_map_type(lock, name, key, subclass, inner, LD_WAIT_INV, LD_LOCK_NORMAL);
+ lockdep_init_map_type(lock, name, key, subclass, inner, outer, LD_LOCK_NORMAL);
}

static inline void
@@ -215,24 +215,28 @@ static inline void lockdep_init_map(struct lockdep_map *lock, const char *name,
* or they are too narrow (they suffer from a false class-split):
*/
#define lockdep_set_class(lock, key) \
- lockdep_init_map_waits(&(lock)->dep_map, #key, key, 0, \
- (lock)->dep_map.wait_type_inner, \
- (lock)->dep_map.wait_type_outer)
+ lockdep_init_map_type(&(lock)->dep_map, #key, key, 0, \
+ (lock)->dep_map.wait_type_inner, \
+ (lock)->dep_map.wait_type_outer, \
+ (lock)->dep_map.lock_type)

#define lockdep_set_class_and_name(lock, key, name) \
- lockdep_init_map_waits(&(lock)->dep_map, name, key, 0, \
- (lock)->dep_map.wait_type_inner, \
- (lock)->dep_map.wait_type_outer)
+ lockdep_init_map_type(&(lock)->dep_map, name, key, 0, \
+ (lock)->dep_map.wait_type_inner, \
+ (lock)->dep_map.wait_type_outer, \
+ (lock)->dep_map.lock_type)

#define lockdep_set_class_and_subclass(lock, key, sub) \
- lockdep_init_map_waits(&(lock)->dep_map, #key, key, sub,\
- (lock)->dep_map.wait_type_inner, \
- (lock)->dep_map.wait_type_outer)
+ lockdep_init_map_type(&(lock)->dep_map, #key, key, sub, \
+ (lock)->dep_map.wait_type_inner, \
+ (lock)->dep_map.wait_type_outer, \
+ (lock)->dep_map.lock_type)

#define lockdep_set_subclass(lock, sub) \
- lockdep_init_map_waits(&(lock)->dep_map, #lock, (lock)->dep_map.key, sub,\
- (lock)->dep_map.wait_type_inner, \
- (lock)->dep_map.wait_type_outer)
+ lockdep_init_map_type(&(lock)->dep_map, #lock, (lock)->dep_map.key, sub,\
+ (lock)->dep_map.wait_type_inner, \
+ (lock)->dep_map.wait_type_outer, \
+ (lock)->dep_map.lock_type)

#define lockdep_set_novalidate_class(lock) \
lockdep_set_class_and_name(lock, &__lockdep_no_validate__, #lock)
diff --git a/include/linux/mbcache.h b/include/linux/mbcache.h
index 20f1e3ff6013..8eca7f25c432 100644
--- a/include/linux/mbcache.h
+++ b/include/linux/mbcache.h
@@ -30,15 +30,23 @@ void mb_cache_destroy(struct mb_cache *cache);
int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key,
u64 value, bool reusable);
void __mb_cache_entry_free(struct mb_cache_entry *entry);
+void mb_cache_entry_wait_unused(struct mb_cache_entry *entry);
static inline int mb_cache_entry_put(struct mb_cache *cache,
struct mb_cache_entry *entry)
{
- if (!atomic_dec_and_test(&entry->e_refcnt))
+ unsigned int cnt = atomic_dec_return(&entry->e_refcnt);
+
+ if (cnt > 0) {
+ if (cnt <= 3)
+ wake_up_var(&entry->e_refcnt);
return 0;
+ }
__mb_cache_entry_free(entry);
return 1;
}

+struct mb_cache_entry *mb_cache_entry_delete_or_get(struct mb_cache *cache,
+ u32 key, u64 value);
void mb_cache_entry_delete(struct mb_cache *cache, u32 key, u64 value);
struct mb_cache_entry *mb_cache_entry_get(struct mb_cache *cache, u32 key,
u64 value);
diff --git a/include/linux/mfd/t7l66xb.h b/include/linux/mfd/t7l66xb.h
index 69632c1b07bd..ae3e7a5c5219 100644
--- a/include/linux/mfd/t7l66xb.h
+++ b/include/linux/mfd/t7l66xb.h
@@ -12,7 +12,6 @@

struct t7l66xb_platform_data {
int (*enable)(struct platform_device *dev);
- int (*disable)(struct platform_device *dev);
int (*suspend)(struct platform_device *dev);
int (*resume)(struct platform_device *dev);

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 9424503eb8d3..3d1594bad4ec 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -445,6 +445,11 @@ struct mlx5_qp_table {
struct radix_tree_root tree;
};

+enum {
+ MLX5_PF_NOTIFY_DISABLE_VF,
+ MLX5_PF_NOTIFY_ENABLE_VF,
+};
+
struct mlx5_vf_context {
int enabled;
u64 port_guid;
@@ -455,6 +460,7 @@ struct mlx5_vf_context {
u8 port_guid_valid:1;
u8 node_guid_valid:1;
enum port_state_policy policy;
+ struct blocking_notifier_head notifier;
};

struct mlx5_core_sriov {
@@ -1155,6 +1161,12 @@ int mlx5_dm_sw_icm_dealloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type
struct mlx5_core_dev *mlx5_vf_get_core_dev(struct pci_dev *pdev);
void mlx5_vf_put_core_dev(struct mlx5_core_dev *mdev);

+int mlx5_sriov_blocking_notifier_register(struct mlx5_core_dev *mdev,
+ int vf_id,
+ struct notifier_block *nb);
+void mlx5_sriov_blocking_notifier_unregister(struct mlx5_core_dev *mdev,
+ int vf_id,
+ struct notifier_block *nb);
#ifdef CONFIG_MLX5_CORE_IPOIB
struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
struct ib_device *ibdev,
diff --git a/include/linux/nvme-fc-driver.h b/include/linux/nvme-fc-driver.h
index 5358a5facdee..fa092b9be2fd 100644
--- a/include/linux/nvme-fc-driver.h
+++ b/include/linux/nvme-fc-driver.h
@@ -564,6 +564,15 @@ int nvme_fc_rcv_ls_req(struct nvme_fc_remote_port *remoteport,
void *lsreqbuf, u32 lsreqbuf_len);

+/*
+ * Routine called to get the appid field associated with request by the lldd
+ *
+ * If the return value is NULL : the user/libvirt has not set the appid to VM
+ * If the return value is non-zero: Returns the appid associated with VM
+ *
+ * @req: IO request from nvme fc to driver
+ */
+char *nvme_fc_io_getuuid(struct nvmefc_fcp_req *req);

/*
* *************** LLDD FC-NVME Target/Subsystem API ***************
@@ -1048,5 +1057,10 @@ int nvmet_fc_rcv_fcp_req(struct nvmet_fc_target_port *tgtport,

void nvmet_fc_rcv_fcp_abort(struct nvmet_fc_target_port *tgtport,
struct nvmefc_tgt_fcp_req *fcpreq);
+/*
+ * add a define, visible to the compiler, that indicates support
+ * for feature. Allows for conditional compilation in LLDDs.
+ */
+#define NVME_FC_FEAT_UUID 0x0001

#endif /* _NVME_FC_DRIVER_H */
diff --git a/include/linux/once_lite.h b/include/linux/once_lite.h
index 861e606b820f..b7bce4983638 100644
--- a/include/linux/once_lite.h
+++ b/include/linux/once_lite.h
@@ -9,15 +9,27 @@
*/
#define DO_ONCE_LITE(func, ...) \
DO_ONCE_LITE_IF(true, func, ##__VA_ARGS__)
-#define DO_ONCE_LITE_IF(condition, func, ...) \
+
+#define __ONCE_LITE_IF(condition) \
({ \
static bool __section(".data.once") __already_done; \
- bool __ret_do_once = !!(condition); \
+ bool __ret_cond = !!(condition); \
+ bool __ret_once = false; \
\
- if (unlikely(__ret_do_once && !__already_done)) { \
+ if (unlikely(__ret_cond && !__already_done)) { \
__already_done = true; \
- func(__VA_ARGS__); \
+ __ret_once = true; \
} \
+ unlikely(__ret_once); \
+ })
+
+#define DO_ONCE_LITE_IF(condition, func, ...) \
+ ({ \
+ bool __ret_do_once = !!(condition); \
+ \
+ if (__ONCE_LITE_IF(__ret_do_once)) \
+ func(__VA_ARGS__); \
+ \
unlikely(__ret_do_once); \
})

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index af97dd427501..a411080d5169 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1063,6 +1063,22 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
data->txn = 0;
}

+/*
+ * Clear all bitfields in the perf_branch_entry.
+ * The to and from fields are not cleared because they are
+ * systematically modified by caller.
+ */
+static inline void perf_clear_branch_entry_bitfields(struct perf_branch_entry *br)
+{
+ br->mispred = 0;
+ br->predicted = 0;
+ br->in_tx = 0;
+ br->abort = 0;
+ br->cycles = 0;
+ br->type = 0;
+ br->reserved = 0;
+}
+
extern void perf_output_sample(struct perf_output_handle *handle,
struct perf_event_header *header,
struct perf_sample_data *data,
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index cb0fd633a610..4ea496924106 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -229,6 +229,15 @@ static inline bool pipe_buf_try_steal(struct pipe_inode_info *pipe,
return buf->ops->try_steal(pipe, buf);
}

+static inline void pipe_discard_from(struct pipe_inode_info *pipe,
+ unsigned int old_head)
+{
+ unsigned int mask = pipe->ring_size - 1;
+
+ while (pipe->head > old_head)
+ pipe_buf_release(pipe, &pipe->bufs[--pipe->head & mask]);
+}
+
/* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual
memory allocation, whereas PIPE_BUF makes atomicity guarantees. */
#define PIPE_SIZE PAGE_SIZE
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 17230c458341..a0c4a870bb48 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -220,8 +220,8 @@ struct page_vma_mapped_walk {
#define DEFINE_PAGE_VMA_WALK(name, _page, _vma, _address, _flags) \
struct page_vma_mapped_walk name = { \
.pfn = page_to_pfn(_page), \
- .nr_pages = compound_nr(page), \
- .pgoff = page_to_pgoff(page), \
+ .nr_pages = compound_nr(_page), \
+ .pgoff = page_to_pgoff(_page), \
.vma = _vma, \
.address = _address, \
.flags = _flags, \
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a8911b1f35aa..d438e39fffe5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1812,7 +1812,7 @@ current_restore_flags(unsigned long orig_flags, unsigned long flags)
}

extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial);
-extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_allowed);
+extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_effective_cpus);
#ifdef CONFIG_SMP
extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask);
extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask);
diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index e5af028c08b4..994c25640e15 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -39,20 +39,12 @@ static inline struct task_struct *rt_mutex_get_top_task(struct task_struct *p)
}
extern void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task);
extern void rt_mutex_adjust_pi(struct task_struct *p);
-static inline bool tsk_is_pi_blocked(struct task_struct *tsk)
-{
- return tsk->pi_blocked_on != NULL;
-}
#else
static inline struct task_struct *rt_mutex_get_top_task(struct task_struct *task)
{
return NULL;
}
# define rt_mutex_adjust_pi(p) do { } while (0)
-static inline bool tsk_is_pi_blocked(struct task_struct *tsk)
-{
- return false;
-}
#endif

extern void normalize_rt_tasks(void);
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 56cffe42abbc..816df6cc444e 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -81,6 +81,7 @@ struct sched_domain_shared {
atomic_t ref;
atomic_t nr_busy_cpus;
int has_idle_cores;
+ int nr_idle_scan;
};

struct sched_domain {
diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h
index 76ce3f3ac0f2..bf6f0decb3f6 100644
--- a/include/linux/soundwire/sdw.h
+++ b/include/linux/soundwire/sdw.h
@@ -646,9 +646,6 @@ struct sdw_slave_ops {
* @dev_num: Current Device Number, values can be 0 or dev_num_sticky
* @dev_num_sticky: one-time static Device Number assigned by Bus
* @probed: boolean tracking driver state
- * @probe_complete: completion utility to control potential races
- * on startup between driver probe/initialization and SoundWire
- * Slave state changes/implementation-defined interrupts
* @enumeration_complete: completion utility to control potential races
* on startup between device enumeration and read/write access to the
* Slave device
@@ -663,6 +660,7 @@ struct sdw_slave_ops {
* for a Slave happens for the first time after enumeration
* @is_mockup_device: status flag used to squelch errors in the command/control
* protocol for SoundWire mockup devices
+ * @sdw_dev_lock: mutex used to protect callbacks/remove races
*/
struct sdw_slave {
struct sdw_slave_id id;
@@ -680,12 +678,12 @@ struct sdw_slave {
u16 dev_num;
u16 dev_num_sticky;
bool probed;
- struct completion probe_complete;
struct completion enumeration_complete;
struct completion initialization_complete;
u32 unattach_request;
bool first_interrupt_done;
bool is_mockup_device;
+ struct mutex sdw_dev_lock; /* protect callbacks/remove races */
};

#define dev_to_sdw_dev(_dev) container_of(_dev, struct sdw_slave, dev)
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index d356ab4047f7..f99f0bd85724 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -216,8 +216,10 @@ extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
spinlock_t *ptl);
extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
unsigned long address);
-extern void migration_entry_wait_huge(struct vm_area_struct *vma,
- struct mm_struct *mm, pte_t *pte);
+#ifdef CONFIG_HUGETLB_PAGE
+extern void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl);
+extern void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte);
+#endif
#else
static inline swp_entry_t make_readable_migration_entry(pgoff_t offset)
{
@@ -238,8 +240,10 @@ static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
spinlock_t *ptl) { }
static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
unsigned long address) { }
-static inline void migration_entry_wait_huge(struct vm_area_struct *vma,
- struct mm_struct *mm, pte_t *pte) { }
+#ifdef CONFIG_HUGETLB_PAGE
+static inline void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl) { }
+static inline void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { }
+#endif
static inline int is_writable_migration_entry(swp_entry_t entry)
{
return 0;
diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h
index 739ba9a03ec1..20c0ff54b7a0 100644
--- a/include/linux/tpm_eventlog.h
+++ b/include/linux/tpm_eventlog.h
@@ -157,7 +157,7 @@ struct tcg_algorithm_info {
* Return: size of the event on success, 0 on failure
*/

-static inline int __calc_tpm2_event_size(struct tcg_pcr_event2_head *event,
+static __always_inline int __calc_tpm2_event_size(struct tcg_pcr_event2_head *event,
struct tcg_pcr_event *event_header,
bool do_mapping)
{
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index e6e95a9f07a5..b18759a673c6 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -916,6 +916,24 @@ perf_trace_buf_submit(void *raw_data, int size, int rctx, u16 type,

#endif

+#define TRACE_EVENT_STR_MAX 512
+
+/*
+ * gcc warns that you can not use a va_list in an inlined
+ * function. But lets me make it into a macro :-/
+ */
+#define __trace_event_vstr_len(fmt, va) \
+({ \
+ va_list __ap; \
+ int __ret; \
+ \
+ va_copy(__ap, *(va)); \
+ __ret = vsnprintf(NULL, 0, fmt, __ap) + 1; \
+ va_end(__ap); \
+ \
+ min(__ret, TRACE_EVENT_STR_MAX); \
+})
+
#endif /* _LINUX_TRACE_EVENT_H */

/*
diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h
index 2c1fc9212cf2..98d1921f02b1 100644
--- a/include/linux/usb/hcd.h
+++ b/include/linux/usb/hcd.h
@@ -66,6 +66,7 @@

struct giveback_urb_bh {
bool running;
+ bool high_prio;
spinlock_t lock;
struct list_head head;
struct tasklet_struct bh;
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 851e07da2583..58cfbf81447c 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -544,10 +544,11 @@ do { \
\
hrtimer_init_sleeper_on_stack(&__t, CLOCK_MONOTONIC, \
HRTIMER_MODE_REL); \
- if ((timeout) != KTIME_MAX) \
- hrtimer_start_range_ns(&__t.timer, timeout, \
- current->timer_slack_ns, \
- HRTIMER_MODE_REL); \
+ if ((timeout) != KTIME_MAX) { \
+ hrtimer_set_expires_range_ns(&__t.timer, timeout, \
+ current->timer_slack_ns); \
+ hrtimer_sleeper_start_expires(&__t, HRTIMER_MODE_REL); \
+ } \
\
__ret = ___wait_event(wq_head, condition, state, 0, 0, \
if (!__t.task) { \
diff --git a/include/media/hevc-ctrls.h b/include/media/hevc-ctrls.h
index 01ccda48d8c5..88e804578cb1 100644
--- a/include/media/hevc-ctrls.h
+++ b/include/media/hevc-ctrls.h
@@ -135,7 +135,7 @@ struct v4l2_hevc_dpb_entry {
__u64 timestamp;
__u8 flags;
__u8 field_pic;
- __u16 pic_order_cnt[2];
+ __s32 pic_order_cnt_val;
__u8 padding[2];
};

@@ -178,7 +178,7 @@ struct v4l2_ctrl_hevc_slice_params {
/* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
__u8 slice_type;
__u8 colour_plane_id;
- __u16 slice_pic_order_cnt;
+ __s32 slice_pic_order_cnt;
__u8 num_ref_idx_l0_active_minus1;
__u8 num_ref_idx_l1_active_minus1;
__u8 collocated_ref_idx;
diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index ec1d1706f43c..cb78e0e33332 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -76,7 +76,7 @@ enum p9_req_status_t {
struct p9_req_t {
int status;
int t_err;
- struct kref refcount;
+ refcount_t refcount;
wait_queue_head_t wq;
struct p9_fcall tc;
struct p9_fcall rc;
@@ -227,15 +227,15 @@ struct p9_req_t *p9_tag_lookup(struct p9_client *c, u16 tag);

static inline void p9_req_get(struct p9_req_t *r)
{
- kref_get(&r->refcount);
+ refcount_inc(&r->refcount);
}

static inline int p9_req_try_get(struct p9_req_t *r)
{
- return kref_get_unless_zero(&r->refcount);
+ return refcount_inc_not_zero(&r->refcount);
}

-int p9_req_put(struct p9_req_t *r);
+int p9_req_put(struct p9_client *c, struct p9_req_t *r);

void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status);

diff --git a/include/net/ax25.h b/include/net/ax25.h
index a427a05672e2..f8cf3629a419 100644
--- a/include/net/ax25.h
+++ b/include/net/ax25.h
@@ -236,6 +236,7 @@ typedef struct ax25_cb {
ax25_address source_addr, dest_addr;
ax25_digi *digipeat;
ax25_dev *ax25_dev;
+ netdevice_tracker dev_tracker;
unsigned char iamdigi;
unsigned char state, modulus, pidincl;
unsigned short vs, vr, va;
diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index 62a9bb022aed..5f2342205ae1 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -361,6 +361,7 @@ enum {
HCI_QUALITY_REPORT,
HCI_OFFLOAD_CODECS_ENABLED,
HCI_LE_SIMULTANEOUS_ROLES,
+ HCI_CMD_DRAIN_WORKQUEUE,

__HCI_NUM_FLAGS,
};
diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index 81b965953036..56f1286583d3 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -103,15 +103,24 @@ struct sock *inet6_lookup(struct net *net, struct inet_hashinfo *hashinfo,
const int dif);

int inet6_hash(struct sock *sk);
-#endif /* IS_ENABLED(CONFIG_IPV6) */

-#define INET6_MATCH(__sk, __net, __saddr, __daddr, __ports, __dif, __sdif) \
- (((__sk)->sk_portpair == (__ports)) && \
- ((__sk)->sk_family == AF_INET6) && \
- ipv6_addr_equal(&(__sk)->sk_v6_daddr, (__saddr)) && \
- ipv6_addr_equal(&(__sk)->sk_v6_rcv_saddr, (__daddr)) && \
- (((__sk)->sk_bound_dev_if == (__dif)) || \
- ((__sk)->sk_bound_dev_if == (__sdif))) && \
- net_eq(sock_net(__sk), (__net)))
+static inline bool inet6_match(struct net *net, const struct sock *sk,
+ const struct in6_addr *saddr,
+ const struct in6_addr *daddr,
+ const __portpair ports,
+ const int dif, const int sdif)
+{
+ if (!net_eq(sock_net(sk), net) ||
+ sk->sk_family != AF_INET6 ||
+ sk->sk_portpair != ports ||
+ !ipv6_addr_equal(&sk->sk_v6_daddr, saddr) ||
+ !ipv6_addr_equal(&sk->sk_v6_rcv_saddr, daddr))
+ return false;
+
+ /* READ_ONCE() paired with WRITE_ONCE() in sock_bindtoindex_locked() */
+ return inet_sk_bound_dev_eq(net, READ_ONCE(sk->sk_bound_dev_if), dif,
+ sdif);
+}
+#endif /* IS_ENABLED(CONFIG_IPV6) */

#endif /* _INET6_HASHTABLES_H */
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 749bb1e46087..53c22b64e972 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -203,17 +203,6 @@ static inline void inet_ehash_locks_free(struct inet_hashinfo *hashinfo)
hashinfo->ehash_locks = NULL;
}

-static inline bool inet_sk_bound_dev_eq(struct net *net, int bound_dev_if,
- int dif, int sdif)
-{
-#if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
- return inet_bound_dev_eq(!!READ_ONCE(net->ipv4.sysctl_tcp_l3mdev_accept),
- bound_dev_if, dif, sdif);
-#else
- return inet_bound_dev_eq(true, bound_dev_if, dif, sdif);
-#endif
-}
-
struct inet_bind_bucket *
inet_bind_bucket_create(struct kmem_cache *cachep, struct net *net,
struct inet_bind_hashbucket *head,
@@ -295,7 +284,6 @@ static inline struct sock *inet_lookup_listener(struct net *net,
((__force __portpair)(((__u32)(__dport) << 16) | (__force __u32)(__be16)(__sport)))
#endif

-#if (BITS_PER_LONG == 64)
#ifdef __BIG_ENDIAN
#define INET_ADDR_COOKIE(__name, __saddr, __daddr) \
const __addrpair __name = (__force __addrpair) ( \
@@ -307,24 +295,20 @@ static inline struct sock *inet_lookup_listener(struct net *net,
(((__force __u64)(__be32)(__daddr)) << 32) | \
((__force __u64)(__be32)(__saddr)))
#endif /* __BIG_ENDIAN */
-#define INET_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif, __sdif) \
- (((__sk)->sk_portpair == (__ports)) && \
- ((__sk)->sk_addrpair == (__cookie)) && \
- (((__sk)->sk_bound_dev_if == (__dif)) || \
- ((__sk)->sk_bound_dev_if == (__sdif))) && \
- net_eq(sock_net(__sk), (__net)))
-#else /* 32-bit arch */
-#define INET_ADDR_COOKIE(__name, __saddr, __daddr) \
- const int __name __deprecated __attribute__((unused))
-
-#define INET_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif, __sdif) \
- (((__sk)->sk_portpair == (__ports)) && \
- ((__sk)->sk_daddr == (__saddr)) && \
- ((__sk)->sk_rcv_saddr == (__daddr)) && \
- (((__sk)->sk_bound_dev_if == (__dif)) || \
- ((__sk)->sk_bound_dev_if == (__sdif))) && \
- net_eq(sock_net(__sk), (__net)))
-#endif /* 64-bit arch */
+
+static inline bool INET_MATCH(struct net *net, const struct sock *sk,
+ const __addrpair cookie, const __portpair ports,
+ int dif, int sdif)
+{
+ if (!net_eq(sock_net(sk), net) ||
+ sk->sk_portpair != ports ||
+ sk->sk_addrpair != cookie)
+ return false;
+
+ /* READ_ONCE() paired with WRITE_ONCE() in sock_bindtoindex_locked() */
+ return inet_sk_bound_dev_eq(net, READ_ONCE(sk->sk_bound_dev_if), dif,
+ sdif);
+}

/* Sockets in TCP_CLOSE state are _always_ taken out of the hash, so we need
* not check it for lookups anymore, thanks Alexey. -DaveM
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 6395f6b9a5d2..bf5654ce711e 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -149,6 +149,17 @@ static inline bool inet_bound_dev_eq(bool l3mdev_accept, int bound_dev_if,
return bound_dev_if == dif || bound_dev_if == sdif;
}

+static inline bool inet_sk_bound_dev_eq(struct net *net, int bound_dev_if,
+ int dif, int sdif)
+{
+#if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
+ return inet_bound_dev_eq(!!READ_ONCE(net->ipv4.sysctl_tcp_l3mdev_accept),
+ bound_dev_if, dif, sdif);
+#else
+ return inet_bound_dev_eq(true, bound_dev_if, dif, sdif);
+#endif
+}
+
struct inet_cork {
unsigned int flags;
__be32 addr;
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 44a35531952e..3372a1f67cf4 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -173,11 +173,28 @@ struct tc_taprio_qopt_offload {
struct tc_taprio_sched_entry entries[];
};

+#if IS_ENABLED(CONFIG_NET_SCH_TAPRIO)
+
/* Reference counting */
struct tc_taprio_qopt_offload *taprio_offload_get(struct tc_taprio_qopt_offload
*offload);
void taprio_offload_free(struct tc_taprio_qopt_offload *offload);

+#else
+
+/* Reference counting */
+static inline struct tc_taprio_qopt_offload *
+taprio_offload_get(struct tc_taprio_qopt_offload *offload)
+{
+ return NULL;
+}
+
+static inline void taprio_offload_free(struct tc_taprio_qopt_offload *offload)
+{
+}
+
+#endif
+
/* Ensure skb_mstamp_ns, which might have been populated with the txtime, is
* not mistaken for a software timestamp, because this will otherwise prevent
* the dispatch of hardware timestamps to the socket.
diff --git a/include/net/raw.h b/include/net/raw.h
index c51a635671a7..537d9d1df890 100644
--- a/include/net/raw.h
+++ b/include/net/raw.h
@@ -20,9 +20,8 @@
extern struct proto raw_prot;

extern struct raw_hashinfo raw_v4_hashinfo;
-struct sock *__raw_v4_lookup(struct net *net, struct sock *sk,
- unsigned short num, __be32 raddr,
- __be32 laddr, int dif, int sdif);
+bool raw_v4_match(struct net *net, struct sock *sk, unsigned short num,
+ __be32 raddr, __be32 laddr, int dif, int sdif);

int raw_abort(struct sock *sk, int err);
void raw_icmp_error(struct sk_buff *, int, u32);
@@ -34,9 +33,18 @@ int raw_rcv(struct sock *, struct sk_buff *);

struct raw_hashinfo {
rwlock_t lock;
- struct hlist_head ht[RAW_HTABLE_SIZE];
+ struct hlist_nulls_head ht[RAW_HTABLE_SIZE];
};

+static inline void raw_hashinfo_init(struct raw_hashinfo *hashinfo)
+{
+ int i;
+
+ rwlock_init(&hashinfo->lock);
+ for (i = 0; i < RAW_HTABLE_SIZE; i++)
+ INIT_HLIST_NULLS_HEAD(&hashinfo->ht[i], i);
+}
+
#ifdef CONFIG_PROC_FS
int raw_proc_init(void);
void raw_proc_exit(void);
diff --git a/include/net/rawv6.h b/include/net/rawv6.h
index 53d86b6055e8..bc70909625f6 100644
--- a/include/net/rawv6.h
+++ b/include/net/rawv6.h
@@ -3,11 +3,12 @@
#define _NET_RAWV6_H

#include <net/protocol.h>
+#include <net/raw.h>

extern struct raw_hashinfo raw_v6_hashinfo;
-struct sock *__raw_v6_lookup(struct net *net, struct sock *sk,
- unsigned short num, const struct in6_addr *loc_addr,
- const struct in6_addr *rmt_addr, int dif, int sdif);
+bool raw_v6_match(struct net *net, struct sock *sk, unsigned short num,
+ const struct in6_addr *loc_addr,
+ const struct in6_addr *rmt_addr, int dif, int sdif);

int raw_abort(struct sock *sk, int err);

diff --git a/include/net/sock.h b/include/net/sock.h
index 9563a093fdfc..810cb96b7f39 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -161,9 +161,6 @@ typedef __u64 __bitwise __addrpair;
* for struct sock and struct inet_timewait_sock.
*/
struct sock_common {
- /* skc_daddr and skc_rcv_saddr must be grouped on a 8 bytes aligned
- * address on 64bit arches : cf INET_MATCH()
- */
union {
__addrpair skc_addrpair;
struct {
@@ -1557,19 +1554,23 @@ static inline bool sk_has_account(struct sock *sk)

static inline bool sk_wmem_schedule(struct sock *sk, int size)
{
+ int delta;
+
if (!sk_has_account(sk))
return true;
- return size <= sk->sk_forward_alloc ||
- __sk_mem_schedule(sk, size, SK_MEM_SEND);
+ delta = size - sk->sk_forward_alloc;
+ return delta <= 0 || __sk_mem_schedule(sk, delta, SK_MEM_SEND);
}

static inline bool
sk_rmem_schedule(struct sock *sk, struct sk_buff *skb, int size)
{
+ int delta;
+
if (!sk_has_account(sk))
return true;
- return size <= sk->sk_forward_alloc ||
- __sk_mem_schedule(sk, size, SK_MEM_RECV) ||
+ delta = size - sk->sk_forward_alloc;
+ return delta <= 0 || __sk_mem_schedule(sk, delta, SK_MEM_RECV) ||
skb_pfmemalloc(skb);
}

diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 4aa031849668..0774ce97c2f1 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -95,6 +95,13 @@ static inline void xsk_buff_free(struct xdp_buff *xdp)
xp_free(xskb);
}

+static inline void xsk_buff_discard(struct xdp_buff *xdp)
+{
+ struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp);
+
+ xp_release(xskb);
+}
+
static inline void xsk_buff_set_size(struct xdp_buff *xdp, u32 size)
{
xdp->data = xdp->data_hard_start + XDP_PACKET_HEADROOM;
@@ -238,6 +245,10 @@ static inline void xsk_buff_free(struct xdp_buff *xdp)
{
}

+static inline void xsk_buff_discard(struct xdp_buff *xdp)
+{
+}
+
static inline void xsk_buff_set_size(struct xdp_buff *xdp, u32 size)
{
}
diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index c0703cd20a99..9758a4a9923f 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -411,7 +411,7 @@ extern int iscsi_host_add(struct Scsi_Host *shost, struct device *pdev);
extern struct Scsi_Host *iscsi_host_alloc(struct scsi_host_template *sht,
int dd_data_size,
bool xmit_can_sleep);
-extern void iscsi_host_remove(struct Scsi_Host *shost);
+extern void iscsi_host_remove(struct Scsi_Host *shost, bool is_shutdown);
extern void iscsi_host_free(struct Scsi_Host *shost);
extern int iscsi_target_alloc(struct scsi_target *starget);
extern int iscsi_host_get_max_scsi_cmds(struct Scsi_Host *shost,
diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h
index 9acb8422f680..d6eab7cb221a 100644
--- a/include/scsi/scsi_transport_iscsi.h
+++ b/include/scsi/scsi_transport_iscsi.h
@@ -442,6 +442,7 @@ extern struct iscsi_cls_session *iscsi_create_session(struct Scsi_Host *shost,
struct iscsi_transport *t,
int dd_size,
unsigned int target_id);
+extern void iscsi_force_destroy_session(struct iscsi_cls_session *session);
extern void iscsi_remove_session(struct iscsi_cls_session *session);
extern void iscsi_free_session(struct iscsi_cls_session *session);
extern struct iscsi_cls_conn *iscsi_alloc_conn(struct iscsi_cls_session *sess,
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index 9b4e6c78d0f4..c90a9a2f77a9 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -568,6 +568,7 @@ struct ocelot_ops {
int (*psfp_stats_get)(struct ocelot *ocelot, struct flow_cls_offload *f,
struct flow_stats *stats);
void (*cut_through_fwd)(struct ocelot *ocelot);
+ void (*tas_clock_adjust)(struct ocelot *ocelot);
};

struct ocelot_vcap_policer {
@@ -652,29 +653,32 @@ struct ocelot_port {

struct regmap *target;

- bool vlan_aware;
+ struct net_device *bond;
+ struct net_device *bridge;
+
/* VLAN that untagged frames are classified to, on ingress */
const struct ocelot_bridge_vlan *pvid_vlan;

+ struct tc_taprio_qopt_offload *taprio;
+
+ phy_interface_t phy_mode;
+
unsigned int ptp_skbs_in_flight;
- u8 ptp_cmd;
struct sk_buff_head tx_skbs;
- u8 ts_id;

- phy_interface_t phy_mode;
+ u16 mrp_ring_id;

- u8 *xmit_template;
+ u8 ptp_cmd;
+ u8 ts_id;
+
+ u8 stp_state;
+ bool vlan_aware;
bool is_dsa_8021q_cpu;
bool learn_ena;

- struct net_device *bond;
bool lag_tx_active;

- u16 mrp_ring_id;
-
- struct net_device *bridge;
int bridge_num;
- u8 stp_state;

int speed;
};
@@ -743,6 +747,9 @@ struct ocelot {
/* Lock for serializing forwarding domain changes */
struct mutex fwd_domain_lock;

+ /* Lock for serializing Time-Aware Shaper changes */
+ struct mutex tas_lock;
+
struct workqueue_struct *owq;

u8 ptp:1;
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 1779e133cea0..d91370377523 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -15,10 +15,6 @@ TRACE_DEFINE_ENUM(NODE);
TRACE_DEFINE_ENUM(DATA);
TRACE_DEFINE_ENUM(META);
TRACE_DEFINE_ENUM(META_FLUSH);
-TRACE_DEFINE_ENUM(INMEM);
-TRACE_DEFINE_ENUM(INMEM_DROP);
-TRACE_DEFINE_ENUM(INMEM_INVALIDATE);
-TRACE_DEFINE_ENUM(INMEM_REVOKE);
TRACE_DEFINE_ENUM(IPU);
TRACE_DEFINE_ENUM(OPU);
TRACE_DEFINE_ENUM(HOT);
@@ -59,10 +55,6 @@ TRACE_DEFINE_ENUM(CP_RESIZE);
{ DATA, "DATA" }, \
{ META, "META" }, \
{ META_FLUSH, "META_FLUSH" }, \
- { INMEM, "INMEM" }, \
- { INMEM_DROP, "INMEM_DROP" }, \
- { INMEM_INVALIDATE, "INMEM_INVALIDATE" }, \
- { INMEM_REVOKE, "INMEM_REVOKE" }, \
{ IPU, "IN-PLACE" }, \
{ OPU, "OUT-OF-PLACE" })

@@ -1289,20 +1281,6 @@ DEFINE_EVENT(f2fs__page, f2fs_vm_page_mkwrite,
TP_ARGS(page, type)
);

-DEFINE_EVENT(f2fs__page, f2fs_register_inmem_page,
-
- TP_PROTO(struct page *page, int type),
-
- TP_ARGS(page, type)
-);
-
-DEFINE_EVENT(f2fs__page, f2fs_commit_inmem_page,
-
- TP_PROTO(struct page *page, int type),
-
- TP_ARGS(page, type)
-);
-
TRACE_EVENT(f2fs_filemap_fault,

TP_PROTO(struct inode *inode, pgoff_t index, unsigned long ret),
diff --git a/include/trace/events/spmi.h b/include/trace/events/spmi.h
index 8b60efe18ba6..a6819fd85cdf 100644
--- a/include/trace/events/spmi.h
+++ b/include/trace/events/spmi.h
@@ -21,15 +21,15 @@ TRACE_EVENT(spmi_write_begin,
__field ( u8, sid )
__field ( u16, addr )
__field ( u8, len )
- __dynamic_array ( u8, buf, len + 1 )
+ __dynamic_array ( u8, buf, len )
),

TP_fast_assign(
__entry->opcode = opcode;
__entry->sid = sid;
__entry->addr = addr;
- __entry->len = len + 1;
- memcpy(__get_dynamic_array(buf), buf, len + 1);
+ __entry->len = len;
+ memcpy(__get_dynamic_array(buf), buf, len);
),

TP_printk("opc=%d sid=%02d addr=0x%04x len=%d buf=0x[%*phD]",
@@ -92,7 +92,7 @@ TRACE_EVENT(spmi_read_end,
__field ( u16, addr )
__field ( int, ret )
__field ( u8, len )
- __dynamic_array ( u8, buf, len + 1 )
+ __dynamic_array ( u8, buf, len )
),

TP_fast_assign(
@@ -100,8 +100,8 @@ TRACE_EVENT(spmi_read_end,
__entry->sid = sid;
__entry->addr = addr;
__entry->ret = ret;
- __entry->len = len + 1;
- memcpy(__get_dynamic_array(buf), buf, len + 1);
+ __entry->len = len;
+ memcpy(__get_dynamic_array(buf), buf, len);
),

TP_printk("opc=%d sid=%02d addr=0x%04x ret=%d len=%02d buf=0x[%*phD]",
diff --git a/include/trace/stages/stage1_struct_define.h b/include/trace/stages/stage1_struct_define.h
index a16783419687..1b7bab60434c 100644
--- a/include/trace/stages/stage1_struct_define.h
+++ b/include/trace/stages/stage1_struct_define.h
@@ -26,6 +26,9 @@
#undef __string_len
#define __string_len(item, src, len) __dynamic_array(char, item, -1)

+#undef __vstring
+#define __vstring(item, fmt, ap) __dynamic_array(char, item, -1)
+
#undef __bitmask
#define __bitmask(item, nr_bits) __dynamic_array(char, item, -1)

diff --git a/include/trace/stages/stage2_data_offsets.h b/include/trace/stages/stage2_data_offsets.h
index 42fd1e8813ec..1b7a8f764fdd 100644
--- a/include/trace/stages/stage2_data_offsets.h
+++ b/include/trace/stages/stage2_data_offsets.h
@@ -32,6 +32,9 @@
#undef __string_len
#define __string_len(item, src, len) __dynamic_array(char, item, -1)

+#undef __vstring
+#define __vstring(item, fmt, ap) __dynamic_array(char, item, -1)
+
#undef __bitmask
#define __bitmask(item, nr_bits) __dynamic_array(unsigned long, item, -1)

diff --git a/include/trace/stages/stage4_event_fields.h b/include/trace/stages/stage4_event_fields.h
index e80cdc397a43..80d34f396555 100644
--- a/include/trace/stages/stage4_event_fields.h
+++ b/include/trace/stages/stage4_event_fields.h
@@ -2,16 +2,18 @@

/* Stage 4 definitions for creating trace events */

+#define ALIGN_STRUCTFIELD(type) ((int)(offsetof(struct {char a; type b;}, b)))
+
#undef __field_ext
#define __field_ext(_type, _item, _filter_type) { \
.type = #_type, .name = #_item, \
- .size = sizeof(_type), .align = __alignof__(_type), \
+ .size = sizeof(_type), .align = ALIGN_STRUCTFIELD(_type), \
.is_signed = is_signed_type(_type), .filter_type = _filter_type },

#undef __field_struct_ext
#define __field_struct_ext(_type, _item, _filter_type) { \
.type = #_type, .name = #_item, \
- .size = sizeof(_type), .align = __alignof__(_type), \
+ .size = sizeof(_type), .align = ALIGN_STRUCTFIELD(_type), \
0, .filter_type = _filter_type },

#undef __field
@@ -23,7 +25,7 @@
#undef __array
#define __array(_type, _item, _len) { \
.type = #_type"["__stringify(_len)"]", .name = #_item, \
- .size = sizeof(_type[_len]), .align = __alignof__(_type), \
+ .size = sizeof(_type[_len]), .align = ALIGN_STRUCTFIELD(_type), \
.is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER },

#undef __dynamic_array
@@ -38,6 +40,9 @@
#undef __string_len
#define __string_len(item, src, len) __dynamic_array(char, item, -1)

+#undef __vstring
+#define __vstring(item, fmt, ap) __dynamic_array(char, item, -1)
+
#undef __bitmask
#define __bitmask(item, nr_bits) __dynamic_array(unsigned long, item, -1)

diff --git a/include/trace/stages/stage5_get_offsets.h b/include/trace/stages/stage5_get_offsets.h
index 7ee5931300e6..fba4c24ed9e6 100644
--- a/include/trace/stages/stage5_get_offsets.h
+++ b/include/trace/stages/stage5_get_offsets.h
@@ -39,6 +39,10 @@
#undef __string_len
#define __string_len(item, src, len) __dynamic_array(char, item, (len) + 1)

+#undef __vstring
+#define __vstring(item, fmt, ap) __dynamic_array(char, item, \
+ __trace_event_vstr_len(fmt, ap))
+
#undef __rel_dynamic_array
#define __rel_dynamic_array(type, item, len) \
__item_length = (len) * sizeof(type); \
diff --git a/include/trace/stages/stage6_event_callback.h b/include/trace/stages/stage6_event_callback.h
index e1724f73594b..3c554a585320 100644
--- a/include/trace/stages/stage6_event_callback.h
+++ b/include/trace/stages/stage6_event_callback.h
@@ -24,6 +24,9 @@
#undef __string_len
#define __string_len(item, src, len) __dynamic_array(char, item, -1)

+#undef __vstring
+#define __vstring(item, fmt, ap) __dynamic_array(char, item, -1)
+
#undef __assign_str
#define __assign_str(dst, src) \
strcpy(__get_str(dst), (src) ? (const char *)(src) : "(null)");
@@ -35,6 +38,15 @@
__get_str(dst)[len] = '\0'; \
} while(0)

+#undef __assign_vstr
+#define __assign_vstr(dst, fmt, va) \
+ do { \
+ va_list __cp_va; \
+ va_copy(__cp_va, *(va)); \
+ vsnprintf(__get_str(dst), TRACE_EVENT_STR_MAX, fmt, __cp_va); \
+ va_end(__cp_va); \
+ } while (0)
+
#undef __bitmask
#define __bitmask(item, nr_bits) __dynamic_array(unsigned long, item, -1)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d14b10b85e51..ff9af73859ca 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1013,6 +1013,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_XDP = 6,
BPF_LINK_TYPE_PERF_EVENT = 7,
BPF_LINK_TYPE_KPROBE_MULTI = 8,
+ BPF_LINK_TYPE_STRUCT_OPS = 9,

MAX_BPF_LINK_TYPE,
};
@@ -5143,6 +5144,17 @@ union bpf_attr {
* The **hash_algo** is returned on success,
* **-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
* invalid arguments are passed.
+ *
+ * void *bpf_kptr_xchg(void *map_value, void *ptr)
+ * Description
+ * Exchange kptr at pointer *map_value* with *ptr*, and return the
+ * old value. *ptr* can be NULL, otherwise it must be a referenced
+ * pointer which will be released when this helper is called.
+ * Return
+ * The old value of kptr (which can be NULL). The returned pointer
+ * if not NULL, is a reference which must be released using its
+ * corresponding release function, or moved into a BPF map before
+ * program exit.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5339,6 +5351,7 @@ union bpf_attr {
FN(copy_from_user_task), \
FN(skb_set_tstamp), \
FN(ima_file_hash), \
+ FN(kptr_xchg), \
/* */

/* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/include/uapi/linux/can/error.h b/include/uapi/linux/can/error.h
index 34633283de64..a1000cb63063 100644
--- a/include/uapi/linux/can/error.h
+++ b/include/uapi/linux/can/error.h
@@ -120,6 +120,9 @@
#define CAN_ERR_TRX_CANL_SHORT_TO_GND 0x70 /* 0111 0000 */
#define CAN_ERR_TRX_CANL_SHORT_TO_CANH 0x80 /* 1000 0000 */

-/* controller specific additional information / data[5..7] */
+/* data[5] is reserved (do not use) */
+
+/* TX error counter / data[6] */
+/* RX error counter / data[7] */

#endif /* _UAPI_CAN_ERROR_H */
diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
index 352a822d4370..3121d127d5aa 100644
--- a/include/uapi/linux/f2fs.h
+++ b/include/uapi/linux/f2fs.h
@@ -13,7 +13,7 @@
#define F2FS_IOC_COMMIT_ATOMIC_WRITE _IO(F2FS_IOCTL_MAGIC, 2)
#define F2FS_IOC_START_VOLATILE_WRITE _IO(F2FS_IOCTL_MAGIC, 3)
#define F2FS_IOC_RELEASE_VOLATILE_WRITE _IO(F2FS_IOCTL_MAGIC, 4)
-#define F2FS_IOC_ABORT_VOLATILE_WRITE _IO(F2FS_IOCTL_MAGIC, 5)
+#define F2FS_IOC_ABORT_ATOMIC_WRITE _IO(F2FS_IOCTL_MAGIC, 5)
#define F2FS_IOC_GARBAGE_COLLECT _IOW(F2FS_IOCTL_MAGIC, 6, __u32)
#define F2FS_IOC_WRITE_CHECKPOINT _IO(F2FS_IOCTL_MAGIC, 7)
#define F2FS_IOC_DEFRAGMENT _IOWR(F2FS_IOCTL_MAGIC, 8, \
diff --git a/include/uapi/linux/netfilter/xt_IDLETIMER.h b/include/uapi/linux/netfilter/xt_IDLETIMER.h
index 49ddcdc61c09..7bfb31a66fc9 100644
--- a/include/uapi/linux/netfilter/xt_IDLETIMER.h
+++ b/include/uapi/linux/netfilter/xt_IDLETIMER.h
@@ -1,6 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
/*
- * linux/include/linux/netfilter/xt_IDLETIMER.h
- *
* Header file for Xtables timer target module.
*
* Copyright (C) 2004, 2010 Nokia Corporation
@@ -10,20 +9,6 @@
* by Luciano Coelho <luciano.coelho@xxxxxxxxx>
*
* Contact: Luciano Coelho <luciano.coelho@xxxxxxxxx>
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * version 2 as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
- * 02110-1301 USA
*/

#ifndef _XT_IDLETIMER_H
diff --git a/init/main.c b/init/main.c
index 80bd217dd35e..1e866b31bf19 100644
--- a/init/main.c
+++ b/init/main.c
@@ -99,6 +99,7 @@
#include <linux/kcsan.h>
#include <linux/init_syscalls.h>
#include <linux/stackdepot.h>
+#include <linux/randomize_kstack.h>
#include <net/net_namespace.h>

#include <asm/io.h>
diff --git a/io_uring/Makefile b/io_uring/Makefile
new file mode 100644
index 000000000000..3680425df947
--- /dev/null
+++ b/io_uring/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for io_uring
+
+obj-$(CONFIG_IO_URING) += io_uring.o
+obj-$(CONFIG_IO_WQ) += io-wq.o
diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
new file mode 100644
index 000000000000..32aeb2c581c5
--- /dev/null
+++ b/io_uring/io-wq.c
@@ -0,0 +1,1424 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Basic worker thread pool for io_uring
+ *
+ * Copyright (C) 2019 Jens Axboe
+ *
+ */
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/sched/signal.h>
+#include <linux/percpu.h>
+#include <linux/slab.h>
+#include <linux/rculist_nulls.h>
+#include <linux/cpu.h>
+#include <linux/task_work.h>
+#include <linux/audit.h>
+#include <uapi/linux/io_uring.h>
+
+#include "io-wq.h"
+
+#define WORKER_IDLE_TIMEOUT (5 * HZ)
+
+enum {
+ IO_WORKER_F_UP = 1, /* up and active */
+ IO_WORKER_F_RUNNING = 2, /* account as running */
+ IO_WORKER_F_FREE = 4, /* worker on free list */
+ IO_WORKER_F_BOUND = 8, /* is doing bounded work */
+};
+
+enum {
+ IO_WQ_BIT_EXIT = 0, /* wq exiting */
+};
+
+enum {
+ IO_ACCT_STALLED_BIT = 0, /* stalled on hash */
+};
+
+/*
+ * One for each thread in a wqe pool
+ */
+struct io_worker {
+ refcount_t ref;
+ unsigned flags;
+ struct hlist_nulls_node nulls_node;
+ struct list_head all_list;
+ struct task_struct *task;
+ struct io_wqe *wqe;
+
+ struct io_wq_work *cur_work;
+ struct io_wq_work *next_work;
+ raw_spinlock_t lock;
+
+ struct completion ref_done;
+
+ unsigned long create_state;
+ struct callback_head create_work;
+ int create_index;
+
+ union {
+ struct rcu_head rcu;
+ struct work_struct work;
+ };
+};
+
+#if BITS_PER_LONG == 64
+#define IO_WQ_HASH_ORDER 6
+#else
+#define IO_WQ_HASH_ORDER 5
+#endif
+
+#define IO_WQ_NR_HASH_BUCKETS (1u << IO_WQ_HASH_ORDER)
+
+struct io_wqe_acct {
+ unsigned nr_workers;
+ unsigned max_workers;
+ int index;
+ atomic_t nr_running;
+ raw_spinlock_t lock;
+ struct io_wq_work_list work_list;
+ unsigned long flags;
+};
+
+enum {
+ IO_WQ_ACCT_BOUND,
+ IO_WQ_ACCT_UNBOUND,
+ IO_WQ_ACCT_NR,
+};
+
+/*
+ * Per-node worker thread pool
+ */
+struct io_wqe {
+ raw_spinlock_t lock;
+ struct io_wqe_acct acct[IO_WQ_ACCT_NR];
+
+ int node;
+
+ struct hlist_nulls_head free_list;
+ struct list_head all_list;
+
+ struct wait_queue_entry wait;
+
+ struct io_wq *wq;
+ struct io_wq_work *hash_tail[IO_WQ_NR_HASH_BUCKETS];
+
+ cpumask_var_t cpu_mask;
+};
+
+/*
+ * Per io_wq state
+ */
+struct io_wq {
+ unsigned long state;
+
+ free_work_fn *free_work;
+ io_wq_work_fn *do_work;
+
+ struct io_wq_hash *hash;
+
+ atomic_t worker_refs;
+ struct completion worker_done;
+
+ struct hlist_node cpuhp_node;
+
+ struct task_struct *task;
+
+ struct io_wqe *wqes[];
+};
+
+static enum cpuhp_state io_wq_online;
+
+struct io_cb_cancel_data {
+ work_cancel_fn *fn;
+ void *data;
+ int nr_running;
+ int nr_pending;
+ bool cancel_all;
+};
+
+static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index);
+static void io_wqe_dec_running(struct io_worker *worker);
+static bool io_acct_cancel_pending_work(struct io_wqe *wqe,
+ struct io_wqe_acct *acct,
+ struct io_cb_cancel_data *match);
+static void create_worker_cb(struct callback_head *cb);
+static void io_wq_cancel_tw_create(struct io_wq *wq);
+
+static bool io_worker_get(struct io_worker *worker)
+{
+ return refcount_inc_not_zero(&worker->ref);
+}
+
+static void io_worker_release(struct io_worker *worker)
+{
+ if (refcount_dec_and_test(&worker->ref))
+ complete(&worker->ref_done);
+}
+
+static inline struct io_wqe_acct *io_get_acct(struct io_wqe *wqe, bool bound)
+{
+ return &wqe->acct[bound ? IO_WQ_ACCT_BOUND : IO_WQ_ACCT_UNBOUND];
+}
+
+static inline struct io_wqe_acct *io_work_get_acct(struct io_wqe *wqe,
+ struct io_wq_work *work)
+{
+ return io_get_acct(wqe, !(work->flags & IO_WQ_WORK_UNBOUND));
+}
+
+static inline struct io_wqe_acct *io_wqe_get_acct(struct io_worker *worker)
+{
+ return io_get_acct(worker->wqe, worker->flags & IO_WORKER_F_BOUND);
+}
+
+static void io_worker_ref_put(struct io_wq *wq)
+{
+ if (atomic_dec_and_test(&wq->worker_refs))
+ complete(&wq->worker_done);
+}
+
+static void io_worker_cancel_cb(struct io_worker *worker)
+{
+ struct io_wqe_acct *acct = io_wqe_get_acct(worker);
+ struct io_wqe *wqe = worker->wqe;
+ struct io_wq *wq = wqe->wq;
+
+ atomic_dec(&acct->nr_running);
+ raw_spin_lock(&worker->wqe->lock);
+ acct->nr_workers--;
+ raw_spin_unlock(&worker->wqe->lock);
+ io_worker_ref_put(wq);
+ clear_bit_unlock(0, &worker->create_state);
+ io_worker_release(worker);
+}
+
+static bool io_task_worker_match(struct callback_head *cb, void *data)
+{
+ struct io_worker *worker;
+
+ if (cb->func != create_worker_cb)
+ return false;
+ worker = container_of(cb, struct io_worker, create_work);
+ return worker == data;
+}
+
+static void io_worker_exit(struct io_worker *worker)
+{
+ struct io_wqe *wqe = worker->wqe;
+ struct io_wq *wq = wqe->wq;
+
+ while (1) {
+ struct callback_head *cb = task_work_cancel_match(wq->task,
+ io_task_worker_match, worker);
+
+ if (!cb)
+ break;
+ io_worker_cancel_cb(worker);
+ }
+
+ io_worker_release(worker);
+ wait_for_completion(&worker->ref_done);
+
+ raw_spin_lock(&wqe->lock);
+ if (worker->flags & IO_WORKER_F_FREE)
+ hlist_nulls_del_rcu(&worker->nulls_node);
+ list_del_rcu(&worker->all_list);
+ raw_spin_unlock(&wqe->lock);
+ io_wqe_dec_running(worker);
+ worker->flags = 0;
+ preempt_disable();
+ current->flags &= ~PF_IO_WORKER;
+ preempt_enable();
+
+ kfree_rcu(worker, rcu);
+ io_worker_ref_put(wqe->wq);
+ do_exit(0);
+}
+
+static inline bool io_acct_run_queue(struct io_wqe_acct *acct)
+{
+ bool ret = false;
+
+ raw_spin_lock(&acct->lock);
+ if (!wq_list_empty(&acct->work_list) &&
+ !test_bit(IO_ACCT_STALLED_BIT, &acct->flags))
+ ret = true;
+ raw_spin_unlock(&acct->lock);
+
+ return ret;
+}
+
+/*
+ * Check head of free list for an available worker. If one isn't available,
+ * caller must create one.
+ */
+static bool io_wqe_activate_free_worker(struct io_wqe *wqe,
+ struct io_wqe_acct *acct)
+ __must_hold(RCU)
+{
+ struct hlist_nulls_node *n;
+ struct io_worker *worker;
+
+ /*
+ * Iterate free_list and see if we can find an idle worker to
+ * activate. If a given worker is on the free_list but in the process
+ * of exiting, keep trying.
+ */
+ hlist_nulls_for_each_entry_rcu(worker, n, &wqe->free_list, nulls_node) {
+ if (!io_worker_get(worker))
+ continue;
+ if (io_wqe_get_acct(worker) != acct) {
+ io_worker_release(worker);
+ continue;
+ }
+ if (wake_up_process(worker->task)) {
+ io_worker_release(worker);
+ return true;
+ }
+ io_worker_release(worker);
+ }
+
+ return false;
+}
+
+/*
+ * We need a worker. If we find a free one, we're good. If not, and we're
+ * below the max number of workers, create one.
+ */
+static bool io_wqe_create_worker(struct io_wqe *wqe, struct io_wqe_acct *acct)
+{
+ /*
+ * Most likely an attempt to queue unbounded work on an io_wq that
+ * wasn't setup with any unbounded workers.
+ */
+ if (unlikely(!acct->max_workers))
+ pr_warn_once("io-wq is not configured for unbound workers");
+
+ raw_spin_lock(&wqe->lock);
+ if (acct->nr_workers >= acct->max_workers) {
+ raw_spin_unlock(&wqe->lock);
+ return true;
+ }
+ acct->nr_workers++;
+ raw_spin_unlock(&wqe->lock);
+ atomic_inc(&acct->nr_running);
+ atomic_inc(&wqe->wq->worker_refs);
+ return create_io_worker(wqe->wq, wqe, acct->index);
+}
+
+static void io_wqe_inc_running(struct io_worker *worker)
+{
+ struct io_wqe_acct *acct = io_wqe_get_acct(worker);
+
+ atomic_inc(&acct->nr_running);
+}
+
+static void create_worker_cb(struct callback_head *cb)
+{
+ struct io_worker *worker;
+ struct io_wq *wq;
+ struct io_wqe *wqe;
+ struct io_wqe_acct *acct;
+ bool do_create = false;
+
+ worker = container_of(cb, struct io_worker, create_work);
+ wqe = worker->wqe;
+ wq = wqe->wq;
+ acct = &wqe->acct[worker->create_index];
+ raw_spin_lock(&wqe->lock);
+ if (acct->nr_workers < acct->max_workers) {
+ acct->nr_workers++;
+ do_create = true;
+ }
+ raw_spin_unlock(&wqe->lock);
+ if (do_create) {
+ create_io_worker(wq, wqe, worker->create_index);
+ } else {
+ atomic_dec(&acct->nr_running);
+ io_worker_ref_put(wq);
+ }
+ clear_bit_unlock(0, &worker->create_state);
+ io_worker_release(worker);
+}
+
+static bool io_queue_worker_create(struct io_worker *worker,
+ struct io_wqe_acct *acct,
+ task_work_func_t func)
+{
+ struct io_wqe *wqe = worker->wqe;
+ struct io_wq *wq = wqe->wq;
+
+ /* raced with exit, just ignore create call */
+ if (test_bit(IO_WQ_BIT_EXIT, &wq->state))
+ goto fail;
+ if (!io_worker_get(worker))
+ goto fail;
+ /*
+ * create_state manages ownership of create_work/index. We should
+ * only need one entry per worker, as the worker going to sleep
+ * will trigger the condition, and waking will clear it once it
+ * runs the task_work.
+ */
+ if (test_bit(0, &worker->create_state) ||
+ test_and_set_bit_lock(0, &worker->create_state))
+ goto fail_release;
+
+ atomic_inc(&wq->worker_refs);
+ init_task_work(&worker->create_work, func);
+ worker->create_index = acct->index;
+ if (!task_work_add(wq->task, &worker->create_work, TWA_SIGNAL)) {
+ /*
+ * EXIT may have been set after checking it above, check after
+ * adding the task_work and remove any creation item if it is
+ * now set. wq exit does that too, but we can have added this
+ * work item after we canceled in io_wq_exit_workers().
+ */
+ if (test_bit(IO_WQ_BIT_EXIT, &wq->state))
+ io_wq_cancel_tw_create(wq);
+ io_worker_ref_put(wq);
+ return true;
+ }
+ io_worker_ref_put(wq);
+ clear_bit_unlock(0, &worker->create_state);
+fail_release:
+ io_worker_release(worker);
+fail:
+ atomic_dec(&acct->nr_running);
+ io_worker_ref_put(wq);
+ return false;
+}
+
+static void io_wqe_dec_running(struct io_worker *worker)
+{
+ struct io_wqe_acct *acct = io_wqe_get_acct(worker);
+ struct io_wqe *wqe = worker->wqe;
+
+ if (!(worker->flags & IO_WORKER_F_UP))
+ return;
+
+ if (!atomic_dec_and_test(&acct->nr_running))
+ return;
+ if (!io_acct_run_queue(acct))
+ return;
+
+ atomic_inc(&acct->nr_running);
+ atomic_inc(&wqe->wq->worker_refs);
+ io_queue_worker_create(worker, acct, create_worker_cb);
+}
+
+/*
+ * Worker will start processing some work. Move it to the busy list, if
+ * it's currently on the freelist
+ */
+static void __io_worker_busy(struct io_wqe *wqe, struct io_worker *worker)
+{
+ if (worker->flags & IO_WORKER_F_FREE) {
+ worker->flags &= ~IO_WORKER_F_FREE;
+ raw_spin_lock(&wqe->lock);
+ hlist_nulls_del_init_rcu(&worker->nulls_node);
+ raw_spin_unlock(&wqe->lock);
+ }
+}
+
+/*
+ * No work, worker going to sleep. Move to freelist, and unuse mm if we
+ * have one attached. Dropping the mm may potentially sleep, so we drop
+ * the lock in that case and return success. Since the caller has to
+ * retry the loop in that case (we changed task state), we don't regrab
+ * the lock if we return success.
+ */
+static void __io_worker_idle(struct io_wqe *wqe, struct io_worker *worker)
+ __must_hold(wqe->lock)
+{
+ if (!(worker->flags & IO_WORKER_F_FREE)) {
+ worker->flags |= IO_WORKER_F_FREE;
+ hlist_nulls_add_head_rcu(&worker->nulls_node, &wqe->free_list);
+ }
+}
+
+static inline unsigned int io_get_work_hash(struct io_wq_work *work)
+{
+ return work->flags >> IO_WQ_HASH_SHIFT;
+}
+
+static bool io_wait_on_hash(struct io_wqe *wqe, unsigned int hash)
+{
+ struct io_wq *wq = wqe->wq;
+ bool ret = false;
+
+ spin_lock_irq(&wq->hash->wait.lock);
+ if (list_empty(&wqe->wait.entry)) {
+ __add_wait_queue(&wq->hash->wait, &wqe->wait);
+ if (!test_bit(hash, &wq->hash->map)) {
+ __set_current_state(TASK_RUNNING);
+ list_del_init(&wqe->wait.entry);
+ ret = true;
+ }
+ }
+ spin_unlock_irq(&wq->hash->wait.lock);
+ return ret;
+}
+
+static struct io_wq_work *io_get_next_work(struct io_wqe_acct *acct,
+ struct io_worker *worker)
+ __must_hold(acct->lock)
+{
+ struct io_wq_work_node *node, *prev;
+ struct io_wq_work *work, *tail;
+ unsigned int stall_hash = -1U;
+ struct io_wqe *wqe = worker->wqe;
+
+ wq_list_for_each(node, prev, &acct->work_list) {
+ unsigned int hash;
+
+ work = container_of(node, struct io_wq_work, list);
+
+ /* not hashed, can run anytime */
+ if (!io_wq_is_hashed(work)) {
+ wq_list_del(&acct->work_list, node, prev);
+ return work;
+ }
+
+ hash = io_get_work_hash(work);
+ /* all items with this hash lie in [work, tail] */
+ tail = wqe->hash_tail[hash];
+
+ /* hashed, can run if not already running */
+ if (!test_and_set_bit(hash, &wqe->wq->hash->map)) {
+ wqe->hash_tail[hash] = NULL;
+ wq_list_cut(&acct->work_list, &tail->list, prev);
+ return work;
+ }
+ if (stall_hash == -1U)
+ stall_hash = hash;
+ /* fast forward to a next hash, for-each will fix up @prev */
+ node = &tail->list;
+ }
+
+ if (stall_hash != -1U) {
+ bool unstalled;
+
+ /*
+ * Set this before dropping the lock to avoid racing with new
+ * work being added and clearing the stalled bit.
+ */
+ set_bit(IO_ACCT_STALLED_BIT, &acct->flags);
+ raw_spin_unlock(&acct->lock);
+ unstalled = io_wait_on_hash(wqe, stall_hash);
+ raw_spin_lock(&acct->lock);
+ if (unstalled) {
+ clear_bit(IO_ACCT_STALLED_BIT, &acct->flags);
+ if (wq_has_sleeper(&wqe->wq->hash->wait))
+ wake_up(&wqe->wq->hash->wait);
+ }
+ }
+
+ return NULL;
+}
+
+static bool io_flush_signals(void)
+{
+ if (unlikely(test_thread_flag(TIF_NOTIFY_SIGNAL))) {
+ __set_current_state(TASK_RUNNING);
+ clear_notify_signal();
+ if (task_work_pending(current))
+ task_work_run();
+ return true;
+ }
+ return false;
+}
+
+static void io_assign_current_work(struct io_worker *worker,
+ struct io_wq_work *work)
+{
+ if (work) {
+ io_flush_signals();
+ cond_resched();
+ }
+
+ raw_spin_lock(&worker->lock);
+ worker->cur_work = work;
+ worker->next_work = NULL;
+ raw_spin_unlock(&worker->lock);
+}
+
+static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work);
+
+static void io_worker_handle_work(struct io_worker *worker)
+{
+ struct io_wqe_acct *acct = io_wqe_get_acct(worker);
+ struct io_wqe *wqe = worker->wqe;
+ struct io_wq *wq = wqe->wq;
+ bool do_kill = test_bit(IO_WQ_BIT_EXIT, &wq->state);
+
+ do {
+ struct io_wq_work *work;
+
+ /*
+ * If we got some work, mark us as busy. If we didn't, but
+ * the list isn't empty, it means we stalled on hashed work.
+ * Mark us stalled so we don't keep looking for work when we
+ * can't make progress, any work completion or insertion will
+ * clear the stalled flag.
+ */
+ raw_spin_lock(&acct->lock);
+ work = io_get_next_work(acct, worker);
+ raw_spin_unlock(&acct->lock);
+ if (work) {
+ __io_worker_busy(wqe, worker);
+
+ /*
+ * Make sure cancelation can find this, even before
+ * it becomes the active work. That avoids a window
+ * where the work has been removed from our general
+ * work list, but isn't yet discoverable as the
+ * current work item for this worker.
+ */
+ raw_spin_lock(&worker->lock);
+ worker->next_work = work;
+ raw_spin_unlock(&worker->lock);
+ } else {
+ break;
+ }
+ io_assign_current_work(worker, work);
+ __set_current_state(TASK_RUNNING);
+
+ /* handle a whole dependent link */
+ do {
+ struct io_wq_work *next_hashed, *linked;
+ unsigned int hash = io_get_work_hash(work);
+
+ next_hashed = wq_next_work(work);
+
+ if (unlikely(do_kill) && (work->flags & IO_WQ_WORK_UNBOUND))
+ work->flags |= IO_WQ_WORK_CANCEL;
+ wq->do_work(work);
+ io_assign_current_work(worker, NULL);
+
+ linked = wq->free_work(work);
+ work = next_hashed;
+ if (!work && linked && !io_wq_is_hashed(linked)) {
+ work = linked;
+ linked = NULL;
+ }
+ io_assign_current_work(worker, work);
+ if (linked)
+ io_wqe_enqueue(wqe, linked);
+
+ if (hash != -1U && !next_hashed) {
+ /* serialize hash clear with wake_up() */
+ spin_lock_irq(&wq->hash->wait.lock);
+ clear_bit(hash, &wq->hash->map);
+ clear_bit(IO_ACCT_STALLED_BIT, &acct->flags);
+ spin_unlock_irq(&wq->hash->wait.lock);
+ if (wq_has_sleeper(&wq->hash->wait))
+ wake_up(&wq->hash->wait);
+ }
+ } while (work);
+ } while (1);
+}
+
+static int io_wqe_worker(void *data)
+{
+ struct io_worker *worker = data;
+ struct io_wqe_acct *acct = io_wqe_get_acct(worker);
+ struct io_wqe *wqe = worker->wqe;
+ struct io_wq *wq = wqe->wq;
+ bool last_timeout = false;
+ char buf[TASK_COMM_LEN];
+
+ worker->flags |= (IO_WORKER_F_UP | IO_WORKER_F_RUNNING);
+
+ snprintf(buf, sizeof(buf), "iou-wrk-%d", wq->task->pid);
+ set_task_comm(current, buf);
+
+ audit_alloc_kernel(current);
+
+ while (!test_bit(IO_WQ_BIT_EXIT, &wq->state)) {
+ long ret;
+
+ set_current_state(TASK_INTERRUPTIBLE);
+ while (io_acct_run_queue(acct))
+ io_worker_handle_work(worker);
+
+ raw_spin_lock(&wqe->lock);
+ /* timed out, exit unless we're the last worker */
+ if (last_timeout && acct->nr_workers > 1) {
+ acct->nr_workers--;
+ raw_spin_unlock(&wqe->lock);
+ __set_current_state(TASK_RUNNING);
+ break;
+ }
+ last_timeout = false;
+ __io_worker_idle(wqe, worker);
+ raw_spin_unlock(&wqe->lock);
+ if (io_flush_signals())
+ continue;
+ ret = schedule_timeout(WORKER_IDLE_TIMEOUT);
+ if (signal_pending(current)) {
+ struct ksignal ksig;
+
+ if (!get_signal(&ksig))
+ continue;
+ break;
+ }
+ last_timeout = !ret;
+ }
+
+ if (test_bit(IO_WQ_BIT_EXIT, &wq->state))
+ io_worker_handle_work(worker);
+
+ audit_free(current);
+ io_worker_exit(worker);
+ return 0;
+}
+
+/*
+ * Called when a worker is scheduled in. Mark us as currently running.
+ */
+void io_wq_worker_running(struct task_struct *tsk)
+{
+ struct io_worker *worker = tsk->worker_private;
+
+ if (!worker)
+ return;
+ if (!(worker->flags & IO_WORKER_F_UP))
+ return;
+ if (worker->flags & IO_WORKER_F_RUNNING)
+ return;
+ worker->flags |= IO_WORKER_F_RUNNING;
+ io_wqe_inc_running(worker);
+}
+
+/*
+ * Called when worker is going to sleep. If there are no workers currently
+ * running and we have work pending, wake up a free one or create a new one.
+ */
+void io_wq_worker_sleeping(struct task_struct *tsk)
+{
+ struct io_worker *worker = tsk->worker_private;
+
+ if (!worker)
+ return;
+ if (!(worker->flags & IO_WORKER_F_UP))
+ return;
+ if (!(worker->flags & IO_WORKER_F_RUNNING))
+ return;
+
+ worker->flags &= ~IO_WORKER_F_RUNNING;
+ io_wqe_dec_running(worker);
+}
+
+static void io_init_new_worker(struct io_wqe *wqe, struct io_worker *worker,
+ struct task_struct *tsk)
+{
+ tsk->worker_private = worker;
+ worker->task = tsk;
+ set_cpus_allowed_ptr(tsk, wqe->cpu_mask);
+ tsk->flags |= PF_NO_SETAFFINITY;
+
+ raw_spin_lock(&wqe->lock);
+ hlist_nulls_add_head_rcu(&worker->nulls_node, &wqe->free_list);
+ list_add_tail_rcu(&worker->all_list, &wqe->all_list);
+ worker->flags |= IO_WORKER_F_FREE;
+ raw_spin_unlock(&wqe->lock);
+ wake_up_new_task(tsk);
+}
+
+static bool io_wq_work_match_all(struct io_wq_work *work, void *data)
+{
+ return true;
+}
+
+static inline bool io_should_retry_thread(long err)
+{
+ /*
+ * Prevent perpetual task_work retry, if the task (or its group) is
+ * exiting.
+ */
+ if (fatal_signal_pending(current))
+ return false;
+
+ switch (err) {
+ case -EAGAIN:
+ case -ERESTARTSYS:
+ case -ERESTARTNOINTR:
+ case -ERESTARTNOHAND:
+ return true;
+ default:
+ return false;
+ }
+}
+
+static void create_worker_cont(struct callback_head *cb)
+{
+ struct io_worker *worker;
+ struct task_struct *tsk;
+ struct io_wqe *wqe;
+
+ worker = container_of(cb, struct io_worker, create_work);
+ clear_bit_unlock(0, &worker->create_state);
+ wqe = worker->wqe;
+ tsk = create_io_thread(io_wqe_worker, worker, wqe->node);
+ if (!IS_ERR(tsk)) {
+ io_init_new_worker(wqe, worker, tsk);
+ io_worker_release(worker);
+ return;
+ } else if (!io_should_retry_thread(PTR_ERR(tsk))) {
+ struct io_wqe_acct *acct = io_wqe_get_acct(worker);
+
+ atomic_dec(&acct->nr_running);
+ raw_spin_lock(&wqe->lock);
+ acct->nr_workers--;
+ if (!acct->nr_workers) {
+ struct io_cb_cancel_data match = {
+ .fn = io_wq_work_match_all,
+ .cancel_all = true,
+ };
+
+ raw_spin_unlock(&wqe->lock);
+ while (io_acct_cancel_pending_work(wqe, acct, &match))
+ ;
+ } else {
+ raw_spin_unlock(&wqe->lock);
+ }
+ io_worker_ref_put(wqe->wq);
+ kfree(worker);
+ return;
+ }
+
+ /* re-create attempts grab a new worker ref, drop the existing one */
+ io_worker_release(worker);
+ schedule_work(&worker->work);
+}
+
+static void io_workqueue_create(struct work_struct *work)
+{
+ struct io_worker *worker = container_of(work, struct io_worker, work);
+ struct io_wqe_acct *acct = io_wqe_get_acct(worker);
+
+ if (!io_queue_worker_create(worker, acct, create_worker_cont))
+ kfree(worker);
+}
+
+static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index)
+{
+ struct io_wqe_acct *acct = &wqe->acct[index];
+ struct io_worker *worker;
+ struct task_struct *tsk;
+
+ __set_current_state(TASK_RUNNING);
+
+ worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, wqe->node);
+ if (!worker) {
+fail:
+ atomic_dec(&acct->nr_running);
+ raw_spin_lock(&wqe->lock);
+ acct->nr_workers--;
+ raw_spin_unlock(&wqe->lock);
+ io_worker_ref_put(wq);
+ return false;
+ }
+
+ refcount_set(&worker->ref, 1);
+ worker->wqe = wqe;
+ raw_spin_lock_init(&worker->lock);
+ init_completion(&worker->ref_done);
+
+ if (index == IO_WQ_ACCT_BOUND)
+ worker->flags |= IO_WORKER_F_BOUND;
+
+ tsk = create_io_thread(io_wqe_worker, worker, wqe->node);
+ if (!IS_ERR(tsk)) {
+ io_init_new_worker(wqe, worker, tsk);
+ } else if (!io_should_retry_thread(PTR_ERR(tsk))) {
+ kfree(worker);
+ goto fail;
+ } else {
+ INIT_WORK(&worker->work, io_workqueue_create);
+ schedule_work(&worker->work);
+ }
+
+ return true;
+}
+
+/*
+ * Iterate the passed in list and call the specific function for each
+ * worker that isn't exiting
+ */
+static bool io_wq_for_each_worker(struct io_wqe *wqe,
+ bool (*func)(struct io_worker *, void *),
+ void *data)
+{
+ struct io_worker *worker;
+ bool ret = false;
+
+ list_for_each_entry_rcu(worker, &wqe->all_list, all_list) {
+ if (io_worker_get(worker)) {
+ /* no task if node is/was offline */
+ if (worker->task)
+ ret = func(worker, data);
+ io_worker_release(worker);
+ if (ret)
+ break;
+ }
+ }
+
+ return ret;
+}
+
+static bool io_wq_worker_wake(struct io_worker *worker, void *data)
+{
+ set_notify_signal(worker->task);
+ wake_up_process(worker->task);
+ return false;
+}
+
+static void io_run_cancel(struct io_wq_work *work, struct io_wqe *wqe)
+{
+ struct io_wq *wq = wqe->wq;
+
+ do {
+ work->flags |= IO_WQ_WORK_CANCEL;
+ wq->do_work(work);
+ work = wq->free_work(work);
+ } while (work);
+}
+
+static void io_wqe_insert_work(struct io_wqe *wqe, struct io_wq_work *work)
+{
+ struct io_wqe_acct *acct = io_work_get_acct(wqe, work);
+ unsigned int hash;
+ struct io_wq_work *tail;
+
+ if (!io_wq_is_hashed(work)) {
+append:
+ wq_list_add_tail(&work->list, &acct->work_list);
+ return;
+ }
+
+ hash = io_get_work_hash(work);
+ tail = wqe->hash_tail[hash];
+ wqe->hash_tail[hash] = work;
+ if (!tail)
+ goto append;
+
+ wq_list_add_after(&work->list, &tail->list, &acct->work_list);
+}
+
+static bool io_wq_work_match_item(struct io_wq_work *work, void *data)
+{
+ return work == data;
+}
+
+static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work)
+{
+ struct io_wqe_acct *acct = io_work_get_acct(wqe, work);
+ struct io_cb_cancel_data match;
+ unsigned work_flags = work->flags;
+ bool do_create;
+
+ /*
+ * If io-wq is exiting for this task, or if the request has explicitly
+ * been marked as one that should not get executed, cancel it here.
+ */
+ if (test_bit(IO_WQ_BIT_EXIT, &wqe->wq->state) ||
+ (work->flags & IO_WQ_WORK_CANCEL)) {
+ io_run_cancel(work, wqe);
+ return;
+ }
+
+ raw_spin_lock(&acct->lock);
+ io_wqe_insert_work(wqe, work);
+ clear_bit(IO_ACCT_STALLED_BIT, &acct->flags);
+ raw_spin_unlock(&acct->lock);
+
+ raw_spin_lock(&wqe->lock);
+ rcu_read_lock();
+ do_create = !io_wqe_activate_free_worker(wqe, acct);
+ rcu_read_unlock();
+
+ raw_spin_unlock(&wqe->lock);
+
+ if (do_create && ((work_flags & IO_WQ_WORK_CONCURRENT) ||
+ !atomic_read(&acct->nr_running))) {
+ bool did_create;
+
+ did_create = io_wqe_create_worker(wqe, acct);
+ if (likely(did_create))
+ return;
+
+ raw_spin_lock(&wqe->lock);
+ if (acct->nr_workers) {
+ raw_spin_unlock(&wqe->lock);
+ return;
+ }
+ raw_spin_unlock(&wqe->lock);
+
+ /* fatal condition, failed to create the first worker */
+ match.fn = io_wq_work_match_item,
+ match.data = work,
+ match.cancel_all = false,
+
+ io_acct_cancel_pending_work(wqe, acct, &match);
+ }
+}
+
+void io_wq_enqueue(struct io_wq *wq, struct io_wq_work *work)
+{
+ struct io_wqe *wqe = wq->wqes[numa_node_id()];
+
+ io_wqe_enqueue(wqe, work);
+}
+
+/*
+ * Work items that hash to the same value will not be done in parallel.
+ * Used to limit concurrent writes, generally hashed by inode.
+ */
+void io_wq_hash_work(struct io_wq_work *work, void *val)
+{
+ unsigned int bit;
+
+ bit = hash_ptr(val, IO_WQ_HASH_ORDER);
+ work->flags |= (IO_WQ_WORK_HASHED | (bit << IO_WQ_HASH_SHIFT));
+}
+
+static bool __io_wq_worker_cancel(struct io_worker *worker,
+ struct io_cb_cancel_data *match,
+ struct io_wq_work *work)
+{
+ if (work && match->fn(work, match->data)) {
+ work->flags |= IO_WQ_WORK_CANCEL;
+ set_notify_signal(worker->task);
+ return true;
+ }
+
+ return false;
+}
+
+static bool io_wq_worker_cancel(struct io_worker *worker, void *data)
+{
+ struct io_cb_cancel_data *match = data;
+
+ /*
+ * Hold the lock to avoid ->cur_work going out of scope, caller
+ * may dereference the passed in work.
+ */
+ raw_spin_lock(&worker->lock);
+ if (__io_wq_worker_cancel(worker, match, worker->cur_work) ||
+ __io_wq_worker_cancel(worker, match, worker->next_work))
+ match->nr_running++;
+ raw_spin_unlock(&worker->lock);
+
+ return match->nr_running && !match->cancel_all;
+}
+
+static inline void io_wqe_remove_pending(struct io_wqe *wqe,
+ struct io_wq_work *work,
+ struct io_wq_work_node *prev)
+{
+ struct io_wqe_acct *acct = io_work_get_acct(wqe, work);
+ unsigned int hash = io_get_work_hash(work);
+ struct io_wq_work *prev_work = NULL;
+
+ if (io_wq_is_hashed(work) && work == wqe->hash_tail[hash]) {
+ if (prev)
+ prev_work = container_of(prev, struct io_wq_work, list);
+ if (prev_work && io_get_work_hash(prev_work) == hash)
+ wqe->hash_tail[hash] = prev_work;
+ else
+ wqe->hash_tail[hash] = NULL;
+ }
+ wq_list_del(&acct->work_list, &work->list, prev);
+}
+
+static bool io_acct_cancel_pending_work(struct io_wqe *wqe,
+ struct io_wqe_acct *acct,
+ struct io_cb_cancel_data *match)
+{
+ struct io_wq_work_node *node, *prev;
+ struct io_wq_work *work;
+
+ raw_spin_lock(&acct->lock);
+ wq_list_for_each(node, prev, &acct->work_list) {
+ work = container_of(node, struct io_wq_work, list);
+ if (!match->fn(work, match->data))
+ continue;
+ io_wqe_remove_pending(wqe, work, prev);
+ raw_spin_unlock(&acct->lock);
+ io_run_cancel(work, wqe);
+ match->nr_pending++;
+ /* not safe to continue after unlock */
+ return true;
+ }
+ raw_spin_unlock(&acct->lock);
+
+ return false;
+}
+
+static void io_wqe_cancel_pending_work(struct io_wqe *wqe,
+ struct io_cb_cancel_data *match)
+{
+ int i;
+retry:
+ for (i = 0; i < IO_WQ_ACCT_NR; i++) {
+ struct io_wqe_acct *acct = io_get_acct(wqe, i == 0);
+
+ if (io_acct_cancel_pending_work(wqe, acct, match)) {
+ if (match->cancel_all)
+ goto retry;
+ break;
+ }
+ }
+}
+
+static void io_wqe_cancel_running_work(struct io_wqe *wqe,
+ struct io_cb_cancel_data *match)
+{
+ rcu_read_lock();
+ io_wq_for_each_worker(wqe, io_wq_worker_cancel, match);
+ rcu_read_unlock();
+}
+
+enum io_wq_cancel io_wq_cancel_cb(struct io_wq *wq, work_cancel_fn *cancel,
+ void *data, bool cancel_all)
+{
+ struct io_cb_cancel_data match = {
+ .fn = cancel,
+ .data = data,
+ .cancel_all = cancel_all,
+ };
+ int node;
+
+ /*
+ * First check pending list, if we're lucky we can just remove it
+ * from there. CANCEL_OK means that the work is returned as-new,
+ * no completion will be posted for it.
+ *
+ * Then check if a free (going busy) or busy worker has the work
+ * currently running. If we find it there, we'll return CANCEL_RUNNING
+ * as an indication that we attempt to signal cancellation. The
+ * completion will run normally in this case.
+ *
+ * Do both of these while holding the wqe->lock, to ensure that
+ * we'll find a work item regardless of state.
+ */
+ for_each_node(node) {
+ struct io_wqe *wqe = wq->wqes[node];
+
+ io_wqe_cancel_pending_work(wqe, &match);
+ if (match.nr_pending && !match.cancel_all)
+ return IO_WQ_CANCEL_OK;
+
+ raw_spin_lock(&wqe->lock);
+ io_wqe_cancel_running_work(wqe, &match);
+ raw_spin_unlock(&wqe->lock);
+ if (match.nr_running && !match.cancel_all)
+ return IO_WQ_CANCEL_RUNNING;
+ }
+
+ if (match.nr_running)
+ return IO_WQ_CANCEL_RUNNING;
+ if (match.nr_pending)
+ return IO_WQ_CANCEL_OK;
+ return IO_WQ_CANCEL_NOTFOUND;
+}
+
+static int io_wqe_hash_wake(struct wait_queue_entry *wait, unsigned mode,
+ int sync, void *key)
+{
+ struct io_wqe *wqe = container_of(wait, struct io_wqe, wait);
+ int i;
+
+ list_del_init(&wait->entry);
+
+ rcu_read_lock();
+ for (i = 0; i < IO_WQ_ACCT_NR; i++) {
+ struct io_wqe_acct *acct = &wqe->acct[i];
+
+ if (test_and_clear_bit(IO_ACCT_STALLED_BIT, &acct->flags))
+ io_wqe_activate_free_worker(wqe, acct);
+ }
+ rcu_read_unlock();
+ return 1;
+}
+
+struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
+{
+ int ret, node, i;
+ struct io_wq *wq;
+
+ if (WARN_ON_ONCE(!data->free_work || !data->do_work))
+ return ERR_PTR(-EINVAL);
+ if (WARN_ON_ONCE(!bounded))
+ return ERR_PTR(-EINVAL);
+
+ wq = kzalloc(struct_size(wq, wqes, nr_node_ids), GFP_KERNEL);
+ if (!wq)
+ return ERR_PTR(-ENOMEM);
+ ret = cpuhp_state_add_instance_nocalls(io_wq_online, &wq->cpuhp_node);
+ if (ret)
+ goto err_wq;
+
+ refcount_inc(&data->hash->refs);
+ wq->hash = data->hash;
+ wq->free_work = data->free_work;
+ wq->do_work = data->do_work;
+
+ ret = -ENOMEM;
+ for_each_node(node) {
+ struct io_wqe *wqe;
+ int alloc_node = node;
+
+ if (!node_online(alloc_node))
+ alloc_node = NUMA_NO_NODE;
+ wqe = kzalloc_node(sizeof(struct io_wqe), GFP_KERNEL, alloc_node);
+ if (!wqe)
+ goto err;
+ if (!alloc_cpumask_var(&wqe->cpu_mask, GFP_KERNEL))
+ goto err;
+ cpumask_copy(wqe->cpu_mask, cpumask_of_node(node));
+ wq->wqes[node] = wqe;
+ wqe->node = alloc_node;
+ wqe->acct[IO_WQ_ACCT_BOUND].max_workers = bounded;
+ wqe->acct[IO_WQ_ACCT_UNBOUND].max_workers =
+ task_rlimit(current, RLIMIT_NPROC);
+ INIT_LIST_HEAD(&wqe->wait.entry);
+ wqe->wait.func = io_wqe_hash_wake;
+ for (i = 0; i < IO_WQ_ACCT_NR; i++) {
+ struct io_wqe_acct *acct = &wqe->acct[i];
+
+ acct->index = i;
+ atomic_set(&acct->nr_running, 0);
+ INIT_WQ_LIST(&acct->work_list);
+ raw_spin_lock_init(&acct->lock);
+ }
+ wqe->wq = wq;
+ raw_spin_lock_init(&wqe->lock);
+ INIT_HLIST_NULLS_HEAD(&wqe->free_list, 0);
+ INIT_LIST_HEAD(&wqe->all_list);
+ }
+
+ wq->task = get_task_struct(data->task);
+ atomic_set(&wq->worker_refs, 1);
+ init_completion(&wq->worker_done);
+ return wq;
+err:
+ io_wq_put_hash(data->hash);
+ cpuhp_state_remove_instance_nocalls(io_wq_online, &wq->cpuhp_node);
+ for_each_node(node) {
+ if (!wq->wqes[node])
+ continue;
+ free_cpumask_var(wq->wqes[node]->cpu_mask);
+ kfree(wq->wqes[node]);
+ }
+err_wq:
+ kfree(wq);
+ return ERR_PTR(ret);
+}
+
+static bool io_task_work_match(struct callback_head *cb, void *data)
+{
+ struct io_worker *worker;
+
+ if (cb->func != create_worker_cb && cb->func != create_worker_cont)
+ return false;
+ worker = container_of(cb, struct io_worker, create_work);
+ return worker->wqe->wq == data;
+}
+
+void io_wq_exit_start(struct io_wq *wq)
+{
+ set_bit(IO_WQ_BIT_EXIT, &wq->state);
+}
+
+static void io_wq_cancel_tw_create(struct io_wq *wq)
+{
+ struct callback_head *cb;
+
+ while ((cb = task_work_cancel_match(wq->task, io_task_work_match, wq)) != NULL) {
+ struct io_worker *worker;
+
+ worker = container_of(cb, struct io_worker, create_work);
+ io_worker_cancel_cb(worker);
+ }
+}
+
+static void io_wq_exit_workers(struct io_wq *wq)
+{
+ int node;
+
+ if (!wq->task)
+ return;
+
+ io_wq_cancel_tw_create(wq);
+
+ rcu_read_lock();
+ for_each_node(node) {
+ struct io_wqe *wqe = wq->wqes[node];
+
+ io_wq_for_each_worker(wqe, io_wq_worker_wake, NULL);
+ }
+ rcu_read_unlock();
+ io_worker_ref_put(wq);
+ wait_for_completion(&wq->worker_done);
+
+ for_each_node(node) {
+ spin_lock_irq(&wq->hash->wait.lock);
+ list_del_init(&wq->wqes[node]->wait.entry);
+ spin_unlock_irq(&wq->hash->wait.lock);
+ }
+ put_task_struct(wq->task);
+ wq->task = NULL;
+}
+
+static void io_wq_destroy(struct io_wq *wq)
+{
+ int node;
+
+ cpuhp_state_remove_instance_nocalls(io_wq_online, &wq->cpuhp_node);
+
+ for_each_node(node) {
+ struct io_wqe *wqe = wq->wqes[node];
+ struct io_cb_cancel_data match = {
+ .fn = io_wq_work_match_all,
+ .cancel_all = true,
+ };
+ io_wqe_cancel_pending_work(wqe, &match);
+ free_cpumask_var(wqe->cpu_mask);
+ kfree(wqe);
+ }
+ io_wq_put_hash(wq->hash);
+ kfree(wq);
+}
+
+void io_wq_put_and_exit(struct io_wq *wq)
+{
+ WARN_ON_ONCE(!test_bit(IO_WQ_BIT_EXIT, &wq->state));
+
+ io_wq_exit_workers(wq);
+ io_wq_destroy(wq);
+}
+
+struct online_data {
+ unsigned int cpu;
+ bool online;
+};
+
+static bool io_wq_worker_affinity(struct io_worker *worker, void *data)
+{
+ struct online_data *od = data;
+
+ if (od->online)
+ cpumask_set_cpu(od->cpu, worker->wqe->cpu_mask);
+ else
+ cpumask_clear_cpu(od->cpu, worker->wqe->cpu_mask);
+ return false;
+}
+
+static int __io_wq_cpu_online(struct io_wq *wq, unsigned int cpu, bool online)
+{
+ struct online_data od = {
+ .cpu = cpu,
+ .online = online
+ };
+ int i;
+
+ rcu_read_lock();
+ for_each_node(i)
+ io_wq_for_each_worker(wq->wqes[i], io_wq_worker_affinity, &od);
+ rcu_read_unlock();
+ return 0;
+}
+
+static int io_wq_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+ struct io_wq *wq = hlist_entry_safe(node, struct io_wq, cpuhp_node);
+
+ return __io_wq_cpu_online(wq, cpu, true);
+}
+
+static int io_wq_cpu_offline(unsigned int cpu, struct hlist_node *node)
+{
+ struct io_wq *wq = hlist_entry_safe(node, struct io_wq, cpuhp_node);
+
+ return __io_wq_cpu_online(wq, cpu, false);
+}
+
+int io_wq_cpu_affinity(struct io_wq *wq, cpumask_var_t mask)
+{
+ int i;
+
+ rcu_read_lock();
+ for_each_node(i) {
+ struct io_wqe *wqe = wq->wqes[i];
+
+ if (mask)
+ cpumask_copy(wqe->cpu_mask, mask);
+ else
+ cpumask_copy(wqe->cpu_mask, cpumask_of_node(i));
+ }
+ rcu_read_unlock();
+ return 0;
+}
+
+/*
+ * Set max number of unbounded workers, returns old value. If new_count is 0,
+ * then just return the old value.
+ */
+int io_wq_max_workers(struct io_wq *wq, int *new_count)
+{
+ int prev[IO_WQ_ACCT_NR];
+ bool first_node = true;
+ int i, node;
+
+ BUILD_BUG_ON((int) IO_WQ_ACCT_BOUND != (int) IO_WQ_BOUND);
+ BUILD_BUG_ON((int) IO_WQ_ACCT_UNBOUND != (int) IO_WQ_UNBOUND);
+ BUILD_BUG_ON((int) IO_WQ_ACCT_NR != 2);
+
+ for (i = 0; i < IO_WQ_ACCT_NR; i++) {
+ if (new_count[i] > task_rlimit(current, RLIMIT_NPROC))
+ new_count[i] = task_rlimit(current, RLIMIT_NPROC);
+ }
+
+ for (i = 0; i < IO_WQ_ACCT_NR; i++)
+ prev[i] = 0;
+
+ rcu_read_lock();
+ for_each_node(node) {
+ struct io_wqe *wqe = wq->wqes[node];
+ struct io_wqe_acct *acct;
+
+ raw_spin_lock(&wqe->lock);
+ for (i = 0; i < IO_WQ_ACCT_NR; i++) {
+ acct = &wqe->acct[i];
+ if (first_node)
+ prev[i] = max_t(int, acct->max_workers, prev[i]);
+ if (new_count[i])
+ acct->max_workers = new_count[i];
+ }
+ raw_spin_unlock(&wqe->lock);
+ first_node = false;
+ }
+ rcu_read_unlock();
+
+ for (i = 0; i < IO_WQ_ACCT_NR; i++)
+ new_count[i] = prev[i];
+
+ return 0;
+}
+
+static __init int io_wq_init(void)
+{
+ int ret;
+
+ ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "io-wq/online",
+ io_wq_cpu_online, io_wq_cpu_offline);
+ if (ret < 0)
+ return ret;
+ io_wq_online = ret;
+ return 0;
+}
+subsys_initcall(io_wq_init);
diff --git a/io_uring/io-wq.h b/io_uring/io-wq.h
new file mode 100644
index 000000000000..dbecd27656c7
--- /dev/null
+++ b/io_uring/io-wq.h
@@ -0,0 +1,227 @@
+#ifndef INTERNAL_IO_WQ_H
+#define INTERNAL_IO_WQ_H
+
+#include <linux/refcount.h>
+
+struct io_wq;
+
+enum {
+ IO_WQ_WORK_CANCEL = 1,
+ IO_WQ_WORK_HASHED = 2,
+ IO_WQ_WORK_UNBOUND = 4,
+ IO_WQ_WORK_CONCURRENT = 16,
+
+ IO_WQ_HASH_SHIFT = 24, /* upper 8 bits are used for hash key */
+};
+
+enum io_wq_cancel {
+ IO_WQ_CANCEL_OK, /* cancelled before started */
+ IO_WQ_CANCEL_RUNNING, /* found, running, and attempted cancelled */
+ IO_WQ_CANCEL_NOTFOUND, /* work not found */
+};
+
+struct io_wq_work_node {
+ struct io_wq_work_node *next;
+};
+
+struct io_wq_work_list {
+ struct io_wq_work_node *first;
+ struct io_wq_work_node *last;
+};
+
+#define wq_list_for_each(pos, prv, head) \
+ for (pos = (head)->first, prv = NULL; pos; prv = pos, pos = (pos)->next)
+
+#define wq_list_for_each_resume(pos, prv) \
+ for (; pos; prv = pos, pos = (pos)->next)
+
+#define wq_list_empty(list) (READ_ONCE((list)->first) == NULL)
+#define INIT_WQ_LIST(list) do { \
+ (list)->first = NULL; \
+} while (0)
+
+static inline void wq_list_add_after(struct io_wq_work_node *node,
+ struct io_wq_work_node *pos,
+ struct io_wq_work_list *list)
+{
+ struct io_wq_work_node *next = pos->next;
+
+ pos->next = node;
+ node->next = next;
+ if (!next)
+ list->last = node;
+}
+
+/**
+ * wq_list_merge - merge the second list to the first one.
+ * @list0: the first list
+ * @list1: the second list
+ * Return the first node after mergence.
+ */
+static inline struct io_wq_work_node *wq_list_merge(struct io_wq_work_list *list0,
+ struct io_wq_work_list *list1)
+{
+ struct io_wq_work_node *ret;
+
+ if (!list0->first) {
+ ret = list1->first;
+ } else {
+ ret = list0->first;
+ list0->last->next = list1->first;
+ }
+ INIT_WQ_LIST(list0);
+ INIT_WQ_LIST(list1);
+ return ret;
+}
+
+static inline void wq_list_add_tail(struct io_wq_work_node *node,
+ struct io_wq_work_list *list)
+{
+ node->next = NULL;
+ if (!list->first) {
+ list->last = node;
+ WRITE_ONCE(list->first, node);
+ } else {
+ list->last->next = node;
+ list->last = node;
+ }
+}
+
+static inline void wq_list_add_head(struct io_wq_work_node *node,
+ struct io_wq_work_list *list)
+{
+ node->next = list->first;
+ if (!node->next)
+ list->last = node;
+ WRITE_ONCE(list->first, node);
+}
+
+static inline void wq_list_cut(struct io_wq_work_list *list,
+ struct io_wq_work_node *last,
+ struct io_wq_work_node *prev)
+{
+ /* first in the list, if prev==NULL */
+ if (!prev)
+ WRITE_ONCE(list->first, last->next);
+ else
+ prev->next = last->next;
+
+ if (last == list->last)
+ list->last = prev;
+ last->next = NULL;
+}
+
+static inline void __wq_list_splice(struct io_wq_work_list *list,
+ struct io_wq_work_node *to)
+{
+ list->last->next = to->next;
+ to->next = list->first;
+ INIT_WQ_LIST(list);
+}
+
+static inline bool wq_list_splice(struct io_wq_work_list *list,
+ struct io_wq_work_node *to)
+{
+ if (!wq_list_empty(list)) {
+ __wq_list_splice(list, to);
+ return true;
+ }
+ return false;
+}
+
+static inline void wq_stack_add_head(struct io_wq_work_node *node,
+ struct io_wq_work_node *stack)
+{
+ node->next = stack->next;
+ stack->next = node;
+}
+
+static inline void wq_list_del(struct io_wq_work_list *list,
+ struct io_wq_work_node *node,
+ struct io_wq_work_node *prev)
+{
+ wq_list_cut(list, node, prev);
+}
+
+static inline
+struct io_wq_work_node *wq_stack_extract(struct io_wq_work_node *stack)
+{
+ struct io_wq_work_node *node = stack->next;
+
+ stack->next = node->next;
+ return node;
+}
+
+struct io_wq_work {
+ struct io_wq_work_node list;
+ unsigned flags;
+};
+
+static inline struct io_wq_work *wq_next_work(struct io_wq_work *work)
+{
+ if (!work->list.next)
+ return NULL;
+
+ return container_of(work->list.next, struct io_wq_work, list);
+}
+
+typedef struct io_wq_work *(free_work_fn)(struct io_wq_work *);
+typedef void (io_wq_work_fn)(struct io_wq_work *);
+
+struct io_wq_hash {
+ refcount_t refs;
+ unsigned long map;
+ struct wait_queue_head wait;
+};
+
+static inline void io_wq_put_hash(struct io_wq_hash *hash)
+{
+ if (refcount_dec_and_test(&hash->refs))
+ kfree(hash);
+}
+
+struct io_wq_data {
+ struct io_wq_hash *hash;
+ struct task_struct *task;
+ io_wq_work_fn *do_work;
+ free_work_fn *free_work;
+};
+
+struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data);
+void io_wq_exit_start(struct io_wq *wq);
+void io_wq_put_and_exit(struct io_wq *wq);
+
+void io_wq_enqueue(struct io_wq *wq, struct io_wq_work *work);
+void io_wq_hash_work(struct io_wq_work *work, void *val);
+
+int io_wq_cpu_affinity(struct io_wq *wq, cpumask_var_t mask);
+int io_wq_max_workers(struct io_wq *wq, int *new_count);
+
+static inline bool io_wq_is_hashed(struct io_wq_work *work)
+{
+ return work->flags & IO_WQ_WORK_HASHED;
+}
+
+typedef bool (work_cancel_fn)(struct io_wq_work *, void *);
+
+enum io_wq_cancel io_wq_cancel_cb(struct io_wq *wq, work_cancel_fn *cancel,
+ void *data, bool cancel_all);
+
+#if defined(CONFIG_IO_WQ)
+extern void io_wq_worker_sleeping(struct task_struct *);
+extern void io_wq_worker_running(struct task_struct *);
+#else
+static inline void io_wq_worker_sleeping(struct task_struct *tsk)
+{
+}
+static inline void io_wq_worker_running(struct task_struct *tsk)
+{
+}
+#endif
+
+static inline bool io_wq_current_is_worker(void)
+{
+ return in_task() && (current->flags & PF_IO_WORKER) &&
+ current->worker_private;
+}
+#endif
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
new file mode 100644
index 000000000000..ec4e31c67e07
--- /dev/null
+++ b/io_uring/io_uring.c
@@ -0,0 +1,11915 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Shared application/kernel submission and completion ring pairs, for
+ * supporting fast/efficient IO.
+ *
+ * A note on the read/write ordering memory barriers that are matched between
+ * the application and kernel side.
+ *
+ * After the application reads the CQ ring tail, it must use an
+ * appropriate smp_rmb() to pair with the smp_wmb() the kernel uses
+ * before writing the tail (using smp_load_acquire to read the tail will
+ * do). It also needs a smp_mb() before updating CQ head (ordering the
+ * entry load(s) with the head store), pairing with an implicit barrier
+ * through a control-dependency in io_get_cqe (smp_store_release to
+ * store head will do). Failure to do so could lead to reading invalid
+ * CQ entries.
+ *
+ * Likewise, the application must use an appropriate smp_wmb() before
+ * writing the SQ tail (ordering SQ entry stores with the tail store),
+ * which pairs with smp_load_acquire in io_get_sqring (smp_store_release
+ * to store the tail will do). And it needs a barrier ordering the SQ
+ * head load before writing new SQ entries (smp_load_acquire to read
+ * head will do).
+ *
+ * When using the SQ poll thread (IORING_SETUP_SQPOLL), the application
+ * needs to check the SQ flags for IORING_SQ_NEED_WAKEUP *after*
+ * updating the SQ tail; a full memory barrier smp_mb() is needed
+ * between.
+ *
+ * Also see the examples in the liburing library:
+ *
+ * git://git.kernel.dk/liburing
+ *
+ * io_uring also uses READ/WRITE_ONCE() for _any_ store or load that happens
+ * from data shared between the kernel and application. This is done both
+ * for ordering purposes, but also to ensure that once a value is loaded from
+ * data that the application could potentially modify, it remains stable.
+ *
+ * Copyright (C) 2018-2019 Jens Axboe
+ * Copyright (c) 2018-2019 Christoph Hellwig
+ */
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/syscalls.h>
+#include <linux/compat.h>
+#include <net/compat.h>
+#include <linux/refcount.h>
+#include <linux/uio.h>
+#include <linux/bits.h>
+
+#include <linux/sched/signal.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/fdtable.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/percpu.h>
+#include <linux/slab.h>
+#include <linux/blk-mq.h>
+#include <linux/bvec.h>
+#include <linux/net.h>
+#include <net/sock.h>
+#include <net/af_unix.h>
+#include <net/scm.h>
+#include <linux/anon_inodes.h>
+#include <linux/sched/mm.h>
+#include <linux/uaccess.h>
+#include <linux/nospec.h>
+#include <linux/sizes.h>
+#include <linux/hugetlb.h>
+#include <linux/highmem.h>
+#include <linux/namei.h>
+#include <linux/fsnotify.h>
+#include <linux/fadvise.h>
+#include <linux/eventpoll.h>
+#include <linux/splice.h>
+#include <linux/task_work.h>
+#include <linux/pagemap.h>
+#include <linux/io_uring.h>
+#include <linux/audit.h>
+#include <linux/security.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/io_uring.h>
+
+#include <uapi/linux/io_uring.h>
+
+#include "../fs/internal.h"
+#include "io-wq.h"
+
+#define IORING_MAX_ENTRIES 32768
+#define IORING_MAX_CQ_ENTRIES (2 * IORING_MAX_ENTRIES)
+#define IORING_SQPOLL_CAP_ENTRIES_VALUE 8
+
+/* only define max */
+#define IORING_MAX_FIXED_FILES (1U << 15)
+#define IORING_MAX_RESTRICTIONS (IORING_RESTRICTION_LAST + \
+ IORING_REGISTER_LAST + IORING_OP_LAST)
+
+#define IO_RSRC_TAG_TABLE_SHIFT (PAGE_SHIFT - 3)
+#define IO_RSRC_TAG_TABLE_MAX (1U << IO_RSRC_TAG_TABLE_SHIFT)
+#define IO_RSRC_TAG_TABLE_MASK (IO_RSRC_TAG_TABLE_MAX - 1)
+
+#define IORING_MAX_REG_BUFFERS (1U << 14)
+
+#define SQE_COMMON_FLAGS (IOSQE_FIXED_FILE | IOSQE_IO_LINK | \
+ IOSQE_IO_HARDLINK | IOSQE_ASYNC)
+
+#define SQE_VALID_FLAGS (SQE_COMMON_FLAGS | IOSQE_BUFFER_SELECT | \
+ IOSQE_IO_DRAIN | IOSQE_CQE_SKIP_SUCCESS)
+
+#define IO_REQ_CLEAN_FLAGS (REQ_F_BUFFER_SELECTED | REQ_F_NEED_CLEANUP | \
+ REQ_F_POLLED | REQ_F_INFLIGHT | REQ_F_CREDS | \
+ REQ_F_ASYNC_DATA)
+
+#define IO_TCTX_REFS_CACHE_NR (1U << 10)
+
+struct io_uring {
+ u32 head ____cacheline_aligned_in_smp;
+ u32 tail ____cacheline_aligned_in_smp;
+};
+
+/*
+ * This data is shared with the application through the mmap at offsets
+ * IORING_OFF_SQ_RING and IORING_OFF_CQ_RING.
+ *
+ * The offsets to the member fields are published through struct
+ * io_sqring_offsets when calling io_uring_setup.
+ */
+struct io_rings {
+ /*
+ * Head and tail offsets into the ring; the offsets need to be
+ * masked to get valid indices.
+ *
+ * The kernel controls head of the sq ring and the tail of the cq ring,
+ * and the application controls tail of the sq ring and the head of the
+ * cq ring.
+ */
+ struct io_uring sq, cq;
+ /*
+ * Bitmasks to apply to head and tail offsets (constant, equals
+ * ring_entries - 1)
+ */
+ u32 sq_ring_mask, cq_ring_mask;
+ /* Ring sizes (constant, power of 2) */
+ u32 sq_ring_entries, cq_ring_entries;
+ /*
+ * Number of invalid entries dropped by the kernel due to
+ * invalid index stored in array
+ *
+ * Written by the kernel, shouldn't be modified by the
+ * application (i.e. get number of "new events" by comparing to
+ * cached value).
+ *
+ * After a new SQ head value was read by the application this
+ * counter includes all submissions that were dropped reaching
+ * the new SQ head (and possibly more).
+ */
+ u32 sq_dropped;
+ /*
+ * Runtime SQ flags
+ *
+ * Written by the kernel, shouldn't be modified by the
+ * application.
+ *
+ * The application needs a full memory barrier before checking
+ * for IORING_SQ_NEED_WAKEUP after updating the sq tail.
+ */
+ u32 sq_flags;
+ /*
+ * Runtime CQ flags
+ *
+ * Written by the application, shouldn't be modified by the
+ * kernel.
+ */
+ u32 cq_flags;
+ /*
+ * Number of completion events lost because the queue was full;
+ * this should be avoided by the application by making sure
+ * there are not more requests pending than there is space in
+ * the completion queue.
+ *
+ * Written by the kernel, shouldn't be modified by the
+ * application (i.e. get number of "new events" by comparing to
+ * cached value).
+ *
+ * As completion events come in out of order this counter is not
+ * ordered with any other data.
+ */
+ u32 cq_overflow;
+ /*
+ * Ring buffer of completion events.
+ *
+ * The kernel writes completion events fresh every time they are
+ * produced, so the application is allowed to modify pending
+ * entries.
+ */
+ struct io_uring_cqe cqes[] ____cacheline_aligned_in_smp;
+};
+
+enum io_uring_cmd_flags {
+ IO_URING_F_COMPLETE_DEFER = 1,
+ IO_URING_F_UNLOCKED = 2,
+ /* int's last bit, sign checks are usually faster than a bit test */
+ IO_URING_F_NONBLOCK = INT_MIN,
+};
+
+struct io_mapped_ubuf {
+ u64 ubuf;
+ u64 ubuf_end;
+ unsigned int nr_bvecs;
+ unsigned long acct_pages;
+ struct bio_vec bvec[];
+};
+
+struct io_ring_ctx;
+
+struct io_overflow_cqe {
+ struct io_uring_cqe cqe;
+ struct list_head list;
+};
+
+struct io_fixed_file {
+ /* file * with additional FFS_* flags */
+ unsigned long file_ptr;
+};
+
+struct io_rsrc_put {
+ struct list_head list;
+ u64 tag;
+ union {
+ void *rsrc;
+ struct file *file;
+ struct io_mapped_ubuf *buf;
+ };
+};
+
+struct io_file_table {
+ struct io_fixed_file *files;
+};
+
+struct io_rsrc_node {
+ struct percpu_ref refs;
+ struct list_head node;
+ struct list_head rsrc_list;
+ struct io_rsrc_data *rsrc_data;
+ struct llist_node llist;
+ bool done;
+};
+
+typedef void (rsrc_put_fn)(struct io_ring_ctx *ctx, struct io_rsrc_put *prsrc);
+
+struct io_rsrc_data {
+ struct io_ring_ctx *ctx;
+
+ u64 **tags;
+ unsigned int nr;
+ rsrc_put_fn *do_put;
+ atomic_t refs;
+ struct completion done;
+ bool quiesce;
+};
+
+struct io_buffer_list {
+ struct list_head list;
+ struct list_head buf_list;
+ __u16 bgid;
+};
+
+struct io_buffer {
+ struct list_head list;
+ __u64 addr;
+ __u32 len;
+ __u16 bid;
+ __u16 bgid;
+};
+
+struct io_restriction {
+ DECLARE_BITMAP(register_op, IORING_REGISTER_LAST);
+ DECLARE_BITMAP(sqe_op, IORING_OP_LAST);
+ u8 sqe_flags_allowed;
+ u8 sqe_flags_required;
+ bool registered;
+};
+
+enum {
+ IO_SQ_THREAD_SHOULD_STOP = 0,
+ IO_SQ_THREAD_SHOULD_PARK,
+};
+
+struct io_sq_data {
+ refcount_t refs;
+ atomic_t park_pending;
+ struct mutex lock;
+
+ /* ctx's that are using this sqd */
+ struct list_head ctx_list;
+
+ struct task_struct *thread;
+ struct wait_queue_head wait;
+
+ unsigned sq_thread_idle;
+ int sq_cpu;
+ pid_t task_pid;
+ pid_t task_tgid;
+
+ unsigned long state;
+ struct completion exited;
+};
+
+#define IO_COMPL_BATCH 32
+#define IO_REQ_CACHE_SIZE 32
+#define IO_REQ_ALLOC_BATCH 8
+
+struct io_submit_link {
+ struct io_kiocb *head;
+ struct io_kiocb *last;
+};
+
+struct io_submit_state {
+ /* inline/task_work completion list, under ->uring_lock */
+ struct io_wq_work_node free_list;
+ /* batch completion logic */
+ struct io_wq_work_list compl_reqs;
+ struct io_submit_link link;
+
+ bool plug_started;
+ bool need_plug;
+ bool flush_cqes;
+ unsigned short submit_nr;
+ struct blk_plug plug;
+};
+
+struct io_ev_fd {
+ struct eventfd_ctx *cq_ev_fd;
+ unsigned int eventfd_async: 1;
+ struct rcu_head rcu;
+};
+
+#define IO_BUFFERS_HASH_BITS 5
+
+struct io_ring_ctx {
+ /* const or read-mostly hot data */
+ struct {
+ struct percpu_ref refs;
+
+ struct io_rings *rings;
+ unsigned int flags;
+ unsigned int compat: 1;
+ unsigned int drain_next: 1;
+ unsigned int restricted: 1;
+ unsigned int off_timeout_used: 1;
+ unsigned int drain_active: 1;
+ unsigned int drain_disabled: 1;
+ unsigned int has_evfd: 1;
+ } ____cacheline_aligned_in_smp;
+
+ /* submission data */
+ struct {
+ struct mutex uring_lock;
+
+ /*
+ * Ring buffer of indices into array of io_uring_sqe, which is
+ * mmapped by the application using the IORING_OFF_SQES offset.
+ *
+ * This indirection could e.g. be used to assign fixed
+ * io_uring_sqe entries to operations and only submit them to
+ * the queue when needed.
+ *
+ * The kernel modifies neither the indices array nor the entries
+ * array.
+ */
+ u32 *sq_array;
+ struct io_uring_sqe *sq_sqes;
+ unsigned cached_sq_head;
+ unsigned sq_entries;
+ struct list_head defer_list;
+
+ /*
+ * Fixed resources fast path, should be accessed only under
+ * uring_lock, and updated through io_uring_register(2)
+ */
+ struct io_rsrc_node *rsrc_node;
+ int rsrc_cached_refs;
+ struct io_file_table file_table;
+ unsigned nr_user_files;
+ unsigned nr_user_bufs;
+ struct io_mapped_ubuf **user_bufs;
+
+ struct io_submit_state submit_state;
+ struct list_head timeout_list;
+ struct list_head ltimeout_list;
+ struct list_head cq_overflow_list;
+ struct list_head *io_buffers;
+ struct list_head io_buffers_cache;
+ struct list_head apoll_cache;
+ struct xarray personalities;
+ u32 pers_next;
+ unsigned sq_thread_idle;
+ } ____cacheline_aligned_in_smp;
+
+ /* IRQ completion list, under ->completion_lock */
+ struct io_wq_work_list locked_free_list;
+ unsigned int locked_free_nr;
+
+ const struct cred *sq_creds; /* cred used for __io_sq_thread() */
+ struct io_sq_data *sq_data; /* if using sq thread polling */
+
+ struct wait_queue_head sqo_sq_wait;
+ struct list_head sqd_list;
+
+ unsigned long check_cq_overflow;
+
+ struct {
+ unsigned cached_cq_tail;
+ unsigned cq_entries;
+ struct io_ev_fd __rcu *io_ev_fd;
+ struct wait_queue_head cq_wait;
+ unsigned cq_extra;
+ atomic_t cq_timeouts;
+ unsigned cq_last_tm_flush;
+ } ____cacheline_aligned_in_smp;
+
+ struct {
+ spinlock_t completion_lock;
+
+ spinlock_t timeout_lock;
+
+ /*
+ * ->iopoll_list is protected by the ctx->uring_lock for
+ * io_uring instances that don't use IORING_SETUP_SQPOLL.
+ * For SQPOLL, only the single threaded io_sq_thread() will
+ * manipulate the list, hence no extra locking is needed there.
+ */
+ struct io_wq_work_list iopoll_list;
+ struct hlist_head *cancel_hash;
+ unsigned cancel_hash_bits;
+ bool poll_multi_queue;
+
+ struct list_head io_buffers_comp;
+ } ____cacheline_aligned_in_smp;
+
+ struct io_restriction restrictions;
+
+ /* slow path rsrc auxilary data, used by update/register */
+ struct {
+ struct io_rsrc_node *rsrc_backup_node;
+ struct io_mapped_ubuf *dummy_ubuf;
+ struct io_rsrc_data *file_data;
+ struct io_rsrc_data *buf_data;
+
+ struct delayed_work rsrc_put_work;
+ struct llist_head rsrc_put_llist;
+ struct list_head rsrc_ref_list;
+ spinlock_t rsrc_ref_lock;
+
+ struct list_head io_buffers_pages;
+ };
+
+ /* Keep this last, we don't need it for the fast path */
+ struct {
+ #if defined(CONFIG_UNIX)
+ struct socket *ring_sock;
+ #endif
+ /* hashed buffered write serialization */
+ struct io_wq_hash *hash_map;
+
+ /* Only used for accounting purposes */
+ struct user_struct *user;
+ struct mm_struct *mm_account;
+
+ /* ctx exit and cancelation */
+ struct llist_head fallback_llist;
+ struct delayed_work fallback_work;
+ struct work_struct exit_work;
+ struct list_head tctx_list;
+ struct completion ref_comp;
+ u32 iowq_limits[2];
+ bool iowq_limits_set;
+ };
+};
+
+/*
+ * Arbitrary limit, can be raised if need be
+ */
+#define IO_RINGFD_REG_MAX 16
+
+struct io_uring_task {
+ /* submission side */
+ int cached_refs;
+ struct xarray xa;
+ struct wait_queue_head wait;
+ const struct io_ring_ctx *last;
+ struct io_wq *io_wq;
+ struct percpu_counter inflight;
+ atomic_t inflight_tracked;
+ atomic_t in_idle;
+
+ spinlock_t task_lock;
+ struct io_wq_work_list task_list;
+ struct io_wq_work_list prior_task_list;
+ struct callback_head task_work;
+ struct file **registered_rings;
+ bool task_running;
+};
+
+/*
+ * First field must be the file pointer in all the
+ * iocb unions! See also 'struct kiocb' in <linux/fs.h>
+ */
+struct io_poll_iocb {
+ struct file *file;
+ struct wait_queue_head *head;
+ __poll_t events;
+ struct wait_queue_entry wait;
+};
+
+struct io_poll_update {
+ struct file *file;
+ u64 old_user_data;
+ u64 new_user_data;
+ __poll_t events;
+ bool update_events;
+ bool update_user_data;
+};
+
+struct io_close {
+ struct file *file;
+ int fd;
+ u32 file_slot;
+};
+
+struct io_timeout_data {
+ struct io_kiocb *req;
+ struct hrtimer timer;
+ struct timespec64 ts;
+ enum hrtimer_mode mode;
+ u32 flags;
+};
+
+struct io_accept {
+ struct file *file;
+ struct sockaddr __user *addr;
+ int __user *addr_len;
+ int flags;
+ u32 file_slot;
+ unsigned long nofile;
+};
+
+struct io_sync {
+ struct file *file;
+ loff_t len;
+ loff_t off;
+ int flags;
+ int mode;
+};
+
+struct io_cancel {
+ struct file *file;
+ u64 addr;
+};
+
+struct io_timeout {
+ struct file *file;
+ u32 off;
+ u32 target_seq;
+ struct list_head list;
+ /* head of the link, used by linked timeouts only */
+ struct io_kiocb *head;
+ /* for linked completions */
+ struct io_kiocb *prev;
+};
+
+struct io_timeout_rem {
+ struct file *file;
+ u64 addr;
+
+ /* timeout update */
+ struct timespec64 ts;
+ u32 flags;
+ bool ltimeout;
+};
+
+struct io_rw {
+ /* NOTE: kiocb has the file as the first member, so don't do it here */
+ struct kiocb kiocb;
+ u64 addr;
+ u32 len;
+ u32 flags;
+};
+
+struct io_connect {
+ struct file *file;
+ struct sockaddr __user *addr;
+ int addr_len;
+};
+
+struct io_sr_msg {
+ struct file *file;
+ union {
+ struct compat_msghdr __user *umsg_compat;
+ struct user_msghdr __user *umsg;
+ void __user *buf;
+ };
+ int msg_flags;
+ int bgid;
+ size_t len;
+ size_t done_io;
+};
+
+struct io_open {
+ struct file *file;
+ int dfd;
+ u32 file_slot;
+ struct filename *filename;
+ struct open_how how;
+ unsigned long nofile;
+};
+
+struct io_rsrc_update {
+ struct file *file;
+ u64 arg;
+ u32 nr_args;
+ u32 offset;
+};
+
+struct io_fadvise {
+ struct file *file;
+ u64 offset;
+ u32 len;
+ u32 advice;
+};
+
+struct io_madvise {
+ struct file *file;
+ u64 addr;
+ u32 len;
+ u32 advice;
+};
+
+struct io_epoll {
+ struct file *file;
+ int epfd;
+ int op;
+ int fd;
+ struct epoll_event event;
+};
+
+struct io_splice {
+ struct file *file_out;
+ loff_t off_out;
+ loff_t off_in;
+ u64 len;
+ int splice_fd_in;
+ unsigned int flags;
+};
+
+struct io_provide_buf {
+ struct file *file;
+ __u64 addr;
+ __u32 len;
+ __u32 bgid;
+ __u16 nbufs;
+ __u16 bid;
+};
+
+struct io_statx {
+ struct file *file;
+ int dfd;
+ unsigned int mask;
+ unsigned int flags;
+ struct filename *filename;
+ struct statx __user *buffer;
+};
+
+struct io_shutdown {
+ struct file *file;
+ int how;
+};
+
+struct io_rename {
+ struct file *file;
+ int old_dfd;
+ int new_dfd;
+ struct filename *oldpath;
+ struct filename *newpath;
+ int flags;
+};
+
+struct io_unlink {
+ struct file *file;
+ int dfd;
+ int flags;
+ struct filename *filename;
+};
+
+struct io_mkdir {
+ struct file *file;
+ int dfd;
+ umode_t mode;
+ struct filename *filename;
+};
+
+struct io_symlink {
+ struct file *file;
+ int new_dfd;
+ struct filename *oldpath;
+ struct filename *newpath;
+};
+
+struct io_hardlink {
+ struct file *file;
+ int old_dfd;
+ int new_dfd;
+ struct filename *oldpath;
+ struct filename *newpath;
+ int flags;
+};
+
+struct io_msg {
+ struct file *file;
+ u64 user_data;
+ u32 len;
+};
+
+struct io_async_connect {
+ struct sockaddr_storage address;
+};
+
+struct io_async_msghdr {
+ struct iovec fast_iov[UIO_FASTIOV];
+ /* points to an allocated iov, if NULL we use fast_iov instead */
+ struct iovec *free_iov;
+ struct sockaddr __user *uaddr;
+ struct msghdr msg;
+ struct sockaddr_storage addr;
+};
+
+struct io_rw_state {
+ struct iov_iter iter;
+ struct iov_iter_state iter_state;
+ struct iovec fast_iov[UIO_FASTIOV];
+};
+
+struct io_async_rw {
+ struct io_rw_state s;
+ const struct iovec *free_iovec;
+ size_t bytes_done;
+ struct wait_page_queue wpq;
+};
+
+enum {
+ REQ_F_FIXED_FILE_BIT = IOSQE_FIXED_FILE_BIT,
+ REQ_F_IO_DRAIN_BIT = IOSQE_IO_DRAIN_BIT,
+ REQ_F_LINK_BIT = IOSQE_IO_LINK_BIT,
+ REQ_F_HARDLINK_BIT = IOSQE_IO_HARDLINK_BIT,
+ REQ_F_FORCE_ASYNC_BIT = IOSQE_ASYNC_BIT,
+ REQ_F_BUFFER_SELECT_BIT = IOSQE_BUFFER_SELECT_BIT,
+ REQ_F_CQE_SKIP_BIT = IOSQE_CQE_SKIP_SUCCESS_BIT,
+
+ /* first byte is taken by user flags, shift it to not overlap */
+ REQ_F_FAIL_BIT = 8,
+ REQ_F_INFLIGHT_BIT,
+ REQ_F_CUR_POS_BIT,
+ REQ_F_NOWAIT_BIT,
+ REQ_F_LINK_TIMEOUT_BIT,
+ REQ_F_NEED_CLEANUP_BIT,
+ REQ_F_POLLED_BIT,
+ REQ_F_BUFFER_SELECTED_BIT,
+ REQ_F_COMPLETE_INLINE_BIT,
+ REQ_F_REISSUE_BIT,
+ REQ_F_CREDS_BIT,
+ REQ_F_REFCOUNT_BIT,
+ REQ_F_ARM_LTIMEOUT_BIT,
+ REQ_F_ASYNC_DATA_BIT,
+ REQ_F_SKIP_LINK_CQES_BIT,
+ REQ_F_SINGLE_POLL_BIT,
+ REQ_F_DOUBLE_POLL_BIT,
+ REQ_F_PARTIAL_IO_BIT,
+ /* keep async read/write and isreg together and in order */
+ REQ_F_SUPPORT_NOWAIT_BIT,
+ REQ_F_ISREG_BIT,
+
+ /* not a real bit, just to check we're not overflowing the space */
+ __REQ_F_LAST_BIT,
+};
+
+enum {
+ /* ctx owns file */
+ REQ_F_FIXED_FILE = BIT(REQ_F_FIXED_FILE_BIT),
+ /* drain existing IO first */
+ REQ_F_IO_DRAIN = BIT(REQ_F_IO_DRAIN_BIT),
+ /* linked sqes */
+ REQ_F_LINK = BIT(REQ_F_LINK_BIT),
+ /* doesn't sever on completion < 0 */
+ REQ_F_HARDLINK = BIT(REQ_F_HARDLINK_BIT),
+ /* IOSQE_ASYNC */
+ REQ_F_FORCE_ASYNC = BIT(REQ_F_FORCE_ASYNC_BIT),
+ /* IOSQE_BUFFER_SELECT */
+ REQ_F_BUFFER_SELECT = BIT(REQ_F_BUFFER_SELECT_BIT),
+ /* IOSQE_CQE_SKIP_SUCCESS */
+ REQ_F_CQE_SKIP = BIT(REQ_F_CQE_SKIP_BIT),
+
+ /* fail rest of links */
+ REQ_F_FAIL = BIT(REQ_F_FAIL_BIT),
+ /* on inflight list, should be cancelled and waited on exit reliably */
+ REQ_F_INFLIGHT = BIT(REQ_F_INFLIGHT_BIT),
+ /* read/write uses file position */
+ REQ_F_CUR_POS = BIT(REQ_F_CUR_POS_BIT),
+ /* must not punt to workers */
+ REQ_F_NOWAIT = BIT(REQ_F_NOWAIT_BIT),
+ /* has or had linked timeout */
+ REQ_F_LINK_TIMEOUT = BIT(REQ_F_LINK_TIMEOUT_BIT),
+ /* needs cleanup */
+ REQ_F_NEED_CLEANUP = BIT(REQ_F_NEED_CLEANUP_BIT),
+ /* already went through poll handler */
+ REQ_F_POLLED = BIT(REQ_F_POLLED_BIT),
+ /* buffer already selected */
+ REQ_F_BUFFER_SELECTED = BIT(REQ_F_BUFFER_SELECTED_BIT),
+ /* completion is deferred through io_comp_state */
+ REQ_F_COMPLETE_INLINE = BIT(REQ_F_COMPLETE_INLINE_BIT),
+ /* caller should reissue async */
+ REQ_F_REISSUE = BIT(REQ_F_REISSUE_BIT),
+ /* supports async reads/writes */
+ REQ_F_SUPPORT_NOWAIT = BIT(REQ_F_SUPPORT_NOWAIT_BIT),
+ /* regular file */
+ REQ_F_ISREG = BIT(REQ_F_ISREG_BIT),
+ /* has creds assigned */
+ REQ_F_CREDS = BIT(REQ_F_CREDS_BIT),
+ /* skip refcounting if not set */
+ REQ_F_REFCOUNT = BIT(REQ_F_REFCOUNT_BIT),
+ /* there is a linked timeout that has to be armed */
+ REQ_F_ARM_LTIMEOUT = BIT(REQ_F_ARM_LTIMEOUT_BIT),
+ /* ->async_data allocated */
+ REQ_F_ASYNC_DATA = BIT(REQ_F_ASYNC_DATA_BIT),
+ /* don't post CQEs while failing linked requests */
+ REQ_F_SKIP_LINK_CQES = BIT(REQ_F_SKIP_LINK_CQES_BIT),
+ /* single poll may be active */
+ REQ_F_SINGLE_POLL = BIT(REQ_F_SINGLE_POLL_BIT),
+ /* double poll may active */
+ REQ_F_DOUBLE_POLL = BIT(REQ_F_DOUBLE_POLL_BIT),
+ /* request has already done partial IO */
+ REQ_F_PARTIAL_IO = BIT(REQ_F_PARTIAL_IO_BIT),
+};
+
+struct async_poll {
+ struct io_poll_iocb poll;
+ struct io_poll_iocb *double_poll;
+};
+
+typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked);
+
+struct io_task_work {
+ union {
+ struct io_wq_work_node node;
+ struct llist_node fallback_node;
+ };
+ io_req_tw_func_t func;
+};
+
+enum {
+ IORING_RSRC_FILE = 0,
+ IORING_RSRC_BUFFER = 1,
+};
+
+/*
+ * NOTE! Each of the iocb union members has the file pointer
+ * as the first entry in their struct definition. So you can
+ * access the file pointer through any of the sub-structs,
+ * or directly as just 'file' in this struct.
+ */
+struct io_kiocb {
+ union {
+ struct file *file;
+ struct io_rw rw;
+ struct io_poll_iocb poll;
+ struct io_poll_update poll_update;
+ struct io_accept accept;
+ struct io_sync sync;
+ struct io_cancel cancel;
+ struct io_timeout timeout;
+ struct io_timeout_rem timeout_rem;
+ struct io_connect connect;
+ struct io_sr_msg sr_msg;
+ struct io_open open;
+ struct io_close close;
+ struct io_rsrc_update rsrc_update;
+ struct io_fadvise fadvise;
+ struct io_madvise madvise;
+ struct io_epoll epoll;
+ struct io_splice splice;
+ struct io_provide_buf pbuf;
+ struct io_statx statx;
+ struct io_shutdown shutdown;
+ struct io_rename rename;
+ struct io_unlink unlink;
+ struct io_mkdir mkdir;
+ struct io_symlink symlink;
+ struct io_hardlink hardlink;
+ struct io_msg msg;
+ };
+
+ u8 opcode;
+ /* polled IO has completed */
+ u8 iopoll_completed;
+ u16 buf_index;
+ unsigned int flags;
+
+ u64 user_data;
+ u32 result;
+ /* fd initially, then cflags for completion */
+ union {
+ u32 cflags;
+ int fd;
+ };
+
+ struct io_ring_ctx *ctx;
+ struct task_struct *task;
+
+ struct percpu_ref *fixed_rsrc_refs;
+ /* store used ubuf, so we can prevent reloading */
+ struct io_mapped_ubuf *imu;
+
+ union {
+ /* used by request caches, completion batching and iopoll */
+ struct io_wq_work_node comp_list;
+ /* cache ->apoll->events */
+ __poll_t apoll_events;
+ };
+ atomic_t refs;
+ atomic_t poll_refs;
+ struct io_task_work io_task_work;
+ /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */
+ struct hlist_node hash_node;
+ /* internal polling, see IORING_FEAT_FAST_POLL */
+ struct async_poll *apoll;
+ /* opcode allocated if it needs to store data for async defer */
+ void *async_data;
+ /* stores selected buf, valid IFF REQ_F_BUFFER_SELECTED is set */
+ struct io_buffer *kbuf;
+ /* linked requests, IFF REQ_F_HARDLINK or REQ_F_LINK are set */
+ struct io_kiocb *link;
+ /* custom credentials, valid IFF REQ_F_CREDS is set */
+ const struct cred *creds;
+ struct io_wq_work work;
+};
+
+struct io_tctx_node {
+ struct list_head ctx_node;
+ struct task_struct *task;
+ struct io_ring_ctx *ctx;
+};
+
+struct io_defer_entry {
+ struct list_head list;
+ struct io_kiocb *req;
+ u32 seq;
+};
+
+struct io_op_def {
+ /* needs req->file assigned */
+ unsigned needs_file : 1;
+ /* should block plug */
+ unsigned plug : 1;
+ /* hash wq insertion if file is a regular file */
+ unsigned hash_reg_file : 1;
+ /* unbound wq insertion if file is a non-regular file */
+ unsigned unbound_nonreg_file : 1;
+ /* set if opcode supports polled "wait" */
+ unsigned pollin : 1;
+ unsigned pollout : 1;
+ unsigned poll_exclusive : 1;
+ /* op supports buffer selection */
+ unsigned buffer_select : 1;
+ /* do prep async if is going to be punted */
+ unsigned needs_async_setup : 1;
+ /* opcode is not supported by this kernel */
+ unsigned not_supported : 1;
+ /* skip auditing */
+ unsigned audit_skip : 1;
+ /* size of async data needed, if any */
+ unsigned short async_size;
+};
+
+static const struct io_op_def io_op_defs[] = {
+ [IORING_OP_NOP] = {},
+ [IORING_OP_READV] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollin = 1,
+ .buffer_select = 1,
+ .needs_async_setup = 1,
+ .plug = 1,
+ .audit_skip = 1,
+ .async_size = sizeof(struct io_async_rw),
+ },
+ [IORING_OP_WRITEV] = {
+ .needs_file = 1,
+ .hash_reg_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollout = 1,
+ .needs_async_setup = 1,
+ .plug = 1,
+ .audit_skip = 1,
+ .async_size = sizeof(struct io_async_rw),
+ },
+ [IORING_OP_FSYNC] = {
+ .needs_file = 1,
+ .audit_skip = 1,
+ },
+ [IORING_OP_READ_FIXED] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollin = 1,
+ .plug = 1,
+ .audit_skip = 1,
+ .async_size = sizeof(struct io_async_rw),
+ },
+ [IORING_OP_WRITE_FIXED] = {
+ .needs_file = 1,
+ .hash_reg_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollout = 1,
+ .plug = 1,
+ .audit_skip = 1,
+ .async_size = sizeof(struct io_async_rw),
+ },
+ [IORING_OP_POLL_ADD] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .audit_skip = 1,
+ },
+ [IORING_OP_POLL_REMOVE] = {
+ .audit_skip = 1,
+ },
+ [IORING_OP_SYNC_FILE_RANGE] = {
+ .needs_file = 1,
+ .audit_skip = 1,
+ },
+ [IORING_OP_SENDMSG] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollout = 1,
+ .needs_async_setup = 1,
+ .async_size = sizeof(struct io_async_msghdr),
+ },
+ [IORING_OP_RECVMSG] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollin = 1,
+ .buffer_select = 1,
+ .needs_async_setup = 1,
+ .async_size = sizeof(struct io_async_msghdr),
+ },
+ [IORING_OP_TIMEOUT] = {
+ .audit_skip = 1,
+ .async_size = sizeof(struct io_timeout_data),
+ },
+ [IORING_OP_TIMEOUT_REMOVE] = {
+ /* used by timeout updates' prep() */
+ .audit_skip = 1,
+ },
+ [IORING_OP_ACCEPT] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollin = 1,
+ .poll_exclusive = 1,
+ },
+ [IORING_OP_ASYNC_CANCEL] = {
+ .audit_skip = 1,
+ },
+ [IORING_OP_LINK_TIMEOUT] = {
+ .audit_skip = 1,
+ .async_size = sizeof(struct io_timeout_data),
+ },
+ [IORING_OP_CONNECT] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollout = 1,
+ .needs_async_setup = 1,
+ .async_size = sizeof(struct io_async_connect),
+ },
+ [IORING_OP_FALLOCATE] = {
+ .needs_file = 1,
+ },
+ [IORING_OP_OPENAT] = {},
+ [IORING_OP_CLOSE] = {},
+ [IORING_OP_FILES_UPDATE] = {
+ .audit_skip = 1,
+ },
+ [IORING_OP_STATX] = {
+ .audit_skip = 1,
+ },
+ [IORING_OP_READ] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollin = 1,
+ .buffer_select = 1,
+ .plug = 1,
+ .audit_skip = 1,
+ .async_size = sizeof(struct io_async_rw),
+ },
+ [IORING_OP_WRITE] = {
+ .needs_file = 1,
+ .hash_reg_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollout = 1,
+ .plug = 1,
+ .audit_skip = 1,
+ .async_size = sizeof(struct io_async_rw),
+ },
+ [IORING_OP_FADVISE] = {
+ .needs_file = 1,
+ .audit_skip = 1,
+ },
+ [IORING_OP_MADVISE] = {},
+ [IORING_OP_SEND] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollout = 1,
+ .audit_skip = 1,
+ },
+ [IORING_OP_RECV] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollin = 1,
+ .buffer_select = 1,
+ .audit_skip = 1,
+ },
+ [IORING_OP_OPENAT2] = {
+ },
+ [IORING_OP_EPOLL_CTL] = {
+ .unbound_nonreg_file = 1,
+ .audit_skip = 1,
+ },
+ [IORING_OP_SPLICE] = {
+ .needs_file = 1,
+ .hash_reg_file = 1,
+ .unbound_nonreg_file = 1,
+ .audit_skip = 1,
+ },
+ [IORING_OP_PROVIDE_BUFFERS] = {
+ .audit_skip = 1,
+ },
+ [IORING_OP_REMOVE_BUFFERS] = {
+ .audit_skip = 1,
+ },
+ [IORING_OP_TEE] = {
+ .needs_file = 1,
+ .hash_reg_file = 1,
+ .unbound_nonreg_file = 1,
+ .audit_skip = 1,
+ },
+ [IORING_OP_SHUTDOWN] = {
+ .needs_file = 1,
+ },
+ [IORING_OP_RENAMEAT] = {},
+ [IORING_OP_UNLINKAT] = {},
+ [IORING_OP_MKDIRAT] = {},
+ [IORING_OP_SYMLINKAT] = {},
+ [IORING_OP_LINKAT] = {},
+ [IORING_OP_MSG_RING] = {
+ .needs_file = 1,
+ },
+};
+
+/* requests with any of those set should undergo io_disarm_next() */
+#define IO_DISARM_MASK (REQ_F_ARM_LTIMEOUT | REQ_F_LINK_TIMEOUT | REQ_F_FAIL)
+
+static bool io_disarm_next(struct io_kiocb *req);
+static void io_uring_del_tctx_node(unsigned long index);
+static void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
+ struct task_struct *task,
+ bool cancel_all);
+static void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd);
+
+static void io_fill_cqe_req(struct io_kiocb *req, s32 res, u32 cflags);
+
+static void io_put_req(struct io_kiocb *req);
+static void io_put_req_deferred(struct io_kiocb *req);
+static void io_dismantle_req(struct io_kiocb *req);
+static void io_queue_linked_timeout(struct io_kiocb *req);
+static int __io_register_rsrc_update(struct io_ring_ctx *ctx, unsigned type,
+ struct io_uring_rsrc_update2 *up,
+ unsigned nr_args);
+static void io_clean_op(struct io_kiocb *req);
+static inline struct file *io_file_get_fixed(struct io_kiocb *req, int fd,
+ unsigned issue_flags);
+static inline struct file *io_file_get_normal(struct io_kiocb *req, int fd);
+static void __io_queue_sqe(struct io_kiocb *req);
+static void io_rsrc_put_work(struct work_struct *work);
+
+static void io_req_task_queue(struct io_kiocb *req);
+static void __io_submit_flush_completions(struct io_ring_ctx *ctx);
+static int io_req_prep_async(struct io_kiocb *req);
+
+static int io_install_fixed_file(struct io_kiocb *req, struct file *file,
+ unsigned int issue_flags, u32 slot_index);
+static int io_close_fixed(struct io_kiocb *req, unsigned int issue_flags);
+
+static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer);
+static void io_eventfd_signal(struct io_ring_ctx *ctx);
+
+static struct kmem_cache *req_cachep;
+
+static const struct file_operations io_uring_fops;
+
+struct sock *io_uring_get_socket(struct file *file)
+{
+#if defined(CONFIG_UNIX)
+ if (file->f_op == &io_uring_fops) {
+ struct io_ring_ctx *ctx = file->private_data;
+
+ return ctx->ring_sock->sk;
+ }
+#endif
+ return NULL;
+}
+EXPORT_SYMBOL(io_uring_get_socket);
+
+static inline void io_tw_lock(struct io_ring_ctx *ctx, bool *locked)
+{
+ if (!*locked) {
+ mutex_lock(&ctx->uring_lock);
+ *locked = true;
+ }
+}
+
+#define io_for_each_link(pos, head) \
+ for (pos = (head); pos; pos = pos->link)
+
+/*
+ * Shamelessly stolen from the mm implementation of page reference checking,
+ * see commit f958d7b528b1 for details.
+ */
+#define req_ref_zero_or_close_to_overflow(req) \
+ ((unsigned int) atomic_read(&(req->refs)) + 127u <= 127u)
+
+static inline bool req_ref_inc_not_zero(struct io_kiocb *req)
+{
+ WARN_ON_ONCE(!(req->flags & REQ_F_REFCOUNT));
+ return atomic_inc_not_zero(&req->refs);
+}
+
+static inline bool req_ref_put_and_test(struct io_kiocb *req)
+{
+ if (likely(!(req->flags & REQ_F_REFCOUNT)))
+ return true;
+
+ WARN_ON_ONCE(req_ref_zero_or_close_to_overflow(req));
+ return atomic_dec_and_test(&req->refs);
+}
+
+static inline void req_ref_get(struct io_kiocb *req)
+{
+ WARN_ON_ONCE(!(req->flags & REQ_F_REFCOUNT));
+ WARN_ON_ONCE(req_ref_zero_or_close_to_overflow(req));
+ atomic_inc(&req->refs);
+}
+
+static inline void io_submit_flush_completions(struct io_ring_ctx *ctx)
+{
+ if (!wq_list_empty(&ctx->submit_state.compl_reqs))
+ __io_submit_flush_completions(ctx);
+}
+
+static inline void __io_req_set_refcount(struct io_kiocb *req, int nr)
+{
+ if (!(req->flags & REQ_F_REFCOUNT)) {
+ req->flags |= REQ_F_REFCOUNT;
+ atomic_set(&req->refs, nr);
+ }
+}
+
+static inline void io_req_set_refcount(struct io_kiocb *req)
+{
+ __io_req_set_refcount(req, 1);
+}
+
+#define IO_RSRC_REF_BATCH 100
+
+static inline void io_req_put_rsrc_locked(struct io_kiocb *req,
+ struct io_ring_ctx *ctx)
+ __must_hold(&ctx->uring_lock)
+{
+ struct percpu_ref *ref = req->fixed_rsrc_refs;
+
+ if (ref) {
+ if (ref == &ctx->rsrc_node->refs)
+ ctx->rsrc_cached_refs++;
+ else
+ percpu_ref_put(ref);
+ }
+}
+
+static inline void io_req_put_rsrc(struct io_kiocb *req, struct io_ring_ctx *ctx)
+{
+ if (req->fixed_rsrc_refs)
+ percpu_ref_put(req->fixed_rsrc_refs);
+}
+
+static __cold void io_rsrc_refs_drop(struct io_ring_ctx *ctx)
+ __must_hold(&ctx->uring_lock)
+{
+ if (ctx->rsrc_cached_refs) {
+ percpu_ref_put_many(&ctx->rsrc_node->refs, ctx->rsrc_cached_refs);
+ ctx->rsrc_cached_refs = 0;
+ }
+}
+
+static void io_rsrc_refs_refill(struct io_ring_ctx *ctx)
+ __must_hold(&ctx->uring_lock)
+{
+ ctx->rsrc_cached_refs += IO_RSRC_REF_BATCH;
+ percpu_ref_get_many(&ctx->rsrc_node->refs, IO_RSRC_REF_BATCH);
+}
+
+static inline void io_req_set_rsrc_node(struct io_kiocb *req,
+ struct io_ring_ctx *ctx,
+ unsigned int issue_flags)
+{
+ if (!req->fixed_rsrc_refs) {
+ req->fixed_rsrc_refs = &ctx->rsrc_node->refs;
+
+ if (!(issue_flags & IO_URING_F_UNLOCKED)) {
+ lockdep_assert_held(&ctx->uring_lock);
+ ctx->rsrc_cached_refs--;
+ if (unlikely(ctx->rsrc_cached_refs < 0))
+ io_rsrc_refs_refill(ctx);
+ } else {
+ percpu_ref_get(req->fixed_rsrc_refs);
+ }
+ }
+}
+
+static unsigned int __io_put_kbuf(struct io_kiocb *req, struct list_head *list)
+{
+ struct io_buffer *kbuf = req->kbuf;
+ unsigned int cflags;
+
+ cflags = IORING_CQE_F_BUFFER | (kbuf->bid << IORING_CQE_BUFFER_SHIFT);
+ req->flags &= ~REQ_F_BUFFER_SELECTED;
+ list_add(&kbuf->list, list);
+ req->kbuf = NULL;
+ return cflags;
+}
+
+static inline unsigned int io_put_kbuf_comp(struct io_kiocb *req)
+{
+ lockdep_assert_held(&req->ctx->completion_lock);
+
+ if (likely(!(req->flags & REQ_F_BUFFER_SELECTED)))
+ return 0;
+ return __io_put_kbuf(req, &req->ctx->io_buffers_comp);
+}
+
+static inline unsigned int io_put_kbuf(struct io_kiocb *req,
+ unsigned issue_flags)
+{
+ unsigned int cflags;
+
+ if (likely(!(req->flags & REQ_F_BUFFER_SELECTED)))
+ return 0;
+
+ /*
+ * We can add this buffer back to two lists:
+ *
+ * 1) The io_buffers_cache list. This one is protected by the
+ * ctx->uring_lock. If we already hold this lock, add back to this
+ * list as we can grab it from issue as well.
+ * 2) The io_buffers_comp list. This one is protected by the
+ * ctx->completion_lock.
+ *
+ * We migrate buffers from the comp_list to the issue cache list
+ * when we need one.
+ */
+ if (issue_flags & IO_URING_F_UNLOCKED) {
+ struct io_ring_ctx *ctx = req->ctx;
+
+ spin_lock(&ctx->completion_lock);
+ cflags = __io_put_kbuf(req, &ctx->io_buffers_comp);
+ spin_unlock(&ctx->completion_lock);
+ } else {
+ lockdep_assert_held(&req->ctx->uring_lock);
+
+ cflags = __io_put_kbuf(req, &req->ctx->io_buffers_cache);
+ }
+
+ return cflags;
+}
+
+static struct io_buffer_list *io_buffer_get_list(struct io_ring_ctx *ctx,
+ unsigned int bgid)
+{
+ struct list_head *hash_list;
+ struct io_buffer_list *bl;
+
+ hash_list = &ctx->io_buffers[hash_32(bgid, IO_BUFFERS_HASH_BITS)];
+ list_for_each_entry(bl, hash_list, list)
+ if (bl->bgid == bgid || bgid == -1U)
+ return bl;
+
+ return NULL;
+}
+
+static void io_kbuf_recycle(struct io_kiocb *req, unsigned issue_flags)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ struct io_buffer_list *bl;
+ struct io_buffer *buf;
+
+ if (likely(!(req->flags & REQ_F_BUFFER_SELECTED)))
+ return;
+ /* don't recycle if we already did IO to this buffer */
+ if (req->flags & REQ_F_PARTIAL_IO)
+ return;
+
+ if (issue_flags & IO_URING_F_UNLOCKED)
+ mutex_lock(&ctx->uring_lock);
+
+ lockdep_assert_held(&ctx->uring_lock);
+
+ buf = req->kbuf;
+ bl = io_buffer_get_list(ctx, buf->bgid);
+ list_add(&buf->list, &bl->buf_list);
+ req->flags &= ~REQ_F_BUFFER_SELECTED;
+ req->kbuf = NULL;
+
+ if (issue_flags & IO_URING_F_UNLOCKED)
+ mutex_unlock(&ctx->uring_lock);
+}
+
+static bool io_match_task(struct io_kiocb *head, struct task_struct *task,
+ bool cancel_all)
+ __must_hold(&req->ctx->timeout_lock)
+{
+ struct io_kiocb *req;
+
+ if (task && head->task != task)
+ return false;
+ if (cancel_all)
+ return true;
+
+ io_for_each_link(req, head) {
+ if (req->flags & REQ_F_INFLIGHT)
+ return true;
+ }
+ return false;
+}
+
+static bool io_match_linked(struct io_kiocb *head)
+{
+ struct io_kiocb *req;
+
+ io_for_each_link(req, head) {
+ if (req->flags & REQ_F_INFLIGHT)
+ return true;
+ }
+ return false;
+}
+
+/*
+ * As io_match_task() but protected against racing with linked timeouts.
+ * User must not hold timeout_lock.
+ */
+static bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task,
+ bool cancel_all)
+{
+ bool matched;
+
+ if (task && head->task != task)
+ return false;
+ if (cancel_all)
+ return true;
+
+ if (head->flags & REQ_F_LINK_TIMEOUT) {
+ struct io_ring_ctx *ctx = head->ctx;
+
+ /* protect against races with linked timeouts */
+ spin_lock_irq(&ctx->timeout_lock);
+ matched = io_match_linked(head);
+ spin_unlock_irq(&ctx->timeout_lock);
+ } else {
+ matched = io_match_linked(head);
+ }
+ return matched;
+}
+
+static inline bool req_has_async_data(struct io_kiocb *req)
+{
+ return req->flags & REQ_F_ASYNC_DATA;
+}
+
+static inline void req_set_fail(struct io_kiocb *req)
+{
+ req->flags |= REQ_F_FAIL;
+ if (req->flags & REQ_F_CQE_SKIP) {
+ req->flags &= ~REQ_F_CQE_SKIP;
+ req->flags |= REQ_F_SKIP_LINK_CQES;
+ }
+}
+
+static inline void req_fail_link_node(struct io_kiocb *req, int res)
+{
+ req_set_fail(req);
+ req->result = res;
+}
+
+static __cold void io_ring_ctx_ref_free(struct percpu_ref *ref)
+{
+ struct io_ring_ctx *ctx = container_of(ref, struct io_ring_ctx, refs);
+
+ complete(&ctx->ref_comp);
+}
+
+static inline bool io_is_timeout_noseq(struct io_kiocb *req)
+{
+ return !req->timeout.off;
+}
+
+static __cold void io_fallback_req_func(struct work_struct *work)
+{
+ struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx,
+ fallback_work.work);
+ struct llist_node *node = llist_del_all(&ctx->fallback_llist);
+ struct io_kiocb *req, *tmp;
+ bool locked = false;
+
+ percpu_ref_get(&ctx->refs);
+ llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
+ req->io_task_work.func(req, &locked);
+
+ if (locked) {
+ io_submit_flush_completions(ctx);
+ mutex_unlock(&ctx->uring_lock);
+ }
+ percpu_ref_put(&ctx->refs);
+}
+
+static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
+{
+ struct io_ring_ctx *ctx;
+ int i, hash_bits;
+
+ ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+ if (!ctx)
+ return NULL;
+
+ /*
+ * Use 5 bits less than the max cq entries, that should give us around
+ * 32 entries per hash list if totally full and uniformly spread.
+ */
+ hash_bits = ilog2(p->cq_entries);
+ hash_bits -= 5;
+ if (hash_bits <= 0)
+ hash_bits = 1;
+ ctx->cancel_hash_bits = hash_bits;
+ ctx->cancel_hash = kmalloc((1U << hash_bits) * sizeof(struct hlist_head),
+ GFP_KERNEL);
+ if (!ctx->cancel_hash)
+ goto err;
+ __hash_init(ctx->cancel_hash, 1U << hash_bits);
+
+ ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
+ if (!ctx->dummy_ubuf)
+ goto err;
+ /* set invalid range, so io_import_fixed() fails meeting it */
+ ctx->dummy_ubuf->ubuf = -1UL;
+
+ ctx->io_buffers = kcalloc(1U << IO_BUFFERS_HASH_BITS,
+ sizeof(struct list_head), GFP_KERNEL);
+ if (!ctx->io_buffers)
+ goto err;
+ for (i = 0; i < (1U << IO_BUFFERS_HASH_BITS); i++)
+ INIT_LIST_HEAD(&ctx->io_buffers[i]);
+
+ if (percpu_ref_init(&ctx->refs, io_ring_ctx_ref_free,
+ 0, GFP_KERNEL))
+ goto err;
+
+ ctx->flags = p->flags;
+ init_waitqueue_head(&ctx->sqo_sq_wait);
+ INIT_LIST_HEAD(&ctx->sqd_list);
+ INIT_LIST_HEAD(&ctx->cq_overflow_list);
+ INIT_LIST_HEAD(&ctx->io_buffers_cache);
+ INIT_LIST_HEAD(&ctx->apoll_cache);
+ init_completion(&ctx->ref_comp);
+ xa_init_flags(&ctx->personalities, XA_FLAGS_ALLOC1);
+ mutex_init(&ctx->uring_lock);
+ init_waitqueue_head(&ctx->cq_wait);
+ spin_lock_init(&ctx->completion_lock);
+ spin_lock_init(&ctx->timeout_lock);
+ INIT_WQ_LIST(&ctx->iopoll_list);
+ INIT_LIST_HEAD(&ctx->io_buffers_pages);
+ INIT_LIST_HEAD(&ctx->io_buffers_comp);
+ INIT_LIST_HEAD(&ctx->defer_list);
+ INIT_LIST_HEAD(&ctx->timeout_list);
+ INIT_LIST_HEAD(&ctx->ltimeout_list);
+ spin_lock_init(&ctx->rsrc_ref_lock);
+ INIT_LIST_HEAD(&ctx->rsrc_ref_list);
+ INIT_DELAYED_WORK(&ctx->rsrc_put_work, io_rsrc_put_work);
+ init_llist_head(&ctx->rsrc_put_llist);
+ INIT_LIST_HEAD(&ctx->tctx_list);
+ ctx->submit_state.free_list.next = NULL;
+ INIT_WQ_LIST(&ctx->locked_free_list);
+ INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func);
+ INIT_WQ_LIST(&ctx->submit_state.compl_reqs);
+ return ctx;
+err:
+ kfree(ctx->dummy_ubuf);
+ kfree(ctx->cancel_hash);
+ kfree(ctx->io_buffers);
+ kfree(ctx);
+ return NULL;
+}
+
+static void io_account_cq_overflow(struct io_ring_ctx *ctx)
+{
+ struct io_rings *r = ctx->rings;
+
+ WRITE_ONCE(r->cq_overflow, READ_ONCE(r->cq_overflow) + 1);
+ ctx->cq_extra--;
+}
+
+static bool req_need_defer(struct io_kiocb *req, u32 seq)
+{
+ if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
+ struct io_ring_ctx *ctx = req->ctx;
+
+ return seq + READ_ONCE(ctx->cq_extra) != ctx->cached_cq_tail;
+ }
+
+ return false;
+}
+
+#define FFS_NOWAIT 0x1UL
+#define FFS_ISREG 0x2UL
+#define FFS_MASK ~(FFS_NOWAIT|FFS_ISREG)
+
+static inline bool io_req_ffs_set(struct io_kiocb *req)
+{
+ return req->flags & REQ_F_FIXED_FILE;
+}
+
+static inline void io_req_track_inflight(struct io_kiocb *req)
+{
+ if (!(req->flags & REQ_F_INFLIGHT)) {
+ req->flags |= REQ_F_INFLIGHT;
+ atomic_inc(&req->task->io_uring->inflight_tracked);
+ }
+}
+
+static struct io_kiocb *__io_prep_linked_timeout(struct io_kiocb *req)
+{
+ if (WARN_ON_ONCE(!req->link))
+ return NULL;
+
+ req->flags &= ~REQ_F_ARM_LTIMEOUT;
+ req->flags |= REQ_F_LINK_TIMEOUT;
+
+ /* linked timeouts should have two refs once prep'ed */
+ io_req_set_refcount(req);
+ __io_req_set_refcount(req->link, 2);
+ return req->link;
+}
+
+static inline struct io_kiocb *io_prep_linked_timeout(struct io_kiocb *req)
+{
+ if (likely(!(req->flags & REQ_F_ARM_LTIMEOUT)))
+ return NULL;
+ return __io_prep_linked_timeout(req);
+}
+
+static void io_prep_async_work(struct io_kiocb *req)
+{
+ const struct io_op_def *def = &io_op_defs[req->opcode];
+ struct io_ring_ctx *ctx = req->ctx;
+
+ if (!(req->flags & REQ_F_CREDS)) {
+ req->flags |= REQ_F_CREDS;
+ req->creds = get_current_cred();
+ }
+
+ req->work.list.next = NULL;
+ req->work.flags = 0;
+ if (req->flags & REQ_F_FORCE_ASYNC)
+ req->work.flags |= IO_WQ_WORK_CONCURRENT;
+
+ if (req->flags & REQ_F_ISREG) {
+ if (def->hash_reg_file || (ctx->flags & IORING_SETUP_IOPOLL))
+ io_wq_hash_work(&req->work, file_inode(req->file));
+ } else if (!req->file || !S_ISBLK(file_inode(req->file)->i_mode)) {
+ if (def->unbound_nonreg_file)
+ req->work.flags |= IO_WQ_WORK_UNBOUND;
+ }
+}
+
+static void io_prep_async_link(struct io_kiocb *req)
+{
+ struct io_kiocb *cur;
+
+ if (req->flags & REQ_F_LINK_TIMEOUT) {
+ struct io_ring_ctx *ctx = req->ctx;
+
+ spin_lock_irq(&ctx->timeout_lock);
+ io_for_each_link(cur, req)
+ io_prep_async_work(cur);
+ spin_unlock_irq(&ctx->timeout_lock);
+ } else {
+ io_for_each_link(cur, req)
+ io_prep_async_work(cur);
+ }
+}
+
+static inline void io_req_add_compl_list(struct io_kiocb *req)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ struct io_submit_state *state = &ctx->submit_state;
+
+ if (!(req->flags & REQ_F_CQE_SKIP))
+ ctx->submit_state.flush_cqes = true;
+ wq_list_add_tail(&req->comp_list, &state->compl_reqs);
+}
+
+static void io_queue_async_work(struct io_kiocb *req, bool *dont_use)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ struct io_kiocb *link = io_prep_linked_timeout(req);
+ struct io_uring_task *tctx = req->task->io_uring;
+
+ BUG_ON(!tctx);
+ BUG_ON(!tctx->io_wq);
+
+ /* init ->work of the whole link before punting */
+ io_prep_async_link(req);
+
+ /*
+ * Not expected to happen, but if we do have a bug where this _can_
+ * happen, catch it here and ensure the request is marked as
+ * canceled. That will make io-wq go through the usual work cancel
+ * procedure rather than attempt to run this request (or create a new
+ * worker for it).
+ */
+ if (WARN_ON_ONCE(!same_thread_group(req->task, current)))
+ req->work.flags |= IO_WQ_WORK_CANCEL;
+
+ trace_io_uring_queue_async_work(ctx, req, req->user_data, req->opcode, req->flags,
+ &req->work, io_wq_is_hashed(&req->work));
+ io_wq_enqueue(tctx->io_wq, &req->work);
+ if (link)
+ io_queue_linked_timeout(link);
+}
+
+static void io_kill_timeout(struct io_kiocb *req, int status)
+ __must_hold(&req->ctx->completion_lock)
+ __must_hold(&req->ctx->timeout_lock)
+{
+ struct io_timeout_data *io = req->async_data;
+
+ if (hrtimer_try_to_cancel(&io->timer) != -1) {
+ if (status)
+ req_set_fail(req);
+ atomic_set(&req->ctx->cq_timeouts,
+ atomic_read(&req->ctx->cq_timeouts) + 1);
+ list_del_init(&req->timeout.list);
+ io_fill_cqe_req(req, status, 0);
+ io_put_req_deferred(req);
+ }
+}
+
+static __cold void io_queue_deferred(struct io_ring_ctx *ctx)
+{
+ while (!list_empty(&ctx->defer_list)) {
+ struct io_defer_entry *de = list_first_entry(&ctx->defer_list,
+ struct io_defer_entry, list);
+
+ if (req_need_defer(de->req, de->seq))
+ break;
+ list_del_init(&de->list);
+ io_req_task_queue(de->req);
+ kfree(de);
+ }
+}
+
+static __cold void io_flush_timeouts(struct io_ring_ctx *ctx)
+ __must_hold(&ctx->completion_lock)
+{
+ u32 seq = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts);
+ struct io_kiocb *req, *tmp;
+
+ spin_lock_irq(&ctx->timeout_lock);
+ list_for_each_entry_safe(req, tmp, &ctx->timeout_list, timeout.list) {
+ u32 events_needed, events_got;
+
+ if (io_is_timeout_noseq(req))
+ break;
+
+ /*
+ * Since seq can easily wrap around over time, subtract
+ * the last seq at which timeouts were flushed before comparing.
+ * Assuming not more than 2^31-1 events have happened since,
+ * these subtractions won't have wrapped, so we can check if
+ * target is in [last_seq, current_seq] by comparing the two.
+ */
+ events_needed = req->timeout.target_seq - ctx->cq_last_tm_flush;
+ events_got = seq - ctx->cq_last_tm_flush;
+ if (events_got < events_needed)
+ break;
+
+ io_kill_timeout(req, 0);
+ }
+ ctx->cq_last_tm_flush = seq;
+ spin_unlock_irq(&ctx->timeout_lock);
+}
+
+static inline void io_commit_cqring(struct io_ring_ctx *ctx)
+{
+ /* order cqe stores with ring update */
+ smp_store_release(&ctx->rings->cq.tail, ctx->cached_cq_tail);
+}
+
+static void __io_commit_cqring_flush(struct io_ring_ctx *ctx)
+{
+ if (ctx->off_timeout_used || ctx->drain_active) {
+ spin_lock(&ctx->completion_lock);
+ if (ctx->off_timeout_used)
+ io_flush_timeouts(ctx);
+ if (ctx->drain_active)
+ io_queue_deferred(ctx);
+ io_commit_cqring(ctx);
+ spin_unlock(&ctx->completion_lock);
+ }
+ if (ctx->has_evfd)
+ io_eventfd_signal(ctx);
+}
+
+static inline bool io_sqring_full(struct io_ring_ctx *ctx)
+{
+ struct io_rings *r = ctx->rings;
+
+ return READ_ONCE(r->sq.tail) - ctx->cached_sq_head == ctx->sq_entries;
+}
+
+static inline unsigned int __io_cqring_events(struct io_ring_ctx *ctx)
+{
+ return ctx->cached_cq_tail - READ_ONCE(ctx->rings->cq.head);
+}
+
+static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx)
+{
+ struct io_rings *rings = ctx->rings;
+ unsigned tail, mask = ctx->cq_entries - 1;
+
+ /*
+ * writes to the cq entry need to come after reading head; the
+ * control dependency is enough as we're using WRITE_ONCE to
+ * fill the cq entry
+ */
+ if (__io_cqring_events(ctx) == ctx->cq_entries)
+ return NULL;
+
+ tail = ctx->cached_cq_tail++;
+ return &rings->cqes[tail & mask];
+}
+
+static void io_eventfd_signal(struct io_ring_ctx *ctx)
+{
+ struct io_ev_fd *ev_fd;
+
+ rcu_read_lock();
+ /*
+ * rcu_dereference ctx->io_ev_fd once and use it for both for checking
+ * and eventfd_signal
+ */
+ ev_fd = rcu_dereference(ctx->io_ev_fd);
+
+ /*
+ * Check again if ev_fd exists incase an io_eventfd_unregister call
+ * completed between the NULL check of ctx->io_ev_fd at the start of
+ * the function and rcu_read_lock.
+ */
+ if (unlikely(!ev_fd))
+ goto out;
+ if (READ_ONCE(ctx->rings->cq_flags) & IORING_CQ_EVENTFD_DISABLED)
+ goto out;
+
+ if (!ev_fd->eventfd_async || io_wq_current_is_worker())
+ eventfd_signal(ev_fd->cq_ev_fd, 1);
+out:
+ rcu_read_unlock();
+}
+
+static inline void io_cqring_wake(struct io_ring_ctx *ctx)
+{
+ /*
+ * wake_up_all() may seem excessive, but io_wake_function() and
+ * io_should_wake() handle the termination of the loop and only
+ * wake as many waiters as we need to.
+ */
+ if (wq_has_sleeper(&ctx->cq_wait))
+ wake_up_all(&ctx->cq_wait);
+}
+
+/*
+ * This should only get called when at least one event has been posted.
+ * Some applications rely on the eventfd notification count only changing
+ * IFF a new CQE has been added to the CQ ring. There's no depedency on
+ * 1:1 relationship between how many times this function is called (and
+ * hence the eventfd count) and number of CQEs posted to the CQ ring.
+ */
+static inline void io_cqring_ev_posted(struct io_ring_ctx *ctx)
+{
+ if (unlikely(ctx->off_timeout_used || ctx->drain_active ||
+ ctx->has_evfd))
+ __io_commit_cqring_flush(ctx);
+
+ io_cqring_wake(ctx);
+}
+
+static void io_cqring_ev_posted_iopoll(struct io_ring_ctx *ctx)
+{
+ if (unlikely(ctx->off_timeout_used || ctx->drain_active ||
+ ctx->has_evfd))
+ __io_commit_cqring_flush(ctx);
+
+ if (ctx->flags & IORING_SETUP_SQPOLL)
+ io_cqring_wake(ctx);
+}
+
+/* Returns true if there are no backlogged entries after the flush */
+static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
+{
+ bool all_flushed, posted;
+
+ if (!force && __io_cqring_events(ctx) == ctx->cq_entries)
+ return false;
+
+ posted = false;
+ spin_lock(&ctx->completion_lock);
+ while (!list_empty(&ctx->cq_overflow_list)) {
+ struct io_uring_cqe *cqe = io_get_cqe(ctx);
+ struct io_overflow_cqe *ocqe;
+
+ if (!cqe && !force)
+ break;
+ ocqe = list_first_entry(&ctx->cq_overflow_list,
+ struct io_overflow_cqe, list);
+ if (cqe)
+ memcpy(cqe, &ocqe->cqe, sizeof(*cqe));
+ else
+ io_account_cq_overflow(ctx);
+
+ posted = true;
+ list_del(&ocqe->list);
+ kfree(ocqe);
+ }
+
+ all_flushed = list_empty(&ctx->cq_overflow_list);
+ if (all_flushed) {
+ clear_bit(0, &ctx->check_cq_overflow);
+ WRITE_ONCE(ctx->rings->sq_flags,
+ ctx->rings->sq_flags & ~IORING_SQ_CQ_OVERFLOW);
+ }
+
+ if (posted)
+ io_commit_cqring(ctx);
+ spin_unlock(&ctx->completion_lock);
+ if (posted)
+ io_cqring_ev_posted(ctx);
+ return all_flushed;
+}
+
+static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx)
+{
+ bool ret = true;
+
+ if (test_bit(0, &ctx->check_cq_overflow)) {
+ /* iopoll syncs against uring_lock, not completion_lock */
+ if (ctx->flags & IORING_SETUP_IOPOLL)
+ mutex_lock(&ctx->uring_lock);
+ ret = __io_cqring_overflow_flush(ctx, false);
+ if (ctx->flags & IORING_SETUP_IOPOLL)
+ mutex_unlock(&ctx->uring_lock);
+ }
+
+ return ret;
+}
+
+/* must to be called somewhat shortly after putting a request */
+static inline void io_put_task(struct task_struct *task, int nr)
+{
+ struct io_uring_task *tctx = task->io_uring;
+
+ if (likely(task == current)) {
+ tctx->cached_refs += nr;
+ } else {
+ percpu_counter_sub(&tctx->inflight, nr);
+ if (unlikely(atomic_read(&tctx->in_idle)))
+ wake_up(&tctx->wait);
+ put_task_struct_many(task, nr);
+ }
+}
+
+static void io_task_refs_refill(struct io_uring_task *tctx)
+{
+ unsigned int refill = -tctx->cached_refs + IO_TCTX_REFS_CACHE_NR;
+
+ percpu_counter_add(&tctx->inflight, refill);
+ refcount_add(refill, &current->usage);
+ tctx->cached_refs += refill;
+}
+
+static inline void io_get_task_refs(int nr)
+{
+ struct io_uring_task *tctx = current->io_uring;
+
+ tctx->cached_refs -= nr;
+ if (unlikely(tctx->cached_refs < 0))
+ io_task_refs_refill(tctx);
+}
+
+static __cold void io_uring_drop_tctx_refs(struct task_struct *task)
+{
+ struct io_uring_task *tctx = task->io_uring;
+ unsigned int refs = tctx->cached_refs;
+
+ if (refs) {
+ tctx->cached_refs = 0;
+ percpu_counter_sub(&tctx->inflight, refs);
+ put_task_struct_many(task, refs);
+ }
+}
+
+static bool io_cqring_event_overflow(struct io_ring_ctx *ctx, u64 user_data,
+ s32 res, u32 cflags)
+{
+ struct io_overflow_cqe *ocqe;
+
+ ocqe = kmalloc(sizeof(*ocqe), GFP_ATOMIC | __GFP_ACCOUNT);
+ if (!ocqe) {
+ /*
+ * If we're in ring overflow flush mode, or in task cancel mode,
+ * or cannot allocate an overflow entry, then we need to drop it
+ * on the floor.
+ */
+ io_account_cq_overflow(ctx);
+ return false;
+ }
+ if (list_empty(&ctx->cq_overflow_list)) {
+ set_bit(0, &ctx->check_cq_overflow);
+ WRITE_ONCE(ctx->rings->sq_flags,
+ ctx->rings->sq_flags | IORING_SQ_CQ_OVERFLOW);
+
+ }
+ ocqe->cqe.user_data = user_data;
+ ocqe->cqe.res = res;
+ ocqe->cqe.flags = cflags;
+ list_add_tail(&ocqe->list, &ctx->cq_overflow_list);
+ return true;
+}
+
+static inline bool __io_fill_cqe(struct io_ring_ctx *ctx, u64 user_data,
+ s32 res, u32 cflags)
+{
+ struct io_uring_cqe *cqe;
+
+ /*
+ * If we can't get a cq entry, userspace overflowed the
+ * submission (by quite a lot). Increment the overflow count in
+ * the ring.
+ */
+ cqe = io_get_cqe(ctx);
+ if (likely(cqe)) {
+ WRITE_ONCE(cqe->user_data, user_data);
+ WRITE_ONCE(cqe->res, res);
+ WRITE_ONCE(cqe->flags, cflags);
+ return true;
+ }
+ return io_cqring_event_overflow(ctx, user_data, res, cflags);
+}
+
+static inline bool __io_fill_cqe_req(struct io_kiocb *req, s32 res, u32 cflags)
+{
+ trace_io_uring_complete(req->ctx, req, req->user_data, res, cflags);
+ return __io_fill_cqe(req->ctx, req->user_data, res, cflags);
+}
+
+static noinline void io_fill_cqe_req(struct io_kiocb *req, s32 res, u32 cflags)
+{
+ if (!(req->flags & REQ_F_CQE_SKIP))
+ __io_fill_cqe_req(req, res, cflags);
+}
+
+static noinline bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data,
+ s32 res, u32 cflags)
+{
+ ctx->cq_extra++;
+ trace_io_uring_complete(ctx, NULL, user_data, res, cflags);
+ return __io_fill_cqe(ctx, user_data, res, cflags);
+}
+
+static void __io_req_complete_post(struct io_kiocb *req, s32 res,
+ u32 cflags)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+
+ if (!(req->flags & REQ_F_CQE_SKIP))
+ __io_fill_cqe_req(req, res, cflags);
+ /*
+ * If we're the last reference to this request, add to our locked
+ * free_list cache.
+ */
+ if (req_ref_put_and_test(req)) {
+ if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK)) {
+ if (req->flags & IO_DISARM_MASK)
+ io_disarm_next(req);
+ if (req->link) {
+ io_req_task_queue(req->link);
+ req->link = NULL;
+ }
+ }
+ io_req_put_rsrc(req, ctx);
+ /*
+ * Selected buffer deallocation in io_clean_op() assumes that
+ * we don't hold ->completion_lock. Clean them here to avoid
+ * deadlocks.
+ */
+ io_put_kbuf_comp(req);
+ io_dismantle_req(req);
+ io_put_task(req->task, 1);
+ wq_list_add_head(&req->comp_list, &ctx->locked_free_list);
+ ctx->locked_free_nr++;
+ }
+}
+
+static void io_req_complete_post(struct io_kiocb *req, s32 res,
+ u32 cflags)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+
+ spin_lock(&ctx->completion_lock);
+ __io_req_complete_post(req, res, cflags);
+ io_commit_cqring(ctx);
+ spin_unlock(&ctx->completion_lock);
+ io_cqring_ev_posted(ctx);
+}
+
+static inline void io_req_complete_state(struct io_kiocb *req, s32 res,
+ u32 cflags)
+{
+ req->result = res;
+ req->cflags = cflags;
+ req->flags |= REQ_F_COMPLETE_INLINE;
+}
+
+static inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags,
+ s32 res, u32 cflags)
+{
+ if (issue_flags & IO_URING_F_COMPLETE_DEFER)
+ io_req_complete_state(req, res, cflags);
+ else
+ io_req_complete_post(req, res, cflags);
+}
+
+static inline void io_req_complete(struct io_kiocb *req, s32 res)
+{
+ __io_req_complete(req, 0, res, 0);
+}
+
+static void io_req_complete_failed(struct io_kiocb *req, s32 res)
+{
+ req_set_fail(req);
+ io_req_complete_post(req, res, io_put_kbuf(req, IO_URING_F_UNLOCKED));
+}
+
+static void io_req_complete_fail_submit(struct io_kiocb *req)
+{
+ /*
+ * We don't submit, fail them all, for that replace hardlinks with
+ * normal links. Extra REQ_F_LINK is tolerated.
+ */
+ req->flags &= ~REQ_F_HARDLINK;
+ req->flags |= REQ_F_LINK;
+ io_req_complete_failed(req, req->result);
+}
+
+/*
+ * Don't initialise the fields below on every allocation, but do that in
+ * advance and keep them valid across allocations.
+ */
+static void io_preinit_req(struct io_kiocb *req, struct io_ring_ctx *ctx)
+{
+ req->ctx = ctx;
+ req->link = NULL;
+ req->async_data = NULL;
+ /* not necessary, but safer to zero */
+ req->result = 0;
+}
+
+static void io_flush_cached_locked_reqs(struct io_ring_ctx *ctx,
+ struct io_submit_state *state)
+{
+ spin_lock(&ctx->completion_lock);
+ wq_list_splice(&ctx->locked_free_list, &state->free_list);
+ ctx->locked_free_nr = 0;
+ spin_unlock(&ctx->completion_lock);
+}
+
+/* Returns true IFF there are requests in the cache */
+static bool io_flush_cached_reqs(struct io_ring_ctx *ctx)
+{
+ struct io_submit_state *state = &ctx->submit_state;
+
+ /*
+ * If we have more than a batch's worth of requests in our IRQ side
+ * locked cache, grab the lock and move them over to our submission
+ * side cache.
+ */
+ if (READ_ONCE(ctx->locked_free_nr) > IO_COMPL_BATCH)
+ io_flush_cached_locked_reqs(ctx, state);
+ return !!state->free_list.next;
+}
+
+/*
+ * A request might get retired back into the request caches even before opcode
+ * handlers and io_issue_sqe() are done with it, e.g. inline completion path.
+ * Because of that, io_alloc_req() should be called only under ->uring_lock
+ * and with extra caution to not get a request that is still worked on.
+ */
+static __cold bool __io_alloc_req_refill(struct io_ring_ctx *ctx)
+ __must_hold(&ctx->uring_lock)
+{
+ struct io_submit_state *state = &ctx->submit_state;
+ gfp_t gfp = GFP_KERNEL | __GFP_NOWARN;
+ void *reqs[IO_REQ_ALLOC_BATCH];
+ struct io_kiocb *req;
+ int ret, i;
+
+ if (likely(state->free_list.next || io_flush_cached_reqs(ctx)))
+ return true;
+
+ ret = kmem_cache_alloc_bulk(req_cachep, gfp, ARRAY_SIZE(reqs), reqs);
+
+ /*
+ * Bulk alloc is all-or-nothing. If we fail to get a batch,
+ * retry single alloc to be on the safe side.
+ */
+ if (unlikely(ret <= 0)) {
+ reqs[0] = kmem_cache_alloc(req_cachep, gfp);
+ if (!reqs[0])
+ return false;
+ ret = 1;
+ }
+
+ percpu_ref_get_many(&ctx->refs, ret);
+ for (i = 0; i < ret; i++) {
+ req = reqs[i];
+
+ io_preinit_req(req, ctx);
+ wq_stack_add_head(&req->comp_list, &state->free_list);
+ }
+ return true;
+}
+
+static inline bool io_alloc_req_refill(struct io_ring_ctx *ctx)
+{
+ if (unlikely(!ctx->submit_state.free_list.next))
+ return __io_alloc_req_refill(ctx);
+ return true;
+}
+
+static inline struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx)
+{
+ struct io_wq_work_node *node;
+
+ node = wq_stack_extract(&ctx->submit_state.free_list);
+ return container_of(node, struct io_kiocb, comp_list);
+}
+
+static inline void io_put_file(struct file *file)
+{
+ if (file)
+ fput(file);
+}
+
+static inline void io_dismantle_req(struct io_kiocb *req)
+{
+ unsigned int flags = req->flags;
+
+ if (unlikely(flags & IO_REQ_CLEAN_FLAGS))
+ io_clean_op(req);
+ if (!(flags & REQ_F_FIXED_FILE))
+ io_put_file(req->file);
+}
+
+static __cold void __io_free_req(struct io_kiocb *req)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+
+ io_req_put_rsrc(req, ctx);
+ io_dismantle_req(req);
+ io_put_task(req->task, 1);
+
+ spin_lock(&ctx->completion_lock);
+ wq_list_add_head(&req->comp_list, &ctx->locked_free_list);
+ ctx->locked_free_nr++;
+ spin_unlock(&ctx->completion_lock);
+}
+
+static inline void io_remove_next_linked(struct io_kiocb *req)
+{
+ struct io_kiocb *nxt = req->link;
+
+ req->link = nxt->link;
+ nxt->link = NULL;
+}
+
+static bool io_kill_linked_timeout(struct io_kiocb *req)
+ __must_hold(&req->ctx->completion_lock)
+ __must_hold(&req->ctx->timeout_lock)
+{
+ struct io_kiocb *link = req->link;
+
+ if (link && link->opcode == IORING_OP_LINK_TIMEOUT) {
+ struct io_timeout_data *io = link->async_data;
+
+ io_remove_next_linked(req);
+ link->timeout.head = NULL;
+ if (hrtimer_try_to_cancel(&io->timer) != -1) {
+ list_del(&link->timeout.list);
+ /* leave REQ_F_CQE_SKIP to io_fill_cqe_req */
+ io_fill_cqe_req(link, -ECANCELED, 0);
+ io_put_req_deferred(link);
+ return true;
+ }
+ }
+ return false;
+}
+
+static void io_fail_links(struct io_kiocb *req)
+ __must_hold(&req->ctx->completion_lock)
+{
+ struct io_kiocb *nxt, *link = req->link;
+ bool ignore_cqes = req->flags & REQ_F_SKIP_LINK_CQES;
+
+ req->link = NULL;
+ while (link) {
+ long res = -ECANCELED;
+
+ if (link->flags & REQ_F_FAIL)
+ res = link->result;
+
+ nxt = link->link;
+ link->link = NULL;
+
+ trace_io_uring_fail_link(req->ctx, req, req->user_data,
+ req->opcode, link);
+
+ if (!ignore_cqes) {
+ link->flags &= ~REQ_F_CQE_SKIP;
+ io_fill_cqe_req(link, res, 0);
+ }
+ io_put_req_deferred(link);
+ link = nxt;
+ }
+}
+
+static bool io_disarm_next(struct io_kiocb *req)
+ __must_hold(&req->ctx->completion_lock)
+{
+ bool posted = false;
+
+ if (req->flags & REQ_F_ARM_LTIMEOUT) {
+ struct io_kiocb *link = req->link;
+
+ req->flags &= ~REQ_F_ARM_LTIMEOUT;
+ if (link && link->opcode == IORING_OP_LINK_TIMEOUT) {
+ io_remove_next_linked(req);
+ /* leave REQ_F_CQE_SKIP to io_fill_cqe_req */
+ io_fill_cqe_req(link, -ECANCELED, 0);
+ io_put_req_deferred(link);
+ posted = true;
+ }
+ } else if (req->flags & REQ_F_LINK_TIMEOUT) {
+ struct io_ring_ctx *ctx = req->ctx;
+
+ spin_lock_irq(&ctx->timeout_lock);
+ posted = io_kill_linked_timeout(req);
+ spin_unlock_irq(&ctx->timeout_lock);
+ }
+ if (unlikely((req->flags & REQ_F_FAIL) &&
+ !(req->flags & REQ_F_HARDLINK))) {
+ posted |= (req->link != NULL);
+ io_fail_links(req);
+ }
+ return posted;
+}
+
+static void __io_req_find_next_prep(struct io_kiocb *req)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ bool posted;
+
+ spin_lock(&ctx->completion_lock);
+ posted = io_disarm_next(req);
+ if (posted)
+ io_commit_cqring(ctx);
+ spin_unlock(&ctx->completion_lock);
+ if (posted)
+ io_cqring_ev_posted(ctx);
+}
+
+static inline struct io_kiocb *io_req_find_next(struct io_kiocb *req)
+{
+ struct io_kiocb *nxt;
+
+ if (likely(!(req->flags & (REQ_F_LINK|REQ_F_HARDLINK))))
+ return NULL;
+ /*
+ * If LINK is set, we have dependent requests in this chain. If we
+ * didn't fail this request, queue the first one up, moving any other
+ * dependencies to the next request. In case of failure, fail the rest
+ * of the chain.
+ */
+ if (unlikely(req->flags & IO_DISARM_MASK))
+ __io_req_find_next_prep(req);
+ nxt = req->link;
+ req->link = NULL;
+ return nxt;
+}
+
+static void ctx_flush_and_put(struct io_ring_ctx *ctx, bool *locked)
+{
+ if (!ctx)
+ return;
+ if (*locked) {
+ io_submit_flush_completions(ctx);
+ mutex_unlock(&ctx->uring_lock);
+ *locked = false;
+ }
+ percpu_ref_put(&ctx->refs);
+}
+
+static inline void ctx_commit_and_unlock(struct io_ring_ctx *ctx)
+{
+ io_commit_cqring(ctx);
+ spin_unlock(&ctx->completion_lock);
+ io_cqring_ev_posted(ctx);
+}
+
+static void handle_prev_tw_list(struct io_wq_work_node *node,
+ struct io_ring_ctx **ctx, bool *uring_locked)
+{
+ if (*ctx && !*uring_locked)
+ spin_lock(&(*ctx)->completion_lock);
+
+ do {
+ struct io_wq_work_node *next = node->next;
+ struct io_kiocb *req = container_of(node, struct io_kiocb,
+ io_task_work.node);
+
+ prefetch(container_of(next, struct io_kiocb, io_task_work.node));
+
+ if (req->ctx != *ctx) {
+ if (unlikely(!*uring_locked && *ctx))
+ ctx_commit_and_unlock(*ctx);
+
+ ctx_flush_and_put(*ctx, uring_locked);
+ *ctx = req->ctx;
+ /* if not contended, grab and improve batching */
+ *uring_locked = mutex_trylock(&(*ctx)->uring_lock);
+ percpu_ref_get(&(*ctx)->refs);
+ if (unlikely(!*uring_locked))
+ spin_lock(&(*ctx)->completion_lock);
+ }
+ if (likely(*uring_locked))
+ req->io_task_work.func(req, uring_locked);
+ else
+ __io_req_complete_post(req, req->result,
+ io_put_kbuf_comp(req));
+ node = next;
+ } while (node);
+
+ if (unlikely(!*uring_locked))
+ ctx_commit_and_unlock(*ctx);
+}
+
+static void handle_tw_list(struct io_wq_work_node *node,
+ struct io_ring_ctx **ctx, bool *locked)
+{
+ do {
+ struct io_wq_work_node *next = node->next;
+ struct io_kiocb *req = container_of(node, struct io_kiocb,
+ io_task_work.node);
+
+ prefetch(container_of(next, struct io_kiocb, io_task_work.node));
+
+ if (req->ctx != *ctx) {
+ ctx_flush_and_put(*ctx, locked);
+ *ctx = req->ctx;
+ /* if not contended, grab and improve batching */
+ *locked = mutex_trylock(&(*ctx)->uring_lock);
+ percpu_ref_get(&(*ctx)->refs);
+ }
+ req->io_task_work.func(req, locked);
+ node = next;
+ } while (node);
+}
+
+static void tctx_task_work(struct callback_head *cb)
+{
+ bool uring_locked = false;
+ struct io_ring_ctx *ctx = NULL;
+ struct io_uring_task *tctx = container_of(cb, struct io_uring_task,
+ task_work);
+
+ while (1) {
+ struct io_wq_work_node *node1, *node2;
+
+ if (!tctx->task_list.first &&
+ !tctx->prior_task_list.first && uring_locked)
+ io_submit_flush_completions(ctx);
+
+ spin_lock_irq(&tctx->task_lock);
+ node1 = tctx->prior_task_list.first;
+ node2 = tctx->task_list.first;
+ INIT_WQ_LIST(&tctx->task_list);
+ INIT_WQ_LIST(&tctx->prior_task_list);
+ if (!node2 && !node1)
+ tctx->task_running = false;
+ spin_unlock_irq(&tctx->task_lock);
+ if (!node2 && !node1)
+ break;
+
+ if (node1)
+ handle_prev_tw_list(node1, &ctx, &uring_locked);
+
+ if (node2)
+ handle_tw_list(node2, &ctx, &uring_locked);
+ cond_resched();
+ }
+
+ ctx_flush_and_put(ctx, &uring_locked);
+
+ /* relaxed read is enough as only the task itself sets ->in_idle */
+ if (unlikely(atomic_read(&tctx->in_idle)))
+ io_uring_drop_tctx_refs(current);
+}
+
+static void io_req_task_work_add(struct io_kiocb *req, bool priority)
+{
+ struct task_struct *tsk = req->task;
+ struct io_uring_task *tctx = tsk->io_uring;
+ enum task_work_notify_mode notify;
+ struct io_wq_work_node *node;
+ unsigned long flags;
+ bool running;
+
+ WARN_ON_ONCE(!tctx);
+
+ spin_lock_irqsave(&tctx->task_lock, flags);
+ if (priority)
+ wq_list_add_tail(&req->io_task_work.node, &tctx->prior_task_list);
+ else
+ wq_list_add_tail(&req->io_task_work.node, &tctx->task_list);
+ running = tctx->task_running;
+ if (!running)
+ tctx->task_running = true;
+ spin_unlock_irqrestore(&tctx->task_lock, flags);
+
+ /* task_work already pending, we're done */
+ if (running)
+ return;
+
+ /*
+ * SQPOLL kernel thread doesn't need notification, just a wakeup. For
+ * all other cases, use TWA_SIGNAL unconditionally to ensure we're
+ * processing task_work. There's no reliable way to tell if TWA_RESUME
+ * will do the job.
+ */
+ notify = (req->ctx->flags & IORING_SETUP_SQPOLL) ? TWA_NONE : TWA_SIGNAL;
+ if (likely(!task_work_add(tsk, &tctx->task_work, notify))) {
+ if (notify == TWA_NONE)
+ wake_up_process(tsk);
+ return;
+ }
+
+ spin_lock_irqsave(&tctx->task_lock, flags);
+ tctx->task_running = false;
+ node = wq_list_merge(&tctx->prior_task_list, &tctx->task_list);
+ spin_unlock_irqrestore(&tctx->task_lock, flags);
+
+ while (node) {
+ req = container_of(node, struct io_kiocb, io_task_work.node);
+ node = node->next;
+ if (llist_add(&req->io_task_work.fallback_node,
+ &req->ctx->fallback_llist))
+ schedule_delayed_work(&req->ctx->fallback_work, 1);
+ }
+}
+
+static void io_req_task_cancel(struct io_kiocb *req, bool *locked)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+
+ /* not needed for normal modes, but SQPOLL depends on it */
+ io_tw_lock(ctx, locked);
+ io_req_complete_failed(req, req->result);
+}
+
+static void io_req_task_submit(struct io_kiocb *req, bool *locked)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+
+ io_tw_lock(ctx, locked);
+ /* req->task == current here, checking PF_EXITING is safe */
+ if (likely(!(req->task->flags & PF_EXITING)))
+ __io_queue_sqe(req);
+ else
+ io_req_complete_failed(req, -EFAULT);
+}
+
+static void io_req_task_queue_fail(struct io_kiocb *req, int ret)
+{
+ req->result = ret;
+ req->io_task_work.func = io_req_task_cancel;
+ io_req_task_work_add(req, false);
+}
+
+static void io_req_task_queue(struct io_kiocb *req)
+{
+ req->io_task_work.func = io_req_task_submit;
+ io_req_task_work_add(req, false);
+}
+
+static void io_req_task_queue_reissue(struct io_kiocb *req)
+{
+ req->io_task_work.func = io_queue_async_work;
+ io_req_task_work_add(req, false);
+}
+
+static inline void io_queue_next(struct io_kiocb *req)
+{
+ struct io_kiocb *nxt = io_req_find_next(req);
+
+ if (nxt)
+ io_req_task_queue(nxt);
+}
+
+static void io_free_req(struct io_kiocb *req)
+{
+ io_queue_next(req);
+ __io_free_req(req);
+}
+
+static void io_free_req_work(struct io_kiocb *req, bool *locked)
+{
+ io_free_req(req);
+}
+
+static void io_free_batch_list(struct io_ring_ctx *ctx,
+ struct io_wq_work_node *node)
+ __must_hold(&ctx->uring_lock)
+{
+ struct task_struct *task = NULL;
+ int task_refs = 0;
+
+ do {
+ struct io_kiocb *req = container_of(node, struct io_kiocb,
+ comp_list);
+
+ if (unlikely(req->flags & REQ_F_REFCOUNT)) {
+ node = req->comp_list.next;
+ if (!req_ref_put_and_test(req))
+ continue;
+ }
+
+ io_req_put_rsrc_locked(req, ctx);
+ io_queue_next(req);
+ io_dismantle_req(req);
+
+ if (req->task != task) {
+ if (task)
+ io_put_task(task, task_refs);
+ task = req->task;
+ task_refs = 0;
+ }
+ task_refs++;
+ node = req->comp_list.next;
+ wq_stack_add_head(&req->comp_list, &ctx->submit_state.free_list);
+ } while (node);
+
+ if (task)
+ io_put_task(task, task_refs);
+}
+
+static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
+ __must_hold(&ctx->uring_lock)
+{
+ struct io_wq_work_node *node, *prev;
+ struct io_submit_state *state = &ctx->submit_state;
+
+ if (state->flush_cqes) {
+ spin_lock(&ctx->completion_lock);
+ wq_list_for_each(node, prev, &state->compl_reqs) {
+ struct io_kiocb *req = container_of(node, struct io_kiocb,
+ comp_list);
+
+ if (!(req->flags & REQ_F_CQE_SKIP))
+ __io_fill_cqe_req(req, req->result, req->cflags);
+ if ((req->flags & REQ_F_POLLED) && req->apoll) {
+ struct async_poll *apoll = req->apoll;
+
+ if (apoll->double_poll)
+ kfree(apoll->double_poll);
+ list_add(&apoll->poll.wait.entry,
+ &ctx->apoll_cache);
+ req->flags &= ~REQ_F_POLLED;
+ }
+ }
+
+ io_commit_cqring(ctx);
+ spin_unlock(&ctx->completion_lock);
+ io_cqring_ev_posted(ctx);
+ state->flush_cqes = false;
+ }
+
+ io_free_batch_list(ctx, state->compl_reqs.first);
+ INIT_WQ_LIST(&state->compl_reqs);
+}
+
+/*
+ * Drop reference to request, return next in chain (if there is one) if this
+ * was the last reference to this request.
+ */
+static inline struct io_kiocb *io_put_req_find_next(struct io_kiocb *req)
+{
+ struct io_kiocb *nxt = NULL;
+
+ if (req_ref_put_and_test(req)) {
+ nxt = io_req_find_next(req);
+ __io_free_req(req);
+ }
+ return nxt;
+}
+
+static inline void io_put_req(struct io_kiocb *req)
+{
+ if (req_ref_put_and_test(req))
+ io_free_req(req);
+}
+
+static inline void io_put_req_deferred(struct io_kiocb *req)
+{
+ if (req_ref_put_and_test(req)) {
+ req->io_task_work.func = io_free_req_work;
+ io_req_task_work_add(req, false);
+ }
+}
+
+static unsigned io_cqring_events(struct io_ring_ctx *ctx)
+{
+ /* See comment at the top of this file */
+ smp_rmb();
+ return __io_cqring_events(ctx);
+}
+
+static inline unsigned int io_sqring_entries(struct io_ring_ctx *ctx)
+{
+ struct io_rings *rings = ctx->rings;
+
+ /* make sure SQ entry isn't read before tail */
+ return smp_load_acquire(&rings->sq.tail) - ctx->cached_sq_head;
+}
+
+static inline bool io_run_task_work(void)
+{
+ if (test_thread_flag(TIF_NOTIFY_SIGNAL) || task_work_pending(current)) {
+ __set_current_state(TASK_RUNNING);
+ clear_notify_signal();
+ if (task_work_pending(current))
+ task_work_run();
+ return true;
+ }
+
+ return false;
+}
+
+static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
+{
+ struct io_wq_work_node *pos, *start, *prev;
+ unsigned int poll_flags = BLK_POLL_NOSLEEP;
+ DEFINE_IO_COMP_BATCH(iob);
+ int nr_events = 0;
+
+ /*
+ * Only spin for completions if we don't have multiple devices hanging
+ * off our complete list.
+ */
+ if (ctx->poll_multi_queue || force_nonspin)
+ poll_flags |= BLK_POLL_ONESHOT;
+
+ wq_list_for_each(pos, start, &ctx->iopoll_list) {
+ struct io_kiocb *req = container_of(pos, struct io_kiocb, comp_list);
+ struct kiocb *kiocb = &req->rw.kiocb;
+ int ret;
+
+ /*
+ * Move completed and retryable entries to our local lists.
+ * If we find a request that requires polling, break out
+ * and complete those lists first, if we have entries there.
+ */
+ if (READ_ONCE(req->iopoll_completed))
+ break;
+
+ ret = kiocb->ki_filp->f_op->iopoll(kiocb, &iob, poll_flags);
+ if (unlikely(ret < 0))
+ return ret;
+ else if (ret)
+ poll_flags |= BLK_POLL_ONESHOT;
+
+ /* iopoll may have completed current req */
+ if (!rq_list_empty(iob.req_list) ||
+ READ_ONCE(req->iopoll_completed))
+ break;
+ }
+
+ if (!rq_list_empty(iob.req_list))
+ iob.complete(&iob);
+ else if (!pos)
+ return 0;
+
+ prev = start;
+ wq_list_for_each_resume(pos, prev) {
+ struct io_kiocb *req = container_of(pos, struct io_kiocb, comp_list);
+
+ /* order with io_complete_rw_iopoll(), e.g. ->result updates */
+ if (!smp_load_acquire(&req->iopoll_completed))
+ break;
+ nr_events++;
+ if (unlikely(req->flags & REQ_F_CQE_SKIP))
+ continue;
+ __io_fill_cqe_req(req, req->result, io_put_kbuf(req, 0));
+ }
+
+ if (unlikely(!nr_events))
+ return 0;
+
+ io_commit_cqring(ctx);
+ io_cqring_ev_posted_iopoll(ctx);
+ pos = start ? start->next : ctx->iopoll_list.first;
+ wq_list_cut(&ctx->iopoll_list, prev, start);
+ io_free_batch_list(ctx, pos);
+ return nr_events;
+}
+
+/*
+ * We can't just wait for polled events to come to us, we have to actively
+ * find and complete them.
+ */
+static __cold void io_iopoll_try_reap_events(struct io_ring_ctx *ctx)
+{
+ if (!(ctx->flags & IORING_SETUP_IOPOLL))
+ return;
+
+ mutex_lock(&ctx->uring_lock);
+ while (!wq_list_empty(&ctx->iopoll_list)) {
+ /* let it sleep and repeat later if can't complete a request */
+ if (io_do_iopoll(ctx, true) == 0)
+ break;
+ /*
+ * Ensure we allow local-to-the-cpu processing to take place,
+ * in this case we need to ensure that we reap all events.
+ * Also let task_work, etc. to progress by releasing the mutex
+ */
+ if (need_resched()) {
+ mutex_unlock(&ctx->uring_lock);
+ cond_resched();
+ mutex_lock(&ctx->uring_lock);
+ }
+ }
+ mutex_unlock(&ctx->uring_lock);
+}
+
+static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
+{
+ unsigned int nr_events = 0;
+ int ret = 0;
+
+ /*
+ * We disallow the app entering submit/complete with polling, but we
+ * still need to lock the ring to prevent racing with polled issue
+ * that got punted to a workqueue.
+ */
+ mutex_lock(&ctx->uring_lock);
+ /*
+ * Don't enter poll loop if we already have events pending.
+ * If we do, we can potentially be spinning for commands that
+ * already triggered a CQE (eg in error).
+ */
+ if (test_bit(0, &ctx->check_cq_overflow))
+ __io_cqring_overflow_flush(ctx, false);
+ if (io_cqring_events(ctx))
+ goto out;
+ do {
+ /*
+ * If a submit got punted to a workqueue, we can have the
+ * application entering polling for a command before it gets
+ * issued. That app will hold the uring_lock for the duration
+ * of the poll right here, so we need to take a breather every
+ * now and then to ensure that the issue has a chance to add
+ * the poll to the issued list. Otherwise we can spin here
+ * forever, while the workqueue is stuck trying to acquire the
+ * very same mutex.
+ */
+ if (wq_list_empty(&ctx->iopoll_list)) {
+ u32 tail = ctx->cached_cq_tail;
+
+ mutex_unlock(&ctx->uring_lock);
+ io_run_task_work();
+ mutex_lock(&ctx->uring_lock);
+
+ /* some requests don't go through iopoll_list */
+ if (tail != ctx->cached_cq_tail ||
+ wq_list_empty(&ctx->iopoll_list))
+ break;
+ }
+ ret = io_do_iopoll(ctx, !min);
+ if (ret < 0)
+ break;
+ nr_events += ret;
+ ret = 0;
+ } while (nr_events < min && !need_resched());
+out:
+ mutex_unlock(&ctx->uring_lock);
+ return ret;
+}
+
+static void kiocb_end_write(struct io_kiocb *req)
+{
+ /*
+ * Tell lockdep we inherited freeze protection from submission
+ * thread.
+ */
+ if (req->flags & REQ_F_ISREG) {
+ struct super_block *sb = file_inode(req->file)->i_sb;
+
+ __sb_writers_acquired(sb, SB_FREEZE_WRITE);
+ sb_end_write(sb);
+ }
+}
+
+#ifdef CONFIG_BLOCK
+static bool io_resubmit_prep(struct io_kiocb *req)
+{
+ struct io_async_rw *rw = req->async_data;
+
+ if (!req_has_async_data(req))
+ return !io_req_prep_async(req);
+ iov_iter_restore(&rw->s.iter, &rw->s.iter_state);
+ return true;
+}
+
+static bool io_rw_should_reissue(struct io_kiocb *req)
+{
+ umode_t mode = file_inode(req->file)->i_mode;
+ struct io_ring_ctx *ctx = req->ctx;
+
+ if (!S_ISBLK(mode) && !S_ISREG(mode))
+ return false;
+ if ((req->flags & REQ_F_NOWAIT) || (io_wq_current_is_worker() &&
+ !(ctx->flags & IORING_SETUP_IOPOLL)))
+ return false;
+ /*
+ * If ref is dying, we might be running poll reap from the exit work.
+ * Don't attempt to reissue from that path, just let it fail with
+ * -EAGAIN.
+ */
+ if (percpu_ref_is_dying(&ctx->refs))
+ return false;
+ /*
+ * Play it safe and assume not safe to re-import and reissue if we're
+ * not in the original thread group (or in task context).
+ */
+ if (!same_thread_group(req->task, current) || !in_task())
+ return false;
+ return true;
+}
+#else
+static bool io_resubmit_prep(struct io_kiocb *req)
+{
+ return false;
+}
+static bool io_rw_should_reissue(struct io_kiocb *req)
+{
+ return false;
+}
+#endif
+
+static bool __io_complete_rw_common(struct io_kiocb *req, long res)
+{
+ if (req->rw.kiocb.ki_flags & IOCB_WRITE) {
+ kiocb_end_write(req);
+ fsnotify_modify(req->file);
+ } else {
+ fsnotify_access(req->file);
+ }
+ if (unlikely(res != req->result)) {
+ if ((res == -EAGAIN || res == -EOPNOTSUPP) &&
+ io_rw_should_reissue(req)) {
+ req->flags |= REQ_F_REISSUE;
+ return true;
+ }
+ req_set_fail(req);
+ req->result = res;
+ }
+ return false;
+}
+
+static inline void io_req_task_complete(struct io_kiocb *req, bool *locked)
+{
+ int res = req->result;
+
+ if (*locked) {
+ io_req_complete_state(req, res, io_put_kbuf(req, 0));
+ io_req_add_compl_list(req);
+ } else {
+ io_req_complete_post(req, res,
+ io_put_kbuf(req, IO_URING_F_UNLOCKED));
+ }
+}
+
+static void __io_complete_rw(struct io_kiocb *req, long res,
+ unsigned int issue_flags)
+{
+ if (__io_complete_rw_common(req, res))
+ return;
+ __io_req_complete(req, issue_flags, req->result,
+ io_put_kbuf(req, issue_flags));
+}
+
+static void io_complete_rw(struct kiocb *kiocb, long res)
+{
+ struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw.kiocb);
+
+ if (__io_complete_rw_common(req, res))
+ return;
+ req->result = res;
+ req->io_task_work.func = io_req_task_complete;
+ io_req_task_work_add(req, !!(req->ctx->flags & IORING_SETUP_SQPOLL));
+}
+
+static void io_complete_rw_iopoll(struct kiocb *kiocb, long res)
+{
+ struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw.kiocb);
+
+ if (kiocb->ki_flags & IOCB_WRITE)
+ kiocb_end_write(req);
+ if (unlikely(res != req->result)) {
+ if (res == -EAGAIN && io_rw_should_reissue(req)) {
+ req->flags |= REQ_F_REISSUE;
+ return;
+ }
+ req->result = res;
+ }
+
+ /* order with io_iopoll_complete() checking ->iopoll_completed */
+ smp_store_release(&req->iopoll_completed, 1);
+}
+
+/*
+ * After the iocb has been issued, it's safe to be found on the poll list.
+ * Adding the kiocb to the list AFTER submission ensures that we don't
+ * find it from a io_do_iopoll() thread before the issuer is done
+ * accessing the kiocb cookie.
+ */
+static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ const bool needs_lock = issue_flags & IO_URING_F_UNLOCKED;
+
+ /* workqueue context doesn't hold uring_lock, grab it now */
+ if (unlikely(needs_lock))
+ mutex_lock(&ctx->uring_lock);
+
+ /*
+ * Track whether we have multiple files in our lists. This will impact
+ * how we do polling eventually, not spinning if we're on potentially
+ * different devices.
+ */
+ if (wq_list_empty(&ctx->iopoll_list)) {
+ ctx->poll_multi_queue = false;
+ } else if (!ctx->poll_multi_queue) {
+ struct io_kiocb *list_req;
+
+ list_req = container_of(ctx->iopoll_list.first, struct io_kiocb,
+ comp_list);
+ if (list_req->file != req->file)
+ ctx->poll_multi_queue = true;
+ }
+
+ /*
+ * For fast devices, IO may have already completed. If it has, add
+ * it to the front so we find it first.
+ */
+ if (READ_ONCE(req->iopoll_completed))
+ wq_list_add_head(&req->comp_list, &ctx->iopoll_list);
+ else
+ wq_list_add_tail(&req->comp_list, &ctx->iopoll_list);
+
+ if (unlikely(needs_lock)) {
+ /*
+ * If IORING_SETUP_SQPOLL is enabled, sqes are either handle
+ * in sq thread task context or in io worker task context. If
+ * current task context is sq thread, we don't need to check
+ * whether should wake up sq thread.
+ */
+ if ((ctx->flags & IORING_SETUP_SQPOLL) &&
+ wq_has_sleeper(&ctx->sq_data->wait))
+ wake_up(&ctx->sq_data->wait);
+
+ mutex_unlock(&ctx->uring_lock);
+ }
+}
+
+static bool io_bdev_nowait(struct block_device *bdev)
+{
+ return !bdev || blk_queue_nowait(bdev_get_queue(bdev));
+}
+
+/*
+ * If we tracked the file through the SCM inflight mechanism, we could support
+ * any file. For now, just ensure that anything potentially problematic is done
+ * inline.
+ */
+static bool __io_file_supports_nowait(struct file *file, umode_t mode)
+{
+ if (S_ISBLK(mode)) {
+ if (IS_ENABLED(CONFIG_BLOCK) &&
+ io_bdev_nowait(I_BDEV(file->f_mapping->host)))
+ return true;
+ return false;
+ }
+ if (S_ISSOCK(mode))
+ return true;
+ if (S_ISREG(mode)) {
+ if (IS_ENABLED(CONFIG_BLOCK) &&
+ io_bdev_nowait(file->f_inode->i_sb->s_bdev) &&
+ file->f_op != &io_uring_fops)
+ return true;
+ return false;
+ }
+
+ /* any ->read/write should understand O_NONBLOCK */
+ if (file->f_flags & O_NONBLOCK)
+ return true;
+ return file->f_mode & FMODE_NOWAIT;
+}
+
+/*
+ * If we tracked the file through the SCM inflight mechanism, we could support
+ * any file. For now, just ensure that anything potentially problematic is done
+ * inline.
+ */
+static unsigned int io_file_get_flags(struct file *file)
+{
+ umode_t mode = file_inode(file)->i_mode;
+ unsigned int res = 0;
+
+ if (S_ISREG(mode))
+ res |= FFS_ISREG;
+ if (__io_file_supports_nowait(file, mode))
+ res |= FFS_NOWAIT;
+ return res;
+}
+
+static inline bool io_file_supports_nowait(struct io_kiocb *req)
+{
+ return req->flags & REQ_F_SUPPORT_NOWAIT;
+}
+
+static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ struct kiocb *kiocb = &req->rw.kiocb;
+ unsigned ioprio;
+ int ret;
+
+ kiocb->ki_pos = READ_ONCE(sqe->off);
+ /* used for fixed read/write too - just read unconditionally */
+ req->buf_index = READ_ONCE(sqe->buf_index);
+ req->imu = NULL;
+
+ if (req->opcode == IORING_OP_READ_FIXED ||
+ req->opcode == IORING_OP_WRITE_FIXED) {
+ struct io_ring_ctx *ctx = req->ctx;
+ u16 index;
+
+ if (unlikely(req->buf_index >= ctx->nr_user_bufs))
+ return -EFAULT;
+ index = array_index_nospec(req->buf_index, ctx->nr_user_bufs);
+ req->imu = ctx->user_bufs[index];
+ io_req_set_rsrc_node(req, ctx, 0);
+ }
+
+ ioprio = READ_ONCE(sqe->ioprio);
+ if (ioprio) {
+ ret = ioprio_check_cap(ioprio);
+ if (ret)
+ return ret;
+
+ kiocb->ki_ioprio = ioprio;
+ } else {
+ kiocb->ki_ioprio = get_current_ioprio();
+ }
+
+ req->rw.addr = READ_ONCE(sqe->addr);
+ req->rw.len = READ_ONCE(sqe->len);
+ req->rw.flags = READ_ONCE(sqe->rw_flags);
+ return 0;
+}
+
+static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret)
+{
+ switch (ret) {
+ case -EIOCBQUEUED:
+ break;
+ case -ERESTARTSYS:
+ case -ERESTARTNOINTR:
+ case -ERESTARTNOHAND:
+ case -ERESTART_RESTARTBLOCK:
+ /*
+ * We can't just restart the syscall, since previously
+ * submitted sqes may already be in progress. Just fail this
+ * IO with EINTR.
+ */
+ ret = -EINTR;
+ fallthrough;
+ default:
+ kiocb->ki_complete(kiocb, ret);
+ }
+}
+
+static inline loff_t *io_kiocb_update_pos(struct io_kiocb *req)
+{
+ struct kiocb *kiocb = &req->rw.kiocb;
+
+ if (kiocb->ki_pos != -1)
+ return &kiocb->ki_pos;
+
+ if (!(req->file->f_mode & FMODE_STREAM)) {
+ req->flags |= REQ_F_CUR_POS;
+ kiocb->ki_pos = req->file->f_pos;
+ return &kiocb->ki_pos;
+ }
+
+ kiocb->ki_pos = 0;
+ return NULL;
+}
+
+static void kiocb_done(struct io_kiocb *req, ssize_t ret,
+ unsigned int issue_flags)
+{
+ struct io_async_rw *io = req->async_data;
+
+ /* add previously done IO, if any */
+ if (req_has_async_data(req) && io->bytes_done > 0) {
+ if (ret < 0)
+ ret = io->bytes_done;
+ else
+ ret += io->bytes_done;
+ }
+
+ if (req->flags & REQ_F_CUR_POS)
+ req->file->f_pos = req->rw.kiocb.ki_pos;
+ if (ret >= 0 && (req->rw.kiocb.ki_complete == io_complete_rw))
+ __io_complete_rw(req, ret, issue_flags);
+ else
+ io_rw_done(&req->rw.kiocb, ret);
+
+ if (req->flags & REQ_F_REISSUE) {
+ req->flags &= ~REQ_F_REISSUE;
+ if (io_resubmit_prep(req))
+ io_req_task_queue_reissue(req);
+ else
+ io_req_task_queue_fail(req, ret);
+ }
+}
+
+static int __io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter,
+ struct io_mapped_ubuf *imu)
+{
+ size_t len = req->rw.len;
+ u64 buf_end, buf_addr = req->rw.addr;
+ size_t offset;
+
+ if (unlikely(check_add_overflow(buf_addr, (u64)len, &buf_end)))
+ return -EFAULT;
+ /* not inside the mapped region */
+ if (unlikely(buf_addr < imu->ubuf || buf_end > imu->ubuf_end))
+ return -EFAULT;
+
+ /*
+ * May not be a start of buffer, set size appropriately
+ * and advance us to the beginning.
+ */
+ offset = buf_addr - imu->ubuf;
+ iov_iter_bvec(iter, rw, imu->bvec, imu->nr_bvecs, offset + len);
+
+ if (offset) {
+ /*
+ * Don't use iov_iter_advance() here, as it's really slow for
+ * using the latter parts of a big fixed buffer - it iterates
+ * over each segment manually. We can cheat a bit here, because
+ * we know that:
+ *
+ * 1) it's a BVEC iter, we set it up
+ * 2) all bvecs are PAGE_SIZE in size, except potentially the
+ * first and last bvec
+ *
+ * So just find our index, and adjust the iterator afterwards.
+ * If the offset is within the first bvec (or the whole first
+ * bvec, just use iov_iter_advance(). This makes it easier
+ * since we can just skip the first segment, which may not
+ * be PAGE_SIZE aligned.
+ */
+ const struct bio_vec *bvec = imu->bvec;
+
+ if (offset <= bvec->bv_len) {
+ iov_iter_advance(iter, offset);
+ } else {
+ unsigned long seg_skip;
+
+ /* skip first vec */
+ offset -= bvec->bv_len;
+ seg_skip = 1 + (offset >> PAGE_SHIFT);
+
+ iter->bvec = bvec + seg_skip;
+ iter->nr_segs -= seg_skip;
+ iter->count -= bvec->bv_len + offset;
+ iter->iov_offset = offset & ~PAGE_MASK;
+ }
+ }
+
+ return 0;
+}
+
+static int io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter,
+ unsigned int issue_flags)
+{
+ if (WARN_ON_ONCE(!req->imu))
+ return -EFAULT;
+ return __io_import_fixed(req, rw, iter, req->imu);
+}
+
+static void io_ring_submit_unlock(struct io_ring_ctx *ctx, bool needs_lock)
+{
+ if (needs_lock)
+ mutex_unlock(&ctx->uring_lock);
+}
+
+static void io_ring_submit_lock(struct io_ring_ctx *ctx, bool needs_lock)
+{
+ /*
+ * "Normal" inline submissions always hold the uring_lock, since we
+ * grab it from the system call. Same is true for the SQPOLL offload.
+ * The only exception is when we've detached the request and issue it
+ * from an async worker thread, grab the lock for that case.
+ */
+ if (needs_lock)
+ mutex_lock(&ctx->uring_lock);
+}
+
+static void io_buffer_add_list(struct io_ring_ctx *ctx,
+ struct io_buffer_list *bl, unsigned int bgid)
+{
+ struct list_head *list;
+
+ list = &ctx->io_buffers[hash_32(bgid, IO_BUFFERS_HASH_BITS)];
+ INIT_LIST_HEAD(&bl->buf_list);
+ bl->bgid = bgid;
+ list_add(&bl->list, list);
+}
+
+static struct io_buffer *io_buffer_select(struct io_kiocb *req, size_t *len,
+ int bgid, unsigned int issue_flags)
+{
+ struct io_buffer *kbuf = req->kbuf;
+ bool needs_lock = issue_flags & IO_URING_F_UNLOCKED;
+ struct io_ring_ctx *ctx = req->ctx;
+ struct io_buffer_list *bl;
+
+ if (req->flags & REQ_F_BUFFER_SELECTED)
+ return kbuf;
+
+ io_ring_submit_lock(ctx, needs_lock);
+
+ lockdep_assert_held(&ctx->uring_lock);
+
+ bl = io_buffer_get_list(ctx, bgid);
+ if (bl && !list_empty(&bl->buf_list)) {
+ kbuf = list_first_entry(&bl->buf_list, struct io_buffer, list);
+ list_del(&kbuf->list);
+ if (*len > kbuf->len)
+ *len = kbuf->len;
+ req->flags |= REQ_F_BUFFER_SELECTED;
+ req->kbuf = kbuf;
+ } else {
+ kbuf = ERR_PTR(-ENOBUFS);
+ }
+
+ io_ring_submit_unlock(req->ctx, needs_lock);
+ return kbuf;
+}
+
+static void __user *io_rw_buffer_select(struct io_kiocb *req, size_t *len,
+ unsigned int issue_flags)
+{
+ struct io_buffer *kbuf;
+ u16 bgid;
+
+ bgid = req->buf_index;
+ kbuf = io_buffer_select(req, len, bgid, issue_flags);
+ if (IS_ERR(kbuf))
+ return kbuf;
+ return u64_to_user_ptr(kbuf->addr);
+}
+
+#ifdef CONFIG_COMPAT
+static ssize_t io_compat_import(struct io_kiocb *req, struct iovec *iov,
+ unsigned int issue_flags)
+{
+ struct compat_iovec __user *uiov;
+ compat_ssize_t clen;
+ void __user *buf;
+ ssize_t len;
+
+ uiov = u64_to_user_ptr(req->rw.addr);
+ if (!access_ok(uiov, sizeof(*uiov)))
+ return -EFAULT;
+ if (__get_user(clen, &uiov->iov_len))
+ return -EFAULT;
+ if (clen < 0)
+ return -EINVAL;
+
+ len = clen;
+ buf = io_rw_buffer_select(req, &len, issue_flags);
+ if (IS_ERR(buf))
+ return PTR_ERR(buf);
+ iov[0].iov_base = buf;
+ iov[0].iov_len = (compat_size_t) len;
+ return 0;
+}
+#endif
+
+static ssize_t __io_iov_buffer_select(struct io_kiocb *req, struct iovec *iov,
+ unsigned int issue_flags)
+{
+ struct iovec __user *uiov = u64_to_user_ptr(req->rw.addr);
+ void __user *buf;
+ ssize_t len;
+
+ if (copy_from_user(iov, uiov, sizeof(*uiov)))
+ return -EFAULT;
+
+ len = iov[0].iov_len;
+ if (len < 0)
+ return -EINVAL;
+ buf = io_rw_buffer_select(req, &len, issue_flags);
+ if (IS_ERR(buf))
+ return PTR_ERR(buf);
+ iov[0].iov_base = buf;
+ iov[0].iov_len = len;
+ return 0;
+}
+
+static ssize_t io_iov_buffer_select(struct io_kiocb *req, struct iovec *iov,
+ unsigned int issue_flags)
+{
+ if (req->flags & REQ_F_BUFFER_SELECTED) {
+ struct io_buffer *kbuf = req->kbuf;
+
+ iov[0].iov_base = u64_to_user_ptr(kbuf->addr);
+ iov[0].iov_len = kbuf->len;
+ return 0;
+ }
+ if (req->rw.len != 1)
+ return -EINVAL;
+
+#ifdef CONFIG_COMPAT
+ if (req->ctx->compat)
+ return io_compat_import(req, iov, issue_flags);
+#endif
+
+ return __io_iov_buffer_select(req, iov, issue_flags);
+}
+
+static inline bool io_do_buffer_select(struct io_kiocb *req)
+{
+ if (!(req->flags & REQ_F_BUFFER_SELECT))
+ return false;
+ return !(req->flags & REQ_F_BUFFER_SELECTED);
+}
+
+static struct iovec *__io_import_iovec(int rw, struct io_kiocb *req,
+ struct io_rw_state *s,
+ unsigned int issue_flags)
+{
+ struct iov_iter *iter = &s->iter;
+ u8 opcode = req->opcode;
+ struct iovec *iovec;
+ void __user *buf;
+ size_t sqe_len;
+ ssize_t ret;
+
+ if (opcode == IORING_OP_READ_FIXED || opcode == IORING_OP_WRITE_FIXED) {
+ ret = io_import_fixed(req, rw, iter, issue_flags);
+ if (ret)
+ return ERR_PTR(ret);
+ return NULL;
+ }
+
+ /* buffer index only valid with fixed read/write, or buffer select */
+ if (unlikely(req->buf_index && !(req->flags & REQ_F_BUFFER_SELECT)))
+ return ERR_PTR(-EINVAL);
+
+ buf = u64_to_user_ptr(req->rw.addr);
+ sqe_len = req->rw.len;
+
+ if (opcode == IORING_OP_READ || opcode == IORING_OP_WRITE) {
+ if (req->flags & REQ_F_BUFFER_SELECT) {
+ buf = io_rw_buffer_select(req, &sqe_len, issue_flags);
+ if (IS_ERR(buf))
+ return ERR_CAST(buf);
+ req->rw.len = sqe_len;
+ }
+
+ ret = import_single_range(rw, buf, sqe_len, s->fast_iov, iter);
+ if (ret)
+ return ERR_PTR(ret);
+ return NULL;
+ }
+
+ iovec = s->fast_iov;
+ if (req->flags & REQ_F_BUFFER_SELECT) {
+ ret = io_iov_buffer_select(req, iovec, issue_flags);
+ if (ret)
+ return ERR_PTR(ret);
+ iov_iter_init(iter, rw, iovec, 1, iovec->iov_len);
+ return NULL;
+ }
+
+ ret = __import_iovec(rw, buf, sqe_len, UIO_FASTIOV, &iovec, iter,
+ req->ctx->compat);
+ if (unlikely(ret < 0))
+ return ERR_PTR(ret);
+ return iovec;
+}
+
+static inline int io_import_iovec(int rw, struct io_kiocb *req,
+ struct iovec **iovec, struct io_rw_state *s,
+ unsigned int issue_flags)
+{
+ *iovec = __io_import_iovec(rw, req, s, issue_flags);
+ if (unlikely(IS_ERR(*iovec)))
+ return PTR_ERR(*iovec);
+
+ iov_iter_save_state(&s->iter, &s->iter_state);
+ return 0;
+}
+
+static inline loff_t *io_kiocb_ppos(struct kiocb *kiocb)
+{
+ return (kiocb->ki_filp->f_mode & FMODE_STREAM) ? NULL : &kiocb->ki_pos;
+}
+
+/*
+ * For files that don't have ->read_iter() and ->write_iter(), handle them
+ * by looping over ->read() or ->write() manually.
+ */
+static ssize_t loop_rw_iter(int rw, struct io_kiocb *req, struct iov_iter *iter)
+{
+ struct kiocb *kiocb = &req->rw.kiocb;
+ struct file *file = req->file;
+ ssize_t ret = 0;
+ loff_t *ppos;
+
+ /*
+ * Don't support polled IO through this interface, and we can't
+ * support non-blocking either. For the latter, this just causes
+ * the kiocb to be handled from an async context.
+ */
+ if (kiocb->ki_flags & IOCB_HIPRI)
+ return -EOPNOTSUPP;
+ if ((kiocb->ki_flags & IOCB_NOWAIT) &&
+ !(kiocb->ki_filp->f_flags & O_NONBLOCK))
+ return -EAGAIN;
+
+ ppos = io_kiocb_ppos(kiocb);
+
+ while (iov_iter_count(iter)) {
+ struct iovec iovec;
+ ssize_t nr;
+
+ if (!iov_iter_is_bvec(iter)) {
+ iovec = iov_iter_iovec(iter);
+ } else {
+ iovec.iov_base = u64_to_user_ptr(req->rw.addr);
+ iovec.iov_len = req->rw.len;
+ }
+
+ if (rw == READ) {
+ nr = file->f_op->read(file, iovec.iov_base,
+ iovec.iov_len, ppos);
+ } else {
+ nr = file->f_op->write(file, iovec.iov_base,
+ iovec.iov_len, ppos);
+ }
+
+ if (nr < 0) {
+ if (!ret)
+ ret = nr;
+ break;
+ }
+ ret += nr;
+ if (!iov_iter_is_bvec(iter)) {
+ iov_iter_advance(iter, nr);
+ } else {
+ req->rw.addr += nr;
+ req->rw.len -= nr;
+ if (!req->rw.len)
+ break;
+ }
+ if (nr != iovec.iov_len)
+ break;
+ }
+
+ return ret;
+}
+
+static void io_req_map_rw(struct io_kiocb *req, const struct iovec *iovec,
+ const struct iovec *fast_iov, struct iov_iter *iter)
+{
+ struct io_async_rw *rw = req->async_data;
+
+ memcpy(&rw->s.iter, iter, sizeof(*iter));
+ rw->free_iovec = iovec;
+ rw->bytes_done = 0;
+ /* can only be fixed buffers, no need to do anything */
+ if (iov_iter_is_bvec(iter))
+ return;
+ if (!iovec) {
+ unsigned iov_off = 0;
+
+ rw->s.iter.iov = rw->s.fast_iov;
+ if (iter->iov != fast_iov) {
+ iov_off = iter->iov - fast_iov;
+ rw->s.iter.iov += iov_off;
+ }
+ if (rw->s.fast_iov != fast_iov)
+ memcpy(rw->s.fast_iov + iov_off, fast_iov + iov_off,
+ sizeof(struct iovec) * iter->nr_segs);
+ } else {
+ req->flags |= REQ_F_NEED_CLEANUP;
+ }
+}
+
+static inline bool io_alloc_async_data(struct io_kiocb *req)
+{
+ WARN_ON_ONCE(!io_op_defs[req->opcode].async_size);
+ req->async_data = kmalloc(io_op_defs[req->opcode].async_size, GFP_KERNEL);
+ if (req->async_data) {
+ req->flags |= REQ_F_ASYNC_DATA;
+ return false;
+ }
+ return true;
+}
+
+static int io_setup_async_rw(struct io_kiocb *req, const struct iovec *iovec,
+ struct io_rw_state *s, bool force)
+{
+ if (!force && !io_op_defs[req->opcode].needs_async_setup)
+ return 0;
+ if (!req_has_async_data(req)) {
+ struct io_async_rw *iorw;
+
+ if (io_alloc_async_data(req)) {
+ kfree(iovec);
+ return -ENOMEM;
+ }
+
+ io_req_map_rw(req, iovec, s->fast_iov, &s->iter);
+ iorw = req->async_data;
+ /* we've copied and mapped the iter, ensure state is saved */
+ iov_iter_save_state(&iorw->s.iter, &iorw->s.iter_state);
+ }
+ return 0;
+}
+
+static inline int io_rw_prep_async(struct io_kiocb *req, int rw)
+{
+ struct io_async_rw *iorw = req->async_data;
+ struct iovec *iov;
+ int ret;
+
+ /* submission path, ->uring_lock should already be taken */
+ ret = io_import_iovec(rw, req, &iov, &iorw->s, 0);
+ if (unlikely(ret < 0))
+ return ret;
+
+ iorw->bytes_done = 0;
+ iorw->free_iovec = iov;
+ if (iov)
+ req->flags |= REQ_F_NEED_CLEANUP;
+ return 0;
+}
+
+/*
+ * This is our waitqueue callback handler, registered through __folio_lock_async()
+ * when we initially tried to do the IO with the iocb armed our waitqueue.
+ * This gets called when the page is unlocked, and we generally expect that to
+ * happen when the page IO is completed and the page is now uptodate. This will
+ * queue a task_work based retry of the operation, attempting to copy the data
+ * again. If the latter fails because the page was NOT uptodate, then we will
+ * do a thread based blocking retry of the operation. That's the unexpected
+ * slow path.
+ */
+static int io_async_buf_func(struct wait_queue_entry *wait, unsigned mode,
+ int sync, void *arg)
+{
+ struct wait_page_queue *wpq;
+ struct io_kiocb *req = wait->private;
+ struct wait_page_key *key = arg;
+
+ wpq = container_of(wait, struct wait_page_queue, wait);
+
+ if (!wake_page_match(wpq, key))
+ return 0;
+
+ req->rw.kiocb.ki_flags &= ~IOCB_WAITQ;
+ list_del_init(&wait->entry);
+ io_req_task_queue(req);
+ return 1;
+}
+
+/*
+ * This controls whether a given IO request should be armed for async page
+ * based retry. If we return false here, the request is handed to the async
+ * worker threads for retry. If we're doing buffered reads on a regular file,
+ * we prepare a private wait_page_queue entry and retry the operation. This
+ * will either succeed because the page is now uptodate and unlocked, or it
+ * will register a callback when the page is unlocked at IO completion. Through
+ * that callback, io_uring uses task_work to setup a retry of the operation.
+ * That retry will attempt the buffered read again. The retry will generally
+ * succeed, or in rare cases where it fails, we then fall back to using the
+ * async worker threads for a blocking retry.
+ */
+static bool io_rw_should_retry(struct io_kiocb *req)
+{
+ struct io_async_rw *rw = req->async_data;
+ struct wait_page_queue *wait = &rw->wpq;
+ struct kiocb *kiocb = &req->rw.kiocb;
+
+ /* never retry for NOWAIT, we just complete with -EAGAIN */
+ if (req->flags & REQ_F_NOWAIT)
+ return false;
+
+ /* Only for buffered IO */
+ if (kiocb->ki_flags & (IOCB_DIRECT | IOCB_HIPRI))
+ return false;
+
+ /*
+ * just use poll if we can, and don't attempt if the fs doesn't
+ * support callback based unlocks
+ */
+ if (file_can_poll(req->file) || !(req->file->f_mode & FMODE_BUF_RASYNC))
+ return false;
+
+ wait->wait.func = io_async_buf_func;
+ wait->wait.private = req;
+ wait->wait.flags = 0;
+ INIT_LIST_HEAD(&wait->wait.entry);
+ kiocb->ki_flags |= IOCB_WAITQ;
+ kiocb->ki_flags &= ~IOCB_NOWAIT;
+ kiocb->ki_waitq = wait;
+ return true;
+}
+
+static inline int io_iter_do_read(struct io_kiocb *req, struct iov_iter *iter)
+{
+ if (likely(req->file->f_op->read_iter))
+ return call_read_iter(req->file, &req->rw.kiocb, iter);
+ else if (req->file->f_op->read)
+ return loop_rw_iter(READ, req, iter);
+ else
+ return -EINVAL;
+}
+
+static bool need_read_all(struct io_kiocb *req)
+{
+ return req->flags & REQ_F_ISREG ||
+ S_ISBLK(file_inode(req->file)->i_mode);
+}
+
+static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
+{
+ struct kiocb *kiocb = &req->rw.kiocb;
+ struct io_ring_ctx *ctx = req->ctx;
+ struct file *file = req->file;
+ int ret;
+
+ if (unlikely(!file || !(file->f_mode & mode)))
+ return -EBADF;
+
+ if (!io_req_ffs_set(req))
+ req->flags |= io_file_get_flags(file) << REQ_F_SUPPORT_NOWAIT_BIT;
+
+ kiocb->ki_flags = iocb_flags(file);
+ ret = kiocb_set_rw_flags(kiocb, req->rw.flags);
+ if (unlikely(ret))
+ return ret;
+
+ /*
+ * If the file is marked O_NONBLOCK, still allow retry for it if it
+ * supports async. Otherwise it's impossible to use O_NONBLOCK files
+ * reliably. If not, or it IOCB_NOWAIT is set, don't retry.
+ */
+ if ((kiocb->ki_flags & IOCB_NOWAIT) ||
+ ((file->f_flags & O_NONBLOCK) && !io_file_supports_nowait(req)))
+ req->flags |= REQ_F_NOWAIT;
+
+ if (ctx->flags & IORING_SETUP_IOPOLL) {
+ if (!(kiocb->ki_flags & IOCB_DIRECT) || !file->f_op->iopoll)
+ return -EOPNOTSUPP;
+
+ kiocb->private = NULL;
+ kiocb->ki_flags |= IOCB_HIPRI | IOCB_ALLOC_CACHE;
+ kiocb->ki_complete = io_complete_rw_iopoll;
+ req->iopoll_completed = 0;
+ } else {
+ if (kiocb->ki_flags & IOCB_HIPRI)
+ return -EINVAL;
+ kiocb->ki_complete = io_complete_rw;
+ }
+
+ return 0;
+}
+
+static int io_read(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_rw_state __s, *s = &__s;
+ struct iovec *iovec;
+ struct kiocb *kiocb = &req->rw.kiocb;
+ bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
+ struct io_async_rw *rw;
+ ssize_t ret, ret2;
+ loff_t *ppos;
+
+ if (!req_has_async_data(req)) {
+ ret = io_import_iovec(READ, req, &iovec, s, issue_flags);
+ if (unlikely(ret < 0))
+ return ret;
+ } else {
+ rw = req->async_data;
+ s = &rw->s;
+
+ /*
+ * Safe and required to re-import if we're using provided
+ * buffers, as we dropped the selected one before retry.
+ */
+ if (io_do_buffer_select(req)) {
+ ret = io_import_iovec(READ, req, &iovec, s, issue_flags);
+ if (unlikely(ret < 0))
+ return ret;
+ }
+
+ /*
+ * We come here from an earlier attempt, restore our state to
+ * match in case it doesn't. It's cheap enough that we don't
+ * need to make this conditional.
+ */
+ iov_iter_restore(&s->iter, &s->iter_state);
+ iovec = NULL;
+ }
+ ret = io_rw_init_file(req, FMODE_READ);
+ if (unlikely(ret)) {
+ kfree(iovec);
+ return ret;
+ }
+ req->result = iov_iter_count(&s->iter);
+
+ if (force_nonblock) {
+ /* If the file doesn't support async, just async punt */
+ if (unlikely(!io_file_supports_nowait(req))) {
+ ret = io_setup_async_rw(req, iovec, s, true);
+ return ret ?: -EAGAIN;
+ }
+ kiocb->ki_flags |= IOCB_NOWAIT;
+ } else {
+ /* Ensure we clear previously set non-block flag */
+ kiocb->ki_flags &= ~IOCB_NOWAIT;
+ }
+
+ ppos = io_kiocb_update_pos(req);
+
+ ret = rw_verify_area(READ, req->file, ppos, req->result);
+ if (unlikely(ret)) {
+ kfree(iovec);
+ return ret;
+ }
+
+ ret = io_iter_do_read(req, &s->iter);
+
+ if (ret == -EAGAIN || (req->flags & REQ_F_REISSUE)) {
+ req->flags &= ~REQ_F_REISSUE;
+ /* if we can poll, just do that */
+ if (req->opcode == IORING_OP_READ && file_can_poll(req->file))
+ return -EAGAIN;
+ /* IOPOLL retry should happen for io-wq threads */
+ if (!force_nonblock && !(req->ctx->flags & IORING_SETUP_IOPOLL))
+ goto done;
+ /* no retry on NONBLOCK nor RWF_NOWAIT */
+ if (req->flags & REQ_F_NOWAIT)
+ goto done;
+ ret = 0;
+ } else if (ret == -EIOCBQUEUED) {
+ goto out_free;
+ } else if (ret == req->result || ret <= 0 || !force_nonblock ||
+ (req->flags & REQ_F_NOWAIT) || !need_read_all(req)) {
+ /* read all, failed, already did sync or don't want to retry */
+ goto done;
+ }
+
+ /*
+ * Don't depend on the iter state matching what was consumed, or being
+ * untouched in case of error. Restore it and we'll advance it
+ * manually if we need to.
+ */
+ iov_iter_restore(&s->iter, &s->iter_state);
+
+ ret2 = io_setup_async_rw(req, iovec, s, true);
+ if (ret2)
+ return ret2;
+
+ iovec = NULL;
+ rw = req->async_data;
+ s = &rw->s;
+ /*
+ * Now use our persistent iterator and state, if we aren't already.
+ * We've restored and mapped the iter to match.
+ */
+
+ do {
+ /*
+ * We end up here because of a partial read, either from
+ * above or inside this loop. Advance the iter by the bytes
+ * that were consumed.
+ */
+ iov_iter_advance(&s->iter, ret);
+ if (!iov_iter_count(&s->iter))
+ break;
+ rw->bytes_done += ret;
+ iov_iter_save_state(&s->iter, &s->iter_state);
+
+ /* if we can retry, do so with the callbacks armed */
+ if (!io_rw_should_retry(req)) {
+ kiocb->ki_flags &= ~IOCB_WAITQ;
+ return -EAGAIN;
+ }
+
+ /*
+ * Now retry read with the IOCB_WAITQ parts set in the iocb. If
+ * we get -EIOCBQUEUED, then we'll get a notification when the
+ * desired page gets unlocked. We can also get a partial read
+ * here, and if we do, then just retry at the new offset.
+ */
+ ret = io_iter_do_read(req, &s->iter);
+ if (ret == -EIOCBQUEUED)
+ return 0;
+ /* we got some bytes, but not all. retry. */
+ kiocb->ki_flags &= ~IOCB_WAITQ;
+ iov_iter_restore(&s->iter, &s->iter_state);
+ } while (ret > 0);
+done:
+ kiocb_done(req, ret, issue_flags);
+out_free:
+ /* it's faster to check here then delegate to kfree */
+ if (iovec)
+ kfree(iovec);
+ return 0;
+}
+
+static int io_write(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_rw_state __s, *s = &__s;
+ struct iovec *iovec;
+ struct kiocb *kiocb = &req->rw.kiocb;
+ bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
+ ssize_t ret, ret2;
+ loff_t *ppos;
+
+ if (!req_has_async_data(req)) {
+ ret = io_import_iovec(WRITE, req, &iovec, s, issue_flags);
+ if (unlikely(ret < 0))
+ return ret;
+ } else {
+ struct io_async_rw *rw = req->async_data;
+
+ s = &rw->s;
+ iov_iter_restore(&s->iter, &s->iter_state);
+ iovec = NULL;
+ }
+ ret = io_rw_init_file(req, FMODE_WRITE);
+ if (unlikely(ret)) {
+ kfree(iovec);
+ return ret;
+ }
+ req->result = iov_iter_count(&s->iter);
+
+ if (force_nonblock) {
+ /* If the file doesn't support async, just async punt */
+ if (unlikely(!io_file_supports_nowait(req)))
+ goto copy_iov;
+
+ /* file path doesn't support NOWAIT for non-direct_IO */
+ if (force_nonblock && !(kiocb->ki_flags & IOCB_DIRECT) &&
+ (req->flags & REQ_F_ISREG))
+ goto copy_iov;
+
+ kiocb->ki_flags |= IOCB_NOWAIT;
+ } else {
+ /* Ensure we clear previously set non-block flag */
+ kiocb->ki_flags &= ~IOCB_NOWAIT;
+ }
+
+ ppos = io_kiocb_update_pos(req);
+
+ ret = rw_verify_area(WRITE, req->file, ppos, req->result);
+ if (unlikely(ret))
+ goto out_free;
+
+ /*
+ * Open-code file_start_write here to grab freeze protection,
+ * which will be released by another thread in
+ * io_complete_rw(). Fool lockdep by telling it the lock got
+ * released so that it doesn't complain about the held lock when
+ * we return to userspace.
+ */
+ if (req->flags & REQ_F_ISREG) {
+ sb_start_write(file_inode(req->file)->i_sb);
+ __sb_writers_release(file_inode(req->file)->i_sb,
+ SB_FREEZE_WRITE);
+ }
+ kiocb->ki_flags |= IOCB_WRITE;
+
+ if (likely(req->file->f_op->write_iter))
+ ret2 = call_write_iter(req->file, kiocb, &s->iter);
+ else if (req->file->f_op->write)
+ ret2 = loop_rw_iter(WRITE, req, &s->iter);
+ else
+ ret2 = -EINVAL;
+
+ if (req->flags & REQ_F_REISSUE) {
+ req->flags &= ~REQ_F_REISSUE;
+ ret2 = -EAGAIN;
+ }
+
+ /*
+ * Raw bdev writes will return -EOPNOTSUPP for IOCB_NOWAIT. Just
+ * retry them without IOCB_NOWAIT.
+ */
+ if (ret2 == -EOPNOTSUPP && (kiocb->ki_flags & IOCB_NOWAIT))
+ ret2 = -EAGAIN;
+ /* no retry on NONBLOCK nor RWF_NOWAIT */
+ if (ret2 == -EAGAIN && (req->flags & REQ_F_NOWAIT))
+ goto done;
+ if (!force_nonblock || ret2 != -EAGAIN) {
+ /* IOPOLL retry should happen for io-wq threads */
+ if (ret2 == -EAGAIN && (req->ctx->flags & IORING_SETUP_IOPOLL))
+ goto copy_iov;
+done:
+ kiocb_done(req, ret2, issue_flags);
+ } else {
+copy_iov:
+ iov_iter_restore(&s->iter, &s->iter_state);
+ ret = io_setup_async_rw(req, iovec, s, false);
+ return ret ?: -EAGAIN;
+ }
+out_free:
+ /* it's reportedly faster than delegating the null check to kfree() */
+ if (iovec)
+ kfree(iovec);
+ return ret;
+}
+
+static int io_renameat_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+ struct io_rename *ren = &req->rename;
+ const char __user *oldf, *newf;
+
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (sqe->ioprio || sqe->buf_index || sqe->splice_fd_in)
+ return -EINVAL;
+ if (unlikely(req->flags & REQ_F_FIXED_FILE))
+ return -EBADF;
+
+ ren->old_dfd = READ_ONCE(sqe->fd);
+ oldf = u64_to_user_ptr(READ_ONCE(sqe->addr));
+ newf = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+ ren->new_dfd = READ_ONCE(sqe->len);
+ ren->flags = READ_ONCE(sqe->rename_flags);
+
+ ren->oldpath = getname(oldf);
+ if (IS_ERR(ren->oldpath))
+ return PTR_ERR(ren->oldpath);
+
+ ren->newpath = getname(newf);
+ if (IS_ERR(ren->newpath)) {
+ putname(ren->oldpath);
+ return PTR_ERR(ren->newpath);
+ }
+
+ req->flags |= REQ_F_NEED_CLEANUP;
+ return 0;
+}
+
+static int io_renameat(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_rename *ren = &req->rename;
+ int ret;
+
+ if (issue_flags & IO_URING_F_NONBLOCK)
+ return -EAGAIN;
+
+ ret = do_renameat2(ren->old_dfd, ren->oldpath, ren->new_dfd,
+ ren->newpath, ren->flags);
+
+ req->flags &= ~REQ_F_NEED_CLEANUP;
+ if (ret < 0)
+ req_set_fail(req);
+ io_req_complete(req, ret);
+ return 0;
+}
+
+static int io_unlinkat_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+ struct io_unlink *un = &req->unlink;
+ const char __user *fname;
+
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (sqe->ioprio || sqe->off || sqe->len || sqe->buf_index ||
+ sqe->splice_fd_in)
+ return -EINVAL;
+ if (unlikely(req->flags & REQ_F_FIXED_FILE))
+ return -EBADF;
+
+ un->dfd = READ_ONCE(sqe->fd);
+
+ un->flags = READ_ONCE(sqe->unlink_flags);
+ if (un->flags & ~AT_REMOVEDIR)
+ return -EINVAL;
+
+ fname = u64_to_user_ptr(READ_ONCE(sqe->addr));
+ un->filename = getname(fname);
+ if (IS_ERR(un->filename))
+ return PTR_ERR(un->filename);
+
+ req->flags |= REQ_F_NEED_CLEANUP;
+ return 0;
+}
+
+static int io_unlinkat(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_unlink *un = &req->unlink;
+ int ret;
+
+ if (issue_flags & IO_URING_F_NONBLOCK)
+ return -EAGAIN;
+
+ if (un->flags & AT_REMOVEDIR)
+ ret = do_rmdir(un->dfd, un->filename);
+ else
+ ret = do_unlinkat(un->dfd, un->filename);
+
+ req->flags &= ~REQ_F_NEED_CLEANUP;
+ if (ret < 0)
+ req_set_fail(req);
+ io_req_complete(req, ret);
+ return 0;
+}
+
+static int io_mkdirat_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+ struct io_mkdir *mkd = &req->mkdir;
+ const char __user *fname;
+
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (sqe->ioprio || sqe->off || sqe->rw_flags || sqe->buf_index ||
+ sqe->splice_fd_in)
+ return -EINVAL;
+ if (unlikely(req->flags & REQ_F_FIXED_FILE))
+ return -EBADF;
+
+ mkd->dfd = READ_ONCE(sqe->fd);
+ mkd->mode = READ_ONCE(sqe->len);
+
+ fname = u64_to_user_ptr(READ_ONCE(sqe->addr));
+ mkd->filename = getname(fname);
+ if (IS_ERR(mkd->filename))
+ return PTR_ERR(mkd->filename);
+
+ req->flags |= REQ_F_NEED_CLEANUP;
+ return 0;
+}
+
+static int io_mkdirat(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_mkdir *mkd = &req->mkdir;
+ int ret;
+
+ if (issue_flags & IO_URING_F_NONBLOCK)
+ return -EAGAIN;
+
+ ret = do_mkdirat(mkd->dfd, mkd->filename, mkd->mode);
+
+ req->flags &= ~REQ_F_NEED_CLEANUP;
+ if (ret < 0)
+ req_set_fail(req);
+ io_req_complete(req, ret);
+ return 0;
+}
+
+static int io_symlinkat_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+ struct io_symlink *sl = &req->symlink;
+ const char __user *oldpath, *newpath;
+
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (sqe->ioprio || sqe->len || sqe->rw_flags || sqe->buf_index ||
+ sqe->splice_fd_in)
+ return -EINVAL;
+ if (unlikely(req->flags & REQ_F_FIXED_FILE))
+ return -EBADF;
+
+ sl->new_dfd = READ_ONCE(sqe->fd);
+ oldpath = u64_to_user_ptr(READ_ONCE(sqe->addr));
+ newpath = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+
+ sl->oldpath = getname(oldpath);
+ if (IS_ERR(sl->oldpath))
+ return PTR_ERR(sl->oldpath);
+
+ sl->newpath = getname(newpath);
+ if (IS_ERR(sl->newpath)) {
+ putname(sl->oldpath);
+ return PTR_ERR(sl->newpath);
+ }
+
+ req->flags |= REQ_F_NEED_CLEANUP;
+ return 0;
+}
+
+static int io_symlinkat(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_symlink *sl = &req->symlink;
+ int ret;
+
+ if (issue_flags & IO_URING_F_NONBLOCK)
+ return -EAGAIN;
+
+ ret = do_symlinkat(sl->oldpath, sl->new_dfd, sl->newpath);
+
+ req->flags &= ~REQ_F_NEED_CLEANUP;
+ if (ret < 0)
+ req_set_fail(req);
+ io_req_complete(req, ret);
+ return 0;
+}
+
+static int io_linkat_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+ struct io_hardlink *lnk = &req->hardlink;
+ const char __user *oldf, *newf;
+
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (sqe->ioprio || sqe->rw_flags || sqe->buf_index || sqe->splice_fd_in)
+ return -EINVAL;
+ if (unlikely(req->flags & REQ_F_FIXED_FILE))
+ return -EBADF;
+
+ lnk->old_dfd = READ_ONCE(sqe->fd);
+ lnk->new_dfd = READ_ONCE(sqe->len);
+ oldf = u64_to_user_ptr(READ_ONCE(sqe->addr));
+ newf = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+ lnk->flags = READ_ONCE(sqe->hardlink_flags);
+
+ lnk->oldpath = getname(oldf);
+ if (IS_ERR(lnk->oldpath))
+ return PTR_ERR(lnk->oldpath);
+
+ lnk->newpath = getname(newf);
+ if (IS_ERR(lnk->newpath)) {
+ putname(lnk->oldpath);
+ return PTR_ERR(lnk->newpath);
+ }
+
+ req->flags |= REQ_F_NEED_CLEANUP;
+ return 0;
+}
+
+static int io_linkat(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_hardlink *lnk = &req->hardlink;
+ int ret;
+
+ if (issue_flags & IO_URING_F_NONBLOCK)
+ return -EAGAIN;
+
+ ret = do_linkat(lnk->old_dfd, lnk->oldpath, lnk->new_dfd,
+ lnk->newpath, lnk->flags);
+
+ req->flags &= ~REQ_F_NEED_CLEANUP;
+ if (ret < 0)
+ req_set_fail(req);
+ io_req_complete(req, ret);
+ return 0;
+}
+
+static int io_shutdown_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+#if defined(CONFIG_NET)
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (unlikely(sqe->ioprio || sqe->off || sqe->addr || sqe->rw_flags ||
+ sqe->buf_index || sqe->splice_fd_in))
+ return -EINVAL;
+
+ req->shutdown.how = READ_ONCE(sqe->len);
+ return 0;
+#else
+ return -EOPNOTSUPP;
+#endif
+}
+
+static int io_shutdown(struct io_kiocb *req, unsigned int issue_flags)
+{
+#if defined(CONFIG_NET)
+ struct socket *sock;
+ int ret;
+
+ if (issue_flags & IO_URING_F_NONBLOCK)
+ return -EAGAIN;
+
+ sock = sock_from_file(req->file);
+ if (unlikely(!sock))
+ return -ENOTSOCK;
+
+ ret = __sys_shutdown_sock(sock, req->shutdown.how);
+ if (ret < 0)
+ req_set_fail(req);
+ io_req_complete(req, ret);
+ return 0;
+#else
+ return -EOPNOTSUPP;
+#endif
+}
+
+static int __io_splice_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+ struct io_splice *sp = &req->splice;
+ unsigned int valid_flags = SPLICE_F_FD_IN_FIXED | SPLICE_F_ALL;
+
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+
+ sp->len = READ_ONCE(sqe->len);
+ sp->flags = READ_ONCE(sqe->splice_flags);
+ if (unlikely(sp->flags & ~valid_flags))
+ return -EINVAL;
+ sp->splice_fd_in = READ_ONCE(sqe->splice_fd_in);
+ return 0;
+}
+
+static int io_tee_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+ if (READ_ONCE(sqe->splice_off_in) || READ_ONCE(sqe->off))
+ return -EINVAL;
+ return __io_splice_prep(req, sqe);
+}
+
+static int io_tee(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_splice *sp = &req->splice;
+ struct file *out = sp->file_out;
+ unsigned int flags = sp->flags & ~SPLICE_F_FD_IN_FIXED;
+ struct file *in;
+ long ret = 0;
+
+ if (issue_flags & IO_URING_F_NONBLOCK)
+ return -EAGAIN;
+
+ if (sp->flags & SPLICE_F_FD_IN_FIXED)
+ in = io_file_get_fixed(req, sp->splice_fd_in, issue_flags);
+ else
+ in = io_file_get_normal(req, sp->splice_fd_in);
+ if (!in) {
+ ret = -EBADF;
+ goto done;
+ }
+
+ if (sp->len)
+ ret = do_tee(in, out, sp->len, flags);
+
+ if (!(sp->flags & SPLICE_F_FD_IN_FIXED))
+ io_put_file(in);
+done:
+ if (ret != sp->len)
+ req_set_fail(req);
+ io_req_complete(req, ret);
+ return 0;
+}
+
+static int io_splice_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ struct io_splice *sp = &req->splice;
+
+ sp->off_in = READ_ONCE(sqe->splice_off_in);
+ sp->off_out = READ_ONCE(sqe->off);
+ return __io_splice_prep(req, sqe);
+}
+
+static int io_splice(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_splice *sp = &req->splice;
+ struct file *out = sp->file_out;
+ unsigned int flags = sp->flags & ~SPLICE_F_FD_IN_FIXED;
+ loff_t *poff_in, *poff_out;
+ struct file *in;
+ long ret = 0;
+
+ if (issue_flags & IO_URING_F_NONBLOCK)
+ return -EAGAIN;
+
+ if (sp->flags & SPLICE_F_FD_IN_FIXED)
+ in = io_file_get_fixed(req, sp->splice_fd_in, issue_flags);
+ else
+ in = io_file_get_normal(req, sp->splice_fd_in);
+ if (!in) {
+ ret = -EBADF;
+ goto done;
+ }
+
+ poff_in = (sp->off_in == -1) ? NULL : &sp->off_in;
+ poff_out = (sp->off_out == -1) ? NULL : &sp->off_out;
+
+ if (sp->len)
+ ret = do_splice(in, poff_in, out, poff_out, sp->len, flags);
+
+ if (!(sp->flags & SPLICE_F_FD_IN_FIXED))
+ io_put_file(in);
+done:
+ if (ret != sp->len)
+ req_set_fail(req);
+ io_req_complete(req, ret);
+ return 0;
+}
+
+/*
+ * IORING_OP_NOP just posts a completion event, nothing else.
+ */
+static int io_nop(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+
+ if (unlikely(ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+
+ __io_req_complete(req, issue_flags, 0, 0);
+ return 0;
+}
+
+static int io_msg_ring_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+ if (unlikely(sqe->addr || sqe->ioprio || sqe->rw_flags ||
+ sqe->splice_fd_in || sqe->buf_index || sqe->personality))
+ return -EINVAL;
+
+ req->msg.user_data = READ_ONCE(sqe->off);
+ req->msg.len = READ_ONCE(sqe->len);
+ return 0;
+}
+
+static int io_msg_ring(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_ring_ctx *target_ctx;
+ struct io_msg *msg = &req->msg;
+ bool filled;
+ int ret;
+
+ ret = -EBADFD;
+ if (req->file->f_op != &io_uring_fops)
+ goto done;
+
+ ret = -EOVERFLOW;
+ target_ctx = req->file->private_data;
+
+ spin_lock(&target_ctx->completion_lock);
+ filled = io_fill_cqe_aux(target_ctx, msg->user_data, msg->len, 0);
+ io_commit_cqring(target_ctx);
+ spin_unlock(&target_ctx->completion_lock);
+
+ if (filled) {
+ io_cqring_ev_posted(target_ctx);
+ ret = 0;
+ }
+
+done:
+ if (ret < 0)
+ req_set_fail(req);
+ __io_req_complete(req, issue_flags, ret, 0);
+ /* put file to avoid an attempt to IOPOLL the req */
+ io_put_file(req->file);
+ req->file = NULL;
+ return 0;
+}
+
+static int io_fsync_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+
+ if (unlikely(ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (unlikely(sqe->addr || sqe->ioprio || sqe->buf_index ||
+ sqe->splice_fd_in))
+ return -EINVAL;
+
+ req->sync.flags = READ_ONCE(sqe->fsync_flags);
+ if (unlikely(req->sync.flags & ~IORING_FSYNC_DATASYNC))
+ return -EINVAL;
+
+ req->sync.off = READ_ONCE(sqe->off);
+ req->sync.len = READ_ONCE(sqe->len);
+ return 0;
+}
+
+static int io_fsync(struct io_kiocb *req, unsigned int issue_flags)
+{
+ loff_t end = req->sync.off + req->sync.len;
+ int ret;
+
+ /* fsync always requires a blocking context */
+ if (issue_flags & IO_URING_F_NONBLOCK)
+ return -EAGAIN;
+
+ ret = vfs_fsync_range(req->file, req->sync.off,
+ end > 0 ? end : LLONG_MAX,
+ req->sync.flags & IORING_FSYNC_DATASYNC);
+ if (ret < 0)
+ req_set_fail(req);
+ io_req_complete(req, ret);
+ return 0;
+}
+
+static int io_fallocate_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+ if (sqe->ioprio || sqe->buf_index || sqe->rw_flags ||
+ sqe->splice_fd_in)
+ return -EINVAL;
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+
+ req->sync.off = READ_ONCE(sqe->off);
+ req->sync.len = READ_ONCE(sqe->addr);
+ req->sync.mode = READ_ONCE(sqe->len);
+ return 0;
+}
+
+static int io_fallocate(struct io_kiocb *req, unsigned int issue_flags)
+{
+ int ret;
+
+ /* fallocate always requiring blocking context */
+ if (issue_flags & IO_URING_F_NONBLOCK)
+ return -EAGAIN;
+ ret = vfs_fallocate(req->file, req->sync.mode, req->sync.off,
+ req->sync.len);
+ if (ret < 0)
+ req_set_fail(req);
+ else
+ fsnotify_modify(req->file);
+ io_req_complete(req, ret);
+ return 0;
+}
+
+static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ const char __user *fname;
+ int ret;
+
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (unlikely(sqe->ioprio || sqe->buf_index))
+ return -EINVAL;
+ if (unlikely(req->flags & REQ_F_FIXED_FILE))
+ return -EBADF;
+
+ /* open.how should be already initialised */
+ if (!(req->open.how.flags & O_PATH) && force_o_largefile())
+ req->open.how.flags |= O_LARGEFILE;
+
+ req->open.dfd = READ_ONCE(sqe->fd);
+ fname = u64_to_user_ptr(READ_ONCE(sqe->addr));
+ req->open.filename = getname(fname);
+ if (IS_ERR(req->open.filename)) {
+ ret = PTR_ERR(req->open.filename);
+ req->open.filename = NULL;
+ return ret;
+ }
+
+ req->open.file_slot = READ_ONCE(sqe->file_index);
+ if (req->open.file_slot && (req->open.how.flags & O_CLOEXEC))
+ return -EINVAL;
+
+ req->open.nofile = rlimit(RLIMIT_NOFILE);
+ req->flags |= REQ_F_NEED_CLEANUP;
+ return 0;
+}
+
+static int io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ u64 mode = READ_ONCE(sqe->len);
+ u64 flags = READ_ONCE(sqe->open_flags);
+
+ req->open.how = build_open_how(flags, mode);
+ return __io_openat_prep(req, sqe);
+}
+
+static int io_openat2_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ struct open_how __user *how;
+ size_t len;
+ int ret;
+
+ how = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+ len = READ_ONCE(sqe->len);
+ if (len < OPEN_HOW_SIZE_VER0)
+ return -EINVAL;
+
+ ret = copy_struct_from_user(&req->open.how, sizeof(req->open.how), how,
+ len);
+ if (ret)
+ return ret;
+
+ return __io_openat_prep(req, sqe);
+}
+
+static int io_openat2(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct open_flags op;
+ struct file *file;
+ bool resolve_nonblock, nonblock_set;
+ bool fixed = !!req->open.file_slot;
+ int ret;
+
+ ret = build_open_flags(&req->open.how, &op);
+ if (ret)
+ goto err;
+ nonblock_set = op.open_flag & O_NONBLOCK;
+ resolve_nonblock = req->open.how.resolve & RESOLVE_CACHED;
+ if (issue_flags & IO_URING_F_NONBLOCK) {
+ /*
+ * Don't bother trying for O_TRUNC, O_CREAT, or O_TMPFILE open,
+ * it'll always -EAGAIN
+ */
+ if (req->open.how.flags & (O_TRUNC | O_CREAT | O_TMPFILE))
+ return -EAGAIN;
+ op.lookup_flags |= LOOKUP_CACHED;
+ op.open_flag |= O_NONBLOCK;
+ }
+
+ if (!fixed) {
+ ret = __get_unused_fd_flags(req->open.how.flags, req->open.nofile);
+ if (ret < 0)
+ goto err;
+ }
+
+ file = do_filp_open(req->open.dfd, req->open.filename, &op);
+ if (IS_ERR(file)) {
+ /*
+ * We could hang on to this 'fd' on retrying, but seems like
+ * marginal gain for something that is now known to be a slower
+ * path. So just put it, and we'll get a new one when we retry.
+ */
+ if (!fixed)
+ put_unused_fd(ret);
+
+ ret = PTR_ERR(file);
+ /* only retry if RESOLVE_CACHED wasn't already set by application */
+ if (ret == -EAGAIN &&
+ (!resolve_nonblock && (issue_flags & IO_URING_F_NONBLOCK)))
+ return -EAGAIN;
+ goto err;
+ }
+
+ if ((issue_flags & IO_URING_F_NONBLOCK) && !nonblock_set)
+ file->f_flags &= ~O_NONBLOCK;
+ fsnotify_open(file);
+
+ if (!fixed)
+ fd_install(ret, file);
+ else
+ ret = io_install_fixed_file(req, file, issue_flags,
+ req->open.file_slot - 1);
+err:
+ putname(req->open.filename);
+ req->flags &= ~REQ_F_NEED_CLEANUP;
+ if (ret < 0)
+ req_set_fail(req);
+ __io_req_complete(req, issue_flags, ret, 0);
+ return 0;
+}
+
+static int io_openat(struct io_kiocb *req, unsigned int issue_flags)
+{
+ return io_openat2(req, issue_flags);
+}
+
+static int io_remove_buffers_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+ struct io_provide_buf *p = &req->pbuf;
+ u64 tmp;
+
+ if (sqe->ioprio || sqe->rw_flags || sqe->addr || sqe->len || sqe->off ||
+ sqe->splice_fd_in)
+ return -EINVAL;
+
+ tmp = READ_ONCE(sqe->fd);
+ if (!tmp || tmp > USHRT_MAX)
+ return -EINVAL;
+
+ memset(p, 0, sizeof(*p));
+ p->nbufs = tmp;
+ p->bgid = READ_ONCE(sqe->buf_group);
+ return 0;
+}
+
+static int __io_remove_buffers(struct io_ring_ctx *ctx,
+ struct io_buffer_list *bl, unsigned nbufs)
+{
+ unsigned i = 0;
+
+ /* shouldn't happen */
+ if (!nbufs)
+ return 0;
+
+ /* the head kbuf is the list itself */
+ while (!list_empty(&bl->buf_list)) {
+ struct io_buffer *nxt;
+
+ nxt = list_first_entry(&bl->buf_list, struct io_buffer, list);
+ list_del(&nxt->list);
+ if (++i == nbufs)
+ return i;
+ cond_resched();
+ }
+ i++;
+
+ return i;
+}
+
+static int io_remove_buffers(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_provide_buf *p = &req->pbuf;
+ struct io_ring_ctx *ctx = req->ctx;
+ struct io_buffer_list *bl;
+ int ret = 0;
+ bool needs_lock = issue_flags & IO_URING_F_UNLOCKED;
+
+ io_ring_submit_lock(ctx, needs_lock);
+
+ lockdep_assert_held(&ctx->uring_lock);
+
+ ret = -ENOENT;
+ bl = io_buffer_get_list(ctx, p->bgid);
+ if (bl)
+ ret = __io_remove_buffers(ctx, bl, p->nbufs);
+ if (ret < 0)
+ req_set_fail(req);
+
+ /* complete before unlock, IOPOLL may need the lock */
+ __io_req_complete(req, issue_flags, ret, 0);
+ io_ring_submit_unlock(ctx, needs_lock);
+ return 0;
+}
+
+static int io_provide_buffers_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+ unsigned long size, tmp_check;
+ struct io_provide_buf *p = &req->pbuf;
+ u64 tmp;
+
+ if (sqe->ioprio || sqe->rw_flags || sqe->splice_fd_in)
+ return -EINVAL;
+
+ tmp = READ_ONCE(sqe->fd);
+ if (!tmp || tmp > USHRT_MAX)
+ return -E2BIG;
+ p->nbufs = tmp;
+ p->addr = READ_ONCE(sqe->addr);
+ p->len = READ_ONCE(sqe->len);
+
+ if (check_mul_overflow((unsigned long)p->len, (unsigned long)p->nbufs,
+ &size))
+ return -EOVERFLOW;
+ if (check_add_overflow((unsigned long)p->addr, size, &tmp_check))
+ return -EOVERFLOW;
+
+ size = (unsigned long)p->len * p->nbufs;
+ if (!access_ok(u64_to_user_ptr(p->addr), size))
+ return -EFAULT;
+
+ p->bgid = READ_ONCE(sqe->buf_group);
+ tmp = READ_ONCE(sqe->off);
+ if (tmp > USHRT_MAX)
+ return -E2BIG;
+ p->bid = tmp;
+ return 0;
+}
+
+static int io_refill_buffer_cache(struct io_ring_ctx *ctx)
+{
+ struct io_buffer *buf;
+ struct page *page;
+ int bufs_in_page;
+
+ /*
+ * Completions that don't happen inline (eg not under uring_lock) will
+ * add to ->io_buffers_comp. If we don't have any free buffers, check
+ * the completion list and splice those entries first.
+ */
+ if (!list_empty_careful(&ctx->io_buffers_comp)) {
+ spin_lock(&ctx->completion_lock);
+ if (!list_empty(&ctx->io_buffers_comp)) {
+ list_splice_init(&ctx->io_buffers_comp,
+ &ctx->io_buffers_cache);
+ spin_unlock(&ctx->completion_lock);
+ return 0;
+ }
+ spin_unlock(&ctx->completion_lock);
+ }
+
+ /*
+ * No free buffers and no completion entries either. Allocate a new
+ * page worth of buffer entries and add those to our freelist.
+ */
+ page = alloc_page(GFP_KERNEL_ACCOUNT);
+ if (!page)
+ return -ENOMEM;
+
+ list_add(&page->lru, &ctx->io_buffers_pages);
+
+ buf = page_address(page);
+ bufs_in_page = PAGE_SIZE / sizeof(*buf);
+ while (bufs_in_page) {
+ list_add_tail(&buf->list, &ctx->io_buffers_cache);
+ buf++;
+ bufs_in_page--;
+ }
+
+ return 0;
+}
+
+static int io_add_buffers(struct io_ring_ctx *ctx, struct io_provide_buf *pbuf,
+ struct io_buffer_list *bl)
+{
+ struct io_buffer *buf;
+ u64 addr = pbuf->addr;
+ int i, bid = pbuf->bid;
+
+ for (i = 0; i < pbuf->nbufs; i++) {
+ if (list_empty(&ctx->io_buffers_cache) &&
+ io_refill_buffer_cache(ctx))
+ break;
+ buf = list_first_entry(&ctx->io_buffers_cache, struct io_buffer,
+ list);
+ list_move_tail(&buf->list, &bl->buf_list);
+ buf->addr = addr;
+ buf->len = min_t(__u32, pbuf->len, MAX_RW_COUNT);
+ buf->bid = bid;
+ buf->bgid = pbuf->bgid;
+ addr += pbuf->len;
+ bid++;
+ cond_resched();
+ }
+
+ return i ? 0 : -ENOMEM;
+}
+
+static int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_provide_buf *p = &req->pbuf;
+ struct io_ring_ctx *ctx = req->ctx;
+ struct io_buffer_list *bl;
+ int ret = 0;
+ bool needs_lock = issue_flags & IO_URING_F_UNLOCKED;
+
+ io_ring_submit_lock(ctx, needs_lock);
+
+ lockdep_assert_held(&ctx->uring_lock);
+
+ bl = io_buffer_get_list(ctx, p->bgid);
+ if (unlikely(!bl)) {
+ bl = kzalloc(sizeof(*bl), GFP_KERNEL_ACCOUNT);
+ if (!bl) {
+ ret = -ENOMEM;
+ goto err;
+ }
+ io_buffer_add_list(ctx, bl, p->bgid);
+ }
+
+ ret = io_add_buffers(ctx, p, bl);
+err:
+ if (ret < 0)
+ req_set_fail(req);
+ /* complete before unlock, IOPOLL may need the lock */
+ __io_req_complete(req, issue_flags, ret, 0);
+ io_ring_submit_unlock(ctx, needs_lock);
+ return 0;
+}
+
+static int io_epoll_ctl_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+#if defined(CONFIG_EPOLL)
+ if (sqe->ioprio || sqe->buf_index || sqe->splice_fd_in)
+ return -EINVAL;
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+
+ req->epoll.epfd = READ_ONCE(sqe->fd);
+ req->epoll.op = READ_ONCE(sqe->len);
+ req->epoll.fd = READ_ONCE(sqe->off);
+
+ if (ep_op_has_event(req->epoll.op)) {
+ struct epoll_event __user *ev;
+
+ ev = u64_to_user_ptr(READ_ONCE(sqe->addr));
+ if (copy_from_user(&req->epoll.event, ev, sizeof(*ev)))
+ return -EFAULT;
+ }
+
+ return 0;
+#else
+ return -EOPNOTSUPP;
+#endif
+}
+
+static int io_epoll_ctl(struct io_kiocb *req, unsigned int issue_flags)
+{
+#if defined(CONFIG_EPOLL)
+ struct io_epoll *ie = &req->epoll;
+ int ret;
+ bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
+
+ ret = do_epoll_ctl(ie->epfd, ie->op, ie->fd, &ie->event, force_nonblock);
+ if (force_nonblock && ret == -EAGAIN)
+ return -EAGAIN;
+
+ if (ret < 0)
+ req_set_fail(req);
+ __io_req_complete(req, issue_flags, ret, 0);
+ return 0;
+#else
+ return -EOPNOTSUPP;
+#endif
+}
+
+static int io_madvise_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+#if defined(CONFIG_ADVISE_SYSCALLS) && defined(CONFIG_MMU)
+ if (sqe->ioprio || sqe->buf_index || sqe->off || sqe->splice_fd_in)
+ return -EINVAL;
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+
+ req->madvise.addr = READ_ONCE(sqe->addr);
+ req->madvise.len = READ_ONCE(sqe->len);
+ req->madvise.advice = READ_ONCE(sqe->fadvise_advice);
+ return 0;
+#else
+ return -EOPNOTSUPP;
+#endif
+}
+
+static int io_madvise(struct io_kiocb *req, unsigned int issue_flags)
+{
+#if defined(CONFIG_ADVISE_SYSCALLS) && defined(CONFIG_MMU)
+ struct io_madvise *ma = &req->madvise;
+ int ret;
+
+ if (issue_flags & IO_URING_F_NONBLOCK)
+ return -EAGAIN;
+
+ ret = do_madvise(current->mm, ma->addr, ma->len, ma->advice);
+ if (ret < 0)
+ req_set_fail(req);
+ io_req_complete(req, ret);
+ return 0;
+#else
+ return -EOPNOTSUPP;
+#endif
+}
+
+static int io_fadvise_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ if (sqe->ioprio || sqe->buf_index || sqe->addr || sqe->splice_fd_in)
+ return -EINVAL;
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+
+ req->fadvise.offset = READ_ONCE(sqe->off);
+ req->fadvise.len = READ_ONCE(sqe->len);
+ req->fadvise.advice = READ_ONCE(sqe->fadvise_advice);
+ return 0;
+}
+
+static int io_fadvise(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_fadvise *fa = &req->fadvise;
+ int ret;
+
+ if (issue_flags & IO_URING_F_NONBLOCK) {
+ switch (fa->advice) {
+ case POSIX_FADV_NORMAL:
+ case POSIX_FADV_RANDOM:
+ case POSIX_FADV_SEQUENTIAL:
+ break;
+ default:
+ return -EAGAIN;
+ }
+ }
+
+ ret = vfs_fadvise(req->file, fa->offset, fa->len, fa->advice);
+ if (ret < 0)
+ req_set_fail(req);
+ __io_req_complete(req, issue_flags, ret, 0);
+ return 0;
+}
+
+static int io_statx_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ const char __user *path;
+
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (sqe->ioprio || sqe->buf_index || sqe->splice_fd_in)
+ return -EINVAL;
+ if (req->flags & REQ_F_FIXED_FILE)
+ return -EBADF;
+
+ req->statx.dfd = READ_ONCE(sqe->fd);
+ req->statx.mask = READ_ONCE(sqe->len);
+ path = u64_to_user_ptr(READ_ONCE(sqe->addr));
+ req->statx.buffer = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+ req->statx.flags = READ_ONCE(sqe->statx_flags);
+
+ req->statx.filename = getname_flags(path,
+ getname_statx_lookup_flags(req->statx.flags),
+ NULL);
+
+ if (IS_ERR(req->statx.filename)) {
+ int ret = PTR_ERR(req->statx.filename);
+
+ req->statx.filename = NULL;
+ return ret;
+ }
+
+ req->flags |= REQ_F_NEED_CLEANUP;
+ return 0;
+}
+
+static int io_statx(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_statx *ctx = &req->statx;
+ int ret;
+
+ if (issue_flags & IO_URING_F_NONBLOCK)
+ return -EAGAIN;
+
+ ret = do_statx(ctx->dfd, ctx->filename, ctx->flags, ctx->mask,
+ ctx->buffer);
+
+ if (ret < 0)
+ req_set_fail(req);
+ io_req_complete(req, ret);
+ return 0;
+}
+
+static int io_close_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (sqe->ioprio || sqe->off || sqe->addr || sqe->len ||
+ sqe->rw_flags || sqe->buf_index)
+ return -EINVAL;
+ if (req->flags & REQ_F_FIXED_FILE)
+ return -EBADF;
+
+ req->close.fd = READ_ONCE(sqe->fd);
+ req->close.file_slot = READ_ONCE(sqe->file_index);
+ if (req->close.file_slot && req->close.fd)
+ return -EINVAL;
+
+ return 0;
+}
+
+static int io_close(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct files_struct *files = current->files;
+ struct io_close *close = &req->close;
+ struct fdtable *fdt;
+ struct file *file = NULL;
+ int ret = -EBADF;
+
+ if (req->close.file_slot) {
+ ret = io_close_fixed(req, issue_flags);
+ goto err;
+ }
+
+ spin_lock(&files->file_lock);
+ fdt = files_fdtable(files);
+ if (close->fd >= fdt->max_fds) {
+ spin_unlock(&files->file_lock);
+ goto err;
+ }
+ file = fdt->fd[close->fd];
+ if (!file || file->f_op == &io_uring_fops) {
+ spin_unlock(&files->file_lock);
+ file = NULL;
+ goto err;
+ }
+
+ /* if the file has a flush method, be safe and punt to async */
+ if (file->f_op->flush && (issue_flags & IO_URING_F_NONBLOCK)) {
+ spin_unlock(&files->file_lock);
+ return -EAGAIN;
+ }
+
+ ret = __close_fd_get_file(close->fd, &file);
+ spin_unlock(&files->file_lock);
+ if (ret < 0) {
+ if (ret == -ENOENT)
+ ret = -EBADF;
+ goto err;
+ }
+
+ /* No ->flush() or already async, safely close from here */
+ ret = filp_close(file, current->files);
+err:
+ if (ret < 0)
+ req_set_fail(req);
+ if (file)
+ fput(file);
+ __io_req_complete(req, issue_flags, ret, 0);
+ return 0;
+}
+
+static int io_sfr_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+
+ if (unlikely(ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (unlikely(sqe->addr || sqe->ioprio || sqe->buf_index ||
+ sqe->splice_fd_in))
+ return -EINVAL;
+
+ req->sync.off = READ_ONCE(sqe->off);
+ req->sync.len = READ_ONCE(sqe->len);
+ req->sync.flags = READ_ONCE(sqe->sync_range_flags);
+ return 0;
+}
+
+static int io_sync_file_range(struct io_kiocb *req, unsigned int issue_flags)
+{
+ int ret;
+
+ /* sync_file_range always requires a blocking context */
+ if (issue_flags & IO_URING_F_NONBLOCK)
+ return -EAGAIN;
+
+ ret = sync_file_range(req->file, req->sync.off, req->sync.len,
+ req->sync.flags);
+ if (ret < 0)
+ req_set_fail(req);
+ io_req_complete(req, ret);
+ return 0;
+}
+
+#if defined(CONFIG_NET)
+static int io_setup_async_msg(struct io_kiocb *req,
+ struct io_async_msghdr *kmsg)
+{
+ struct io_async_msghdr *async_msg = req->async_data;
+
+ if (async_msg)
+ return -EAGAIN;
+ if (io_alloc_async_data(req)) {
+ kfree(kmsg->free_iov);
+ return -ENOMEM;
+ }
+ async_msg = req->async_data;
+ req->flags |= REQ_F_NEED_CLEANUP;
+ memcpy(async_msg, kmsg, sizeof(*kmsg));
+ async_msg->msg.msg_name = &async_msg->addr;
+ /* if were using fast_iov, set it to the new one */
+ if (!async_msg->free_iov)
+ async_msg->msg.msg_iter.iov = async_msg->fast_iov;
+
+ return -EAGAIN;
+}
+
+static int io_sendmsg_copy_hdr(struct io_kiocb *req,
+ struct io_async_msghdr *iomsg)
+{
+ iomsg->msg.msg_name = &iomsg->addr;
+ iomsg->free_iov = iomsg->fast_iov;
+ return sendmsg_copy_msghdr(&iomsg->msg, req->sr_msg.umsg,
+ req->sr_msg.msg_flags, &iomsg->free_iov);
+}
+
+static int io_sendmsg_prep_async(struct io_kiocb *req)
+{
+ int ret;
+
+ ret = io_sendmsg_copy_hdr(req, req->async_data);
+ if (!ret)
+ req->flags |= REQ_F_NEED_CLEANUP;
+ return ret;
+}
+
+static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ struct io_sr_msg *sr = &req->sr_msg;
+
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (unlikely(sqe->addr2 || sqe->file_index || sqe->ioprio))
+ return -EINVAL;
+
+ sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
+ sr->len = READ_ONCE(sqe->len);
+ sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
+ if (sr->msg_flags & MSG_DONTWAIT)
+ req->flags |= REQ_F_NOWAIT;
+
+#ifdef CONFIG_COMPAT
+ if (req->ctx->compat)
+ sr->msg_flags |= MSG_CMSG_COMPAT;
+#endif
+ return 0;
+}
+
+static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_async_msghdr iomsg, *kmsg;
+ struct socket *sock;
+ unsigned flags;
+ int min_ret = 0;
+ int ret;
+
+ sock = sock_from_file(req->file);
+ if (unlikely(!sock))
+ return -ENOTSOCK;
+
+ if (req_has_async_data(req)) {
+ kmsg = req->async_data;
+ } else {
+ ret = io_sendmsg_copy_hdr(req, &iomsg);
+ if (ret)
+ return ret;
+ kmsg = &iomsg;
+ }
+
+ flags = req->sr_msg.msg_flags;
+ if (issue_flags & IO_URING_F_NONBLOCK)
+ flags |= MSG_DONTWAIT;
+ if (flags & MSG_WAITALL)
+ min_ret = iov_iter_count(&kmsg->msg.msg_iter);
+
+ ret = __sys_sendmsg_sock(sock, &kmsg->msg, flags);
+
+ if (ret < min_ret) {
+ if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
+ return io_setup_async_msg(req, kmsg);
+ if (ret == -ERESTARTSYS)
+ ret = -EINTR;
+ req_set_fail(req);
+ }
+ /* fast path, check for non-NULL to avoid function call */
+ if (kmsg->free_iov)
+ kfree(kmsg->free_iov);
+ req->flags &= ~REQ_F_NEED_CLEANUP;
+ __io_req_complete(req, issue_flags, ret, 0);
+ return 0;
+}
+
+static int io_send(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_sr_msg *sr = &req->sr_msg;
+ struct msghdr msg;
+ struct iovec iov;
+ struct socket *sock;
+ unsigned flags;
+ int min_ret = 0;
+ int ret;
+
+ sock = sock_from_file(req->file);
+ if (unlikely(!sock))
+ return -ENOTSOCK;
+
+ ret = import_single_range(WRITE, sr->buf, sr->len, &iov, &msg.msg_iter);
+ if (unlikely(ret))
+ return ret;
+
+ msg.msg_name = NULL;
+ msg.msg_control = NULL;
+ msg.msg_controllen = 0;
+ msg.msg_namelen = 0;
+
+ flags = req->sr_msg.msg_flags;
+ if (issue_flags & IO_URING_F_NONBLOCK)
+ flags |= MSG_DONTWAIT;
+ if (flags & MSG_WAITALL)
+ min_ret = iov_iter_count(&msg.msg_iter);
+
+ msg.msg_flags = flags;
+ ret = sock_sendmsg(sock, &msg);
+ if (ret < min_ret) {
+ if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
+ return -EAGAIN;
+ if (ret == -ERESTARTSYS)
+ ret = -EINTR;
+ req_set_fail(req);
+ }
+ __io_req_complete(req, issue_flags, ret, 0);
+ return 0;
+}
+
+static int __io_recvmsg_copy_hdr(struct io_kiocb *req,
+ struct io_async_msghdr *iomsg)
+{
+ struct io_sr_msg *sr = &req->sr_msg;
+ struct iovec __user *uiov;
+ size_t iov_len;
+ int ret;
+
+ ret = __copy_msghdr_from_user(&iomsg->msg, sr->umsg,
+ &iomsg->uaddr, &uiov, &iov_len);
+ if (ret)
+ return ret;
+
+ if (req->flags & REQ_F_BUFFER_SELECT) {
+ if (iov_len > 1)
+ return -EINVAL;
+ if (copy_from_user(iomsg->fast_iov, uiov, sizeof(*uiov)))
+ return -EFAULT;
+ sr->len = iomsg->fast_iov[0].iov_len;
+ iomsg->free_iov = NULL;
+ } else {
+ iomsg->free_iov = iomsg->fast_iov;
+ ret = __import_iovec(READ, uiov, iov_len, UIO_FASTIOV,
+ &iomsg->free_iov, &iomsg->msg.msg_iter,
+ false);
+ if (ret > 0)
+ ret = 0;
+ }
+
+ return ret;
+}
+
+#ifdef CONFIG_COMPAT
+static int __io_compat_recvmsg_copy_hdr(struct io_kiocb *req,
+ struct io_async_msghdr *iomsg)
+{
+ struct io_sr_msg *sr = &req->sr_msg;
+ struct compat_iovec __user *uiov;
+ compat_uptr_t ptr;
+ compat_size_t len;
+ int ret;
+
+ ret = __get_compat_msghdr(&iomsg->msg, sr->umsg_compat, &iomsg->uaddr,
+ &ptr, &len);
+ if (ret)
+ return ret;
+
+ uiov = compat_ptr(ptr);
+ if (req->flags & REQ_F_BUFFER_SELECT) {
+ compat_ssize_t clen;
+
+ if (len > 1)
+ return -EINVAL;
+ if (!access_ok(uiov, sizeof(*uiov)))
+ return -EFAULT;
+ if (__get_user(clen, &uiov->iov_len))
+ return -EFAULT;
+ if (clen < 0)
+ return -EINVAL;
+ sr->len = clen;
+ iomsg->free_iov = NULL;
+ } else {
+ iomsg->free_iov = iomsg->fast_iov;
+ ret = __import_iovec(READ, (struct iovec __user *)uiov, len,
+ UIO_FASTIOV, &iomsg->free_iov,
+ &iomsg->msg.msg_iter, true);
+ if (ret < 0)
+ return ret;
+ }
+
+ return 0;
+}
+#endif
+
+static int io_recvmsg_copy_hdr(struct io_kiocb *req,
+ struct io_async_msghdr *iomsg)
+{
+ iomsg->msg.msg_name = &iomsg->addr;
+
+#ifdef CONFIG_COMPAT
+ if (req->ctx->compat)
+ return __io_compat_recvmsg_copy_hdr(req, iomsg);
+#endif
+
+ return __io_recvmsg_copy_hdr(req, iomsg);
+}
+
+static struct io_buffer *io_recv_buffer_select(struct io_kiocb *req,
+ unsigned int issue_flags)
+{
+ struct io_sr_msg *sr = &req->sr_msg;
+
+ return io_buffer_select(req, &sr->len, sr->bgid, issue_flags);
+}
+
+static int io_recvmsg_prep_async(struct io_kiocb *req)
+{
+ int ret;
+
+ ret = io_recvmsg_copy_hdr(req, req->async_data);
+ if (!ret)
+ req->flags |= REQ_F_NEED_CLEANUP;
+ return ret;
+}
+
+static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ struct io_sr_msg *sr = &req->sr_msg;
+
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (unlikely(sqe->addr2 || sqe->file_index || sqe->ioprio))
+ return -EINVAL;
+
+ sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
+ sr->len = READ_ONCE(sqe->len);
+ sr->bgid = READ_ONCE(sqe->buf_group);
+ sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
+ if (sr->msg_flags & MSG_DONTWAIT)
+ req->flags |= REQ_F_NOWAIT;
+
+#ifdef CONFIG_COMPAT
+ if (req->ctx->compat)
+ sr->msg_flags |= MSG_CMSG_COMPAT;
+#endif
+ sr->done_io = 0;
+ return 0;
+}
+
+static bool io_net_retry(struct socket *sock, int flags)
+{
+ if (!(flags & MSG_WAITALL))
+ return false;
+ return sock->type == SOCK_STREAM || sock->type == SOCK_SEQPACKET;
+}
+
+static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_async_msghdr iomsg, *kmsg;
+ struct io_sr_msg *sr = &req->sr_msg;
+ struct socket *sock;
+ struct io_buffer *kbuf;
+ unsigned flags;
+ int ret, min_ret = 0;
+ bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
+
+ sock = sock_from_file(req->file);
+ if (unlikely(!sock))
+ return -ENOTSOCK;
+
+ if (req_has_async_data(req)) {
+ kmsg = req->async_data;
+ } else {
+ ret = io_recvmsg_copy_hdr(req, &iomsg);
+ if (ret)
+ return ret;
+ kmsg = &iomsg;
+ }
+
+ if (req->flags & REQ_F_BUFFER_SELECT) {
+ kbuf = io_recv_buffer_select(req, issue_flags);
+ if (IS_ERR(kbuf))
+ return PTR_ERR(kbuf);
+ kmsg->fast_iov[0].iov_base = u64_to_user_ptr(kbuf->addr);
+ kmsg->fast_iov[0].iov_len = req->sr_msg.len;
+ iov_iter_init(&kmsg->msg.msg_iter, READ, kmsg->fast_iov,
+ 1, req->sr_msg.len);
+ }
+
+ flags = req->sr_msg.msg_flags;
+ if (force_nonblock)
+ flags |= MSG_DONTWAIT;
+ if (flags & MSG_WAITALL)
+ min_ret = iov_iter_count(&kmsg->msg.msg_iter);
+
+ ret = __sys_recvmsg_sock(sock, &kmsg->msg, req->sr_msg.umsg,
+ kmsg->uaddr, flags);
+ if (ret < min_ret) {
+ if (ret == -EAGAIN && force_nonblock)
+ return io_setup_async_msg(req, kmsg);
+ if (ret == -ERESTARTSYS)
+ ret = -EINTR;
+ if (ret > 0 && io_net_retry(sock, flags)) {
+ sr->done_io += ret;
+ req->flags |= REQ_F_PARTIAL_IO;
+ return io_setup_async_msg(req, kmsg);
+ }
+ req_set_fail(req);
+ } else if ((flags & MSG_WAITALL) && (kmsg->msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC))) {
+ req_set_fail(req);
+ }
+
+ /* fast path, check for non-NULL to avoid function call */
+ if (kmsg->free_iov)
+ kfree(kmsg->free_iov);
+ req->flags &= ~REQ_F_NEED_CLEANUP;
+ if (ret >= 0)
+ ret += sr->done_io;
+ else if (sr->done_io)
+ ret = sr->done_io;
+ __io_req_complete(req, issue_flags, ret, io_put_kbuf(req, issue_flags));
+ return 0;
+}
+
+static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_buffer *kbuf;
+ struct io_sr_msg *sr = &req->sr_msg;
+ struct msghdr msg;
+ void __user *buf = sr->buf;
+ struct socket *sock;
+ struct iovec iov;
+ unsigned flags;
+ int ret, min_ret = 0;
+ bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
+
+ sock = sock_from_file(req->file);
+ if (unlikely(!sock))
+ return -ENOTSOCK;
+
+ if (req->flags & REQ_F_BUFFER_SELECT) {
+ kbuf = io_recv_buffer_select(req, issue_flags);
+ if (IS_ERR(kbuf))
+ return PTR_ERR(kbuf);
+ buf = u64_to_user_ptr(kbuf->addr);
+ }
+
+ ret = import_single_range(READ, buf, sr->len, &iov, &msg.msg_iter);
+ if (unlikely(ret))
+ goto out_free;
+
+ msg.msg_name = NULL;
+ msg.msg_control = NULL;
+ msg.msg_controllen = 0;
+ msg.msg_namelen = 0;
+ msg.msg_iocb = NULL;
+ msg.msg_flags = 0;
+
+ flags = req->sr_msg.msg_flags;
+ if (force_nonblock)
+ flags |= MSG_DONTWAIT;
+ if (flags & MSG_WAITALL)
+ min_ret = iov_iter_count(&msg.msg_iter);
+
+ ret = sock_recvmsg(sock, &msg, flags);
+ if (ret < min_ret) {
+ if (ret == -EAGAIN && force_nonblock)
+ return -EAGAIN;
+ if (ret == -ERESTARTSYS)
+ ret = -EINTR;
+ if (ret > 0 && io_net_retry(sock, flags)) {
+ sr->len -= ret;
+ sr->buf += ret;
+ sr->done_io += ret;
+ req->flags |= REQ_F_PARTIAL_IO;
+ return -EAGAIN;
+ }
+ req_set_fail(req);
+ } else if ((flags & MSG_WAITALL) && (msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC))) {
+out_free:
+ req_set_fail(req);
+ }
+
+ if (ret >= 0)
+ ret += sr->done_io;
+ else if (sr->done_io)
+ ret = sr->done_io;
+ __io_req_complete(req, issue_flags, ret, io_put_kbuf(req, issue_flags));
+ return 0;
+}
+
+static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ struct io_accept *accept = &req->accept;
+
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (sqe->ioprio || sqe->len || sqe->buf_index)
+ return -EINVAL;
+
+ accept->addr = u64_to_user_ptr(READ_ONCE(sqe->addr));
+ accept->addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+ accept->flags = READ_ONCE(sqe->accept_flags);
+ accept->nofile = rlimit(RLIMIT_NOFILE);
+
+ accept->file_slot = READ_ONCE(sqe->file_index);
+ if (accept->file_slot && (accept->flags & SOCK_CLOEXEC))
+ return -EINVAL;
+ if (accept->flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK))
+ return -EINVAL;
+ if (SOCK_NONBLOCK != O_NONBLOCK && (accept->flags & SOCK_NONBLOCK))
+ accept->flags = (accept->flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
+ return 0;
+}
+
+static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_accept *accept = &req->accept;
+ bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
+ unsigned int file_flags = force_nonblock ? O_NONBLOCK : 0;
+ bool fixed = !!accept->file_slot;
+ struct file *file;
+ int ret, fd;
+
+ if (!fixed) {
+ fd = __get_unused_fd_flags(accept->flags, accept->nofile);
+ if (unlikely(fd < 0))
+ return fd;
+ }
+ file = do_accept(req->file, file_flags, accept->addr, accept->addr_len,
+ accept->flags);
+ if (IS_ERR(file)) {
+ if (!fixed)
+ put_unused_fd(fd);
+ ret = PTR_ERR(file);
+ if (ret == -EAGAIN && force_nonblock)
+ return -EAGAIN;
+ if (ret == -ERESTARTSYS)
+ ret = -EINTR;
+ req_set_fail(req);
+ } else if (!fixed) {
+ fd_install(fd, file);
+ ret = fd;
+ } else {
+ ret = io_install_fixed_file(req, file, issue_flags,
+ accept->file_slot - 1);
+ }
+ __io_req_complete(req, issue_flags, ret, 0);
+ return 0;
+}
+
+static int io_connect_prep_async(struct io_kiocb *req)
+{
+ struct io_async_connect *io = req->async_data;
+ struct io_connect *conn = &req->connect;
+
+ return move_addr_to_kernel(conn->addr, conn->addr_len, &io->address);
+}
+
+static int io_connect_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ struct io_connect *conn = &req->connect;
+
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (sqe->ioprio || sqe->len || sqe->buf_index || sqe->rw_flags ||
+ sqe->splice_fd_in)
+ return -EINVAL;
+
+ conn->addr = u64_to_user_ptr(READ_ONCE(sqe->addr));
+ conn->addr_len = READ_ONCE(sqe->addr2);
+ return 0;
+}
+
+static int io_connect(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_async_connect __io, *io;
+ unsigned file_flags;
+ int ret;
+ bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
+
+ if (req_has_async_data(req)) {
+ io = req->async_data;
+ } else {
+ ret = move_addr_to_kernel(req->connect.addr,
+ req->connect.addr_len,
+ &__io.address);
+ if (ret)
+ goto out;
+ io = &__io;
+ }
+
+ file_flags = force_nonblock ? O_NONBLOCK : 0;
+
+ ret = __sys_connect_file(req->file, &io->address,
+ req->connect.addr_len, file_flags);
+ if ((ret == -EAGAIN || ret == -EINPROGRESS) && force_nonblock) {
+ if (req_has_async_data(req))
+ return -EAGAIN;
+ if (io_alloc_async_data(req)) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ memcpy(req->async_data, &__io, sizeof(__io));
+ return -EAGAIN;
+ }
+ if (ret == -ERESTARTSYS)
+ ret = -EINTR;
+out:
+ if (ret < 0)
+ req_set_fail(req);
+ __io_req_complete(req, issue_flags, ret, 0);
+ return 0;
+}
+#else /* !CONFIG_NET */
+#define IO_NETOP_FN(op) \
+static int io_##op(struct io_kiocb *req, unsigned int issue_flags) \
+{ \
+ return -EOPNOTSUPP; \
+}
+
+#define IO_NETOP_PREP(op) \
+IO_NETOP_FN(op) \
+static int io_##op##_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) \
+{ \
+ return -EOPNOTSUPP; \
+} \
+
+#define IO_NETOP_PREP_ASYNC(op) \
+IO_NETOP_PREP(op) \
+static int io_##op##_prep_async(struct io_kiocb *req) \
+{ \
+ return -EOPNOTSUPP; \
+}
+
+IO_NETOP_PREP_ASYNC(sendmsg);
+IO_NETOP_PREP_ASYNC(recvmsg);
+IO_NETOP_PREP_ASYNC(connect);
+IO_NETOP_PREP(accept);
+IO_NETOP_FN(send);
+IO_NETOP_FN(recv);
+#endif /* CONFIG_NET */
+
+struct io_poll_table {
+ struct poll_table_struct pt;
+ struct io_kiocb *req;
+ int nr_entries;
+ int error;
+};
+
+#define IO_POLL_CANCEL_FLAG BIT(31)
+#define IO_POLL_REF_MASK GENMASK(30, 0)
+
+/*
+ * If refs part of ->poll_refs (see IO_POLL_REF_MASK) is 0, it's free. We can
+ * bump it and acquire ownership. It's disallowed to modify requests while not
+ * owning it, that prevents from races for enqueueing task_work's and b/w
+ * arming poll and wakeups.
+ */
+static inline bool io_poll_get_ownership(struct io_kiocb *req)
+{
+ return !(atomic_fetch_inc(&req->poll_refs) & IO_POLL_REF_MASK);
+}
+
+static void io_poll_mark_cancelled(struct io_kiocb *req)
+{
+ atomic_or(IO_POLL_CANCEL_FLAG, &req->poll_refs);
+}
+
+static struct io_poll_iocb *io_poll_get_double(struct io_kiocb *req)
+{
+ /* pure poll stashes this in ->async_data, poll driven retry elsewhere */
+ if (req->opcode == IORING_OP_POLL_ADD)
+ return req->async_data;
+ return req->apoll->double_poll;
+}
+
+static struct io_poll_iocb *io_poll_get_single(struct io_kiocb *req)
+{
+ if (req->opcode == IORING_OP_POLL_ADD)
+ return &req->poll;
+ return &req->apoll->poll;
+}
+
+static void io_poll_req_insert(struct io_kiocb *req)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ struct hlist_head *list;
+
+ list = &ctx->cancel_hash[hash_long(req->user_data, ctx->cancel_hash_bits)];
+ hlist_add_head(&req->hash_node, list);
+}
+
+static void io_init_poll_iocb(struct io_poll_iocb *poll, __poll_t events,
+ wait_queue_func_t wake_func)
+{
+ poll->head = NULL;
+#define IO_POLL_UNMASK (EPOLLERR|EPOLLHUP|EPOLLNVAL|EPOLLRDHUP)
+ /* mask in events that we always want/need */
+ poll->events = events | IO_POLL_UNMASK;
+ INIT_LIST_HEAD(&poll->wait.entry);
+ init_waitqueue_func_entry(&poll->wait, wake_func);
+}
+
+static inline void io_poll_remove_entry(struct io_poll_iocb *poll)
+{
+ struct wait_queue_head *head = smp_load_acquire(&poll->head);
+
+ if (head) {
+ spin_lock_irq(&head->lock);
+ list_del_init(&poll->wait.entry);
+ poll->head = NULL;
+ spin_unlock_irq(&head->lock);
+ }
+}
+
+static void io_poll_remove_entries(struct io_kiocb *req)
+{
+ /*
+ * Nothing to do if neither of those flags are set. Avoid dipping
+ * into the poll/apoll/double cachelines if we can.
+ */
+ if (!(req->flags & (REQ_F_SINGLE_POLL | REQ_F_DOUBLE_POLL)))
+ return;
+
+ /*
+ * While we hold the waitqueue lock and the waitqueue is nonempty,
+ * wake_up_pollfree() will wait for us. However, taking the waitqueue
+ * lock in the first place can race with the waitqueue being freed.
+ *
+ * We solve this as eventpoll does: by taking advantage of the fact that
+ * all users of wake_up_pollfree() will RCU-delay the actual free. If
+ * we enter rcu_read_lock() and see that the pointer to the queue is
+ * non-NULL, we can then lock it without the memory being freed out from
+ * under us.
+ *
+ * Keep holding rcu_read_lock() as long as we hold the queue lock, in
+ * case the caller deletes the entry from the queue, leaving it empty.
+ * In that case, only RCU prevents the queue memory from being freed.
+ */
+ rcu_read_lock();
+ if (req->flags & REQ_F_SINGLE_POLL)
+ io_poll_remove_entry(io_poll_get_single(req));
+ if (req->flags & REQ_F_DOUBLE_POLL)
+ io_poll_remove_entry(io_poll_get_double(req));
+ rcu_read_unlock();
+}
+
+/*
+ * All poll tw should go through this. Checks for poll events, manages
+ * references, does rewait, etc.
+ *
+ * Returns a negative error on failure. >0 when no action require, which is
+ * either spurious wakeup or multishot CQE is served. 0 when it's done with
+ * the request, then the mask is stored in req->result.
+ */
+static int io_poll_check_events(struct io_kiocb *req, bool locked)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ int v;
+
+ /* req->task == current here, checking PF_EXITING is safe */
+ if (unlikely(req->task->flags & PF_EXITING))
+ io_poll_mark_cancelled(req);
+
+ do {
+ v = atomic_read(&req->poll_refs);
+
+ /* tw handler should be the owner, and so have some references */
+ if (WARN_ON_ONCE(!(v & IO_POLL_REF_MASK)))
+ return 0;
+ if (v & IO_POLL_CANCEL_FLAG)
+ return -ECANCELED;
+
+ if (!req->result) {
+ struct poll_table_struct pt = { ._key = req->apoll_events };
+ req->result = vfs_poll(req->file, &pt) & req->apoll_events;
+ }
+
+ /* multishot, just fill an CQE and proceed */
+ if (req->result && !(req->apoll_events & EPOLLONESHOT)) {
+ __poll_t mask = mangle_poll(req->result & req->apoll_events);
+ bool filled;
+
+ spin_lock(&ctx->completion_lock);
+ filled = io_fill_cqe_aux(ctx, req->user_data, mask,
+ IORING_CQE_F_MORE);
+ io_commit_cqring(ctx);
+ spin_unlock(&ctx->completion_lock);
+ if (unlikely(!filled))
+ return -ECANCELED;
+ io_cqring_ev_posted(ctx);
+ } else if (req->result) {
+ return 0;
+ }
+
+ /*
+ * Release all references, retry if someone tried to restart
+ * task_work while we were executing it.
+ */
+ } while (atomic_sub_return(v & IO_POLL_REF_MASK, &req->poll_refs));
+
+ return 1;
+}
+
+static void io_poll_task_func(struct io_kiocb *req, bool *locked)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ int ret;
+
+ ret = io_poll_check_events(req, *locked);
+ if (ret > 0)
+ return;
+
+ if (!ret) {
+ req->result = mangle_poll(req->result & req->poll.events);
+ } else {
+ req->result = ret;
+ req_set_fail(req);
+ }
+
+ io_poll_remove_entries(req);
+ spin_lock(&ctx->completion_lock);
+ hash_del(&req->hash_node);
+ __io_req_complete_post(req, req->result, 0);
+ io_commit_cqring(ctx);
+ spin_unlock(&ctx->completion_lock);
+ io_cqring_ev_posted(ctx);
+}
+
+static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ int ret;
+
+ ret = io_poll_check_events(req, *locked);
+ if (ret > 0)
+ return;
+
+ io_poll_remove_entries(req);
+ spin_lock(&ctx->completion_lock);
+ hash_del(&req->hash_node);
+ spin_unlock(&ctx->completion_lock);
+
+ if (!ret)
+ io_req_task_submit(req, locked);
+ else
+ io_req_complete_failed(req, ret);
+}
+
+static void __io_poll_execute(struct io_kiocb *req, int mask,
+ __poll_t __maybe_unused events)
+{
+ req->result = mask;
+ /*
+ * This is useful for poll that is armed on behalf of another
+ * request, and where the wakeup path could be on a different
+ * CPU. We want to avoid pulling in req->apoll->events for that
+ * case.
+ */
+ if (req->opcode == IORING_OP_POLL_ADD)
+ req->io_task_work.func = io_poll_task_func;
+ else
+ req->io_task_work.func = io_apoll_task_func;
+
+ trace_io_uring_task_add(req->ctx, req, req->user_data, req->opcode, mask);
+ io_req_task_work_add(req, false);
+}
+
+static inline void io_poll_execute(struct io_kiocb *req, int res,
+ __poll_t events)
+{
+ if (io_poll_get_ownership(req))
+ __io_poll_execute(req, res, events);
+}
+
+static void io_poll_cancel_req(struct io_kiocb *req)
+{
+ io_poll_mark_cancelled(req);
+ /* kick tw, which should complete the request */
+ io_poll_execute(req, 0, 0);
+}
+
+#define wqe_to_req(wait) ((void *)((unsigned long) (wait)->private & ~1))
+#define wqe_is_double(wait) ((unsigned long) (wait)->private & 1)
+#define IO_ASYNC_POLL_COMMON (EPOLLONESHOT | POLLPRI)
+
+static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
+ void *key)
+{
+ struct io_kiocb *req = wqe_to_req(wait);
+ struct io_poll_iocb *poll = container_of(wait, struct io_poll_iocb,
+ wait);
+ __poll_t mask = key_to_poll(key);
+
+ if (unlikely(mask & POLLFREE)) {
+ io_poll_mark_cancelled(req);
+ /* we have to kick tw in case it's not already */
+ io_poll_execute(req, 0, poll->events);
+
+ /*
+ * If the waitqueue is being freed early but someone is already
+ * holds ownership over it, we have to tear down the request as
+ * best we can. That means immediately removing the request from
+ * its waitqueue and preventing all further accesses to the
+ * waitqueue via the request.
+ */
+ list_del_init(&poll->wait.entry);
+
+ /*
+ * Careful: this *must* be the last step, since as soon
+ * as req->head is NULL'ed out, the request can be
+ * completed and freed, since aio_poll_complete_work()
+ * will no longer need to take the waitqueue lock.
+ */
+ smp_store_release(&poll->head, NULL);
+ return 1;
+ }
+
+ /* for instances that support it check for an event match first */
+ if (mask && !(mask & (poll->events & ~IO_ASYNC_POLL_COMMON)))
+ return 0;
+
+ if (io_poll_get_ownership(req)) {
+ /* optional, saves extra locking for removal in tw handler */
+ if (mask && poll->events & EPOLLONESHOT) {
+ list_del_init(&poll->wait.entry);
+ poll->head = NULL;
+ if (wqe_is_double(wait))
+ req->flags &= ~REQ_F_DOUBLE_POLL;
+ else
+ req->flags &= ~REQ_F_SINGLE_POLL;
+ }
+ __io_poll_execute(req, mask, poll->events);
+ }
+ return 1;
+}
+
+static void __io_queue_proc(struct io_poll_iocb *poll, struct io_poll_table *pt,
+ struct wait_queue_head *head,
+ struct io_poll_iocb **poll_ptr)
+{
+ struct io_kiocb *req = pt->req;
+ unsigned long wqe_private = (unsigned long) req;
+
+ /*
+ * The file being polled uses multiple waitqueues for poll handling
+ * (e.g. one for read, one for write). Setup a separate io_poll_iocb
+ * if this happens.
+ */
+ if (unlikely(pt->nr_entries)) {
+ struct io_poll_iocb *first = poll;
+
+ /* double add on the same waitqueue head, ignore */
+ if (first->head == head)
+ return;
+ /* already have a 2nd entry, fail a third attempt */
+ if (*poll_ptr) {
+ if ((*poll_ptr)->head == head)
+ return;
+ pt->error = -EINVAL;
+ return;
+ }
+
+ poll = kmalloc(sizeof(*poll), GFP_ATOMIC);
+ if (!poll) {
+ pt->error = -ENOMEM;
+ return;
+ }
+ /* mark as double wq entry */
+ wqe_private |= 1;
+ req->flags |= REQ_F_DOUBLE_POLL;
+ io_init_poll_iocb(poll, first->events, first->wait.func);
+ *poll_ptr = poll;
+ if (req->opcode == IORING_OP_POLL_ADD)
+ req->flags |= REQ_F_ASYNC_DATA;
+ }
+
+ req->flags |= REQ_F_SINGLE_POLL;
+ pt->nr_entries++;
+ poll->head = head;
+ poll->wait.private = (void *) wqe_private;
+
+ if (poll->events & EPOLLEXCLUSIVE)
+ add_wait_queue_exclusive(head, &poll->wait);
+ else
+ add_wait_queue(head, &poll->wait);
+}
+
+static void io_poll_queue_proc(struct file *file, struct wait_queue_head *head,
+ struct poll_table_struct *p)
+{
+ struct io_poll_table *pt = container_of(p, struct io_poll_table, pt);
+
+ __io_queue_proc(&pt->req->poll, pt, head,
+ (struct io_poll_iocb **) &pt->req->async_data);
+}
+
+static int __io_arm_poll_handler(struct io_kiocb *req,
+ struct io_poll_iocb *poll,
+ struct io_poll_table *ipt, __poll_t mask)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ int v;
+
+ INIT_HLIST_NODE(&req->hash_node);
+ io_init_poll_iocb(poll, mask, io_poll_wake);
+ poll->file = req->file;
+
+ req->apoll_events = poll->events;
+
+ ipt->pt._key = mask;
+ ipt->req = req;
+ ipt->error = 0;
+ ipt->nr_entries = 0;
+
+ /*
+ * Take the ownership to delay any tw execution up until we're done
+ * with poll arming. see io_poll_get_ownership().
+ */
+ atomic_set(&req->poll_refs, 1);
+ mask = vfs_poll(req->file, &ipt->pt) & poll->events;
+
+ if (mask && (poll->events & EPOLLONESHOT)) {
+ io_poll_remove_entries(req);
+ /* no one else has access to the req, forget about the ref */
+ return mask;
+ }
+ if (!mask && unlikely(ipt->error || !ipt->nr_entries)) {
+ io_poll_remove_entries(req);
+ if (!ipt->error)
+ ipt->error = -EINVAL;
+ return 0;
+ }
+
+ spin_lock(&ctx->completion_lock);
+ io_poll_req_insert(req);
+ spin_unlock(&ctx->completion_lock);
+
+ if (mask) {
+ /* can't multishot if failed, just queue the event we've got */
+ if (unlikely(ipt->error || !ipt->nr_entries)) {
+ poll->events |= EPOLLONESHOT;
+ req->apoll_events |= EPOLLONESHOT;
+ ipt->error = 0;
+ }
+ __io_poll_execute(req, mask, poll->events);
+ return 0;
+ }
+
+ /*
+ * Release ownership. If someone tried to queue a tw while it was
+ * locked, kick it off for them.
+ */
+ v = atomic_dec_return(&req->poll_refs);
+ if (unlikely(v & IO_POLL_REF_MASK))
+ __io_poll_execute(req, 0, poll->events);
+ return 0;
+}
+
+static void io_async_queue_proc(struct file *file, struct wait_queue_head *head,
+ struct poll_table_struct *p)
+{
+ struct io_poll_table *pt = container_of(p, struct io_poll_table, pt);
+ struct async_poll *apoll = pt->req->apoll;
+
+ __io_queue_proc(&apoll->poll, pt, head, &apoll->double_poll);
+}
+
+enum {
+ IO_APOLL_OK,
+ IO_APOLL_ABORTED,
+ IO_APOLL_READY
+};
+
+static int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
+{
+ const struct io_op_def *def = &io_op_defs[req->opcode];
+ struct io_ring_ctx *ctx = req->ctx;
+ struct async_poll *apoll;
+ struct io_poll_table ipt;
+ __poll_t mask = IO_ASYNC_POLL_COMMON | POLLERR;
+ int ret;
+
+ if (!def->pollin && !def->pollout)
+ return IO_APOLL_ABORTED;
+ if (!file_can_poll(req->file) || (req->flags & REQ_F_POLLED))
+ return IO_APOLL_ABORTED;
+
+ if (def->pollin) {
+ mask |= POLLIN | POLLRDNORM;
+
+ /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
+ if ((req->opcode == IORING_OP_RECVMSG) &&
+ (req->sr_msg.msg_flags & MSG_ERRQUEUE))
+ mask &= ~POLLIN;
+ } else {
+ mask |= POLLOUT | POLLWRNORM;
+ }
+ if (def->poll_exclusive)
+ mask |= EPOLLEXCLUSIVE;
+ if (!(issue_flags & IO_URING_F_UNLOCKED) &&
+ !list_empty(&ctx->apoll_cache)) {
+ apoll = list_first_entry(&ctx->apoll_cache, struct async_poll,
+ poll.wait.entry);
+ list_del_init(&apoll->poll.wait.entry);
+ } else {
+ apoll = kmalloc(sizeof(*apoll), GFP_ATOMIC);
+ if (unlikely(!apoll))
+ return IO_APOLL_ABORTED;
+ }
+ apoll->double_poll = NULL;
+ req->apoll = apoll;
+ req->flags |= REQ_F_POLLED;
+ ipt.pt._qproc = io_async_queue_proc;
+
+ io_kbuf_recycle(req, issue_flags);
+
+ ret = __io_arm_poll_handler(req, &apoll->poll, &ipt, mask);
+ if (ret || ipt.error)
+ return ret ? IO_APOLL_READY : IO_APOLL_ABORTED;
+
+ trace_io_uring_poll_arm(ctx, req, req->user_data, req->opcode,
+ mask, apoll->poll.events);
+ return IO_APOLL_OK;
+}
+
+/*
+ * Returns true if we found and killed one or more poll requests
+ */
+static __cold bool io_poll_remove_all(struct io_ring_ctx *ctx,
+ struct task_struct *tsk, bool cancel_all)
+{
+ struct hlist_node *tmp;
+ struct io_kiocb *req;
+ bool found = false;
+ int i;
+
+ spin_lock(&ctx->completion_lock);
+ for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
+ struct hlist_head *list;
+
+ list = &ctx->cancel_hash[i];
+ hlist_for_each_entry_safe(req, tmp, list, hash_node) {
+ if (io_match_task_safe(req, tsk, cancel_all)) {
+ hlist_del_init(&req->hash_node);
+ io_poll_cancel_req(req);
+ found = true;
+ }
+ }
+ }
+ spin_unlock(&ctx->completion_lock);
+ return found;
+}
+
+static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, __u64 sqe_addr,
+ bool poll_only)
+ __must_hold(&ctx->completion_lock)
+{
+ struct hlist_head *list;
+ struct io_kiocb *req;
+
+ list = &ctx->cancel_hash[hash_long(sqe_addr, ctx->cancel_hash_bits)];
+ hlist_for_each_entry(req, list, hash_node) {
+ if (sqe_addr != req->user_data)
+ continue;
+ if (poll_only && req->opcode != IORING_OP_POLL_ADD)
+ continue;
+ return req;
+ }
+ return NULL;
+}
+
+static bool io_poll_disarm(struct io_kiocb *req)
+ __must_hold(&ctx->completion_lock)
+{
+ if (!io_poll_get_ownership(req))
+ return false;
+ io_poll_remove_entries(req);
+ hash_del(&req->hash_node);
+ return true;
+}
+
+static int io_poll_cancel(struct io_ring_ctx *ctx, __u64 sqe_addr,
+ bool poll_only)
+ __must_hold(&ctx->completion_lock)
+{
+ struct io_kiocb *req = io_poll_find(ctx, sqe_addr, poll_only);
+
+ if (!req)
+ return -ENOENT;
+ io_poll_cancel_req(req);
+ return 0;
+}
+
+static __poll_t io_poll_parse_events(const struct io_uring_sqe *sqe,
+ unsigned int flags)
+{
+ u32 events;
+
+ events = READ_ONCE(sqe->poll32_events);
+#ifdef __BIG_ENDIAN
+ events = swahw32(events);
+#endif
+ if (!(flags & IORING_POLL_ADD_MULTI))
+ events |= EPOLLONESHOT;
+ return demangle_poll(events) | (events & (EPOLLEXCLUSIVE|EPOLLONESHOT));
+}
+
+static int io_poll_update_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+ struct io_poll_update *upd = &req->poll_update;
+ u32 flags;
+
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (sqe->ioprio || sqe->buf_index || sqe->splice_fd_in)
+ return -EINVAL;
+ flags = READ_ONCE(sqe->len);
+ if (flags & ~(IORING_POLL_UPDATE_EVENTS | IORING_POLL_UPDATE_USER_DATA |
+ IORING_POLL_ADD_MULTI))
+ return -EINVAL;
+ /* meaningless without update */
+ if (flags == IORING_POLL_ADD_MULTI)
+ return -EINVAL;
+
+ upd->old_user_data = READ_ONCE(sqe->addr);
+ upd->update_events = flags & IORING_POLL_UPDATE_EVENTS;
+ upd->update_user_data = flags & IORING_POLL_UPDATE_USER_DATA;
+
+ upd->new_user_data = READ_ONCE(sqe->off);
+ if (!upd->update_user_data && upd->new_user_data)
+ return -EINVAL;
+ if (upd->update_events)
+ upd->events = io_poll_parse_events(sqe, flags);
+ else if (sqe->poll32_events)
+ return -EINVAL;
+
+ return 0;
+}
+
+static int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ struct io_poll_iocb *poll = &req->poll;
+ u32 flags;
+
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (sqe->ioprio || sqe->buf_index || sqe->off || sqe->addr)
+ return -EINVAL;
+ flags = READ_ONCE(sqe->len);
+ if (flags & ~IORING_POLL_ADD_MULTI)
+ return -EINVAL;
+ if ((flags & IORING_POLL_ADD_MULTI) && (req->flags & REQ_F_CQE_SKIP))
+ return -EINVAL;
+
+ io_req_set_refcount(req);
+ poll->events = io_poll_parse_events(sqe, flags);
+ return 0;
+}
+
+static int io_poll_add(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_poll_iocb *poll = &req->poll;
+ struct io_poll_table ipt;
+ int ret;
+
+ ipt.pt._qproc = io_poll_queue_proc;
+
+ ret = __io_arm_poll_handler(req, &req->poll, &ipt, poll->events);
+ if (!ret && ipt.error)
+ req_set_fail(req);
+ ret = ret ?: ipt.error;
+ if (ret)
+ __io_req_complete(req, issue_flags, ret, 0);
+ return 0;
+}
+
+static int io_poll_update(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ struct io_kiocb *preq;
+ int ret2, ret = 0;
+ bool locked;
+
+ spin_lock(&ctx->completion_lock);
+ preq = io_poll_find(ctx, req->poll_update.old_user_data, true);
+ if (!preq || !io_poll_disarm(preq)) {
+ spin_unlock(&ctx->completion_lock);
+ ret = preq ? -EALREADY : -ENOENT;
+ goto out;
+ }
+ spin_unlock(&ctx->completion_lock);
+
+ if (req->poll_update.update_events || req->poll_update.update_user_data) {
+ /* only mask one event flags, keep behavior flags */
+ if (req->poll_update.update_events) {
+ preq->poll.events &= ~0xffff;
+ preq->poll.events |= req->poll_update.events & 0xffff;
+ preq->poll.events |= IO_POLL_UNMASK;
+ }
+ if (req->poll_update.update_user_data)
+ preq->user_data = req->poll_update.new_user_data;
+
+ ret2 = io_poll_add(preq, issue_flags);
+ /* successfully updated, don't complete poll request */
+ if (!ret2)
+ goto out;
+ }
+
+ req_set_fail(preq);
+ preq->result = -ECANCELED;
+ locked = !(issue_flags & IO_URING_F_UNLOCKED);
+ io_req_task_complete(preq, &locked);
+out:
+ if (ret < 0)
+ req_set_fail(req);
+ /* complete update request, we're done with it */
+ __io_req_complete(req, issue_flags, ret, 0);
+ return 0;
+}
+
+static enum hrtimer_restart io_timeout_fn(struct hrtimer *timer)
+{
+ struct io_timeout_data *data = container_of(timer,
+ struct io_timeout_data, timer);
+ struct io_kiocb *req = data->req;
+ struct io_ring_ctx *ctx = req->ctx;
+ unsigned long flags;
+
+ spin_lock_irqsave(&ctx->timeout_lock, flags);
+ list_del_init(&req->timeout.list);
+ atomic_set(&req->ctx->cq_timeouts,
+ atomic_read(&req->ctx->cq_timeouts) + 1);
+ spin_unlock_irqrestore(&ctx->timeout_lock, flags);
+
+ if (!(data->flags & IORING_TIMEOUT_ETIME_SUCCESS))
+ req_set_fail(req);
+
+ req->result = -ETIME;
+ req->io_task_work.func = io_req_task_complete;
+ io_req_task_work_add(req, false);
+ return HRTIMER_NORESTART;
+}
+
+static struct io_kiocb *io_timeout_extract(struct io_ring_ctx *ctx,
+ __u64 user_data)
+ __must_hold(&ctx->timeout_lock)
+{
+ struct io_timeout_data *io;
+ struct io_kiocb *req;
+ bool found = false;
+
+ list_for_each_entry(req, &ctx->timeout_list, timeout.list) {
+ found = user_data == req->user_data;
+ if (found)
+ break;
+ }
+ if (!found)
+ return ERR_PTR(-ENOENT);
+
+ io = req->async_data;
+ if (hrtimer_try_to_cancel(&io->timer) == -1)
+ return ERR_PTR(-EALREADY);
+ list_del_init(&req->timeout.list);
+ return req;
+}
+
+static int io_timeout_cancel(struct io_ring_ctx *ctx, __u64 user_data)
+ __must_hold(&ctx->completion_lock)
+ __must_hold(&ctx->timeout_lock)
+{
+ struct io_kiocb *req = io_timeout_extract(ctx, user_data);
+
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+ io_req_task_queue_fail(req, -ECANCELED);
+ return 0;
+}
+
+static clockid_t io_timeout_get_clock(struct io_timeout_data *data)
+{
+ switch (data->flags & IORING_TIMEOUT_CLOCK_MASK) {
+ case IORING_TIMEOUT_BOOTTIME:
+ return CLOCK_BOOTTIME;
+ case IORING_TIMEOUT_REALTIME:
+ return CLOCK_REALTIME;
+ default:
+ /* can't happen, vetted at prep time */
+ WARN_ON_ONCE(1);
+ fallthrough;
+ case 0:
+ return CLOCK_MONOTONIC;
+ }
+}
+
+static int io_linked_timeout_update(struct io_ring_ctx *ctx, __u64 user_data,
+ struct timespec64 *ts, enum hrtimer_mode mode)
+ __must_hold(&ctx->timeout_lock)
+{
+ struct io_timeout_data *io;
+ struct io_kiocb *req;
+ bool found = false;
+
+ list_for_each_entry(req, &ctx->ltimeout_list, timeout.list) {
+ found = user_data == req->user_data;
+ if (found)
+ break;
+ }
+ if (!found)
+ return -ENOENT;
+
+ io = req->async_data;
+ if (hrtimer_try_to_cancel(&io->timer) == -1)
+ return -EALREADY;
+ hrtimer_init(&io->timer, io_timeout_get_clock(io), mode);
+ io->timer.function = io_link_timeout_fn;
+ hrtimer_start(&io->timer, timespec64_to_ktime(*ts), mode);
+ return 0;
+}
+
+static int io_timeout_update(struct io_ring_ctx *ctx, __u64 user_data,
+ struct timespec64 *ts, enum hrtimer_mode mode)
+ __must_hold(&ctx->timeout_lock)
+{
+ struct io_kiocb *req = io_timeout_extract(ctx, user_data);
+ struct io_timeout_data *data;
+
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ req->timeout.off = 0; /* noseq */
+ data = req->async_data;
+ list_add_tail(&req->timeout.list, &ctx->timeout_list);
+ hrtimer_init(&data->timer, io_timeout_get_clock(data), mode);
+ data->timer.function = io_timeout_fn;
+ hrtimer_start(&data->timer, timespec64_to_ktime(*ts), mode);
+ return 0;
+}
+
+static int io_timeout_remove_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+ struct io_timeout_rem *tr = &req->timeout_rem;
+
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT)))
+ return -EINVAL;
+ if (sqe->ioprio || sqe->buf_index || sqe->len || sqe->splice_fd_in)
+ return -EINVAL;
+
+ tr->ltimeout = false;
+ tr->addr = READ_ONCE(sqe->addr);
+ tr->flags = READ_ONCE(sqe->timeout_flags);
+ if (tr->flags & IORING_TIMEOUT_UPDATE_MASK) {
+ if (hweight32(tr->flags & IORING_TIMEOUT_CLOCK_MASK) > 1)
+ return -EINVAL;
+ if (tr->flags & IORING_LINK_TIMEOUT_UPDATE)
+ tr->ltimeout = true;
+ if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK|IORING_TIMEOUT_ABS))
+ return -EINVAL;
+ if (get_timespec64(&tr->ts, u64_to_user_ptr(sqe->addr2)))
+ return -EFAULT;
+ if (tr->ts.tv_sec < 0 || tr->ts.tv_nsec < 0)
+ return -EINVAL;
+ } else if (tr->flags) {
+ /* timeout removal doesn't support flags */
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static inline enum hrtimer_mode io_translate_timeout_mode(unsigned int flags)
+{
+ return (flags & IORING_TIMEOUT_ABS) ? HRTIMER_MODE_ABS
+ : HRTIMER_MODE_REL;
+}
+
+/*
+ * Remove or update an existing timeout command
+ */
+static int io_timeout_remove(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_timeout_rem *tr = &req->timeout_rem;
+ struct io_ring_ctx *ctx = req->ctx;
+ int ret;
+
+ if (!(req->timeout_rem.flags & IORING_TIMEOUT_UPDATE)) {
+ spin_lock(&ctx->completion_lock);
+ spin_lock_irq(&ctx->timeout_lock);
+ ret = io_timeout_cancel(ctx, tr->addr);
+ spin_unlock_irq(&ctx->timeout_lock);
+ spin_unlock(&ctx->completion_lock);
+ } else {
+ enum hrtimer_mode mode = io_translate_timeout_mode(tr->flags);
+
+ spin_lock_irq(&ctx->timeout_lock);
+ if (tr->ltimeout)
+ ret = io_linked_timeout_update(ctx, tr->addr, &tr->ts, mode);
+ else
+ ret = io_timeout_update(ctx, tr->addr, &tr->ts, mode);
+ spin_unlock_irq(&ctx->timeout_lock);
+ }
+
+ if (ret < 0)
+ req_set_fail(req);
+ io_req_complete_post(req, ret, 0);
+ return 0;
+}
+
+static int io_timeout_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe,
+ bool is_timeout_link)
+{
+ struct io_timeout_data *data;
+ unsigned flags;
+ u32 off = READ_ONCE(sqe->off);
+
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (sqe->ioprio || sqe->buf_index || sqe->len != 1 ||
+ sqe->splice_fd_in)
+ return -EINVAL;
+ if (off && is_timeout_link)
+ return -EINVAL;
+ flags = READ_ONCE(sqe->timeout_flags);
+ if (flags & ~(IORING_TIMEOUT_ABS | IORING_TIMEOUT_CLOCK_MASK |
+ IORING_TIMEOUT_ETIME_SUCCESS))
+ return -EINVAL;
+ /* more than one clock specified is invalid, obviously */
+ if (hweight32(flags & IORING_TIMEOUT_CLOCK_MASK) > 1)
+ return -EINVAL;
+
+ INIT_LIST_HEAD(&req->timeout.list);
+ req->timeout.off = off;
+ if (unlikely(off && !req->ctx->off_timeout_used))
+ req->ctx->off_timeout_used = true;
+
+ if (WARN_ON_ONCE(req_has_async_data(req)))
+ return -EFAULT;
+ if (io_alloc_async_data(req))
+ return -ENOMEM;
+
+ data = req->async_data;
+ data->req = req;
+ data->flags = flags;
+
+ if (get_timespec64(&data->ts, u64_to_user_ptr(sqe->addr)))
+ return -EFAULT;
+
+ if (data->ts.tv_sec < 0 || data->ts.tv_nsec < 0)
+ return -EINVAL;
+
+ INIT_LIST_HEAD(&req->timeout.list);
+ data->mode = io_translate_timeout_mode(flags);
+ hrtimer_init(&data->timer, io_timeout_get_clock(data), data->mode);
+
+ if (is_timeout_link) {
+ struct io_submit_link *link = &req->ctx->submit_state.link;
+
+ if (!link->head)
+ return -EINVAL;
+ if (link->last->opcode == IORING_OP_LINK_TIMEOUT)
+ return -EINVAL;
+ req->timeout.head = link->last;
+ link->last->flags |= REQ_F_ARM_LTIMEOUT;
+ }
+ return 0;
+}
+
+static int io_timeout(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ struct io_timeout_data *data = req->async_data;
+ struct list_head *entry;
+ u32 tail, off = req->timeout.off;
+
+ spin_lock_irq(&ctx->timeout_lock);
+
+ /*
+ * sqe->off holds how many events that need to occur for this
+ * timeout event to be satisfied. If it isn't set, then this is
+ * a pure timeout request, sequence isn't used.
+ */
+ if (io_is_timeout_noseq(req)) {
+ entry = ctx->timeout_list.prev;
+ goto add;
+ }
+
+ tail = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts);
+ req->timeout.target_seq = tail + off;
+
+ /* Update the last seq here in case io_flush_timeouts() hasn't.
+ * This is safe because ->completion_lock is held, and submissions
+ * and completions are never mixed in the same ->completion_lock section.
+ */
+ ctx->cq_last_tm_flush = tail;
+
+ /*
+ * Insertion sort, ensuring the first entry in the list is always
+ * the one we need first.
+ */
+ list_for_each_prev(entry, &ctx->timeout_list) {
+ struct io_kiocb *nxt = list_entry(entry, struct io_kiocb,
+ timeout.list);
+
+ if (io_is_timeout_noseq(nxt))
+ continue;
+ /* nxt.seq is behind @tail, otherwise would've been completed */
+ if (off >= nxt->timeout.target_seq - tail)
+ break;
+ }
+add:
+ list_add(&req->timeout.list, entry);
+ data->timer.function = io_timeout_fn;
+ hrtimer_start(&data->timer, timespec64_to_ktime(data->ts), data->mode);
+ spin_unlock_irq(&ctx->timeout_lock);
+ return 0;
+}
+
+struct io_cancel_data {
+ struct io_ring_ctx *ctx;
+ u64 user_data;
+};
+
+static bool io_cancel_cb(struct io_wq_work *work, void *data)
+{
+ struct io_kiocb *req = container_of(work, struct io_kiocb, work);
+ struct io_cancel_data *cd = data;
+
+ return req->ctx == cd->ctx && req->user_data == cd->user_data;
+}
+
+static int io_async_cancel_one(struct io_uring_task *tctx, u64 user_data,
+ struct io_ring_ctx *ctx)
+{
+ struct io_cancel_data data = { .ctx = ctx, .user_data = user_data, };
+ enum io_wq_cancel cancel_ret;
+ int ret = 0;
+
+ if (!tctx || !tctx->io_wq)
+ return -ENOENT;
+
+ cancel_ret = io_wq_cancel_cb(tctx->io_wq, io_cancel_cb, &data, false);
+ switch (cancel_ret) {
+ case IO_WQ_CANCEL_OK:
+ ret = 0;
+ break;
+ case IO_WQ_CANCEL_RUNNING:
+ ret = -EALREADY;
+ break;
+ case IO_WQ_CANCEL_NOTFOUND:
+ ret = -ENOENT;
+ break;
+ }
+
+ return ret;
+}
+
+static int io_try_cancel_userdata(struct io_kiocb *req, u64 sqe_addr)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ int ret;
+
+ WARN_ON_ONCE(!io_wq_current_is_worker() && req->task != current);
+
+ ret = io_async_cancel_one(req->task->io_uring, sqe_addr, ctx);
+ /*
+ * Fall-through even for -EALREADY, as we may have poll armed
+ * that need unarming.
+ */
+ if (!ret)
+ return 0;
+
+ spin_lock(&ctx->completion_lock);
+ ret = io_poll_cancel(ctx, sqe_addr, false);
+ if (ret != -ENOENT)
+ goto out;
+
+ spin_lock_irq(&ctx->timeout_lock);
+ ret = io_timeout_cancel(ctx, sqe_addr);
+ spin_unlock_irq(&ctx->timeout_lock);
+out:
+ spin_unlock(&ctx->completion_lock);
+ return ret;
+}
+
+static int io_async_cancel_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT)))
+ return -EINVAL;
+ if (sqe->ioprio || sqe->off || sqe->len || sqe->cancel_flags ||
+ sqe->splice_fd_in)
+ return -EINVAL;
+
+ req->cancel.addr = READ_ONCE(sqe->addr);
+ return 0;
+}
+
+static int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ u64 sqe_addr = req->cancel.addr;
+ bool needs_lock = issue_flags & IO_URING_F_UNLOCKED;
+ struct io_tctx_node *node;
+ int ret;
+
+ ret = io_try_cancel_userdata(req, sqe_addr);
+ if (ret != -ENOENT)
+ goto done;
+
+ /* slow path, try all io-wq's */
+ io_ring_submit_lock(ctx, needs_lock);
+ ret = -ENOENT;
+ list_for_each_entry(node, &ctx->tctx_list, ctx_node) {
+ struct io_uring_task *tctx = node->task->io_uring;
+
+ ret = io_async_cancel_one(tctx, req->cancel.addr, ctx);
+ if (ret != -ENOENT)
+ break;
+ }
+ io_ring_submit_unlock(ctx, needs_lock);
+done:
+ if (ret < 0)
+ req_set_fail(req);
+ io_req_complete_post(req, ret, 0);
+ return 0;
+}
+
+static int io_rsrc_update_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+ if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT)))
+ return -EINVAL;
+ if (sqe->ioprio || sqe->rw_flags || sqe->splice_fd_in)
+ return -EINVAL;
+
+ req->rsrc_update.offset = READ_ONCE(sqe->off);
+ req->rsrc_update.nr_args = READ_ONCE(sqe->len);
+ if (!req->rsrc_update.nr_args)
+ return -EINVAL;
+ req->rsrc_update.arg = READ_ONCE(sqe->addr);
+ return 0;
+}
+
+static int io_files_update(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ bool needs_lock = issue_flags & IO_URING_F_UNLOCKED;
+ struct io_uring_rsrc_update2 up;
+ int ret;
+
+ up.offset = req->rsrc_update.offset;
+ up.data = req->rsrc_update.arg;
+ up.nr = 0;
+ up.tags = 0;
+ up.resv = 0;
+ up.resv2 = 0;
+
+ io_ring_submit_lock(ctx, needs_lock);
+ ret = __io_register_rsrc_update(ctx, IORING_RSRC_FILE,
+ &up, req->rsrc_update.nr_args);
+ io_ring_submit_unlock(ctx, needs_lock);
+
+ if (ret < 0)
+ req_set_fail(req);
+ __io_req_complete(req, issue_flags, ret, 0);
+ return 0;
+}
+
+static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ switch (req->opcode) {
+ case IORING_OP_NOP:
+ return 0;
+ case IORING_OP_READV:
+ case IORING_OP_READ_FIXED:
+ case IORING_OP_READ:
+ case IORING_OP_WRITEV:
+ case IORING_OP_WRITE_FIXED:
+ case IORING_OP_WRITE:
+ return io_prep_rw(req, sqe);
+ case IORING_OP_POLL_ADD:
+ return io_poll_add_prep(req, sqe);
+ case IORING_OP_POLL_REMOVE:
+ return io_poll_update_prep(req, sqe);
+ case IORING_OP_FSYNC:
+ return io_fsync_prep(req, sqe);
+ case IORING_OP_SYNC_FILE_RANGE:
+ return io_sfr_prep(req, sqe);
+ case IORING_OP_SENDMSG:
+ case IORING_OP_SEND:
+ return io_sendmsg_prep(req, sqe);
+ case IORING_OP_RECVMSG:
+ case IORING_OP_RECV:
+ return io_recvmsg_prep(req, sqe);
+ case IORING_OP_CONNECT:
+ return io_connect_prep(req, sqe);
+ case IORING_OP_TIMEOUT:
+ return io_timeout_prep(req, sqe, false);
+ case IORING_OP_TIMEOUT_REMOVE:
+ return io_timeout_remove_prep(req, sqe);
+ case IORING_OP_ASYNC_CANCEL:
+ return io_async_cancel_prep(req, sqe);
+ case IORING_OP_LINK_TIMEOUT:
+ return io_timeout_prep(req, sqe, true);
+ case IORING_OP_ACCEPT:
+ return io_accept_prep(req, sqe);
+ case IORING_OP_FALLOCATE:
+ return io_fallocate_prep(req, sqe);
+ case IORING_OP_OPENAT:
+ return io_openat_prep(req, sqe);
+ case IORING_OP_CLOSE:
+ return io_close_prep(req, sqe);
+ case IORING_OP_FILES_UPDATE:
+ return io_rsrc_update_prep(req, sqe);
+ case IORING_OP_STATX:
+ return io_statx_prep(req, sqe);
+ case IORING_OP_FADVISE:
+ return io_fadvise_prep(req, sqe);
+ case IORING_OP_MADVISE:
+ return io_madvise_prep(req, sqe);
+ case IORING_OP_OPENAT2:
+ return io_openat2_prep(req, sqe);
+ case IORING_OP_EPOLL_CTL:
+ return io_epoll_ctl_prep(req, sqe);
+ case IORING_OP_SPLICE:
+ return io_splice_prep(req, sqe);
+ case IORING_OP_PROVIDE_BUFFERS:
+ return io_provide_buffers_prep(req, sqe);
+ case IORING_OP_REMOVE_BUFFERS:
+ return io_remove_buffers_prep(req, sqe);
+ case IORING_OP_TEE:
+ return io_tee_prep(req, sqe);
+ case IORING_OP_SHUTDOWN:
+ return io_shutdown_prep(req, sqe);
+ case IORING_OP_RENAMEAT:
+ return io_renameat_prep(req, sqe);
+ case IORING_OP_UNLINKAT:
+ return io_unlinkat_prep(req, sqe);
+ case IORING_OP_MKDIRAT:
+ return io_mkdirat_prep(req, sqe);
+ case IORING_OP_SYMLINKAT:
+ return io_symlinkat_prep(req, sqe);
+ case IORING_OP_LINKAT:
+ return io_linkat_prep(req, sqe);
+ case IORING_OP_MSG_RING:
+ return io_msg_ring_prep(req, sqe);
+ }
+
+ printk_once(KERN_WARNING "io_uring: unhandled opcode %d\n",
+ req->opcode);
+ return -EINVAL;
+}
+
+static int io_req_prep_async(struct io_kiocb *req)
+{
+ const struct io_op_def *def = &io_op_defs[req->opcode];
+
+ /* assign early for deferred execution for non-fixed file */
+ if (def->needs_file && !(req->flags & REQ_F_FIXED_FILE))
+ req->file = io_file_get_normal(req, req->fd);
+ if (!def->needs_async_setup)
+ return 0;
+ if (WARN_ON_ONCE(req_has_async_data(req)))
+ return -EFAULT;
+ if (io_alloc_async_data(req))
+ return -EAGAIN;
+
+ switch (req->opcode) {
+ case IORING_OP_READV:
+ return io_rw_prep_async(req, READ);
+ case IORING_OP_WRITEV:
+ return io_rw_prep_async(req, WRITE);
+ case IORING_OP_SENDMSG:
+ return io_sendmsg_prep_async(req);
+ case IORING_OP_RECVMSG:
+ return io_recvmsg_prep_async(req);
+ case IORING_OP_CONNECT:
+ return io_connect_prep_async(req);
+ }
+ printk_once(KERN_WARNING "io_uring: prep_async() bad opcode %d\n",
+ req->opcode);
+ return -EFAULT;
+}
+
+static u32 io_get_sequence(struct io_kiocb *req)
+{
+ u32 seq = req->ctx->cached_sq_head;
+
+ /* need original cached_sq_head, but it was increased for each req */
+ io_for_each_link(req, req)
+ seq--;
+ return seq;
+}
+
+static __cold void io_drain_req(struct io_kiocb *req)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ struct io_defer_entry *de;
+ int ret;
+ u32 seq = io_get_sequence(req);
+
+ /* Still need defer if there is pending req in defer list. */
+ spin_lock(&ctx->completion_lock);
+ if (!req_need_defer(req, seq) && list_empty_careful(&ctx->defer_list)) {
+ spin_unlock(&ctx->completion_lock);
+queue:
+ ctx->drain_active = false;
+ io_req_task_queue(req);
+ return;
+ }
+ spin_unlock(&ctx->completion_lock);
+
+ ret = io_req_prep_async(req);
+ if (ret) {
+fail:
+ io_req_complete_failed(req, ret);
+ return;
+ }
+ io_prep_async_link(req);
+ de = kmalloc(sizeof(*de), GFP_KERNEL);
+ if (!de) {
+ ret = -ENOMEM;
+ goto fail;
+ }
+
+ spin_lock(&ctx->completion_lock);
+ if (!req_need_defer(req, seq) && list_empty(&ctx->defer_list)) {
+ spin_unlock(&ctx->completion_lock);
+ kfree(de);
+ goto queue;
+ }
+
+ trace_io_uring_defer(ctx, req, req->user_data, req->opcode);
+ de->req = req;
+ de->seq = seq;
+ list_add_tail(&de->list, &ctx->defer_list);
+ spin_unlock(&ctx->completion_lock);
+}
+
+static void io_clean_op(struct io_kiocb *req)
+{
+ if (req->flags & REQ_F_BUFFER_SELECTED) {
+ spin_lock(&req->ctx->completion_lock);
+ io_put_kbuf_comp(req);
+ spin_unlock(&req->ctx->completion_lock);
+ }
+
+ if (req->flags & REQ_F_NEED_CLEANUP) {
+ switch (req->opcode) {
+ case IORING_OP_READV:
+ case IORING_OP_READ_FIXED:
+ case IORING_OP_READ:
+ case IORING_OP_WRITEV:
+ case IORING_OP_WRITE_FIXED:
+ case IORING_OP_WRITE: {
+ struct io_async_rw *io = req->async_data;
+
+ kfree(io->free_iovec);
+ break;
+ }
+ case IORING_OP_RECVMSG:
+ case IORING_OP_SENDMSG: {
+ struct io_async_msghdr *io = req->async_data;
+
+ kfree(io->free_iov);
+ break;
+ }
+ case IORING_OP_OPENAT:
+ case IORING_OP_OPENAT2:
+ if (req->open.filename)
+ putname(req->open.filename);
+ break;
+ case IORING_OP_RENAMEAT:
+ putname(req->rename.oldpath);
+ putname(req->rename.newpath);
+ break;
+ case IORING_OP_UNLINKAT:
+ putname(req->unlink.filename);
+ break;
+ case IORING_OP_MKDIRAT:
+ putname(req->mkdir.filename);
+ break;
+ case IORING_OP_SYMLINKAT:
+ putname(req->symlink.oldpath);
+ putname(req->symlink.newpath);
+ break;
+ case IORING_OP_LINKAT:
+ putname(req->hardlink.oldpath);
+ putname(req->hardlink.newpath);
+ break;
+ case IORING_OP_STATX:
+ if (req->statx.filename)
+ putname(req->statx.filename);
+ break;
+ }
+ }
+ if ((req->flags & REQ_F_POLLED) && req->apoll) {
+ kfree(req->apoll->double_poll);
+ kfree(req->apoll);
+ req->apoll = NULL;
+ }
+ if (req->flags & REQ_F_INFLIGHT) {
+ struct io_uring_task *tctx = req->task->io_uring;
+
+ atomic_dec(&tctx->inflight_tracked);
+ }
+ if (req->flags & REQ_F_CREDS)
+ put_cred(req->creds);
+ if (req->flags & REQ_F_ASYNC_DATA) {
+ kfree(req->async_data);
+ req->async_data = NULL;
+ }
+ req->flags &= ~IO_REQ_CLEAN_FLAGS;
+}
+
+static bool io_assign_file(struct io_kiocb *req, unsigned int issue_flags)
+{
+ if (req->file || !io_op_defs[req->opcode].needs_file)
+ return true;
+
+ if (req->flags & REQ_F_FIXED_FILE)
+ req->file = io_file_get_fixed(req, req->fd, issue_flags);
+ else
+ req->file = io_file_get_normal(req, req->fd);
+ if (req->file)
+ return true;
+
+ req_set_fail(req);
+ req->result = -EBADF;
+ return false;
+}
+
+static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
+{
+ const struct cred *creds = NULL;
+ int ret;
+
+ if (unlikely(!io_assign_file(req, issue_flags)))
+ return -EBADF;
+
+ if (unlikely((req->flags & REQ_F_CREDS) && req->creds != current_cred()))
+ creds = override_creds(req->creds);
+
+ if (!io_op_defs[req->opcode].audit_skip)
+ audit_uring_entry(req->opcode);
+
+ switch (req->opcode) {
+ case IORING_OP_NOP:
+ ret = io_nop(req, issue_flags);
+ break;
+ case IORING_OP_READV:
+ case IORING_OP_READ_FIXED:
+ case IORING_OP_READ:
+ ret = io_read(req, issue_flags);
+ break;
+ case IORING_OP_WRITEV:
+ case IORING_OP_WRITE_FIXED:
+ case IORING_OP_WRITE:
+ ret = io_write(req, issue_flags);
+ break;
+ case IORING_OP_FSYNC:
+ ret = io_fsync(req, issue_flags);
+ break;
+ case IORING_OP_POLL_ADD:
+ ret = io_poll_add(req, issue_flags);
+ break;
+ case IORING_OP_POLL_REMOVE:
+ ret = io_poll_update(req, issue_flags);
+ break;
+ case IORING_OP_SYNC_FILE_RANGE:
+ ret = io_sync_file_range(req, issue_flags);
+ break;
+ case IORING_OP_SENDMSG:
+ ret = io_sendmsg(req, issue_flags);
+ break;
+ case IORING_OP_SEND:
+ ret = io_send(req, issue_flags);
+ break;
+ case IORING_OP_RECVMSG:
+ ret = io_recvmsg(req, issue_flags);
+ break;
+ case IORING_OP_RECV:
+ ret = io_recv(req, issue_flags);
+ break;
+ case IORING_OP_TIMEOUT:
+ ret = io_timeout(req, issue_flags);
+ break;
+ case IORING_OP_TIMEOUT_REMOVE:
+ ret = io_timeout_remove(req, issue_flags);
+ break;
+ case IORING_OP_ACCEPT:
+ ret = io_accept(req, issue_flags);
+ break;
+ case IORING_OP_CONNECT:
+ ret = io_connect(req, issue_flags);
+ break;
+ case IORING_OP_ASYNC_CANCEL:
+ ret = io_async_cancel(req, issue_flags);
+ break;
+ case IORING_OP_FALLOCATE:
+ ret = io_fallocate(req, issue_flags);
+ break;
+ case IORING_OP_OPENAT:
+ ret = io_openat(req, issue_flags);
+ break;
+ case IORING_OP_CLOSE:
+ ret = io_close(req, issue_flags);
+ break;
+ case IORING_OP_FILES_UPDATE:
+ ret = io_files_update(req, issue_flags);
+ break;
+ case IORING_OP_STATX:
+ ret = io_statx(req, issue_flags);
+ break;
+ case IORING_OP_FADVISE:
+ ret = io_fadvise(req, issue_flags);
+ break;
+ case IORING_OP_MADVISE:
+ ret = io_madvise(req, issue_flags);
+ break;
+ case IORING_OP_OPENAT2:
+ ret = io_openat2(req, issue_flags);
+ break;
+ case IORING_OP_EPOLL_CTL:
+ ret = io_epoll_ctl(req, issue_flags);
+ break;
+ case IORING_OP_SPLICE:
+ ret = io_splice(req, issue_flags);
+ break;
+ case IORING_OP_PROVIDE_BUFFERS:
+ ret = io_provide_buffers(req, issue_flags);
+ break;
+ case IORING_OP_REMOVE_BUFFERS:
+ ret = io_remove_buffers(req, issue_flags);
+ break;
+ case IORING_OP_TEE:
+ ret = io_tee(req, issue_flags);
+ break;
+ case IORING_OP_SHUTDOWN:
+ ret = io_shutdown(req, issue_flags);
+ break;
+ case IORING_OP_RENAMEAT:
+ ret = io_renameat(req, issue_flags);
+ break;
+ case IORING_OP_UNLINKAT:
+ ret = io_unlinkat(req, issue_flags);
+ break;
+ case IORING_OP_MKDIRAT:
+ ret = io_mkdirat(req, issue_flags);
+ break;
+ case IORING_OP_SYMLINKAT:
+ ret = io_symlinkat(req, issue_flags);
+ break;
+ case IORING_OP_LINKAT:
+ ret = io_linkat(req, issue_flags);
+ break;
+ case IORING_OP_MSG_RING:
+ ret = io_msg_ring(req, issue_flags);
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ }
+
+ if (!io_op_defs[req->opcode].audit_skip)
+ audit_uring_exit(!ret, ret);
+
+ if (creds)
+ revert_creds(creds);
+ if (ret)
+ return ret;
+ /* If the op doesn't have a file, we're not polling for it */
+ if ((req->ctx->flags & IORING_SETUP_IOPOLL) && req->file)
+ io_iopoll_req_issued(req, issue_flags);
+
+ return 0;
+}
+
+static struct io_wq_work *io_wq_free_work(struct io_wq_work *work)
+{
+ struct io_kiocb *req = container_of(work, struct io_kiocb, work);
+
+ req = io_put_req_find_next(req);
+ return req ? &req->work : NULL;
+}
+
+static void io_wq_submit_work(struct io_wq_work *work)
+{
+ struct io_kiocb *req = container_of(work, struct io_kiocb, work);
+ const struct io_op_def *def = &io_op_defs[req->opcode];
+ unsigned int issue_flags = IO_URING_F_UNLOCKED;
+ bool needs_poll = false;
+ struct io_kiocb *timeout;
+ int ret = 0, err = -ECANCELED;
+
+ /* one will be dropped by ->io_free_work() after returning to io-wq */
+ if (!(req->flags & REQ_F_REFCOUNT))
+ __io_req_set_refcount(req, 2);
+ else
+ req_ref_get(req);
+
+ timeout = io_prep_linked_timeout(req);
+ if (timeout)
+ io_queue_linked_timeout(timeout);
+
+
+ /* either cancelled or io-wq is dying, so don't touch tctx->iowq */
+ if (work->flags & IO_WQ_WORK_CANCEL) {
+fail:
+ io_req_task_queue_fail(req, err);
+ return;
+ }
+ if (!io_assign_file(req, issue_flags)) {
+ err = -EBADF;
+ work->flags |= IO_WQ_WORK_CANCEL;
+ goto fail;
+ }
+
+ if (req->flags & REQ_F_FORCE_ASYNC) {
+ bool opcode_poll = def->pollin || def->pollout;
+
+ if (opcode_poll && file_can_poll(req->file)) {
+ needs_poll = true;
+ issue_flags |= IO_URING_F_NONBLOCK;
+ }
+ }
+
+ do {
+ ret = io_issue_sqe(req, issue_flags);
+ if (ret != -EAGAIN)
+ break;
+ /*
+ * We can get EAGAIN for iopolled IO even though we're
+ * forcing a sync submission from here, since we can't
+ * wait for request slots on the block side.
+ */
+ if (!needs_poll) {
+ if (!(req->ctx->flags & IORING_SETUP_IOPOLL))
+ break;
+ cond_resched();
+ continue;
+ }
+
+ if (io_arm_poll_handler(req, issue_flags) == IO_APOLL_OK)
+ return;
+ /* aborted or ready, in either case retry blocking */
+ needs_poll = false;
+ issue_flags &= ~IO_URING_F_NONBLOCK;
+ } while (1);
+
+ /* avoid locking problems by failing it from a clean context */
+ if (ret)
+ io_req_task_queue_fail(req, ret);
+}
+
+static inline struct io_fixed_file *io_fixed_file_slot(struct io_file_table *table,
+ unsigned i)
+{
+ return &table->files[i];
+}
+
+static inline struct file *io_file_from_index(struct io_ring_ctx *ctx,
+ int index)
+{
+ struct io_fixed_file *slot = io_fixed_file_slot(&ctx->file_table, index);
+
+ return (struct file *) (slot->file_ptr & FFS_MASK);
+}
+
+static void io_fixed_file_set(struct io_fixed_file *file_slot, struct file *file)
+{
+ unsigned long file_ptr = (unsigned long) file;
+
+ file_ptr |= io_file_get_flags(file);
+ file_slot->file_ptr = file_ptr;
+}
+
+static inline struct file *io_file_get_fixed(struct io_kiocb *req, int fd,
+ unsigned int issue_flags)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ struct file *file = NULL;
+ unsigned long file_ptr;
+
+ if (issue_flags & IO_URING_F_UNLOCKED)
+ mutex_lock(&ctx->uring_lock);
+
+ if (unlikely((unsigned int)fd >= ctx->nr_user_files))
+ goto out;
+ fd = array_index_nospec(fd, ctx->nr_user_files);
+ file_ptr = io_fixed_file_slot(&ctx->file_table, fd)->file_ptr;
+ file = (struct file *) (file_ptr & FFS_MASK);
+ file_ptr &= ~FFS_MASK;
+ /* mask in overlapping REQ_F and FFS bits */
+ req->flags |= (file_ptr << REQ_F_SUPPORT_NOWAIT_BIT);
+ io_req_set_rsrc_node(req, ctx, 0);
+out:
+ if (issue_flags & IO_URING_F_UNLOCKED)
+ mutex_unlock(&ctx->uring_lock);
+ return file;
+}
+
+static struct file *io_file_get_normal(struct io_kiocb *req, int fd)
+{
+ struct file *file = fget(fd);
+
+ trace_io_uring_file_get(req->ctx, req, req->user_data, fd);
+
+ /* we don't allow fixed io_uring files */
+ if (file && file->f_op == &io_uring_fops)
+ io_req_track_inflight(req);
+ return file;
+}
+
+static void io_req_task_link_timeout(struct io_kiocb *req, bool *locked)
+{
+ struct io_kiocb *prev = req->timeout.prev;
+ int ret = -ENOENT;
+
+ if (prev) {
+ if (!(req->task->flags & PF_EXITING))
+ ret = io_try_cancel_userdata(req, prev->user_data);
+ io_req_complete_post(req, ret ?: -ETIME, 0);
+ io_put_req(prev);
+ } else {
+ io_req_complete_post(req, -ETIME, 0);
+ }
+}
+
+static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer)
+{
+ struct io_timeout_data *data = container_of(timer,
+ struct io_timeout_data, timer);
+ struct io_kiocb *prev, *req = data->req;
+ struct io_ring_ctx *ctx = req->ctx;
+ unsigned long flags;
+
+ spin_lock_irqsave(&ctx->timeout_lock, flags);
+ prev = req->timeout.head;
+ req->timeout.head = NULL;
+
+ /*
+ * We don't expect the list to be empty, that will only happen if we
+ * race with the completion of the linked work.
+ */
+ if (prev) {
+ io_remove_next_linked(prev);
+ if (!req_ref_inc_not_zero(prev))
+ prev = NULL;
+ }
+ list_del(&req->timeout.list);
+ req->timeout.prev = prev;
+ spin_unlock_irqrestore(&ctx->timeout_lock, flags);
+
+ req->io_task_work.func = io_req_task_link_timeout;
+ io_req_task_work_add(req, false);
+ return HRTIMER_NORESTART;
+}
+
+static void io_queue_linked_timeout(struct io_kiocb *req)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+
+ spin_lock_irq(&ctx->timeout_lock);
+ /*
+ * If the back reference is NULL, then our linked request finished
+ * before we got a chance to setup the timer
+ */
+ if (req->timeout.head) {
+ struct io_timeout_data *data = req->async_data;
+
+ data->timer.function = io_link_timeout_fn;
+ hrtimer_start(&data->timer, timespec64_to_ktime(data->ts),
+ data->mode);
+ list_add_tail(&req->timeout.list, &ctx->ltimeout_list);
+ }
+ spin_unlock_irq(&ctx->timeout_lock);
+ /* drop submission reference */
+ io_put_req(req);
+}
+
+static void io_queue_sqe_arm_apoll(struct io_kiocb *req)
+ __must_hold(&req->ctx->uring_lock)
+{
+ struct io_kiocb *linked_timeout = io_prep_linked_timeout(req);
+
+ switch (io_arm_poll_handler(req, 0)) {
+ case IO_APOLL_READY:
+ io_req_task_queue(req);
+ break;
+ case IO_APOLL_ABORTED:
+ /*
+ * Queued up for async execution, worker will release
+ * submit reference when the iocb is actually submitted.
+ */
+ io_queue_async_work(req, NULL);
+ break;
+ case IO_APOLL_OK:
+ break;
+ }
+
+ if (linked_timeout)
+ io_queue_linked_timeout(linked_timeout);
+}
+
+static inline void __io_queue_sqe(struct io_kiocb *req)
+ __must_hold(&req->ctx->uring_lock)
+{
+ struct io_kiocb *linked_timeout;
+ int ret;
+
+ ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
+
+ if (req->flags & REQ_F_COMPLETE_INLINE) {
+ io_req_add_compl_list(req);
+ return;
+ }
+ /*
+ * We async punt it if the file wasn't marked NOWAIT, or if the file
+ * doesn't support non-blocking read/write attempts
+ */
+ if (likely(!ret)) {
+ linked_timeout = io_prep_linked_timeout(req);
+ if (linked_timeout)
+ io_queue_linked_timeout(linked_timeout);
+ } else if (ret == -EAGAIN && !(req->flags & REQ_F_NOWAIT)) {
+ io_queue_sqe_arm_apoll(req);
+ } else {
+ io_req_complete_failed(req, ret);
+ }
+}
+
+static void io_queue_sqe_fallback(struct io_kiocb *req)
+ __must_hold(&req->ctx->uring_lock)
+{
+ if (req->flags & REQ_F_FAIL) {
+ io_req_complete_fail_submit(req);
+ } else if (unlikely(req->ctx->drain_active)) {
+ io_drain_req(req);
+ } else {
+ int ret = io_req_prep_async(req);
+
+ if (unlikely(ret))
+ io_req_complete_failed(req, ret);
+ else
+ io_queue_async_work(req, NULL);
+ }
+}
+
+static inline void io_queue_sqe(struct io_kiocb *req)
+ __must_hold(&req->ctx->uring_lock)
+{
+ if (likely(!(req->flags & (REQ_F_FORCE_ASYNC | REQ_F_FAIL))))
+ __io_queue_sqe(req);
+ else
+ io_queue_sqe_fallback(req);
+}
+
+/*
+ * Check SQE restrictions (opcode and flags).
+ *
+ * Returns 'true' if SQE is allowed, 'false' otherwise.
+ */
+static inline bool io_check_restriction(struct io_ring_ctx *ctx,
+ struct io_kiocb *req,
+ unsigned int sqe_flags)
+{
+ if (!test_bit(req->opcode, ctx->restrictions.sqe_op))
+ return false;
+
+ if ((sqe_flags & ctx->restrictions.sqe_flags_required) !=
+ ctx->restrictions.sqe_flags_required)
+ return false;
+
+ if (sqe_flags & ~(ctx->restrictions.sqe_flags_allowed |
+ ctx->restrictions.sqe_flags_required))
+ return false;
+
+ return true;
+}
+
+static void io_init_req_drain(struct io_kiocb *req)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ struct io_kiocb *head = ctx->submit_state.link.head;
+
+ ctx->drain_active = true;
+ if (head) {
+ /*
+ * If we need to drain a request in the middle of a link, drain
+ * the head request and the next request/link after the current
+ * link. Considering sequential execution of links,
+ * REQ_F_IO_DRAIN will be maintained for every request of our
+ * link.
+ */
+ head->flags |= REQ_F_IO_DRAIN | REQ_F_FORCE_ASYNC;
+ ctx->drain_next = true;
+ }
+}
+
+static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+ __must_hold(&ctx->uring_lock)
+{
+ unsigned int sqe_flags;
+ int personality;
+ u8 opcode;
+
+ /* req is partially pre-initialised, see io_preinit_req() */
+ req->opcode = opcode = READ_ONCE(sqe->opcode);
+ /* same numerical values with corresponding REQ_F_*, safe to copy */
+ req->flags = sqe_flags = READ_ONCE(sqe->flags);
+ req->user_data = READ_ONCE(sqe->user_data);
+ req->file = NULL;
+ req->fixed_rsrc_refs = NULL;
+ req->task = current;
+
+ if (unlikely(opcode >= IORING_OP_LAST)) {
+ req->opcode = 0;
+ return -EINVAL;
+ }
+ if (unlikely(sqe_flags & ~SQE_COMMON_FLAGS)) {
+ /* enforce forwards compatibility on users */
+ if (sqe_flags & ~SQE_VALID_FLAGS)
+ return -EINVAL;
+ if ((sqe_flags & IOSQE_BUFFER_SELECT) &&
+ !io_op_defs[opcode].buffer_select)
+ return -EOPNOTSUPP;
+ if (sqe_flags & IOSQE_CQE_SKIP_SUCCESS)
+ ctx->drain_disabled = true;
+ if (sqe_flags & IOSQE_IO_DRAIN) {
+ if (ctx->drain_disabled)
+ return -EOPNOTSUPP;
+ io_init_req_drain(req);
+ }
+ }
+ if (unlikely(ctx->restricted || ctx->drain_active || ctx->drain_next)) {
+ if (ctx->restricted && !io_check_restriction(ctx, req, sqe_flags))
+ return -EACCES;
+ /* knock it to the slow queue path, will be drained there */
+ if (ctx->drain_active)
+ req->flags |= REQ_F_FORCE_ASYNC;
+ /* if there is no link, we're at "next" request and need to drain */
+ if (unlikely(ctx->drain_next) && !ctx->submit_state.link.head) {
+ ctx->drain_next = false;
+ ctx->drain_active = true;
+ req->flags |= REQ_F_IO_DRAIN | REQ_F_FORCE_ASYNC;
+ }
+ }
+
+ if (io_op_defs[opcode].needs_file) {
+ struct io_submit_state *state = &ctx->submit_state;
+
+ req->fd = READ_ONCE(sqe->fd);
+
+ /*
+ * Plug now if we have more than 2 IO left after this, and the
+ * target is potentially a read/write to block based storage.
+ */
+ if (state->need_plug && io_op_defs[opcode].plug) {
+ state->plug_started = true;
+ state->need_plug = false;
+ blk_start_plug_nr_ios(&state->plug, state->submit_nr);
+ }
+ }
+
+ personality = READ_ONCE(sqe->personality);
+ if (personality) {
+ int ret;
+
+ req->creds = xa_load(&ctx->personalities, personality);
+ if (!req->creds)
+ return -EINVAL;
+ get_cred(req->creds);
+ ret = security_uring_override_creds(req->creds);
+ if (ret) {
+ put_cred(req->creds);
+ return ret;
+ }
+ req->flags |= REQ_F_CREDS;
+ }
+
+ return io_req_prep(req, sqe);
+}
+
+static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+ __must_hold(&ctx->uring_lock)
+{
+ struct io_submit_link *link = &ctx->submit_state.link;
+ int ret;
+
+ ret = io_init_req(ctx, req, sqe);
+ if (unlikely(ret)) {
+ trace_io_uring_req_failed(sqe, ctx, req, ret);
+
+ /* fail even hard links since we don't submit */
+ if (link->head) {
+ /*
+ * we can judge a link req is failed or cancelled by if
+ * REQ_F_FAIL is set, but the head is an exception since
+ * it may be set REQ_F_FAIL because of other req's failure
+ * so let's leverage req->result to distinguish if a head
+ * is set REQ_F_FAIL because of its failure or other req's
+ * failure so that we can set the correct ret code for it.
+ * init result here to avoid affecting the normal path.
+ */
+ if (!(link->head->flags & REQ_F_FAIL))
+ req_fail_link_node(link->head, -ECANCELED);
+ } else if (!(req->flags & (REQ_F_LINK | REQ_F_HARDLINK))) {
+ /*
+ * the current req is a normal req, we should return
+ * error and thus break the submittion loop.
+ */
+ io_req_complete_failed(req, ret);
+ return ret;
+ }
+ req_fail_link_node(req, ret);
+ }
+
+ /* don't need @sqe from now on */
+ trace_io_uring_submit_sqe(ctx, req, req->user_data, req->opcode,
+ req->flags, true,
+ ctx->flags & IORING_SETUP_SQPOLL);
+
+ /*
+ * If we already have a head request, queue this one for async
+ * submittal once the head completes. If we don't have a head but
+ * IOSQE_IO_LINK is set in the sqe, start a new head. This one will be
+ * submitted sync once the chain is complete. If none of those
+ * conditions are true (normal request), then just queue it.
+ */
+ if (link->head) {
+ struct io_kiocb *head = link->head;
+
+ if (!(req->flags & REQ_F_FAIL)) {
+ ret = io_req_prep_async(req);
+ if (unlikely(ret)) {
+ req_fail_link_node(req, ret);
+ if (!(head->flags & REQ_F_FAIL))
+ req_fail_link_node(head, -ECANCELED);
+ }
+ }
+ trace_io_uring_link(ctx, req, head);
+ link->last->link = req;
+ link->last = req;
+
+ if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK))
+ return 0;
+ /* last request of a link, enqueue the link */
+ link->head = NULL;
+ req = head;
+ } else if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK)) {
+ link->head = req;
+ link->last = req;
+ return 0;
+ }
+
+ io_queue_sqe(req);
+ return 0;
+}
+
+/*
+ * Batched submission is done, ensure local IO is flushed out.
+ */
+static void io_submit_state_end(struct io_ring_ctx *ctx)
+{
+ struct io_submit_state *state = &ctx->submit_state;
+
+ if (state->link.head)
+ io_queue_sqe(state->link.head);
+ /* flush only after queuing links as they can generate completions */
+ io_submit_flush_completions(ctx);
+ if (state->plug_started)
+ blk_finish_plug(&state->plug);
+}
+
+/*
+ * Start submission side cache.
+ */
+static void io_submit_state_start(struct io_submit_state *state,
+ unsigned int max_ios)
+{
+ state->plug_started = false;
+ state->need_plug = max_ios > 2;
+ state->submit_nr = max_ios;
+ /* set only head, no need to init link_last in advance */
+ state->link.head = NULL;
+}
+
+static void io_commit_sqring(struct io_ring_ctx *ctx)
+{
+ struct io_rings *rings = ctx->rings;
+
+ /*
+ * Ensure any loads from the SQEs are done at this point,
+ * since once we write the new head, the application could
+ * write new data to them.
+ */
+ smp_store_release(&rings->sq.head, ctx->cached_sq_head);
+}
+
+/*
+ * Fetch an sqe, if one is available. Note this returns a pointer to memory
+ * that is mapped by userspace. This means that care needs to be taken to
+ * ensure that reads are stable, as we cannot rely on userspace always
+ * being a good citizen. If members of the sqe are validated and then later
+ * used, it's important that those reads are done through READ_ONCE() to
+ * prevent a re-load down the line.
+ */
+static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx)
+{
+ unsigned head, mask = ctx->sq_entries - 1;
+ unsigned sq_idx = ctx->cached_sq_head++ & mask;
+
+ /*
+ * The cached sq head (or cq tail) serves two purposes:
+ *
+ * 1) allows us to batch the cost of updating the user visible
+ * head updates.
+ * 2) allows the kernel side to track the head on its own, even
+ * though the application is the one updating it.
+ */
+ head = READ_ONCE(ctx->sq_array[sq_idx]);
+ if (likely(head < ctx->sq_entries))
+ return &ctx->sq_sqes[head];
+
+ /* drop invalid entries */
+ ctx->cq_extra--;
+ WRITE_ONCE(ctx->rings->sq_dropped,
+ READ_ONCE(ctx->rings->sq_dropped) + 1);
+ return NULL;
+}
+
+static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
+ __must_hold(&ctx->uring_lock)
+{
+ unsigned int entries = io_sqring_entries(ctx);
+ int submitted = 0;
+
+ if (unlikely(!entries))
+ return 0;
+ /* make sure SQ entry isn't read before tail */
+ nr = min3(nr, ctx->sq_entries, entries);
+ io_get_task_refs(nr);
+
+ io_submit_state_start(&ctx->submit_state, nr);
+ do {
+ const struct io_uring_sqe *sqe;
+ struct io_kiocb *req;
+
+ if (unlikely(!io_alloc_req_refill(ctx))) {
+ if (!submitted)
+ submitted = -EAGAIN;
+ break;
+ }
+ req = io_alloc_req(ctx);
+ sqe = io_get_sqe(ctx);
+ if (unlikely(!sqe)) {
+ wq_stack_add_head(&req->comp_list, &ctx->submit_state.free_list);
+ break;
+ }
+ /* will complete beyond this point, count as submitted */
+ submitted++;
+ if (io_submit_sqe(ctx, req, sqe)) {
+ /*
+ * Continue submitting even for sqe failure if the
+ * ring was setup with IORING_SETUP_SUBMIT_ALL
+ */
+ if (!(ctx->flags & IORING_SETUP_SUBMIT_ALL))
+ break;
+ }
+ } while (submitted < nr);
+
+ if (unlikely(submitted != nr)) {
+ int ref_used = (submitted == -EAGAIN) ? 0 : submitted;
+ int unused = nr - ref_used;
+
+ current->io_uring->cached_refs += unused;
+ }
+
+ io_submit_state_end(ctx);
+ /* Commit SQ ring head once we've consumed and submitted all SQEs */
+ io_commit_sqring(ctx);
+
+ return submitted;
+}
+
+static inline bool io_sqd_events_pending(struct io_sq_data *sqd)
+{
+ return READ_ONCE(sqd->state);
+}
+
+static inline void io_ring_set_wakeup_flag(struct io_ring_ctx *ctx)
+{
+ /* Tell userspace we may need a wakeup call */
+ spin_lock(&ctx->completion_lock);
+ WRITE_ONCE(ctx->rings->sq_flags,
+ ctx->rings->sq_flags | IORING_SQ_NEED_WAKEUP);
+ spin_unlock(&ctx->completion_lock);
+}
+
+static inline void io_ring_clear_wakeup_flag(struct io_ring_ctx *ctx)
+{
+ spin_lock(&ctx->completion_lock);
+ WRITE_ONCE(ctx->rings->sq_flags,
+ ctx->rings->sq_flags & ~IORING_SQ_NEED_WAKEUP);
+ spin_unlock(&ctx->completion_lock);
+}
+
+static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
+{
+ unsigned int to_submit;
+ int ret = 0;
+
+ to_submit = io_sqring_entries(ctx);
+ /* if we're handling multiple rings, cap submit size for fairness */
+ if (cap_entries && to_submit > IORING_SQPOLL_CAP_ENTRIES_VALUE)
+ to_submit = IORING_SQPOLL_CAP_ENTRIES_VALUE;
+
+ if (!wq_list_empty(&ctx->iopoll_list) || to_submit) {
+ const struct cred *creds = NULL;
+
+ if (ctx->sq_creds != current_cred())
+ creds = override_creds(ctx->sq_creds);
+
+ mutex_lock(&ctx->uring_lock);
+ if (!wq_list_empty(&ctx->iopoll_list))
+ io_do_iopoll(ctx, true);
+
+ /*
+ * Don't submit if refs are dying, good for io_uring_register(),
+ * but also it is relied upon by io_ring_exit_work()
+ */
+ if (to_submit && likely(!percpu_ref_is_dying(&ctx->refs)) &&
+ !(ctx->flags & IORING_SETUP_R_DISABLED))
+ ret = io_submit_sqes(ctx, to_submit);
+ mutex_unlock(&ctx->uring_lock);
+
+ if (to_submit && wq_has_sleeper(&ctx->sqo_sq_wait))
+ wake_up(&ctx->sqo_sq_wait);
+ if (creds)
+ revert_creds(creds);
+ }
+
+ return ret;
+}
+
+static __cold void io_sqd_update_thread_idle(struct io_sq_data *sqd)
+{
+ struct io_ring_ctx *ctx;
+ unsigned sq_thread_idle = 0;
+
+ list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
+ sq_thread_idle = max(sq_thread_idle, ctx->sq_thread_idle);
+ sqd->sq_thread_idle = sq_thread_idle;
+}
+
+static bool io_sqd_handle_event(struct io_sq_data *sqd)
+{
+ bool did_sig = false;
+ struct ksignal ksig;
+
+ if (test_bit(IO_SQ_THREAD_SHOULD_PARK, &sqd->state) ||
+ signal_pending(current)) {
+ mutex_unlock(&sqd->lock);
+ if (signal_pending(current))
+ did_sig = get_signal(&ksig);
+ cond_resched();
+ mutex_lock(&sqd->lock);
+ }
+ return did_sig || test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state);
+}
+
+static int io_sq_thread(void *data)
+{
+ struct io_sq_data *sqd = data;
+ struct io_ring_ctx *ctx;
+ unsigned long timeout = 0;
+ char buf[TASK_COMM_LEN];
+ DEFINE_WAIT(wait);
+
+ snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
+ set_task_comm(current, buf);
+
+ if (sqd->sq_cpu != -1)
+ set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu));
+ else
+ set_cpus_allowed_ptr(current, cpu_online_mask);
+ current->flags |= PF_NO_SETAFFINITY;
+
+ audit_alloc_kernel(current);
+
+ mutex_lock(&sqd->lock);
+ while (1) {
+ bool cap_entries, sqt_spin = false;
+
+ if (io_sqd_events_pending(sqd) || signal_pending(current)) {
+ if (io_sqd_handle_event(sqd))
+ break;
+ timeout = jiffies + sqd->sq_thread_idle;
+ }
+
+ cap_entries = !list_is_singular(&sqd->ctx_list);
+ list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
+ int ret = __io_sq_thread(ctx, cap_entries);
+
+ if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list)))
+ sqt_spin = true;
+ }
+ if (io_run_task_work())
+ sqt_spin = true;
+
+ if (sqt_spin || !time_after(jiffies, timeout)) {
+ cond_resched();
+ if (sqt_spin)
+ timeout = jiffies + sqd->sq_thread_idle;
+ continue;
+ }
+
+ prepare_to_wait(&sqd->wait, &wait, TASK_INTERRUPTIBLE);
+ if (!io_sqd_events_pending(sqd) && !task_work_pending(current)) {
+ bool needs_sched = true;
+
+ list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
+ io_ring_set_wakeup_flag(ctx);
+
+ if ((ctx->flags & IORING_SETUP_IOPOLL) &&
+ !wq_list_empty(&ctx->iopoll_list)) {
+ needs_sched = false;
+ break;
+ }
+
+ /*
+ * Ensure the store of the wakeup flag is not
+ * reordered with the load of the SQ tail
+ */
+ smp_mb();
+
+ if (io_sqring_entries(ctx)) {
+ needs_sched = false;
+ break;
+ }
+ }
+
+ if (needs_sched) {
+ mutex_unlock(&sqd->lock);
+ schedule();
+ mutex_lock(&sqd->lock);
+ }
+ list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
+ io_ring_clear_wakeup_flag(ctx);
+ }
+
+ finish_wait(&sqd->wait, &wait);
+ timeout = jiffies + sqd->sq_thread_idle;
+ }
+
+ io_uring_cancel_generic(true, sqd);
+ sqd->thread = NULL;
+ list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
+ io_ring_set_wakeup_flag(ctx);
+ io_run_task_work();
+ mutex_unlock(&sqd->lock);
+
+ audit_free(current);
+
+ complete(&sqd->exited);
+ do_exit(0);
+}
+
+struct io_wait_queue {
+ struct wait_queue_entry wq;
+ struct io_ring_ctx *ctx;
+ unsigned cq_tail;
+ unsigned nr_timeouts;
+};
+
+static inline bool io_should_wake(struct io_wait_queue *iowq)
+{
+ struct io_ring_ctx *ctx = iowq->ctx;
+ int dist = ctx->cached_cq_tail - (int) iowq->cq_tail;
+
+ /*
+ * Wake up if we have enough events, or if a timeout occurred since we
+ * started waiting. For timeouts, we always want to return to userspace,
+ * regardless of event count.
+ */
+ return dist >= 0 || atomic_read(&ctx->cq_timeouts) != iowq->nr_timeouts;
+}
+
+static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode,
+ int wake_flags, void *key)
+{
+ struct io_wait_queue *iowq = container_of(curr, struct io_wait_queue,
+ wq);
+
+ /*
+ * Cannot safely flush overflowed CQEs from here, ensure we wake up
+ * the task, and the next invocation will do it.
+ */
+ if (io_should_wake(iowq) || test_bit(0, &iowq->ctx->check_cq_overflow))
+ return autoremove_wake_function(curr, mode, wake_flags, key);
+ return -1;
+}
+
+static int io_run_task_work_sig(void)
+{
+ if (io_run_task_work())
+ return 1;
+ if (test_thread_flag(TIF_NOTIFY_SIGNAL))
+ return -ERESTARTSYS;
+ if (task_sigpending(current))
+ return -EINTR;
+ return 0;
+}
+
+/* when returns >0, the caller should retry */
+static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
+ struct io_wait_queue *iowq,
+ ktime_t timeout)
+{
+ int ret;
+
+ /* make sure we run task_work before checking for signals */
+ ret = io_run_task_work_sig();
+ if (ret || io_should_wake(iowq))
+ return ret;
+ /* let the caller flush overflows, retry */
+ if (test_bit(0, &ctx->check_cq_overflow))
+ return 1;
+
+ if (!schedule_hrtimeout(&timeout, HRTIMER_MODE_ABS))
+ return -ETIME;
+ return 1;
+}
+
+/*
+ * Wait until events become available, if we don't already have some. The
+ * application must reap them itself, as they reside on the shared cq ring.
+ */
+static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
+ const sigset_t __user *sig, size_t sigsz,
+ struct __kernel_timespec __user *uts)
+{
+ struct io_wait_queue iowq;
+ struct io_rings *rings = ctx->rings;
+ ktime_t timeout = KTIME_MAX;
+ int ret;
+
+ do {
+ io_cqring_overflow_flush(ctx);
+ if (io_cqring_events(ctx) >= min_events)
+ return 0;
+ if (!io_run_task_work())
+ break;
+ } while (1);
+
+ if (sig) {
+#ifdef CONFIG_COMPAT
+ if (in_compat_syscall())
+ ret = set_compat_user_sigmask((const compat_sigset_t __user *)sig,
+ sigsz);
+ else
+#endif
+ ret = set_user_sigmask(sig, sigsz);
+
+ if (ret)
+ return ret;
+ }
+
+ if (uts) {
+ struct timespec64 ts;
+
+ if (get_timespec64(&ts, uts))
+ return -EFAULT;
+ timeout = ktime_add_ns(timespec64_to_ktime(ts), ktime_get_ns());
+ }
+
+ init_waitqueue_func_entry(&iowq.wq, io_wake_function);
+ iowq.wq.private = current;
+ INIT_LIST_HEAD(&iowq.wq.entry);
+ iowq.ctx = ctx;
+ iowq.nr_timeouts = atomic_read(&ctx->cq_timeouts);
+ iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events;
+
+ trace_io_uring_cqring_wait(ctx, min_events);
+ do {
+ /* if we can't even flush overflow, don't wait for more */
+ if (!io_cqring_overflow_flush(ctx)) {
+ ret = -EBUSY;
+ break;
+ }
+ prepare_to_wait_exclusive(&ctx->cq_wait, &iowq.wq,
+ TASK_INTERRUPTIBLE);
+ ret = io_cqring_wait_schedule(ctx, &iowq, timeout);
+ finish_wait(&ctx->cq_wait, &iowq.wq);
+ cond_resched();
+ } while (ret > 0);
+
+ restore_saved_sigmask_unless(ret == -EINTR);
+
+ return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0;
+}
+
+static void io_free_page_table(void **table, size_t size)
+{
+ unsigned i, nr_tables = DIV_ROUND_UP(size, PAGE_SIZE);
+
+ for (i = 0; i < nr_tables; i++)
+ kfree(table[i]);
+ kfree(table);
+}
+
+static __cold void **io_alloc_page_table(size_t size)
+{
+ unsigned i, nr_tables = DIV_ROUND_UP(size, PAGE_SIZE);
+ size_t init_size = size;
+ void **table;
+
+ table = kcalloc(nr_tables, sizeof(*table), GFP_KERNEL_ACCOUNT);
+ if (!table)
+ return NULL;
+
+ for (i = 0; i < nr_tables; i++) {
+ unsigned int this_size = min_t(size_t, size, PAGE_SIZE);
+
+ table[i] = kzalloc(this_size, GFP_KERNEL_ACCOUNT);
+ if (!table[i]) {
+ io_free_page_table(table, init_size);
+ return NULL;
+ }
+ size -= this_size;
+ }
+ return table;
+}
+
+static void io_rsrc_node_destroy(struct io_rsrc_node *ref_node)
+{
+ percpu_ref_exit(&ref_node->refs);
+ kfree(ref_node);
+}
+
+static __cold void io_rsrc_node_ref_zero(struct percpu_ref *ref)
+{
+ struct io_rsrc_node *node = container_of(ref, struct io_rsrc_node, refs);
+ struct io_ring_ctx *ctx = node->rsrc_data->ctx;
+ unsigned long flags;
+ bool first_add = false;
+ unsigned long delay = HZ;
+
+ spin_lock_irqsave(&ctx->rsrc_ref_lock, flags);
+ node->done = true;
+
+ /* if we are mid-quiesce then do not delay */
+ if (node->rsrc_data->quiesce)
+ delay = 0;
+
+ while (!list_empty(&ctx->rsrc_ref_list)) {
+ node = list_first_entry(&ctx->rsrc_ref_list,
+ struct io_rsrc_node, node);
+ /* recycle ref nodes in order */
+ if (!node->done)
+ break;
+ list_del(&node->node);
+ first_add |= llist_add(&node->llist, &ctx->rsrc_put_llist);
+ }
+ spin_unlock_irqrestore(&ctx->rsrc_ref_lock, flags);
+
+ if (first_add)
+ mod_delayed_work(system_wq, &ctx->rsrc_put_work, delay);
+}
+
+static struct io_rsrc_node *io_rsrc_node_alloc(void)
+{
+ struct io_rsrc_node *ref_node;
+
+ ref_node = kzalloc(sizeof(*ref_node), GFP_KERNEL);
+ if (!ref_node)
+ return NULL;
+
+ if (percpu_ref_init(&ref_node->refs, io_rsrc_node_ref_zero,
+ 0, GFP_KERNEL)) {
+ kfree(ref_node);
+ return NULL;
+ }
+ INIT_LIST_HEAD(&ref_node->node);
+ INIT_LIST_HEAD(&ref_node->rsrc_list);
+ ref_node->done = false;
+ return ref_node;
+}
+
+static void io_rsrc_node_switch(struct io_ring_ctx *ctx,
+ struct io_rsrc_data *data_to_kill)
+ __must_hold(&ctx->uring_lock)
+{
+ WARN_ON_ONCE(!ctx->rsrc_backup_node);
+ WARN_ON_ONCE(data_to_kill && !ctx->rsrc_node);
+
+ io_rsrc_refs_drop(ctx);
+
+ if (data_to_kill) {
+ struct io_rsrc_node *rsrc_node = ctx->rsrc_node;
+
+ rsrc_node->rsrc_data = data_to_kill;
+ spin_lock_irq(&ctx->rsrc_ref_lock);
+ list_add_tail(&rsrc_node->node, &ctx->rsrc_ref_list);
+ spin_unlock_irq(&ctx->rsrc_ref_lock);
+
+ atomic_inc(&data_to_kill->refs);
+ percpu_ref_kill(&rsrc_node->refs);
+ ctx->rsrc_node = NULL;
+ }
+
+ if (!ctx->rsrc_node) {
+ ctx->rsrc_node = ctx->rsrc_backup_node;
+ ctx->rsrc_backup_node = NULL;
+ }
+}
+
+static int io_rsrc_node_switch_start(struct io_ring_ctx *ctx)
+{
+ if (ctx->rsrc_backup_node)
+ return 0;
+ ctx->rsrc_backup_node = io_rsrc_node_alloc();
+ return ctx->rsrc_backup_node ? 0 : -ENOMEM;
+}
+
+static __cold int io_rsrc_ref_quiesce(struct io_rsrc_data *data,
+ struct io_ring_ctx *ctx)
+{
+ int ret;
+
+ /* As we may drop ->uring_lock, other task may have started quiesce */
+ if (data->quiesce)
+ return -ENXIO;
+
+ data->quiesce = true;
+ do {
+ ret = io_rsrc_node_switch_start(ctx);
+ if (ret)
+ break;
+ io_rsrc_node_switch(ctx, data);
+
+ /* kill initial ref, already quiesced if zero */
+ if (atomic_dec_and_test(&data->refs))
+ break;
+ mutex_unlock(&ctx->uring_lock);
+ flush_delayed_work(&ctx->rsrc_put_work);
+ ret = wait_for_completion_interruptible(&data->done);
+ if (!ret) {
+ mutex_lock(&ctx->uring_lock);
+ if (atomic_read(&data->refs) > 0) {
+ /*
+ * it has been revived by another thread while
+ * we were unlocked
+ */
+ mutex_unlock(&ctx->uring_lock);
+ } else {
+ break;
+ }
+ }
+
+ atomic_inc(&data->refs);
+ /* wait for all works potentially completing data->done */
+ flush_delayed_work(&ctx->rsrc_put_work);
+ reinit_completion(&data->done);
+
+ ret = io_run_task_work_sig();
+ mutex_lock(&ctx->uring_lock);
+ } while (ret >= 0);
+ data->quiesce = false;
+
+ return ret;
+}
+
+static u64 *io_get_tag_slot(struct io_rsrc_data *data, unsigned int idx)
+{
+ unsigned int off = idx & IO_RSRC_TAG_TABLE_MASK;
+ unsigned int table_idx = idx >> IO_RSRC_TAG_TABLE_SHIFT;
+
+ return &data->tags[table_idx][off];
+}
+
+static void io_rsrc_data_free(struct io_rsrc_data *data)
+{
+ size_t size = data->nr * sizeof(data->tags[0][0]);
+
+ if (data->tags)
+ io_free_page_table((void **)data->tags, size);
+ kfree(data);
+}
+
+static __cold int io_rsrc_data_alloc(struct io_ring_ctx *ctx, rsrc_put_fn *do_put,
+ u64 __user *utags, unsigned nr,
+ struct io_rsrc_data **pdata)
+{
+ struct io_rsrc_data *data;
+ int ret = -ENOMEM;
+ unsigned i;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+ data->tags = (u64 **)io_alloc_page_table(nr * sizeof(data->tags[0][0]));
+ if (!data->tags) {
+ kfree(data);
+ return -ENOMEM;
+ }
+
+ data->nr = nr;
+ data->ctx = ctx;
+ data->do_put = do_put;
+ if (utags) {
+ ret = -EFAULT;
+ for (i = 0; i < nr; i++) {
+ u64 *tag_slot = io_get_tag_slot(data, i);
+
+ if (copy_from_user(tag_slot, &utags[i],
+ sizeof(*tag_slot)))
+ goto fail;
+ }
+ }
+
+ atomic_set(&data->refs, 1);
+ init_completion(&data->done);
+ *pdata = data;
+ return 0;
+fail:
+ io_rsrc_data_free(data);
+ return ret;
+}
+
+static bool io_alloc_file_tables(struct io_file_table *table, unsigned nr_files)
+{
+ table->files = kvcalloc(nr_files, sizeof(table->files[0]),
+ GFP_KERNEL_ACCOUNT);
+ return !!table->files;
+}
+
+static void io_free_file_tables(struct io_file_table *table)
+{
+ kvfree(table->files);
+ table->files = NULL;
+}
+
+static void __io_sqe_files_unregister(struct io_ring_ctx *ctx)
+{
+#if defined(CONFIG_UNIX)
+ if (ctx->ring_sock) {
+ struct sock *sock = ctx->ring_sock->sk;
+ struct sk_buff *skb;
+
+ while ((skb = skb_dequeue(&sock->sk_receive_queue)) != NULL)
+ kfree_skb(skb);
+ }
+#else
+ int i;
+
+ for (i = 0; i < ctx->nr_user_files; i++) {
+ struct file *file;
+
+ file = io_file_from_index(ctx, i);
+ if (file)
+ fput(file);
+ }
+#endif
+ io_free_file_tables(&ctx->file_table);
+ io_rsrc_data_free(ctx->file_data);
+ ctx->file_data = NULL;
+ ctx->nr_user_files = 0;
+}
+
+static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
+{
+ unsigned nr = ctx->nr_user_files;
+ int ret;
+
+ if (!ctx->file_data)
+ return -ENXIO;
+
+ /*
+ * Quiesce may unlock ->uring_lock, and while it's not held
+ * prevent new requests using the table.
+ */
+ ctx->nr_user_files = 0;
+ ret = io_rsrc_ref_quiesce(ctx->file_data, ctx);
+ ctx->nr_user_files = nr;
+ if (!ret)
+ __io_sqe_files_unregister(ctx);
+ return ret;
+}
+
+static void io_sq_thread_unpark(struct io_sq_data *sqd)
+ __releases(&sqd->lock)
+{
+ WARN_ON_ONCE(sqd->thread == current);
+
+ /*
+ * Do the dance but not conditional clear_bit() because it'd race with
+ * other threads incrementing park_pending and setting the bit.
+ */
+ clear_bit(IO_SQ_THREAD_SHOULD_PARK, &sqd->state);
+ if (atomic_dec_return(&sqd->park_pending))
+ set_bit(IO_SQ_THREAD_SHOULD_PARK, &sqd->state);
+ mutex_unlock(&sqd->lock);
+}
+
+static void io_sq_thread_park(struct io_sq_data *sqd)
+ __acquires(&sqd->lock)
+{
+ WARN_ON_ONCE(sqd->thread == current);
+
+ atomic_inc(&sqd->park_pending);
+ set_bit(IO_SQ_THREAD_SHOULD_PARK, &sqd->state);
+ mutex_lock(&sqd->lock);
+ if (sqd->thread)
+ wake_up_process(sqd->thread);
+}
+
+static void io_sq_thread_stop(struct io_sq_data *sqd)
+{
+ WARN_ON_ONCE(sqd->thread == current);
+ WARN_ON_ONCE(test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state));
+
+ set_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state);
+ mutex_lock(&sqd->lock);
+ if (sqd->thread)
+ wake_up_process(sqd->thread);
+ mutex_unlock(&sqd->lock);
+ wait_for_completion(&sqd->exited);
+}
+
+static void io_put_sq_data(struct io_sq_data *sqd)
+{
+ if (refcount_dec_and_test(&sqd->refs)) {
+ WARN_ON_ONCE(atomic_read(&sqd->park_pending));
+
+ io_sq_thread_stop(sqd);
+ kfree(sqd);
+ }
+}
+
+static void io_sq_thread_finish(struct io_ring_ctx *ctx)
+{
+ struct io_sq_data *sqd = ctx->sq_data;
+
+ if (sqd) {
+ io_sq_thread_park(sqd);
+ list_del_init(&ctx->sqd_list);
+ io_sqd_update_thread_idle(sqd);
+ io_sq_thread_unpark(sqd);
+
+ io_put_sq_data(sqd);
+ ctx->sq_data = NULL;
+ }
+}
+
+static struct io_sq_data *io_attach_sq_data(struct io_uring_params *p)
+{
+ struct io_ring_ctx *ctx_attach;
+ struct io_sq_data *sqd;
+ struct fd f;
+
+ f = fdget(p->wq_fd);
+ if (!f.file)
+ return ERR_PTR(-ENXIO);
+ if (f.file->f_op != &io_uring_fops) {
+ fdput(f);
+ return ERR_PTR(-EINVAL);
+ }
+
+ ctx_attach = f.file->private_data;
+ sqd = ctx_attach->sq_data;
+ if (!sqd) {
+ fdput(f);
+ return ERR_PTR(-EINVAL);
+ }
+ if (sqd->task_tgid != current->tgid) {
+ fdput(f);
+ return ERR_PTR(-EPERM);
+ }
+
+ refcount_inc(&sqd->refs);
+ fdput(f);
+ return sqd;
+}
+
+static struct io_sq_data *io_get_sq_data(struct io_uring_params *p,
+ bool *attached)
+{
+ struct io_sq_data *sqd;
+
+ *attached = false;
+ if (p->flags & IORING_SETUP_ATTACH_WQ) {
+ sqd = io_attach_sq_data(p);
+ if (!IS_ERR(sqd)) {
+ *attached = true;
+ return sqd;
+ }
+ /* fall through for EPERM case, setup new sqd/task */
+ if (PTR_ERR(sqd) != -EPERM)
+ return sqd;
+ }
+
+ sqd = kzalloc(sizeof(*sqd), GFP_KERNEL);
+ if (!sqd)
+ return ERR_PTR(-ENOMEM);
+
+ atomic_set(&sqd->park_pending, 0);
+ refcount_set(&sqd->refs, 1);
+ INIT_LIST_HEAD(&sqd->ctx_list);
+ mutex_init(&sqd->lock);
+ init_waitqueue_head(&sqd->wait);
+ init_completion(&sqd->exited);
+ return sqd;
+}
+
+#if defined(CONFIG_UNIX)
+/*
+ * Ensure the UNIX gc is aware of our file set, so we are certain that
+ * the io_uring can be safely unregistered on process exit, even if we have
+ * loops in the file referencing.
+ */
+static int __io_sqe_files_scm(struct io_ring_ctx *ctx, int nr, int offset)
+{
+ struct sock *sk = ctx->ring_sock->sk;
+ struct scm_fp_list *fpl;
+ struct sk_buff *skb;
+ int i, nr_files;
+
+ fpl = kzalloc(sizeof(*fpl), GFP_KERNEL);
+ if (!fpl)
+ return -ENOMEM;
+
+ skb = alloc_skb(0, GFP_KERNEL);
+ if (!skb) {
+ kfree(fpl);
+ return -ENOMEM;
+ }
+
+ skb->sk = sk;
+
+ nr_files = 0;
+ fpl->user = get_uid(current_user());
+ for (i = 0; i < nr; i++) {
+ struct file *file = io_file_from_index(ctx, i + offset);
+
+ if (!file)
+ continue;
+ fpl->fp[nr_files] = get_file(file);
+ unix_inflight(fpl->user, fpl->fp[nr_files]);
+ nr_files++;
+ }
+
+ if (nr_files) {
+ fpl->max = SCM_MAX_FD;
+ fpl->count = nr_files;
+ UNIXCB(skb).fp = fpl;
+ skb->destructor = unix_destruct_scm;
+ refcount_add(skb->truesize, &sk->sk_wmem_alloc);
+ skb_queue_head(&sk->sk_receive_queue, skb);
+
+ for (i = 0; i < nr; i++) {
+ struct file *file = io_file_from_index(ctx, i + offset);
+
+ if (file)
+ fput(file);
+ }
+ } else {
+ kfree_skb(skb);
+ free_uid(fpl->user);
+ kfree(fpl);
+ }
+
+ return 0;
+}
+
+/*
+ * If UNIX sockets are enabled, fd passing can cause a reference cycle which
+ * causes regular reference counting to break down. We rely on the UNIX
+ * garbage collection to take care of this problem for us.
+ */
+static int io_sqe_files_scm(struct io_ring_ctx *ctx)
+{
+ unsigned left, total;
+ int ret = 0;
+
+ total = 0;
+ left = ctx->nr_user_files;
+ while (left) {
+ unsigned this_files = min_t(unsigned, left, SCM_MAX_FD);
+
+ ret = __io_sqe_files_scm(ctx, this_files, total);
+ if (ret)
+ break;
+ left -= this_files;
+ total += this_files;
+ }
+
+ if (!ret)
+ return 0;
+
+ while (total < ctx->nr_user_files) {
+ struct file *file = io_file_from_index(ctx, total);
+
+ if (file)
+ fput(file);
+ total++;
+ }
+
+ return ret;
+}
+#else
+static int io_sqe_files_scm(struct io_ring_ctx *ctx)
+{
+ return 0;
+}
+#endif
+
+static void io_rsrc_file_put(struct io_ring_ctx *ctx, struct io_rsrc_put *prsrc)
+{
+ struct file *file = prsrc->file;
+#if defined(CONFIG_UNIX)
+ struct sock *sock = ctx->ring_sock->sk;
+ struct sk_buff_head list, *head = &sock->sk_receive_queue;
+ struct sk_buff *skb;
+ int i;
+
+ __skb_queue_head_init(&list);
+
+ /*
+ * Find the skb that holds this file in its SCM_RIGHTS. When found,
+ * remove this entry and rearrange the file array.
+ */
+ skb = skb_dequeue(head);
+ while (skb) {
+ struct scm_fp_list *fp;
+
+ fp = UNIXCB(skb).fp;
+ for (i = 0; i < fp->count; i++) {
+ int left;
+
+ if (fp->fp[i] != file)
+ continue;
+
+ unix_notinflight(fp->user, fp->fp[i]);
+ left = fp->count - 1 - i;
+ if (left) {
+ memmove(&fp->fp[i], &fp->fp[i + 1],
+ left * sizeof(struct file *));
+ }
+ fp->count--;
+ if (!fp->count) {
+ kfree_skb(skb);
+ skb = NULL;
+ } else {
+ __skb_queue_tail(&list, skb);
+ }
+ fput(file);
+ file = NULL;
+ break;
+ }
+
+ if (!file)
+ break;
+
+ __skb_queue_tail(&list, skb);
+
+ skb = skb_dequeue(head);
+ }
+
+ if (skb_peek(&list)) {
+ spin_lock_irq(&head->lock);
+ while ((skb = __skb_dequeue(&list)) != NULL)
+ __skb_queue_tail(head, skb);
+ spin_unlock_irq(&head->lock);
+ }
+#else
+ fput(file);
+#endif
+}
+
+static void __io_rsrc_put_work(struct io_rsrc_node *ref_node)
+{
+ struct io_rsrc_data *rsrc_data = ref_node->rsrc_data;
+ struct io_ring_ctx *ctx = rsrc_data->ctx;
+ struct io_rsrc_put *prsrc, *tmp;
+
+ list_for_each_entry_safe(prsrc, tmp, &ref_node->rsrc_list, list) {
+ list_del(&prsrc->list);
+
+ if (prsrc->tag) {
+ bool lock_ring = ctx->flags & IORING_SETUP_IOPOLL;
+
+ io_ring_submit_lock(ctx, lock_ring);
+ spin_lock(&ctx->completion_lock);
+ io_fill_cqe_aux(ctx, prsrc->tag, 0, 0);
+ io_commit_cqring(ctx);
+ spin_unlock(&ctx->completion_lock);
+ io_cqring_ev_posted(ctx);
+ io_ring_submit_unlock(ctx, lock_ring);
+ }
+
+ rsrc_data->do_put(ctx, prsrc);
+ kfree(prsrc);
+ }
+
+ io_rsrc_node_destroy(ref_node);
+ if (atomic_dec_and_test(&rsrc_data->refs))
+ complete(&rsrc_data->done);
+}
+
+static void io_rsrc_put_work(struct work_struct *work)
+{
+ struct io_ring_ctx *ctx;
+ struct llist_node *node;
+
+ ctx = container_of(work, struct io_ring_ctx, rsrc_put_work.work);
+ node = llist_del_all(&ctx->rsrc_put_llist);
+
+ while (node) {
+ struct io_rsrc_node *ref_node;
+ struct llist_node *next = node->next;
+
+ ref_node = llist_entry(node, struct io_rsrc_node, llist);
+ __io_rsrc_put_work(ref_node);
+ node = next;
+ }
+}
+
+static int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
+ unsigned nr_args, u64 __user *tags)
+{
+ __s32 __user *fds = (__s32 __user *) arg;
+ struct file *file;
+ int fd, ret;
+ unsigned i;
+
+ if (ctx->file_data)
+ return -EBUSY;
+ if (!nr_args)
+ return -EINVAL;
+ if (nr_args > IORING_MAX_FIXED_FILES)
+ return -EMFILE;
+ if (nr_args > rlimit(RLIMIT_NOFILE))
+ return -EMFILE;
+ ret = io_rsrc_node_switch_start(ctx);
+ if (ret)
+ return ret;
+ ret = io_rsrc_data_alloc(ctx, io_rsrc_file_put, tags, nr_args,
+ &ctx->file_data);
+ if (ret)
+ return ret;
+
+ ret = -ENOMEM;
+ if (!io_alloc_file_tables(&ctx->file_table, nr_args))
+ goto out_free;
+
+ for (i = 0; i < nr_args; i++, ctx->nr_user_files++) {
+ if (copy_from_user(&fd, &fds[i], sizeof(fd))) {
+ ret = -EFAULT;
+ goto out_fput;
+ }
+ /* allow sparse sets */
+ if (fd == -1) {
+ ret = -EINVAL;
+ if (unlikely(*io_get_tag_slot(ctx->file_data, i)))
+ goto out_fput;
+ continue;
+ }
+
+ file = fget(fd);
+ ret = -EBADF;
+ if (unlikely(!file))
+ goto out_fput;
+
+ /*
+ * Don't allow io_uring instances to be registered. If UNIX
+ * isn't enabled, then this causes a reference cycle and this
+ * instance can never get freed. If UNIX is enabled we'll
+ * handle it just fine, but there's still no point in allowing
+ * a ring fd as it doesn't support regular read/write anyway.
+ */
+ if (file->f_op == &io_uring_fops) {
+ fput(file);
+ goto out_fput;
+ }
+ io_fixed_file_set(io_fixed_file_slot(&ctx->file_table, i), file);
+ }
+
+ ret = io_sqe_files_scm(ctx);
+ if (ret) {
+ __io_sqe_files_unregister(ctx);
+ return ret;
+ }
+
+ io_rsrc_node_switch(ctx, NULL);
+ return ret;
+out_fput:
+ for (i = 0; i < ctx->nr_user_files; i++) {
+ file = io_file_from_index(ctx, i);
+ if (file)
+ fput(file);
+ }
+ io_free_file_tables(&ctx->file_table);
+ ctx->nr_user_files = 0;
+out_free:
+ io_rsrc_data_free(ctx->file_data);
+ ctx->file_data = NULL;
+ return ret;
+}
+
+static int io_sqe_file_register(struct io_ring_ctx *ctx, struct file *file,
+ int index)
+{
+#if defined(CONFIG_UNIX)
+ struct sock *sock = ctx->ring_sock->sk;
+ struct sk_buff_head *head = &sock->sk_receive_queue;
+ struct sk_buff *skb;
+
+ /*
+ * See if we can merge this file into an existing skb SCM_RIGHTS
+ * file set. If there's no room, fall back to allocating a new skb
+ * and filling it in.
+ */
+ spin_lock_irq(&head->lock);
+ skb = skb_peek(head);
+ if (skb) {
+ struct scm_fp_list *fpl = UNIXCB(skb).fp;
+
+ if (fpl->count < SCM_MAX_FD) {
+ __skb_unlink(skb, head);
+ spin_unlock_irq(&head->lock);
+ fpl->fp[fpl->count] = get_file(file);
+ unix_inflight(fpl->user, fpl->fp[fpl->count]);
+ fpl->count++;
+ spin_lock_irq(&head->lock);
+ __skb_queue_head(head, skb);
+ } else {
+ skb = NULL;
+ }
+ }
+ spin_unlock_irq(&head->lock);
+
+ if (skb) {
+ fput(file);
+ return 0;
+ }
+
+ return __io_sqe_files_scm(ctx, 1, index);
+#else
+ return 0;
+#endif
+}
+
+static int io_queue_rsrc_removal(struct io_rsrc_data *data, unsigned idx,
+ struct io_rsrc_node *node, void *rsrc)
+{
+ u64 *tag_slot = io_get_tag_slot(data, idx);
+ struct io_rsrc_put *prsrc;
+
+ prsrc = kzalloc(sizeof(*prsrc), GFP_KERNEL);
+ if (!prsrc)
+ return -ENOMEM;
+
+ prsrc->tag = *tag_slot;
+ *tag_slot = 0;
+ prsrc->rsrc = rsrc;
+ list_add(&prsrc->list, &node->rsrc_list);
+ return 0;
+}
+
+static int io_install_fixed_file(struct io_kiocb *req, struct file *file,
+ unsigned int issue_flags, u32 slot_index)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ bool needs_lock = issue_flags & IO_URING_F_UNLOCKED;
+ bool needs_switch = false;
+ struct io_fixed_file *file_slot;
+ int ret = -EBADF;
+
+ io_ring_submit_lock(ctx, needs_lock);
+ if (file->f_op == &io_uring_fops)
+ goto err;
+ ret = -ENXIO;
+ if (!ctx->file_data)
+ goto err;
+ ret = -EINVAL;
+ if (slot_index >= ctx->nr_user_files)
+ goto err;
+
+ slot_index = array_index_nospec(slot_index, ctx->nr_user_files);
+ file_slot = io_fixed_file_slot(&ctx->file_table, slot_index);
+
+ if (file_slot->file_ptr) {
+ struct file *old_file;
+
+ ret = io_rsrc_node_switch_start(ctx);
+ if (ret)
+ goto err;
+
+ old_file = (struct file *)(file_slot->file_ptr & FFS_MASK);
+ ret = io_queue_rsrc_removal(ctx->file_data, slot_index,
+ ctx->rsrc_node, old_file);
+ if (ret)
+ goto err;
+ file_slot->file_ptr = 0;
+ needs_switch = true;
+ }
+
+ *io_get_tag_slot(ctx->file_data, slot_index) = 0;
+ io_fixed_file_set(file_slot, file);
+ ret = io_sqe_file_register(ctx, file, slot_index);
+ if (ret) {
+ file_slot->file_ptr = 0;
+ goto err;
+ }
+
+ ret = 0;
+err:
+ if (needs_switch)
+ io_rsrc_node_switch(ctx, ctx->file_data);
+ io_ring_submit_unlock(ctx, needs_lock);
+ if (ret)
+ fput(file);
+ return ret;
+}
+
+static int io_close_fixed(struct io_kiocb *req, unsigned int issue_flags)
+{
+ unsigned int offset = req->close.file_slot - 1;
+ struct io_ring_ctx *ctx = req->ctx;
+ bool needs_lock = issue_flags & IO_URING_F_UNLOCKED;
+ struct io_fixed_file *file_slot;
+ struct file *file;
+ int ret;
+
+ io_ring_submit_lock(ctx, needs_lock);
+ ret = -ENXIO;
+ if (unlikely(!ctx->file_data))
+ goto out;
+ ret = -EINVAL;
+ if (offset >= ctx->nr_user_files)
+ goto out;
+ ret = io_rsrc_node_switch_start(ctx);
+ if (ret)
+ goto out;
+
+ offset = array_index_nospec(offset, ctx->nr_user_files);
+ file_slot = io_fixed_file_slot(&ctx->file_table, offset);
+ ret = -EBADF;
+ if (!file_slot->file_ptr)
+ goto out;
+
+ file = (struct file *)(file_slot->file_ptr & FFS_MASK);
+ ret = io_queue_rsrc_removal(ctx->file_data, offset, ctx->rsrc_node, file);
+ if (ret)
+ goto out;
+
+ file_slot->file_ptr = 0;
+ io_rsrc_node_switch(ctx, ctx->file_data);
+ ret = 0;
+out:
+ io_ring_submit_unlock(ctx, needs_lock);
+ return ret;
+}
+
+static int __io_sqe_files_update(struct io_ring_ctx *ctx,
+ struct io_uring_rsrc_update2 *up,
+ unsigned nr_args)
+{
+ u64 __user *tags = u64_to_user_ptr(up->tags);
+ __s32 __user *fds = u64_to_user_ptr(up->data);
+ struct io_rsrc_data *data = ctx->file_data;
+ struct io_fixed_file *file_slot;
+ struct file *file;
+ int fd, i, err = 0;
+ unsigned int done;
+ bool needs_switch = false;
+
+ if (!ctx->file_data)
+ return -ENXIO;
+ if (up->offset + nr_args > ctx->nr_user_files)
+ return -EINVAL;
+
+ for (done = 0; done < nr_args; done++) {
+ u64 tag = 0;
+
+ if ((tags && copy_from_user(&tag, &tags[done], sizeof(tag))) ||
+ copy_from_user(&fd, &fds[done], sizeof(fd))) {
+ err = -EFAULT;
+ break;
+ }
+ if ((fd == IORING_REGISTER_FILES_SKIP || fd == -1) && tag) {
+ err = -EINVAL;
+ break;
+ }
+ if (fd == IORING_REGISTER_FILES_SKIP)
+ continue;
+
+ i = array_index_nospec(up->offset + done, ctx->nr_user_files);
+ file_slot = io_fixed_file_slot(&ctx->file_table, i);
+
+ if (file_slot->file_ptr) {
+ file = (struct file *)(file_slot->file_ptr & FFS_MASK);
+ err = io_queue_rsrc_removal(data, i, ctx->rsrc_node, file);
+ if (err)
+ break;
+ file_slot->file_ptr = 0;
+ needs_switch = true;
+ }
+ if (fd != -1) {
+ file = fget(fd);
+ if (!file) {
+ err = -EBADF;
+ break;
+ }
+ /*
+ * Don't allow io_uring instances to be registered. If
+ * UNIX isn't enabled, then this causes a reference
+ * cycle and this instance can never get freed. If UNIX
+ * is enabled we'll handle it just fine, but there's
+ * still no point in allowing a ring fd as it doesn't
+ * support regular read/write anyway.
+ */
+ if (file->f_op == &io_uring_fops) {
+ fput(file);
+ err = -EBADF;
+ break;
+ }
+ *io_get_tag_slot(data, i) = tag;
+ io_fixed_file_set(file_slot, file);
+ err = io_sqe_file_register(ctx, file, i);
+ if (err) {
+ file_slot->file_ptr = 0;
+ fput(file);
+ break;
+ }
+ }
+ }
+
+ if (needs_switch)
+ io_rsrc_node_switch(ctx, data);
+ return done ? done : err;
+}
+
+static struct io_wq *io_init_wq_offload(struct io_ring_ctx *ctx,
+ struct task_struct *task)
+{
+ struct io_wq_hash *hash;
+ struct io_wq_data data;
+ unsigned int concurrency;
+
+ mutex_lock(&ctx->uring_lock);
+ hash = ctx->hash_map;
+ if (!hash) {
+ hash = kzalloc(sizeof(*hash), GFP_KERNEL);
+ if (!hash) {
+ mutex_unlock(&ctx->uring_lock);
+ return ERR_PTR(-ENOMEM);
+ }
+ refcount_set(&hash->refs, 1);
+ init_waitqueue_head(&hash->wait);
+ ctx->hash_map = hash;
+ }
+ mutex_unlock(&ctx->uring_lock);
+
+ data.hash = hash;
+ data.task = task;
+ data.free_work = io_wq_free_work;
+ data.do_work = io_wq_submit_work;
+
+ /* Do QD, or 4 * CPUS, whatever is smallest */
+ concurrency = min(ctx->sq_entries, 4 * num_online_cpus());
+
+ return io_wq_create(concurrency, &data);
+}
+
+static __cold int io_uring_alloc_task_context(struct task_struct *task,
+ struct io_ring_ctx *ctx)
+{
+ struct io_uring_task *tctx;
+ int ret;
+
+ tctx = kzalloc(sizeof(*tctx), GFP_KERNEL);
+ if (unlikely(!tctx))
+ return -ENOMEM;
+
+ tctx->registered_rings = kcalloc(IO_RINGFD_REG_MAX,
+ sizeof(struct file *), GFP_KERNEL);
+ if (unlikely(!tctx->registered_rings)) {
+ kfree(tctx);
+ return -ENOMEM;
+ }
+
+ ret = percpu_counter_init(&tctx->inflight, 0, GFP_KERNEL);
+ if (unlikely(ret)) {
+ kfree(tctx->registered_rings);
+ kfree(tctx);
+ return ret;
+ }
+
+ tctx->io_wq = io_init_wq_offload(ctx, task);
+ if (IS_ERR(tctx->io_wq)) {
+ ret = PTR_ERR(tctx->io_wq);
+ percpu_counter_destroy(&tctx->inflight);
+ kfree(tctx->registered_rings);
+ kfree(tctx);
+ return ret;
+ }
+
+ xa_init(&tctx->xa);
+ init_waitqueue_head(&tctx->wait);
+ atomic_set(&tctx->in_idle, 0);
+ atomic_set(&tctx->inflight_tracked, 0);
+ task->io_uring = tctx;
+ spin_lock_init(&tctx->task_lock);
+ INIT_WQ_LIST(&tctx->task_list);
+ INIT_WQ_LIST(&tctx->prior_task_list);
+ init_task_work(&tctx->task_work, tctx_task_work);
+ return 0;
+}
+
+void __io_uring_free(struct task_struct *tsk)
+{
+ struct io_uring_task *tctx = tsk->io_uring;
+
+ WARN_ON_ONCE(!xa_empty(&tctx->xa));
+ WARN_ON_ONCE(tctx->io_wq);
+ WARN_ON_ONCE(tctx->cached_refs);
+
+ kfree(tctx->registered_rings);
+ percpu_counter_destroy(&tctx->inflight);
+ kfree(tctx);
+ tsk->io_uring = NULL;
+}
+
+static __cold int io_sq_offload_create(struct io_ring_ctx *ctx,
+ struct io_uring_params *p)
+{
+ int ret;
+
+ /* Retain compatibility with failing for an invalid attach attempt */
+ if ((ctx->flags & (IORING_SETUP_ATTACH_WQ | IORING_SETUP_SQPOLL)) ==
+ IORING_SETUP_ATTACH_WQ) {
+ struct fd f;
+
+ f = fdget(p->wq_fd);
+ if (!f.file)
+ return -ENXIO;
+ if (f.file->f_op != &io_uring_fops) {
+ fdput(f);
+ return -EINVAL;
+ }
+ fdput(f);
+ }
+ if (ctx->flags & IORING_SETUP_SQPOLL) {
+ struct task_struct *tsk;
+ struct io_sq_data *sqd;
+ bool attached;
+
+ ret = security_uring_sqpoll();
+ if (ret)
+ return ret;
+
+ sqd = io_get_sq_data(p, &attached);
+ if (IS_ERR(sqd)) {
+ ret = PTR_ERR(sqd);
+ goto err;
+ }
+
+ ctx->sq_creds = get_current_cred();
+ ctx->sq_data = sqd;
+ ctx->sq_thread_idle = msecs_to_jiffies(p->sq_thread_idle);
+ if (!ctx->sq_thread_idle)
+ ctx->sq_thread_idle = HZ;
+
+ io_sq_thread_park(sqd);
+ list_add(&ctx->sqd_list, &sqd->ctx_list);
+ io_sqd_update_thread_idle(sqd);
+ /* don't attach to a dying SQPOLL thread, would be racy */
+ ret = (attached && !sqd->thread) ? -ENXIO : 0;
+ io_sq_thread_unpark(sqd);
+
+ if (ret < 0)
+ goto err;
+ if (attached)
+ return 0;
+
+ if (p->flags & IORING_SETUP_SQ_AFF) {
+ int cpu = p->sq_thread_cpu;
+
+ ret = -EINVAL;
+ if (cpu >= nr_cpu_ids || !cpu_online(cpu))
+ goto err_sqpoll;
+ sqd->sq_cpu = cpu;
+ } else {
+ sqd->sq_cpu = -1;
+ }
+
+ sqd->task_pid = current->pid;
+ sqd->task_tgid = current->tgid;
+ tsk = create_io_thread(io_sq_thread, sqd, NUMA_NO_NODE);
+ if (IS_ERR(tsk)) {
+ ret = PTR_ERR(tsk);
+ goto err_sqpoll;
+ }
+
+ sqd->thread = tsk;
+ ret = io_uring_alloc_task_context(tsk, ctx);
+ wake_up_new_task(tsk);
+ if (ret)
+ goto err;
+ } else if (p->flags & IORING_SETUP_SQ_AFF) {
+ /* Can't have SQ_AFF without SQPOLL */
+ ret = -EINVAL;
+ goto err;
+ }
+
+ return 0;
+err_sqpoll:
+ complete(&ctx->sq_data->exited);
+err:
+ io_sq_thread_finish(ctx);
+ return ret;
+}
+
+static inline void __io_unaccount_mem(struct user_struct *user,
+ unsigned long nr_pages)
+{
+ atomic_long_sub(nr_pages, &user->locked_vm);
+}
+
+static inline int __io_account_mem(struct user_struct *user,
+ unsigned long nr_pages)
+{
+ unsigned long page_limit, cur_pages, new_pages;
+
+ /* Don't allow more pages than we can safely lock */
+ page_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+
+ do {
+ cur_pages = atomic_long_read(&user->locked_vm);
+ new_pages = cur_pages + nr_pages;
+ if (new_pages > page_limit)
+ return -ENOMEM;
+ } while (atomic_long_cmpxchg(&user->locked_vm, cur_pages,
+ new_pages) != cur_pages);
+
+ return 0;
+}
+
+static void io_unaccount_mem(struct io_ring_ctx *ctx, unsigned long nr_pages)
+{
+ if (ctx->user)
+ __io_unaccount_mem(ctx->user, nr_pages);
+
+ if (ctx->mm_account)
+ atomic64_sub(nr_pages, &ctx->mm_account->pinned_vm);
+}
+
+static int io_account_mem(struct io_ring_ctx *ctx, unsigned long nr_pages)
+{
+ int ret;
+
+ if (ctx->user) {
+ ret = __io_account_mem(ctx->user, nr_pages);
+ if (ret)
+ return ret;
+ }
+
+ if (ctx->mm_account)
+ atomic64_add(nr_pages, &ctx->mm_account->pinned_vm);
+
+ return 0;
+}
+
+static void io_mem_free(void *ptr)
+{
+ struct page *page;
+
+ if (!ptr)
+ return;
+
+ page = virt_to_head_page(ptr);
+ if (put_page_testzero(page))
+ free_compound_page(page);
+}
+
+static void *io_mem_alloc(size_t size)
+{
+ gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP;
+
+ return (void *) __get_free_pages(gfp, get_order(size));
+}
+
+static unsigned long rings_size(unsigned sq_entries, unsigned cq_entries,
+ size_t *sq_offset)
+{
+ struct io_rings *rings;
+ size_t off, sq_array_size;
+
+ off = struct_size(rings, cqes, cq_entries);
+ if (off == SIZE_MAX)
+ return SIZE_MAX;
+
+#ifdef CONFIG_SMP
+ off = ALIGN(off, SMP_CACHE_BYTES);
+ if (off == 0)
+ return SIZE_MAX;
+#endif
+
+ if (sq_offset)
+ *sq_offset = off;
+
+ sq_array_size = array_size(sizeof(u32), sq_entries);
+ if (sq_array_size == SIZE_MAX)
+ return SIZE_MAX;
+
+ if (check_add_overflow(off, sq_array_size, &off))
+ return SIZE_MAX;
+
+ return off;
+}
+
+static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_mapped_ubuf **slot)
+{
+ struct io_mapped_ubuf *imu = *slot;
+ unsigned int i;
+
+ if (imu != ctx->dummy_ubuf) {
+ for (i = 0; i < imu->nr_bvecs; i++)
+ unpin_user_page(imu->bvec[i].bv_page);
+ if (imu->acct_pages)
+ io_unaccount_mem(ctx, imu->acct_pages);
+ kvfree(imu);
+ }
+ *slot = NULL;
+}
+
+static void io_rsrc_buf_put(struct io_ring_ctx *ctx, struct io_rsrc_put *prsrc)
+{
+ io_buffer_unmap(ctx, &prsrc->buf);
+ prsrc->buf = NULL;
+}
+
+static void __io_sqe_buffers_unregister(struct io_ring_ctx *ctx)
+{
+ unsigned int i;
+
+ for (i = 0; i < ctx->nr_user_bufs; i++)
+ io_buffer_unmap(ctx, &ctx->user_bufs[i]);
+ kfree(ctx->user_bufs);
+ io_rsrc_data_free(ctx->buf_data);
+ ctx->user_bufs = NULL;
+ ctx->buf_data = NULL;
+ ctx->nr_user_bufs = 0;
+}
+
+static int io_sqe_buffers_unregister(struct io_ring_ctx *ctx)
+{
+ unsigned nr = ctx->nr_user_bufs;
+ int ret;
+
+ if (!ctx->buf_data)
+ return -ENXIO;
+
+ /*
+ * Quiesce may unlock ->uring_lock, and while it's not held
+ * prevent new requests using the table.
+ */
+ ctx->nr_user_bufs = 0;
+ ret = io_rsrc_ref_quiesce(ctx->buf_data, ctx);
+ ctx->nr_user_bufs = nr;
+ if (!ret)
+ __io_sqe_buffers_unregister(ctx);
+ return ret;
+}
+
+static int io_copy_iov(struct io_ring_ctx *ctx, struct iovec *dst,
+ void __user *arg, unsigned index)
+{
+ struct iovec __user *src;
+
+#ifdef CONFIG_COMPAT
+ if (ctx->compat) {
+ struct compat_iovec __user *ciovs;
+ struct compat_iovec ciov;
+
+ ciovs = (struct compat_iovec __user *) arg;
+ if (copy_from_user(&ciov, &ciovs[index], sizeof(ciov)))
+ return -EFAULT;
+
+ dst->iov_base = u64_to_user_ptr((u64)ciov.iov_base);
+ dst->iov_len = ciov.iov_len;
+ return 0;
+ }
+#endif
+ src = (struct iovec __user *) arg;
+ if (copy_from_user(dst, &src[index], sizeof(*dst)))
+ return -EFAULT;
+ return 0;
+}
+
+/*
+ * Not super efficient, but this is just a registration time. And we do cache
+ * the last compound head, so generally we'll only do a full search if we don't
+ * match that one.
+ *
+ * We check if the given compound head page has already been accounted, to
+ * avoid double accounting it. This allows us to account the full size of the
+ * page, not just the constituent pages of a huge page.
+ */
+static bool headpage_already_acct(struct io_ring_ctx *ctx, struct page **pages,
+ int nr_pages, struct page *hpage)
+{
+ int i, j;
+
+ /* check current page array */
+ for (i = 0; i < nr_pages; i++) {
+ if (!PageCompound(pages[i]))
+ continue;
+ if (compound_head(pages[i]) == hpage)
+ return true;
+ }
+
+ /* check previously registered pages */
+ for (i = 0; i < ctx->nr_user_bufs; i++) {
+ struct io_mapped_ubuf *imu = ctx->user_bufs[i];
+
+ for (j = 0; j < imu->nr_bvecs; j++) {
+ if (!PageCompound(imu->bvec[j].bv_page))
+ continue;
+ if (compound_head(imu->bvec[j].bv_page) == hpage)
+ return true;
+ }
+ }
+
+ return false;
+}
+
+static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages,
+ int nr_pages, struct io_mapped_ubuf *imu,
+ struct page **last_hpage)
+{
+ int i, ret;
+
+ imu->acct_pages = 0;
+ for (i = 0; i < nr_pages; i++) {
+ if (!PageCompound(pages[i])) {
+ imu->acct_pages++;
+ } else {
+ struct page *hpage;
+
+ hpage = compound_head(pages[i]);
+ if (hpage == *last_hpage)
+ continue;
+ *last_hpage = hpage;
+ if (headpage_already_acct(ctx, pages, i, hpage))
+ continue;
+ imu->acct_pages += page_size(hpage) >> PAGE_SHIFT;
+ }
+ }
+
+ if (!imu->acct_pages)
+ return 0;
+
+ ret = io_account_mem(ctx, imu->acct_pages);
+ if (ret)
+ imu->acct_pages = 0;
+ return ret;
+}
+
+static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov,
+ struct io_mapped_ubuf **pimu,
+ struct page **last_hpage)
+{
+ struct io_mapped_ubuf *imu = NULL;
+ struct vm_area_struct **vmas = NULL;
+ struct page **pages = NULL;
+ unsigned long off, start, end, ubuf;
+ size_t size;
+ int ret, pret, nr_pages, i;
+
+ if (!iov->iov_base) {
+ *pimu = ctx->dummy_ubuf;
+ return 0;
+ }
+
+ ubuf = (unsigned long) iov->iov_base;
+ end = (ubuf + iov->iov_len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ start = ubuf >> PAGE_SHIFT;
+ nr_pages = end - start;
+
+ *pimu = NULL;
+ ret = -ENOMEM;
+
+ pages = kvmalloc_array(nr_pages, sizeof(struct page *), GFP_KERNEL);
+ if (!pages)
+ goto done;
+
+ vmas = kvmalloc_array(nr_pages, sizeof(struct vm_area_struct *),
+ GFP_KERNEL);
+ if (!vmas)
+ goto done;
+
+ imu = kvmalloc(struct_size(imu, bvec, nr_pages), GFP_KERNEL);
+ if (!imu)
+ goto done;
+
+ ret = 0;
+ mmap_read_lock(current->mm);
+ pret = pin_user_pages(ubuf, nr_pages, FOLL_WRITE | FOLL_LONGTERM,
+ pages, vmas);
+ if (pret == nr_pages) {
+ /* don't support file backed memory */
+ for (i = 0; i < nr_pages; i++) {
+ struct vm_area_struct *vma = vmas[i];
+
+ if (vma_is_shmem(vma))
+ continue;
+ if (vma->vm_file &&
+ !is_file_hugepages(vma->vm_file)) {
+ ret = -EOPNOTSUPP;
+ break;
+ }
+ }
+ } else {
+ ret = pret < 0 ? pret : -EFAULT;
+ }
+ mmap_read_unlock(current->mm);
+ if (ret) {
+ /*
+ * if we did partial map, or found file backed vmas,
+ * release any pages we did get
+ */
+ if (pret > 0)
+ unpin_user_pages(pages, pret);
+ goto done;
+ }
+
+ ret = io_buffer_account_pin(ctx, pages, pret, imu, last_hpage);
+ if (ret) {
+ unpin_user_pages(pages, pret);
+ goto done;
+ }
+
+ off = ubuf & ~PAGE_MASK;
+ size = iov->iov_len;
+ for (i = 0; i < nr_pages; i++) {
+ size_t vec_len;
+
+ vec_len = min_t(size_t, size, PAGE_SIZE - off);
+ imu->bvec[i].bv_page = pages[i];
+ imu->bvec[i].bv_len = vec_len;
+ imu->bvec[i].bv_offset = off;
+ off = 0;
+ size -= vec_len;
+ }
+ /* store original address for later verification */
+ imu->ubuf = ubuf;
+ imu->ubuf_end = ubuf + iov->iov_len;
+ imu->nr_bvecs = nr_pages;
+ *pimu = imu;
+ ret = 0;
+done:
+ if (ret)
+ kvfree(imu);
+ kvfree(pages);
+ kvfree(vmas);
+ return ret;
+}
+
+static int io_buffers_map_alloc(struct io_ring_ctx *ctx, unsigned int nr_args)
+{
+ ctx->user_bufs = kcalloc(nr_args, sizeof(*ctx->user_bufs), GFP_KERNEL);
+ return ctx->user_bufs ? 0 : -ENOMEM;
+}
+
+static int io_buffer_validate(struct iovec *iov)
+{
+ unsigned long tmp, acct_len = iov->iov_len + (PAGE_SIZE - 1);
+
+ /*
+ * Don't impose further limits on the size and buffer
+ * constraints here, we'll -EINVAL later when IO is
+ * submitted if they are wrong.
+ */
+ if (!iov->iov_base)
+ return iov->iov_len ? -EFAULT : 0;
+ if (!iov->iov_len)
+ return -EFAULT;
+
+ /* arbitrary limit, but we need something */
+ if (iov->iov_len > SZ_1G)
+ return -EFAULT;
+
+ if (check_add_overflow((unsigned long)iov->iov_base, acct_len, &tmp))
+ return -EOVERFLOW;
+
+ return 0;
+}
+
+static int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
+ unsigned int nr_args, u64 __user *tags)
+{
+ struct page *last_hpage = NULL;
+ struct io_rsrc_data *data;
+ int i, ret;
+ struct iovec iov;
+
+ if (ctx->user_bufs)
+ return -EBUSY;
+ if (!nr_args || nr_args > IORING_MAX_REG_BUFFERS)
+ return -EINVAL;
+ ret = io_rsrc_node_switch_start(ctx);
+ if (ret)
+ return ret;
+ ret = io_rsrc_data_alloc(ctx, io_rsrc_buf_put, tags, nr_args, &data);
+ if (ret)
+ return ret;
+ ret = io_buffers_map_alloc(ctx, nr_args);
+ if (ret) {
+ io_rsrc_data_free(data);
+ return ret;
+ }
+
+ for (i = 0; i < nr_args; i++, ctx->nr_user_bufs++) {
+ ret = io_copy_iov(ctx, &iov, arg, i);
+ if (ret)
+ break;
+ ret = io_buffer_validate(&iov);
+ if (ret)
+ break;
+ if (!iov.iov_base && *io_get_tag_slot(data, i)) {
+ ret = -EINVAL;
+ break;
+ }
+
+ ret = io_sqe_buffer_register(ctx, &iov, &ctx->user_bufs[i],
+ &last_hpage);
+ if (ret)
+ break;
+ }
+
+ WARN_ON_ONCE(ctx->buf_data);
+
+ ctx->buf_data = data;
+ if (ret)
+ __io_sqe_buffers_unregister(ctx);
+ else
+ io_rsrc_node_switch(ctx, NULL);
+ return ret;
+}
+
+static int __io_sqe_buffers_update(struct io_ring_ctx *ctx,
+ struct io_uring_rsrc_update2 *up,
+ unsigned int nr_args)
+{
+ u64 __user *tags = u64_to_user_ptr(up->tags);
+ struct iovec iov, __user *iovs = u64_to_user_ptr(up->data);
+ struct page *last_hpage = NULL;
+ bool needs_switch = false;
+ __u32 done;
+ int i, err;
+
+ if (!ctx->buf_data)
+ return -ENXIO;
+ if (up->offset + nr_args > ctx->nr_user_bufs)
+ return -EINVAL;
+
+ for (done = 0; done < nr_args; done++) {
+ struct io_mapped_ubuf *imu;
+ int offset = up->offset + done;
+ u64 tag = 0;
+
+ err = io_copy_iov(ctx, &iov, iovs, done);
+ if (err)
+ break;
+ if (tags && copy_from_user(&tag, &tags[done], sizeof(tag))) {
+ err = -EFAULT;
+ break;
+ }
+ err = io_buffer_validate(&iov);
+ if (err)
+ break;
+ if (!iov.iov_base && tag) {
+ err = -EINVAL;
+ break;
+ }
+ err = io_sqe_buffer_register(ctx, &iov, &imu, &last_hpage);
+ if (err)
+ break;
+
+ i = array_index_nospec(offset, ctx->nr_user_bufs);
+ if (ctx->user_bufs[i] != ctx->dummy_ubuf) {
+ err = io_queue_rsrc_removal(ctx->buf_data, i,
+ ctx->rsrc_node, ctx->user_bufs[i]);
+ if (unlikely(err)) {
+ io_buffer_unmap(ctx, &imu);
+ break;
+ }
+ ctx->user_bufs[i] = NULL;
+ needs_switch = true;
+ }
+
+ ctx->user_bufs[i] = imu;
+ *io_get_tag_slot(ctx->buf_data, offset) = tag;
+ }
+
+ if (needs_switch)
+ io_rsrc_node_switch(ctx, ctx->buf_data);
+ return done ? done : err;
+}
+
+static int io_eventfd_register(struct io_ring_ctx *ctx, void __user *arg,
+ unsigned int eventfd_async)
+{
+ struct io_ev_fd *ev_fd;
+ __s32 __user *fds = arg;
+ int fd;
+
+ ev_fd = rcu_dereference_protected(ctx->io_ev_fd,
+ lockdep_is_held(&ctx->uring_lock));
+ if (ev_fd)
+ return -EBUSY;
+
+ if (copy_from_user(&fd, fds, sizeof(*fds)))
+ return -EFAULT;
+
+ ev_fd = kmalloc(sizeof(*ev_fd), GFP_KERNEL);
+ if (!ev_fd)
+ return -ENOMEM;
+
+ ev_fd->cq_ev_fd = eventfd_ctx_fdget(fd);
+ if (IS_ERR(ev_fd->cq_ev_fd)) {
+ int ret = PTR_ERR(ev_fd->cq_ev_fd);
+ kfree(ev_fd);
+ return ret;
+ }
+ ev_fd->eventfd_async = eventfd_async;
+ ctx->has_evfd = true;
+ rcu_assign_pointer(ctx->io_ev_fd, ev_fd);
+ return 0;
+}
+
+static void io_eventfd_put(struct rcu_head *rcu)
+{
+ struct io_ev_fd *ev_fd = container_of(rcu, struct io_ev_fd, rcu);
+
+ eventfd_ctx_put(ev_fd->cq_ev_fd);
+ kfree(ev_fd);
+}
+
+static int io_eventfd_unregister(struct io_ring_ctx *ctx)
+{
+ struct io_ev_fd *ev_fd;
+
+ ev_fd = rcu_dereference_protected(ctx->io_ev_fd,
+ lockdep_is_held(&ctx->uring_lock));
+ if (ev_fd) {
+ ctx->has_evfd = false;
+ rcu_assign_pointer(ctx->io_ev_fd, NULL);
+ call_rcu(&ev_fd->rcu, io_eventfd_put);
+ return 0;
+ }
+
+ return -ENXIO;
+}
+
+static void io_destroy_buffers(struct io_ring_ctx *ctx)
+{
+ int i;
+
+ for (i = 0; i < (1U << IO_BUFFERS_HASH_BITS); i++) {
+ struct list_head *list = &ctx->io_buffers[i];
+
+ while (!list_empty(list)) {
+ struct io_buffer_list *bl;
+
+ bl = list_first_entry(list, struct io_buffer_list, list);
+ __io_remove_buffers(ctx, bl, -1U);
+ list_del(&bl->list);
+ kfree(bl);
+ }
+ }
+
+ while (!list_empty(&ctx->io_buffers_pages)) {
+ struct page *page;
+
+ page = list_first_entry(&ctx->io_buffers_pages, struct page, lru);
+ list_del_init(&page->lru);
+ __free_page(page);
+ }
+}
+
+static void io_req_caches_free(struct io_ring_ctx *ctx)
+{
+ struct io_submit_state *state = &ctx->submit_state;
+ int nr = 0;
+
+ mutex_lock(&ctx->uring_lock);
+ io_flush_cached_locked_reqs(ctx, state);
+
+ while (state->free_list.next) {
+ struct io_wq_work_node *node;
+ struct io_kiocb *req;
+
+ node = wq_stack_extract(&state->free_list);
+ req = container_of(node, struct io_kiocb, comp_list);
+ kmem_cache_free(req_cachep, req);
+ nr++;
+ }
+ if (nr)
+ percpu_ref_put_many(&ctx->refs, nr);
+ mutex_unlock(&ctx->uring_lock);
+}
+
+static void io_wait_rsrc_data(struct io_rsrc_data *data)
+{
+ if (data && !atomic_dec_and_test(&data->refs))
+ wait_for_completion(&data->done);
+}
+
+static void io_flush_apoll_cache(struct io_ring_ctx *ctx)
+{
+ struct async_poll *apoll;
+
+ while (!list_empty(&ctx->apoll_cache)) {
+ apoll = list_first_entry(&ctx->apoll_cache, struct async_poll,
+ poll.wait.entry);
+ list_del(&apoll->poll.wait.entry);
+ kfree(apoll);
+ }
+}
+
+static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
+{
+ io_sq_thread_finish(ctx);
+
+ if (ctx->mm_account) {
+ mmdrop(ctx->mm_account);
+ ctx->mm_account = NULL;
+ }
+
+ io_rsrc_refs_drop(ctx);
+ /* __io_rsrc_put_work() may need uring_lock to progress, wait w/o it */
+ io_wait_rsrc_data(ctx->buf_data);
+ io_wait_rsrc_data(ctx->file_data);
+
+ mutex_lock(&ctx->uring_lock);
+ if (ctx->buf_data)
+ __io_sqe_buffers_unregister(ctx);
+ if (ctx->file_data)
+ __io_sqe_files_unregister(ctx);
+ if (ctx->rings)
+ __io_cqring_overflow_flush(ctx, true);
+ io_eventfd_unregister(ctx);
+ io_flush_apoll_cache(ctx);
+ mutex_unlock(&ctx->uring_lock);
+ io_destroy_buffers(ctx);
+ if (ctx->sq_creds)
+ put_cred(ctx->sq_creds);
+
+ /* there are no registered resources left, nobody uses it */
+ if (ctx->rsrc_node)
+ io_rsrc_node_destroy(ctx->rsrc_node);
+ if (ctx->rsrc_backup_node)
+ io_rsrc_node_destroy(ctx->rsrc_backup_node);
+ flush_delayed_work(&ctx->rsrc_put_work);
+ flush_delayed_work(&ctx->fallback_work);
+
+ WARN_ON_ONCE(!list_empty(&ctx->rsrc_ref_list));
+ WARN_ON_ONCE(!llist_empty(&ctx->rsrc_put_llist));
+
+#if defined(CONFIG_UNIX)
+ if (ctx->ring_sock) {
+ ctx->ring_sock->file = NULL; /* so that iput() is called */
+ sock_release(ctx->ring_sock);
+ }
+#endif
+ WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list));
+
+ io_mem_free(ctx->rings);
+ io_mem_free(ctx->sq_sqes);
+
+ percpu_ref_exit(&ctx->refs);
+ free_uid(ctx->user);
+ io_req_caches_free(ctx);
+ if (ctx->hash_map)
+ io_wq_put_hash(ctx->hash_map);
+ kfree(ctx->cancel_hash);
+ kfree(ctx->dummy_ubuf);
+ kfree(ctx->io_buffers);
+ kfree(ctx);
+}
+
+static __poll_t io_uring_poll(struct file *file, poll_table *wait)
+{
+ struct io_ring_ctx *ctx = file->private_data;
+ __poll_t mask = 0;
+
+ poll_wait(file, &ctx->cq_wait, wait);
+ /*
+ * synchronizes with barrier from wq_has_sleeper call in
+ * io_commit_cqring
+ */
+ smp_rmb();
+ if (!io_sqring_full(ctx))
+ mask |= EPOLLOUT | EPOLLWRNORM;
+
+ /*
+ * Don't flush cqring overflow list here, just do a simple check.
+ * Otherwise there could possible be ABBA deadlock:
+ * CPU0 CPU1
+ * ---- ----
+ * lock(&ctx->uring_lock);
+ * lock(&ep->mtx);
+ * lock(&ctx->uring_lock);
+ * lock(&ep->mtx);
+ *
+ * Users may get EPOLLIN meanwhile seeing nothing in cqring, this
+ * pushs them to do the flush.
+ */
+ if (io_cqring_events(ctx) || test_bit(0, &ctx->check_cq_overflow))
+ mask |= EPOLLIN | EPOLLRDNORM;
+
+ return mask;
+}
+
+static int io_unregister_personality(struct io_ring_ctx *ctx, unsigned id)
+{
+ const struct cred *creds;
+
+ creds = xa_erase(&ctx->personalities, id);
+ if (creds) {
+ put_cred(creds);
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
+struct io_tctx_exit {
+ struct callback_head task_work;
+ struct completion completion;
+ struct io_ring_ctx *ctx;
+};
+
+static __cold void io_tctx_exit_cb(struct callback_head *cb)
+{
+ struct io_uring_task *tctx = current->io_uring;
+ struct io_tctx_exit *work;
+
+ work = container_of(cb, struct io_tctx_exit, task_work);
+ /*
+ * When @in_idle, we're in cancellation and it's racy to remove the
+ * node. It'll be removed by the end of cancellation, just ignore it.
+ */
+ if (!atomic_read(&tctx->in_idle))
+ io_uring_del_tctx_node((unsigned long)work->ctx);
+ complete(&work->completion);
+}
+
+static __cold bool io_cancel_ctx_cb(struct io_wq_work *work, void *data)
+{
+ struct io_kiocb *req = container_of(work, struct io_kiocb, work);
+
+ return req->ctx == data;
+}
+
+static __cold void io_ring_exit_work(struct work_struct *work)
+{
+ struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, exit_work);
+ unsigned long timeout = jiffies + HZ * 60 * 5;
+ unsigned long interval = HZ / 20;
+ struct io_tctx_exit exit;
+ struct io_tctx_node *node;
+ int ret;
+
+ /*
+ * If we're doing polled IO and end up having requests being
+ * submitted async (out-of-line), then completions can come in while
+ * we're waiting for refs to drop. We need to reap these manually,
+ * as nobody else will be looking for them.
+ */
+ do {
+ io_uring_try_cancel_requests(ctx, NULL, true);
+ if (ctx->sq_data) {
+ struct io_sq_data *sqd = ctx->sq_data;
+ struct task_struct *tsk;
+
+ io_sq_thread_park(sqd);
+ tsk = sqd->thread;
+ if (tsk && tsk->io_uring && tsk->io_uring->io_wq)
+ io_wq_cancel_cb(tsk->io_uring->io_wq,
+ io_cancel_ctx_cb, ctx, true);
+ io_sq_thread_unpark(sqd);
+ }
+
+ io_req_caches_free(ctx);
+
+ if (WARN_ON_ONCE(time_after(jiffies, timeout))) {
+ /* there is little hope left, don't run it too often */
+ interval = HZ * 60;
+ }
+ } while (!wait_for_completion_timeout(&ctx->ref_comp, interval));
+
+ init_completion(&exit.completion);
+ init_task_work(&exit.task_work, io_tctx_exit_cb);
+ exit.ctx = ctx;
+ /*
+ * Some may use context even when all refs and requests have been put,
+ * and they are free to do so while still holding uring_lock or
+ * completion_lock, see io_req_task_submit(). Apart from other work,
+ * this lock/unlock section also waits them to finish.
+ */
+ mutex_lock(&ctx->uring_lock);
+ while (!list_empty(&ctx->tctx_list)) {
+ WARN_ON_ONCE(time_after(jiffies, timeout));
+
+ node = list_first_entry(&ctx->tctx_list, struct io_tctx_node,
+ ctx_node);
+ /* don't spin on a single task if cancellation failed */
+ list_rotate_left(&ctx->tctx_list);
+ ret = task_work_add(node->task, &exit.task_work, TWA_SIGNAL);
+ if (WARN_ON_ONCE(ret))
+ continue;
+
+ mutex_unlock(&ctx->uring_lock);
+ wait_for_completion(&exit.completion);
+ mutex_lock(&ctx->uring_lock);
+ }
+ mutex_unlock(&ctx->uring_lock);
+ spin_lock(&ctx->completion_lock);
+ spin_unlock(&ctx->completion_lock);
+
+ io_ring_ctx_free(ctx);
+}
+
+/* Returns true if we found and killed one or more timeouts */
+static __cold bool io_kill_timeouts(struct io_ring_ctx *ctx,
+ struct task_struct *tsk, bool cancel_all)
+{
+ struct io_kiocb *req, *tmp;
+ int canceled = 0;
+
+ spin_lock(&ctx->completion_lock);
+ spin_lock_irq(&ctx->timeout_lock);
+ list_for_each_entry_safe(req, tmp, &ctx->timeout_list, timeout.list) {
+ if (io_match_task(req, tsk, cancel_all)) {
+ io_kill_timeout(req, -ECANCELED);
+ canceled++;
+ }
+ }
+ spin_unlock_irq(&ctx->timeout_lock);
+ if (canceled != 0)
+ io_commit_cqring(ctx);
+ spin_unlock(&ctx->completion_lock);
+ if (canceled != 0)
+ io_cqring_ev_posted(ctx);
+ return canceled != 0;
+}
+
+static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
+{
+ unsigned long index;
+ struct creds *creds;
+
+ mutex_lock(&ctx->uring_lock);
+ percpu_ref_kill(&ctx->refs);
+ if (ctx->rings)
+ __io_cqring_overflow_flush(ctx, true);
+ xa_for_each(&ctx->personalities, index, creds)
+ io_unregister_personality(ctx, index);
+ mutex_unlock(&ctx->uring_lock);
+
+ io_kill_timeouts(ctx, NULL, true);
+ io_poll_remove_all(ctx, NULL, true);
+
+ /* if we failed setting up the ctx, we might not have any rings */
+ io_iopoll_try_reap_events(ctx);
+
+ INIT_WORK(&ctx->exit_work, io_ring_exit_work);
+ /*
+ * Use system_unbound_wq to avoid spawning tons of event kworkers
+ * if we're exiting a ton of rings at the same time. It just adds
+ * noise and overhead, there's no discernable change in runtime
+ * over using system_wq.
+ */
+ queue_work(system_unbound_wq, &ctx->exit_work);
+}
+
+static int io_uring_release(struct inode *inode, struct file *file)
+{
+ struct io_ring_ctx *ctx = file->private_data;
+
+ file->private_data = NULL;
+ io_ring_ctx_wait_and_kill(ctx);
+ return 0;
+}
+
+struct io_task_cancel {
+ struct task_struct *task;
+ bool all;
+};
+
+static bool io_cancel_task_cb(struct io_wq_work *work, void *data)
+{
+ struct io_kiocb *req = container_of(work, struct io_kiocb, work);
+ struct io_task_cancel *cancel = data;
+
+ return io_match_task_safe(req, cancel->task, cancel->all);
+}
+
+static __cold bool io_cancel_defer_files(struct io_ring_ctx *ctx,
+ struct task_struct *task,
+ bool cancel_all)
+{
+ struct io_defer_entry *de;
+ LIST_HEAD(list);
+
+ spin_lock(&ctx->completion_lock);
+ list_for_each_entry_reverse(de, &ctx->defer_list, list) {
+ if (io_match_task_safe(de->req, task, cancel_all)) {
+ list_cut_position(&list, &ctx->defer_list, &de->list);
+ break;
+ }
+ }
+ spin_unlock(&ctx->completion_lock);
+ if (list_empty(&list))
+ return false;
+
+ while (!list_empty(&list)) {
+ de = list_first_entry(&list, struct io_defer_entry, list);
+ list_del_init(&de->list);
+ io_req_complete_failed(de->req, -ECANCELED);
+ kfree(de);
+ }
+ return true;
+}
+
+static __cold bool io_uring_try_cancel_iowq(struct io_ring_ctx *ctx)
+{
+ struct io_tctx_node *node;
+ enum io_wq_cancel cret;
+ bool ret = false;
+
+ mutex_lock(&ctx->uring_lock);
+ list_for_each_entry(node, &ctx->tctx_list, ctx_node) {
+ struct io_uring_task *tctx = node->task->io_uring;
+
+ /*
+ * io_wq will stay alive while we hold uring_lock, because it's
+ * killed after ctx nodes, which requires to take the lock.
+ */
+ if (!tctx || !tctx->io_wq)
+ continue;
+ cret = io_wq_cancel_cb(tctx->io_wq, io_cancel_ctx_cb, ctx, true);
+ ret |= (cret != IO_WQ_CANCEL_NOTFOUND);
+ }
+ mutex_unlock(&ctx->uring_lock);
+
+ return ret;
+}
+
+static __cold void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
+ struct task_struct *task,
+ bool cancel_all)
+{
+ struct io_task_cancel cancel = { .task = task, .all = cancel_all, };
+ struct io_uring_task *tctx = task ? task->io_uring : NULL;
+
+ while (1) {
+ enum io_wq_cancel cret;
+ bool ret = false;
+
+ if (!task) {
+ ret |= io_uring_try_cancel_iowq(ctx);
+ } else if (tctx && tctx->io_wq) {
+ /*
+ * Cancels requests of all rings, not only @ctx, but
+ * it's fine as the task is in exit/exec.
+ */
+ cret = io_wq_cancel_cb(tctx->io_wq, io_cancel_task_cb,
+ &cancel, true);
+ ret |= (cret != IO_WQ_CANCEL_NOTFOUND);
+ }
+
+ /* SQPOLL thread does its own polling */
+ if ((!(ctx->flags & IORING_SETUP_SQPOLL) && cancel_all) ||
+ (ctx->sq_data && ctx->sq_data->thread == current)) {
+ while (!wq_list_empty(&ctx->iopoll_list)) {
+ io_iopoll_try_reap_events(ctx);
+ ret = true;
+ }
+ }
+
+ ret |= io_cancel_defer_files(ctx, task, cancel_all);
+ ret |= io_poll_remove_all(ctx, task, cancel_all);
+ ret |= io_kill_timeouts(ctx, task, cancel_all);
+ if (task)
+ ret |= io_run_task_work();
+ if (!ret)
+ break;
+ cond_resched();
+ }
+}
+
+static int __io_uring_add_tctx_node(struct io_ring_ctx *ctx)
+{
+ struct io_uring_task *tctx = current->io_uring;
+ struct io_tctx_node *node;
+ int ret;
+
+ if (unlikely(!tctx)) {
+ ret = io_uring_alloc_task_context(current, ctx);
+ if (unlikely(ret))
+ return ret;
+
+ tctx = current->io_uring;
+ if (ctx->iowq_limits_set) {
+ unsigned int limits[2] = { ctx->iowq_limits[0],
+ ctx->iowq_limits[1], };
+
+ ret = io_wq_max_workers(tctx->io_wq, limits);
+ if (ret)
+ return ret;
+ }
+ }
+ if (!xa_load(&tctx->xa, (unsigned long)ctx)) {
+ node = kmalloc(sizeof(*node), GFP_KERNEL);
+ if (!node)
+ return -ENOMEM;
+ node->ctx = ctx;
+ node->task = current;
+
+ ret = xa_err(xa_store(&tctx->xa, (unsigned long)ctx,
+ node, GFP_KERNEL));
+ if (ret) {
+ kfree(node);
+ return ret;
+ }
+
+ mutex_lock(&ctx->uring_lock);
+ list_add(&node->ctx_node, &ctx->tctx_list);
+ mutex_unlock(&ctx->uring_lock);
+ }
+ tctx->last = ctx;
+ return 0;
+}
+
+/*
+ * Note that this task has used io_uring. We use it for cancelation purposes.
+ */
+static inline int io_uring_add_tctx_node(struct io_ring_ctx *ctx)
+{
+ struct io_uring_task *tctx = current->io_uring;
+
+ if (likely(tctx && tctx->last == ctx))
+ return 0;
+ return __io_uring_add_tctx_node(ctx);
+}
+
+/*
+ * Remove this io_uring_file -> task mapping.
+ */
+static __cold void io_uring_del_tctx_node(unsigned long index)
+{
+ struct io_uring_task *tctx = current->io_uring;
+ struct io_tctx_node *node;
+
+ if (!tctx)
+ return;
+ node = xa_erase(&tctx->xa, index);
+ if (!node)
+ return;
+
+ WARN_ON_ONCE(current != node->task);
+ WARN_ON_ONCE(list_empty(&node->ctx_node));
+
+ mutex_lock(&node->ctx->uring_lock);
+ list_del(&node->ctx_node);
+ mutex_unlock(&node->ctx->uring_lock);
+
+ if (tctx->last == node->ctx)
+ tctx->last = NULL;
+ kfree(node);
+}
+
+static __cold void io_uring_clean_tctx(struct io_uring_task *tctx)
+{
+ struct io_wq *wq = tctx->io_wq;
+ struct io_tctx_node *node;
+ unsigned long index;
+
+ xa_for_each(&tctx->xa, index, node) {
+ io_uring_del_tctx_node(index);
+ cond_resched();
+ }
+ if (wq) {
+ /*
+ * Must be after io_uring_del_tctx_node() (removes nodes under
+ * uring_lock) to avoid race with io_uring_try_cancel_iowq().
+ */
+ io_wq_put_and_exit(wq);
+ tctx->io_wq = NULL;
+ }
+}
+
+static s64 tctx_inflight(struct io_uring_task *tctx, bool tracked)
+{
+ if (tracked)
+ return atomic_read(&tctx->inflight_tracked);
+ return percpu_counter_sum(&tctx->inflight);
+}
+
+/*
+ * Find any io_uring ctx that this task has registered or done IO on, and cancel
+ * requests. @sqd should be not-null IFF it's an SQPOLL thread cancellation.
+ */
+static __cold void io_uring_cancel_generic(bool cancel_all,
+ struct io_sq_data *sqd)
+{
+ struct io_uring_task *tctx = current->io_uring;
+ struct io_ring_ctx *ctx;
+ s64 inflight;
+ DEFINE_WAIT(wait);
+
+ WARN_ON_ONCE(sqd && sqd->thread != current);
+
+ if (!current->io_uring)
+ return;
+ if (tctx->io_wq)
+ io_wq_exit_start(tctx->io_wq);
+
+ atomic_inc(&tctx->in_idle);
+ do {
+ io_uring_drop_tctx_refs(current);
+ /* read completions before cancelations */
+ inflight = tctx_inflight(tctx, !cancel_all);
+ if (!inflight)
+ break;
+
+ if (!sqd) {
+ struct io_tctx_node *node;
+ unsigned long index;
+
+ xa_for_each(&tctx->xa, index, node) {
+ /* sqpoll task will cancel all its requests */
+ if (node->ctx->sq_data)
+ continue;
+ io_uring_try_cancel_requests(node->ctx, current,
+ cancel_all);
+ }
+ } else {
+ list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
+ io_uring_try_cancel_requests(ctx, current,
+ cancel_all);
+ }
+
+ prepare_to_wait(&tctx->wait, &wait, TASK_INTERRUPTIBLE);
+ io_run_task_work();
+ io_uring_drop_tctx_refs(current);
+
+ /*
+ * If we've seen completions, retry without waiting. This
+ * avoids a race where a completion comes in before we did
+ * prepare_to_wait().
+ */
+ if (inflight == tctx_inflight(tctx, !cancel_all))
+ schedule();
+ finish_wait(&tctx->wait, &wait);
+ } while (1);
+
+ io_uring_clean_tctx(tctx);
+ if (cancel_all) {
+ /*
+ * We shouldn't run task_works after cancel, so just leave
+ * ->in_idle set for normal exit.
+ */
+ atomic_dec(&tctx->in_idle);
+ /* for exec all current's requests should be gone, kill tctx */
+ __io_uring_free(current);
+ }
+}
+
+void __io_uring_cancel(bool cancel_all)
+{
+ io_uring_cancel_generic(cancel_all, NULL);
+}
+
+void io_uring_unreg_ringfd(void)
+{
+ struct io_uring_task *tctx = current->io_uring;
+ int i;
+
+ for (i = 0; i < IO_RINGFD_REG_MAX; i++) {
+ if (tctx->registered_rings[i]) {
+ fput(tctx->registered_rings[i]);
+ tctx->registered_rings[i] = NULL;
+ }
+ }
+}
+
+static int io_ring_add_registered_fd(struct io_uring_task *tctx, int fd,
+ int start, int end)
+{
+ struct file *file;
+ int offset;
+
+ for (offset = start; offset < end; offset++) {
+ offset = array_index_nospec(offset, IO_RINGFD_REG_MAX);
+ if (tctx->registered_rings[offset])
+ continue;
+
+ file = fget(fd);
+ if (!file) {
+ return -EBADF;
+ } else if (file->f_op != &io_uring_fops) {
+ fput(file);
+ return -EOPNOTSUPP;
+ }
+ tctx->registered_rings[offset] = file;
+ return offset;
+ }
+
+ return -EBUSY;
+}
+
+/*
+ * Register a ring fd to avoid fdget/fdput for each io_uring_enter()
+ * invocation. User passes in an array of struct io_uring_rsrc_update
+ * with ->data set to the ring_fd, and ->offset given for the desired
+ * index. If no index is desired, application may set ->offset == -1U
+ * and we'll find an available index. Returns number of entries
+ * successfully processed, or < 0 on error if none were processed.
+ */
+static int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg,
+ unsigned nr_args)
+{
+ struct io_uring_rsrc_update __user *arg = __arg;
+ struct io_uring_rsrc_update reg;
+ struct io_uring_task *tctx;
+ int ret, i;
+
+ if (!nr_args || nr_args > IO_RINGFD_REG_MAX)
+ return -EINVAL;
+
+ mutex_unlock(&ctx->uring_lock);
+ ret = io_uring_add_tctx_node(ctx);
+ mutex_lock(&ctx->uring_lock);
+ if (ret)
+ return ret;
+
+ tctx = current->io_uring;
+ for (i = 0; i < nr_args; i++) {
+ int start, end;
+
+ if (copy_from_user(&reg, &arg[i], sizeof(reg))) {
+ ret = -EFAULT;
+ break;
+ }
+
+ if (reg.resv) {
+ ret = -EINVAL;
+ break;
+ }
+
+ if (reg.offset == -1U) {
+ start = 0;
+ end = IO_RINGFD_REG_MAX;
+ } else {
+ if (reg.offset >= IO_RINGFD_REG_MAX) {
+ ret = -EINVAL;
+ break;
+ }
+ start = reg.offset;
+ end = start + 1;
+ }
+
+ ret = io_ring_add_registered_fd(tctx, reg.data, start, end);
+ if (ret < 0)
+ break;
+
+ reg.offset = ret;
+ if (copy_to_user(&arg[i], &reg, sizeof(reg))) {
+ fput(tctx->registered_rings[reg.offset]);
+ tctx->registered_rings[reg.offset] = NULL;
+ ret = -EFAULT;
+ break;
+ }
+ }
+
+ return i ? i : ret;
+}
+
+static int io_ringfd_unregister(struct io_ring_ctx *ctx, void __user *__arg,
+ unsigned nr_args)
+{
+ struct io_uring_rsrc_update __user *arg = __arg;
+ struct io_uring_task *tctx = current->io_uring;
+ struct io_uring_rsrc_update reg;
+ int ret = 0, i;
+
+ if (!nr_args || nr_args > IO_RINGFD_REG_MAX)
+ return -EINVAL;
+ if (!tctx)
+ return 0;
+
+ for (i = 0; i < nr_args; i++) {
+ if (copy_from_user(&reg, &arg[i], sizeof(reg))) {
+ ret = -EFAULT;
+ break;
+ }
+ if (reg.resv || reg.data || reg.offset >= IO_RINGFD_REG_MAX) {
+ ret = -EINVAL;
+ break;
+ }
+
+ reg.offset = array_index_nospec(reg.offset, IO_RINGFD_REG_MAX);
+ if (tctx->registered_rings[reg.offset]) {
+ fput(tctx->registered_rings[reg.offset]);
+ tctx->registered_rings[reg.offset] = NULL;
+ }
+ }
+
+ return i ? i : ret;
+}
+
+static void *io_uring_validate_mmap_request(struct file *file,
+ loff_t pgoff, size_t sz)
+{
+ struct io_ring_ctx *ctx = file->private_data;
+ loff_t offset = pgoff << PAGE_SHIFT;
+ struct page *page;
+ void *ptr;
+
+ switch (offset) {
+ case IORING_OFF_SQ_RING:
+ case IORING_OFF_CQ_RING:
+ ptr = ctx->rings;
+ break;
+ case IORING_OFF_SQES:
+ ptr = ctx->sq_sqes;
+ break;
+ default:
+ return ERR_PTR(-EINVAL);
+ }
+
+ page = virt_to_head_page(ptr);
+ if (sz > page_size(page))
+ return ERR_PTR(-EINVAL);
+
+ return ptr;
+}
+
+#ifdef CONFIG_MMU
+
+static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ size_t sz = vma->vm_end - vma->vm_start;
+ unsigned long pfn;
+ void *ptr;
+
+ ptr = io_uring_validate_mmap_request(file, vma->vm_pgoff, sz);
+ if (IS_ERR(ptr))
+ return PTR_ERR(ptr);
+
+ pfn = virt_to_phys(ptr) >> PAGE_SHIFT;
+ return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
+}
+
+#else /* !CONFIG_MMU */
+
+static int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ return vma->vm_flags & (VM_SHARED | VM_MAYSHARE) ? 0 : -EINVAL;
+}
+
+static unsigned int io_uring_nommu_mmap_capabilities(struct file *file)
+{
+ return NOMMU_MAP_DIRECT | NOMMU_MAP_READ | NOMMU_MAP_WRITE;
+}
+
+static unsigned long io_uring_nommu_get_unmapped_area(struct file *file,
+ unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags)
+{
+ void *ptr;
+
+ ptr = io_uring_validate_mmap_request(file, pgoff, len);
+ if (IS_ERR(ptr))
+ return PTR_ERR(ptr);
+
+ return (unsigned long) ptr;
+}
+
+#endif /* !CONFIG_MMU */
+
+static int io_sqpoll_wait_sq(struct io_ring_ctx *ctx)
+{
+ DEFINE_WAIT(wait);
+
+ do {
+ if (!io_sqring_full(ctx))
+ break;
+ prepare_to_wait(&ctx->sqo_sq_wait, &wait, TASK_INTERRUPTIBLE);
+
+ if (!io_sqring_full(ctx))
+ break;
+ schedule();
+ } while (!signal_pending(current));
+
+ finish_wait(&ctx->sqo_sq_wait, &wait);
+ return 0;
+}
+
+static int io_get_ext_arg(unsigned flags, const void __user *argp, size_t *argsz,
+ struct __kernel_timespec __user **ts,
+ const sigset_t __user **sig)
+{
+ struct io_uring_getevents_arg arg;
+
+ /*
+ * If EXT_ARG isn't set, then we have no timespec and the argp pointer
+ * is just a pointer to the sigset_t.
+ */
+ if (!(flags & IORING_ENTER_EXT_ARG)) {
+ *sig = (const sigset_t __user *) argp;
+ *ts = NULL;
+ return 0;
+ }
+
+ /*
+ * EXT_ARG is set - ensure we agree on the size of it and copy in our
+ * timespec and sigset_t pointers if good.
+ */
+ if (*argsz != sizeof(arg))
+ return -EINVAL;
+ if (copy_from_user(&arg, argp, sizeof(arg)))
+ return -EFAULT;
+ if (arg.pad)
+ return -EINVAL;
+ *sig = u64_to_user_ptr(arg.sigmask);
+ *argsz = arg.sigmask_sz;
+ *ts = u64_to_user_ptr(arg.ts);
+ return 0;
+}
+
+SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
+ u32, min_complete, u32, flags, const void __user *, argp,
+ size_t, argsz)
+{
+ struct io_ring_ctx *ctx;
+ int submitted = 0;
+ struct fd f;
+ long ret;
+
+ io_run_task_work();
+
+ if (unlikely(flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP |
+ IORING_ENTER_SQ_WAIT | IORING_ENTER_EXT_ARG |
+ IORING_ENTER_REGISTERED_RING)))
+ return -EINVAL;
+
+ /*
+ * Ring fd has been registered via IORING_REGISTER_RING_FDS, we
+ * need only dereference our task private array to find it.
+ */
+ if (flags & IORING_ENTER_REGISTERED_RING) {
+ struct io_uring_task *tctx = current->io_uring;
+
+ if (!tctx || fd >= IO_RINGFD_REG_MAX)
+ return -EINVAL;
+ fd = array_index_nospec(fd, IO_RINGFD_REG_MAX);
+ f.file = tctx->registered_rings[fd];
+ if (unlikely(!f.file))
+ return -EBADF;
+ } else {
+ f = fdget(fd);
+ if (unlikely(!f.file))
+ return -EBADF;
+ }
+
+ ret = -EOPNOTSUPP;
+ if (unlikely(f.file->f_op != &io_uring_fops))
+ goto out_fput;
+
+ ret = -ENXIO;
+ ctx = f.file->private_data;
+ if (unlikely(!percpu_ref_tryget(&ctx->refs)))
+ goto out_fput;
+
+ ret = -EBADFD;
+ if (unlikely(ctx->flags & IORING_SETUP_R_DISABLED))
+ goto out;
+
+ /*
+ * For SQ polling, the thread will do all submissions and completions.
+ * Just return the requested submit count, and wake the thread if
+ * we were asked to.
+ */
+ ret = 0;
+ if (ctx->flags & IORING_SETUP_SQPOLL) {
+ io_cqring_overflow_flush(ctx);
+
+ if (unlikely(ctx->sq_data->thread == NULL)) {
+ ret = -EOWNERDEAD;
+ goto out;
+ }
+ if (flags & IORING_ENTER_SQ_WAKEUP)
+ wake_up(&ctx->sq_data->wait);
+ if (flags & IORING_ENTER_SQ_WAIT) {
+ ret = io_sqpoll_wait_sq(ctx);
+ if (ret)
+ goto out;
+ }
+ submitted = to_submit;
+ } else if (to_submit) {
+ ret = io_uring_add_tctx_node(ctx);
+ if (unlikely(ret))
+ goto out;
+ mutex_lock(&ctx->uring_lock);
+ submitted = io_submit_sqes(ctx, to_submit);
+ mutex_unlock(&ctx->uring_lock);
+
+ if (submitted != to_submit)
+ goto out;
+ }
+ if (flags & IORING_ENTER_GETEVENTS) {
+ const sigset_t __user *sig;
+ struct __kernel_timespec __user *ts;
+
+ ret = io_get_ext_arg(flags, argp, &argsz, &ts, &sig);
+ if (unlikely(ret))
+ goto out;
+
+ min_complete = min(min_complete, ctx->cq_entries);
+
+ /*
+ * When SETUP_IOPOLL and SETUP_SQPOLL are both enabled, user
+ * space applications don't need to do io completion events
+ * polling again, they can rely on io_sq_thread to do polling
+ * work, which can reduce cpu usage and uring_lock contention.
+ */
+ if (ctx->flags & IORING_SETUP_IOPOLL &&
+ !(ctx->flags & IORING_SETUP_SQPOLL)) {
+ ret = io_iopoll_check(ctx, min_complete);
+ } else {
+ ret = io_cqring_wait(ctx, min_complete, sig, argsz, ts);
+ }
+ }
+
+out:
+ percpu_ref_put(&ctx->refs);
+out_fput:
+ if (!(flags & IORING_ENTER_REGISTERED_RING))
+ fdput(f);
+ return submitted ? submitted : ret;
+}
+
+#ifdef CONFIG_PROC_FS
+static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id,
+ const struct cred *cred)
+{
+ struct user_namespace *uns = seq_user_ns(m);
+ struct group_info *gi;
+ kernel_cap_t cap;
+ unsigned __capi;
+ int g;
+
+ seq_printf(m, "%5d\n", id);
+ seq_put_decimal_ull(m, "\tUid:\t", from_kuid_munged(uns, cred->uid));
+ seq_put_decimal_ull(m, "\t\t", from_kuid_munged(uns, cred->euid));
+ seq_put_decimal_ull(m, "\t\t", from_kuid_munged(uns, cred->suid));
+ seq_put_decimal_ull(m, "\t\t", from_kuid_munged(uns, cred->fsuid));
+ seq_put_decimal_ull(m, "\n\tGid:\t", from_kgid_munged(uns, cred->gid));
+ seq_put_decimal_ull(m, "\t\t", from_kgid_munged(uns, cred->egid));
+ seq_put_decimal_ull(m, "\t\t", from_kgid_munged(uns, cred->sgid));
+ seq_put_decimal_ull(m, "\t\t", from_kgid_munged(uns, cred->fsgid));
+ seq_puts(m, "\n\tGroups:\t");
+ gi = cred->group_info;
+ for (g = 0; g < gi->ngroups; g++) {
+ seq_put_decimal_ull(m, g ? " " : "",
+ from_kgid_munged(uns, gi->gid[g]));
+ }
+ seq_puts(m, "\n\tCapEff:\t");
+ cap = cred->cap_effective;
+ CAP_FOR_EACH_U32(__capi)
+ seq_put_hex_ll(m, NULL, cap.cap[CAP_LAST_U32 - __capi], 8);
+ seq_putc(m, '\n');
+ return 0;
+}
+
+static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
+ struct seq_file *m)
+{
+ struct io_sq_data *sq = NULL;
+ struct io_overflow_cqe *ocqe;
+ struct io_rings *r = ctx->rings;
+ unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1;
+ unsigned int sq_head = READ_ONCE(r->sq.head);
+ unsigned int sq_tail = READ_ONCE(r->sq.tail);
+ unsigned int cq_head = READ_ONCE(r->cq.head);
+ unsigned int cq_tail = READ_ONCE(r->cq.tail);
+ unsigned int sq_entries, cq_entries;
+ bool has_lock;
+ unsigned int i;
+
+ /*
+ * we may get imprecise sqe and cqe info if uring is actively running
+ * since we get cached_sq_head and cached_cq_tail without uring_lock
+ * and sq_tail and cq_head are changed by userspace. But it's ok since
+ * we usually use these info when it is stuck.
+ */
+ seq_printf(m, "SqMask:\t0x%x\n", sq_mask);
+ seq_printf(m, "SqHead:\t%u\n", sq_head);
+ seq_printf(m, "SqTail:\t%u\n", sq_tail);
+ seq_printf(m, "CachedSqHead:\t%u\n", ctx->cached_sq_head);
+ seq_printf(m, "CqMask:\t0x%x\n", cq_mask);
+ seq_printf(m, "CqHead:\t%u\n", cq_head);
+ seq_printf(m, "CqTail:\t%u\n", cq_tail);
+ seq_printf(m, "CachedCqTail:\t%u\n", ctx->cached_cq_tail);
+ seq_printf(m, "SQEs:\t%u\n", sq_tail - ctx->cached_sq_head);
+ sq_entries = min(sq_tail - sq_head, ctx->sq_entries);
+ for (i = 0; i < sq_entries; i++) {
+ unsigned int entry = i + sq_head;
+ unsigned int sq_idx = READ_ONCE(ctx->sq_array[entry & sq_mask]);
+ struct io_uring_sqe *sqe;
+
+ if (sq_idx > sq_mask)
+ continue;
+ sqe = &ctx->sq_sqes[sq_idx];
+ seq_printf(m, "%5u: opcode:%d, fd:%d, flags:%x, user_data:%llu\n",
+ sq_idx, sqe->opcode, sqe->fd, sqe->flags,
+ sqe->user_data);
+ }
+ seq_printf(m, "CQEs:\t%u\n", cq_tail - cq_head);
+ cq_entries = min(cq_tail - cq_head, ctx->cq_entries);
+ for (i = 0; i < cq_entries; i++) {
+ unsigned int entry = i + cq_head;
+ struct io_uring_cqe *cqe = &r->cqes[entry & cq_mask];
+
+ seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x\n",
+ entry & cq_mask, cqe->user_data, cqe->res,
+ cqe->flags);
+ }
+
+ /*
+ * Avoid ABBA deadlock between the seq lock and the io_uring mutex,
+ * since fdinfo case grabs it in the opposite direction of normal use
+ * cases. If we fail to get the lock, we just don't iterate any
+ * structures that could be going away outside the io_uring mutex.
+ */
+ has_lock = mutex_trylock(&ctx->uring_lock);
+
+ if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) {
+ sq = ctx->sq_data;
+ if (!sq->thread)
+ sq = NULL;
+ }
+
+ seq_printf(m, "SqThread:\t%d\n", sq ? task_pid_nr(sq->thread) : -1);
+ seq_printf(m, "SqThreadCpu:\t%d\n", sq ? task_cpu(sq->thread) : -1);
+ seq_printf(m, "UserFiles:\t%u\n", ctx->nr_user_files);
+ for (i = 0; has_lock && i < ctx->nr_user_files; i++) {
+ struct file *f = io_file_from_index(ctx, i);
+
+ if (f)
+ seq_printf(m, "%5u: %s\n", i, file_dentry(f)->d_iname);
+ else
+ seq_printf(m, "%5u: <none>\n", i);
+ }
+ seq_printf(m, "UserBufs:\t%u\n", ctx->nr_user_bufs);
+ for (i = 0; has_lock && i < ctx->nr_user_bufs; i++) {
+ struct io_mapped_ubuf *buf = ctx->user_bufs[i];
+ unsigned int len = buf->ubuf_end - buf->ubuf;
+
+ seq_printf(m, "%5u: 0x%llx/%u\n", i, buf->ubuf, len);
+ }
+ if (has_lock && !xa_empty(&ctx->personalities)) {
+ unsigned long index;
+ const struct cred *cred;
+
+ seq_printf(m, "Personalities:\n");
+ xa_for_each(&ctx->personalities, index, cred)
+ io_uring_show_cred(m, index, cred);
+ }
+ if (has_lock)
+ mutex_unlock(&ctx->uring_lock);
+
+ seq_puts(m, "PollList:\n");
+ spin_lock(&ctx->completion_lock);
+ for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
+ struct hlist_head *list = &ctx->cancel_hash[i];
+ struct io_kiocb *req;
+
+ hlist_for_each_entry(req, list, hash_node)
+ seq_printf(m, " op=%d, task_works=%d\n", req->opcode,
+ task_work_pending(req->task));
+ }
+
+ seq_puts(m, "CqOverflowList:\n");
+ list_for_each_entry(ocqe, &ctx->cq_overflow_list, list) {
+ struct io_uring_cqe *cqe = &ocqe->cqe;
+
+ seq_printf(m, " user_data=%llu, res=%d, flags=%x\n",
+ cqe->user_data, cqe->res, cqe->flags);
+
+ }
+
+ spin_unlock(&ctx->completion_lock);
+}
+
+static __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
+{
+ struct io_ring_ctx *ctx = f->private_data;
+
+ if (percpu_ref_tryget(&ctx->refs)) {
+ __io_uring_show_fdinfo(ctx, m);
+ percpu_ref_put(&ctx->refs);
+ }
+}
+#endif
+
+static const struct file_operations io_uring_fops = {
+ .release = io_uring_release,
+ .mmap = io_uring_mmap,
+#ifndef CONFIG_MMU
+ .get_unmapped_area = io_uring_nommu_get_unmapped_area,
+ .mmap_capabilities = io_uring_nommu_mmap_capabilities,
+#endif
+ .poll = io_uring_poll,
+#ifdef CONFIG_PROC_FS
+ .show_fdinfo = io_uring_show_fdinfo,
+#endif
+};
+
+static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
+ struct io_uring_params *p)
+{
+ struct io_rings *rings;
+ size_t size, sq_array_offset;
+
+ /* make sure these are sane, as we already accounted them */
+ ctx->sq_entries = p->sq_entries;
+ ctx->cq_entries = p->cq_entries;
+
+ size = rings_size(p->sq_entries, p->cq_entries, &sq_array_offset);
+ if (size == SIZE_MAX)
+ return -EOVERFLOW;
+
+ rings = io_mem_alloc(size);
+ if (!rings)
+ return -ENOMEM;
+
+ ctx->rings = rings;
+ ctx->sq_array = (u32 *)((char *)rings + sq_array_offset);
+ rings->sq_ring_mask = p->sq_entries - 1;
+ rings->cq_ring_mask = p->cq_entries - 1;
+ rings->sq_ring_entries = p->sq_entries;
+ rings->cq_ring_entries = p->cq_entries;
+
+ size = array_size(sizeof(struct io_uring_sqe), p->sq_entries);
+ if (size == SIZE_MAX) {
+ io_mem_free(ctx->rings);
+ ctx->rings = NULL;
+ return -EOVERFLOW;
+ }
+
+ ctx->sq_sqes = io_mem_alloc(size);
+ if (!ctx->sq_sqes) {
+ io_mem_free(ctx->rings);
+ ctx->rings = NULL;
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static int io_uring_install_fd(struct io_ring_ctx *ctx, struct file *file)
+{
+ int ret, fd;
+
+ fd = get_unused_fd_flags(O_RDWR | O_CLOEXEC);
+ if (fd < 0)
+ return fd;
+
+ ret = io_uring_add_tctx_node(ctx);
+ if (ret) {
+ put_unused_fd(fd);
+ return ret;
+ }
+ fd_install(fd, file);
+ return fd;
+}
+
+/*
+ * Allocate an anonymous fd, this is what constitutes the application
+ * visible backing of an io_uring instance. The application mmaps this
+ * fd to gain access to the SQ/CQ ring details. If UNIX sockets are enabled,
+ * we have to tie this fd to a socket for file garbage collection purposes.
+ */
+static struct file *io_uring_get_file(struct io_ring_ctx *ctx)
+{
+ struct file *file;
+#if defined(CONFIG_UNIX)
+ int ret;
+
+ ret = sock_create_kern(&init_net, PF_UNIX, SOCK_RAW, IPPROTO_IP,
+ &ctx->ring_sock);
+ if (ret)
+ return ERR_PTR(ret);
+#endif
+
+ file = anon_inode_getfile_secure("[io_uring]", &io_uring_fops, ctx,
+ O_RDWR | O_CLOEXEC, NULL);
+#if defined(CONFIG_UNIX)
+ if (IS_ERR(file)) {
+ sock_release(ctx->ring_sock);
+ ctx->ring_sock = NULL;
+ } else {
+ ctx->ring_sock->file = file;
+ }
+#endif
+ return file;
+}
+
+static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
+ struct io_uring_params __user *params)
+{
+ struct io_ring_ctx *ctx;
+ struct file *file;
+ int ret;
+
+ if (!entries)
+ return -EINVAL;
+ if (entries > IORING_MAX_ENTRIES) {
+ if (!(p->flags & IORING_SETUP_CLAMP))
+ return -EINVAL;
+ entries = IORING_MAX_ENTRIES;
+ }
+
+ /*
+ * Use twice as many entries for the CQ ring. It's possible for the
+ * application to drive a higher depth than the size of the SQ ring,
+ * since the sqes are only used at submission time. This allows for
+ * some flexibility in overcommitting a bit. If the application has
+ * set IORING_SETUP_CQSIZE, it will have passed in the desired number
+ * of CQ ring entries manually.
+ */
+ p->sq_entries = roundup_pow_of_two(entries);
+ if (p->flags & IORING_SETUP_CQSIZE) {
+ /*
+ * If IORING_SETUP_CQSIZE is set, we do the same roundup
+ * to a power-of-two, if it isn't already. We do NOT impose
+ * any cq vs sq ring sizing.
+ */
+ if (!p->cq_entries)
+ return -EINVAL;
+ if (p->cq_entries > IORING_MAX_CQ_ENTRIES) {
+ if (!(p->flags & IORING_SETUP_CLAMP))
+ return -EINVAL;
+ p->cq_entries = IORING_MAX_CQ_ENTRIES;
+ }
+ p->cq_entries = roundup_pow_of_two(p->cq_entries);
+ if (p->cq_entries < p->sq_entries)
+ return -EINVAL;
+ } else {
+ p->cq_entries = 2 * p->sq_entries;
+ }
+
+ ctx = io_ring_ctx_alloc(p);
+ if (!ctx)
+ return -ENOMEM;
+ ctx->compat = in_compat_syscall();
+ if (!capable(CAP_IPC_LOCK))
+ ctx->user = get_uid(current_user());
+
+ /*
+ * This is just grabbed for accounting purposes. When a process exits,
+ * the mm is exited and dropped before the files, hence we need to hang
+ * on to this mm purely for the purposes of being able to unaccount
+ * memory (locked/pinned vm). It's not used for anything else.
+ */
+ mmgrab(current->mm);
+ ctx->mm_account = current->mm;
+
+ ret = io_allocate_scq_urings(ctx, p);
+ if (ret)
+ goto err;
+
+ ret = io_sq_offload_create(ctx, p);
+ if (ret)
+ goto err;
+ /* always set a rsrc node */
+ ret = io_rsrc_node_switch_start(ctx);
+ if (ret)
+ goto err;
+ io_rsrc_node_switch(ctx, NULL);
+
+ memset(&p->sq_off, 0, sizeof(p->sq_off));
+ p->sq_off.head = offsetof(struct io_rings, sq.head);
+ p->sq_off.tail = offsetof(struct io_rings, sq.tail);
+ p->sq_off.ring_mask = offsetof(struct io_rings, sq_ring_mask);
+ p->sq_off.ring_entries = offsetof(struct io_rings, sq_ring_entries);
+ p->sq_off.flags = offsetof(struct io_rings, sq_flags);
+ p->sq_off.dropped = offsetof(struct io_rings, sq_dropped);
+ p->sq_off.array = (char *)ctx->sq_array - (char *)ctx->rings;
+
+ memset(&p->cq_off, 0, sizeof(p->cq_off));
+ p->cq_off.head = offsetof(struct io_rings, cq.head);
+ p->cq_off.tail = offsetof(struct io_rings, cq.tail);
+ p->cq_off.ring_mask = offsetof(struct io_rings, cq_ring_mask);
+ p->cq_off.ring_entries = offsetof(struct io_rings, cq_ring_entries);
+ p->cq_off.overflow = offsetof(struct io_rings, cq_overflow);
+ p->cq_off.cqes = offsetof(struct io_rings, cqes);
+ p->cq_off.flags = offsetof(struct io_rings, cq_flags);
+
+ p->features = IORING_FEAT_SINGLE_MMAP | IORING_FEAT_NODROP |
+ IORING_FEAT_SUBMIT_STABLE | IORING_FEAT_RW_CUR_POS |
+ IORING_FEAT_CUR_PERSONALITY | IORING_FEAT_FAST_POLL |
+ IORING_FEAT_POLL_32BITS | IORING_FEAT_SQPOLL_NONFIXED |
+ IORING_FEAT_EXT_ARG | IORING_FEAT_NATIVE_WORKERS |
+ IORING_FEAT_RSRC_TAGS | IORING_FEAT_CQE_SKIP |
+ IORING_FEAT_LINKED_FILE;
+
+ if (copy_to_user(params, p, sizeof(*p))) {
+ ret = -EFAULT;
+ goto err;
+ }
+
+ file = io_uring_get_file(ctx);
+ if (IS_ERR(file)) {
+ ret = PTR_ERR(file);
+ goto err;
+ }
+
+ /*
+ * Install ring fd as the very last thing, so we don't risk someone
+ * having closed it before we finish setup
+ */
+ ret = io_uring_install_fd(ctx, file);
+ if (ret < 0) {
+ /* fput will clean it up */
+ fput(file);
+ return ret;
+ }
+
+ trace_io_uring_create(ret, ctx, p->sq_entries, p->cq_entries, p->flags);
+ return ret;
+err:
+ io_ring_ctx_wait_and_kill(ctx);
+ return ret;
+}
+
+/*
+ * Sets up an aio uring context, and returns the fd. Applications asks for a
+ * ring size, we return the actual sq/cq ring sizes (among other things) in the
+ * params structure passed in.
+ */
+static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
+{
+ struct io_uring_params p;
+ int i;
+
+ if (copy_from_user(&p, params, sizeof(p)))
+ return -EFAULT;
+ for (i = 0; i < ARRAY_SIZE(p.resv); i++) {
+ if (p.resv[i])
+ return -EINVAL;
+ }
+
+ if (p.flags & ~(IORING_SETUP_IOPOLL | IORING_SETUP_SQPOLL |
+ IORING_SETUP_SQ_AFF | IORING_SETUP_CQSIZE |
+ IORING_SETUP_CLAMP | IORING_SETUP_ATTACH_WQ |
+ IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL))
+ return -EINVAL;
+
+ return io_uring_create(entries, &p, params);
+}
+
+SYSCALL_DEFINE2(io_uring_setup, u32, entries,
+ struct io_uring_params __user *, params)
+{
+ return io_uring_setup(entries, params);
+}
+
+static __cold int io_probe(struct io_ring_ctx *ctx, void __user *arg,
+ unsigned nr_args)
+{
+ struct io_uring_probe *p;
+ size_t size;
+ int i, ret;
+
+ size = struct_size(p, ops, nr_args);
+ if (size == SIZE_MAX)
+ return -EOVERFLOW;
+ p = kzalloc(size, GFP_KERNEL);
+ if (!p)
+ return -ENOMEM;
+
+ ret = -EFAULT;
+ if (copy_from_user(p, arg, size))
+ goto out;
+ ret = -EINVAL;
+ if (memchr_inv(p, 0, size))
+ goto out;
+
+ p->last_op = IORING_OP_LAST - 1;
+ if (nr_args > IORING_OP_LAST)
+ nr_args = IORING_OP_LAST;
+
+ for (i = 0; i < nr_args; i++) {
+ p->ops[i].op = i;
+ if (!io_op_defs[i].not_supported)
+ p->ops[i].flags = IO_URING_OP_SUPPORTED;
+ }
+ p->ops_len = i;
+
+ ret = 0;
+ if (copy_to_user(arg, p, size))
+ ret = -EFAULT;
+out:
+ kfree(p);
+ return ret;
+}
+
+static int io_register_personality(struct io_ring_ctx *ctx)
+{
+ const struct cred *creds;
+ u32 id;
+ int ret;
+
+ creds = get_current_cred();
+
+ ret = xa_alloc_cyclic(&ctx->personalities, &id, (void *)creds,
+ XA_LIMIT(0, USHRT_MAX), &ctx->pers_next, GFP_KERNEL);
+ if (ret < 0) {
+ put_cred(creds);
+ return ret;
+ }
+ return id;
+}
+
+static __cold int io_register_restrictions(struct io_ring_ctx *ctx,
+ void __user *arg, unsigned int nr_args)
+{
+ struct io_uring_restriction *res;
+ size_t size;
+ int i, ret;
+
+ /* Restrictions allowed only if rings started disabled */
+ if (!(ctx->flags & IORING_SETUP_R_DISABLED))
+ return -EBADFD;
+
+ /* We allow only a single restrictions registration */
+ if (ctx->restrictions.registered)
+ return -EBUSY;
+
+ if (!arg || nr_args > IORING_MAX_RESTRICTIONS)
+ return -EINVAL;
+
+ size = array_size(nr_args, sizeof(*res));
+ if (size == SIZE_MAX)
+ return -EOVERFLOW;
+
+ res = memdup_user(arg, size);
+ if (IS_ERR(res))
+ return PTR_ERR(res);
+
+ ret = 0;
+
+ for (i = 0; i < nr_args; i++) {
+ switch (res[i].opcode) {
+ case IORING_RESTRICTION_REGISTER_OP:
+ if (res[i].register_op >= IORING_REGISTER_LAST) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ __set_bit(res[i].register_op,
+ ctx->restrictions.register_op);
+ break;
+ case IORING_RESTRICTION_SQE_OP:
+ if (res[i].sqe_op >= IORING_OP_LAST) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ __set_bit(res[i].sqe_op, ctx->restrictions.sqe_op);
+ break;
+ case IORING_RESTRICTION_SQE_FLAGS_ALLOWED:
+ ctx->restrictions.sqe_flags_allowed = res[i].sqe_flags;
+ break;
+ case IORING_RESTRICTION_SQE_FLAGS_REQUIRED:
+ ctx->restrictions.sqe_flags_required = res[i].sqe_flags;
+ break;
+ default:
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+out:
+ /* Reset all restrictions if an error happened */
+ if (ret != 0)
+ memset(&ctx->restrictions, 0, sizeof(ctx->restrictions));
+ else
+ ctx->restrictions.registered = true;
+
+ kfree(res);
+ return ret;
+}
+
+static int io_register_enable_rings(struct io_ring_ctx *ctx)
+{
+ if (!(ctx->flags & IORING_SETUP_R_DISABLED))
+ return -EBADFD;
+
+ if (ctx->restrictions.registered)
+ ctx->restricted = 1;
+
+ ctx->flags &= ~IORING_SETUP_R_DISABLED;
+ if (ctx->sq_data && wq_has_sleeper(&ctx->sq_data->wait))
+ wake_up(&ctx->sq_data->wait);
+ return 0;
+}
+
+static int __io_register_rsrc_update(struct io_ring_ctx *ctx, unsigned type,
+ struct io_uring_rsrc_update2 *up,
+ unsigned nr_args)
+{
+ __u32 tmp;
+ int err;
+
+ if (check_add_overflow(up->offset, nr_args, &tmp))
+ return -EOVERFLOW;
+ err = io_rsrc_node_switch_start(ctx);
+ if (err)
+ return err;
+
+ switch (type) {
+ case IORING_RSRC_FILE:
+ return __io_sqe_files_update(ctx, up, nr_args);
+ case IORING_RSRC_BUFFER:
+ return __io_sqe_buffers_update(ctx, up, nr_args);
+ }
+ return -EINVAL;
+}
+
+static int io_register_files_update(struct io_ring_ctx *ctx, void __user *arg,
+ unsigned nr_args)
+{
+ struct io_uring_rsrc_update2 up;
+
+ if (!nr_args)
+ return -EINVAL;
+ memset(&up, 0, sizeof(up));
+ if (copy_from_user(&up, arg, sizeof(struct io_uring_rsrc_update)))
+ return -EFAULT;
+ if (up.resv || up.resv2)
+ return -EINVAL;
+ return __io_register_rsrc_update(ctx, IORING_RSRC_FILE, &up, nr_args);
+}
+
+static int io_register_rsrc_update(struct io_ring_ctx *ctx, void __user *arg,
+ unsigned size, unsigned type)
+{
+ struct io_uring_rsrc_update2 up;
+
+ if (size != sizeof(up))
+ return -EINVAL;
+ if (copy_from_user(&up, arg, sizeof(up)))
+ return -EFAULT;
+ if (!up.nr || up.resv || up.resv2)
+ return -EINVAL;
+ return __io_register_rsrc_update(ctx, type, &up, up.nr);
+}
+
+static __cold int io_register_rsrc(struct io_ring_ctx *ctx, void __user *arg,
+ unsigned int size, unsigned int type)
+{
+ struct io_uring_rsrc_register rr;
+
+ /* keep it extendible */
+ if (size != sizeof(rr))
+ return -EINVAL;
+
+ memset(&rr, 0, sizeof(rr));
+ if (copy_from_user(&rr, arg, size))
+ return -EFAULT;
+ if (!rr.nr || rr.resv || rr.resv2)
+ return -EINVAL;
+
+ switch (type) {
+ case IORING_RSRC_FILE:
+ return io_sqe_files_register(ctx, u64_to_user_ptr(rr.data),
+ rr.nr, u64_to_user_ptr(rr.tags));
+ case IORING_RSRC_BUFFER:
+ return io_sqe_buffers_register(ctx, u64_to_user_ptr(rr.data),
+ rr.nr, u64_to_user_ptr(rr.tags));
+ }
+ return -EINVAL;
+}
+
+static __cold int io_register_iowq_aff(struct io_ring_ctx *ctx,
+ void __user *arg, unsigned len)
+{
+ struct io_uring_task *tctx = current->io_uring;
+ cpumask_var_t new_mask;
+ int ret;
+
+ if (!tctx || !tctx->io_wq)
+ return -EINVAL;
+
+ if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))
+ return -ENOMEM;
+
+ cpumask_clear(new_mask);
+ if (len > cpumask_size())
+ len = cpumask_size();
+
+ if (in_compat_syscall()) {
+ ret = compat_get_bitmap(cpumask_bits(new_mask),
+ (const compat_ulong_t __user *)arg,
+ len * 8 /* CHAR_BIT */);
+ } else {
+ ret = copy_from_user(new_mask, arg, len);
+ }
+
+ if (ret) {
+ free_cpumask_var(new_mask);
+ return -EFAULT;
+ }
+
+ ret = io_wq_cpu_affinity(tctx->io_wq, new_mask);
+ free_cpumask_var(new_mask);
+ return ret;
+}
+
+static __cold int io_unregister_iowq_aff(struct io_ring_ctx *ctx)
+{
+ struct io_uring_task *tctx = current->io_uring;
+
+ if (!tctx || !tctx->io_wq)
+ return -EINVAL;
+
+ return io_wq_cpu_affinity(tctx->io_wq, NULL);
+}
+
+static __cold int io_register_iowq_max_workers(struct io_ring_ctx *ctx,
+ void __user *arg)
+ __must_hold(&ctx->uring_lock)
+{
+ struct io_tctx_node *node;
+ struct io_uring_task *tctx = NULL;
+ struct io_sq_data *sqd = NULL;
+ __u32 new_count[2];
+ int i, ret;
+
+ if (copy_from_user(new_count, arg, sizeof(new_count)))
+ return -EFAULT;
+ for (i = 0; i < ARRAY_SIZE(new_count); i++)
+ if (new_count[i] > INT_MAX)
+ return -EINVAL;
+
+ if (ctx->flags & IORING_SETUP_SQPOLL) {
+ sqd = ctx->sq_data;
+ if (sqd) {
+ /*
+ * Observe the correct sqd->lock -> ctx->uring_lock
+ * ordering. Fine to drop uring_lock here, we hold
+ * a ref to the ctx.
+ */
+ refcount_inc(&sqd->refs);
+ mutex_unlock(&ctx->uring_lock);
+ mutex_lock(&sqd->lock);
+ mutex_lock(&ctx->uring_lock);
+ if (sqd->thread)
+ tctx = sqd->thread->io_uring;
+ }
+ } else {
+ tctx = current->io_uring;
+ }
+
+ BUILD_BUG_ON(sizeof(new_count) != sizeof(ctx->iowq_limits));
+
+ for (i = 0; i < ARRAY_SIZE(new_count); i++)
+ if (new_count[i])
+ ctx->iowq_limits[i] = new_count[i];
+ ctx->iowq_limits_set = true;
+
+ if (tctx && tctx->io_wq) {
+ ret = io_wq_max_workers(tctx->io_wq, new_count);
+ if (ret)
+ goto err;
+ } else {
+ memset(new_count, 0, sizeof(new_count));
+ }
+
+ if (sqd) {
+ mutex_unlock(&sqd->lock);
+ io_put_sq_data(sqd);
+ }
+
+ if (copy_to_user(arg, new_count, sizeof(new_count)))
+ return -EFAULT;
+
+ /* that's it for SQPOLL, only the SQPOLL task creates requests */
+ if (sqd)
+ return 0;
+
+ /* now propagate the restriction to all registered users */
+ list_for_each_entry(node, &ctx->tctx_list, ctx_node) {
+ struct io_uring_task *tctx = node->task->io_uring;
+
+ if (WARN_ON_ONCE(!tctx->io_wq))
+ continue;
+
+ for (i = 0; i < ARRAY_SIZE(new_count); i++)
+ new_count[i] = ctx->iowq_limits[i];
+ /* ignore errors, it always returns zero anyway */
+ (void)io_wq_max_workers(tctx->io_wq, new_count);
+ }
+ return 0;
+err:
+ if (sqd) {
+ mutex_unlock(&sqd->lock);
+ io_put_sq_data(sqd);
+ }
+ return ret;
+}
+
+static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
+ void __user *arg, unsigned nr_args)
+ __releases(ctx->uring_lock)
+ __acquires(ctx->uring_lock)
+{
+ int ret;
+
+ /*
+ * We're inside the ring mutex, if the ref is already dying, then
+ * someone else killed the ctx or is already going through
+ * io_uring_register().
+ */
+ if (percpu_ref_is_dying(&ctx->refs))
+ return -ENXIO;
+
+ if (ctx->restricted) {
+ if (opcode >= IORING_REGISTER_LAST)
+ return -EINVAL;
+ opcode = array_index_nospec(opcode, IORING_REGISTER_LAST);
+ if (!test_bit(opcode, ctx->restrictions.register_op))
+ return -EACCES;
+ }
+
+ switch (opcode) {
+ case IORING_REGISTER_BUFFERS:
+ ret = io_sqe_buffers_register(ctx, arg, nr_args, NULL);
+ break;
+ case IORING_UNREGISTER_BUFFERS:
+ ret = -EINVAL;
+ if (arg || nr_args)
+ break;
+ ret = io_sqe_buffers_unregister(ctx);
+ break;
+ case IORING_REGISTER_FILES:
+ ret = io_sqe_files_register(ctx, arg, nr_args, NULL);
+ break;
+ case IORING_UNREGISTER_FILES:
+ ret = -EINVAL;
+ if (arg || nr_args)
+ break;
+ ret = io_sqe_files_unregister(ctx);
+ break;
+ case IORING_REGISTER_FILES_UPDATE:
+ ret = io_register_files_update(ctx, arg, nr_args);
+ break;
+ case IORING_REGISTER_EVENTFD:
+ ret = -EINVAL;
+ if (nr_args != 1)
+ break;
+ ret = io_eventfd_register(ctx, arg, 0);
+ break;
+ case IORING_REGISTER_EVENTFD_ASYNC:
+ ret = -EINVAL;
+ if (nr_args != 1)
+ break;
+ ret = io_eventfd_register(ctx, arg, 1);
+ break;
+ case IORING_UNREGISTER_EVENTFD:
+ ret = -EINVAL;
+ if (arg || nr_args)
+ break;
+ ret = io_eventfd_unregister(ctx);
+ break;
+ case IORING_REGISTER_PROBE:
+ ret = -EINVAL;
+ if (!arg || nr_args > 256)
+ break;
+ ret = io_probe(ctx, arg, nr_args);
+ break;
+ case IORING_REGISTER_PERSONALITY:
+ ret = -EINVAL;
+ if (arg || nr_args)
+ break;
+ ret = io_register_personality(ctx);
+ break;
+ case IORING_UNREGISTER_PERSONALITY:
+ ret = -EINVAL;
+ if (arg)
+ break;
+ ret = io_unregister_personality(ctx, nr_args);
+ break;
+ case IORING_REGISTER_ENABLE_RINGS:
+ ret = -EINVAL;
+ if (arg || nr_args)
+ break;
+ ret = io_register_enable_rings(ctx);
+ break;
+ case IORING_REGISTER_RESTRICTIONS:
+ ret = io_register_restrictions(ctx, arg, nr_args);
+ break;
+ case IORING_REGISTER_FILES2:
+ ret = io_register_rsrc(ctx, arg, nr_args, IORING_RSRC_FILE);
+ break;
+ case IORING_REGISTER_FILES_UPDATE2:
+ ret = io_register_rsrc_update(ctx, arg, nr_args,
+ IORING_RSRC_FILE);
+ break;
+ case IORING_REGISTER_BUFFERS2:
+ ret = io_register_rsrc(ctx, arg, nr_args, IORING_RSRC_BUFFER);
+ break;
+ case IORING_REGISTER_BUFFERS_UPDATE:
+ ret = io_register_rsrc_update(ctx, arg, nr_args,
+ IORING_RSRC_BUFFER);
+ break;
+ case IORING_REGISTER_IOWQ_AFF:
+ ret = -EINVAL;
+ if (!arg || !nr_args)
+ break;
+ ret = io_register_iowq_aff(ctx, arg, nr_args);
+ break;
+ case IORING_UNREGISTER_IOWQ_AFF:
+ ret = -EINVAL;
+ if (arg || nr_args)
+ break;
+ ret = io_unregister_iowq_aff(ctx);
+ break;
+ case IORING_REGISTER_IOWQ_MAX_WORKERS:
+ ret = -EINVAL;
+ if (!arg || nr_args != 2)
+ break;
+ ret = io_register_iowq_max_workers(ctx, arg);
+ break;
+ case IORING_REGISTER_RING_FDS:
+ ret = io_ringfd_register(ctx, arg, nr_args);
+ break;
+ case IORING_UNREGISTER_RING_FDS:
+ ret = io_ringfd_unregister(ctx, arg, nr_args);
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
+SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode,
+ void __user *, arg, unsigned int, nr_args)
+{
+ struct io_ring_ctx *ctx;
+ long ret = -EBADF;
+ struct fd f;
+
+ f = fdget(fd);
+ if (!f.file)
+ return -EBADF;
+
+ ret = -EOPNOTSUPP;
+ if (f.file->f_op != &io_uring_fops)
+ goto out_fput;
+
+ ctx = f.file->private_data;
+
+ io_run_task_work();
+
+ mutex_lock(&ctx->uring_lock);
+ ret = __io_uring_register(ctx, opcode, arg, nr_args);
+ mutex_unlock(&ctx->uring_lock);
+ trace_io_uring_register(ctx, opcode, ctx->nr_user_files, ctx->nr_user_bufs, ret);
+out_fput:
+ fdput(f);
+ return ret;
+}
+
+static int __init io_uring_init(void)
+{
+#define __BUILD_BUG_VERIFY_ELEMENT(stype, eoffset, etype, ename) do { \
+ BUILD_BUG_ON(offsetof(stype, ename) != eoffset); \
+ BUILD_BUG_ON(sizeof(etype) != sizeof_field(stype, ename)); \
+} while (0)
+
+#define BUILD_BUG_SQE_ELEM(eoffset, etype, ename) \
+ __BUILD_BUG_VERIFY_ELEMENT(struct io_uring_sqe, eoffset, etype, ename)
+ BUILD_BUG_ON(sizeof(struct io_uring_sqe) != 64);
+ BUILD_BUG_SQE_ELEM(0, __u8, opcode);
+ BUILD_BUG_SQE_ELEM(1, __u8, flags);
+ BUILD_BUG_SQE_ELEM(2, __u16, ioprio);
+ BUILD_BUG_SQE_ELEM(4, __s32, fd);
+ BUILD_BUG_SQE_ELEM(8, __u64, off);
+ BUILD_BUG_SQE_ELEM(8, __u64, addr2);
+ BUILD_BUG_SQE_ELEM(16, __u64, addr);
+ BUILD_BUG_SQE_ELEM(16, __u64, splice_off_in);
+ BUILD_BUG_SQE_ELEM(24, __u32, len);
+ BUILD_BUG_SQE_ELEM(28, __kernel_rwf_t, rw_flags);
+ BUILD_BUG_SQE_ELEM(28, /* compat */ int, rw_flags);
+ BUILD_BUG_SQE_ELEM(28, /* compat */ __u32, rw_flags);
+ BUILD_BUG_SQE_ELEM(28, __u32, fsync_flags);
+ BUILD_BUG_SQE_ELEM(28, /* compat */ __u16, poll_events);
+ BUILD_BUG_SQE_ELEM(28, __u32, poll32_events);
+ BUILD_BUG_SQE_ELEM(28, __u32, sync_range_flags);
+ BUILD_BUG_SQE_ELEM(28, __u32, msg_flags);
+ BUILD_BUG_SQE_ELEM(28, __u32, timeout_flags);
+ BUILD_BUG_SQE_ELEM(28, __u32, accept_flags);
+ BUILD_BUG_SQE_ELEM(28, __u32, cancel_flags);
+ BUILD_BUG_SQE_ELEM(28, __u32, open_flags);
+ BUILD_BUG_SQE_ELEM(28, __u32, statx_flags);
+ BUILD_BUG_SQE_ELEM(28, __u32, fadvise_advice);
+ BUILD_BUG_SQE_ELEM(28, __u32, splice_flags);
+ BUILD_BUG_SQE_ELEM(32, __u64, user_data);
+ BUILD_BUG_SQE_ELEM(40, __u16, buf_index);
+ BUILD_BUG_SQE_ELEM(40, __u16, buf_group);
+ BUILD_BUG_SQE_ELEM(42, __u16, personality);
+ BUILD_BUG_SQE_ELEM(44, __s32, splice_fd_in);
+ BUILD_BUG_SQE_ELEM(44, __u32, file_index);
+
+ BUILD_BUG_ON(sizeof(struct io_uring_files_update) !=
+ sizeof(struct io_uring_rsrc_update));
+ BUILD_BUG_ON(sizeof(struct io_uring_rsrc_update) >
+ sizeof(struct io_uring_rsrc_update2));
+
+ /* ->buf_index is u16 */
+ BUILD_BUG_ON(IORING_MAX_REG_BUFFERS >= (1u << 16));
+
+ /* should fit into one byte */
+ BUILD_BUG_ON(SQE_VALID_FLAGS >= (1 << 8));
+ BUILD_BUG_ON(SQE_COMMON_FLAGS >= (1 << 8));
+ BUILD_BUG_ON((SQE_VALID_FLAGS | SQE_COMMON_FLAGS) != SQE_VALID_FLAGS);
+
+ BUILD_BUG_ON(ARRAY_SIZE(io_op_defs) != IORING_OP_LAST);
+ BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(int));
+
+ req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC |
+ SLAB_ACCOUNT);
+ return 0;
+};
+__initcall(io_uring_init);
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 7f145aefbff8..d015fce67865 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -155,6 +155,11 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
return &array->map;
}

+static void *array_map_elem_ptr(struct bpf_array* array, u32 index)
+{
+ return array->value + (u64)array->elem_size * index;
+}
+
/* Called from syscall or from eBPF program */
static void *array_map_lookup_elem(struct bpf_map *map, void *key)
{
@@ -164,7 +169,7 @@ static void *array_map_lookup_elem(struct bpf_map *map, void *key)
if (unlikely(index >= array->map.max_entries))
return NULL;

- return array->value + array->elem_size * (index & array->index_mask);
+ return array->value + (u64)array->elem_size * (index & array->index_mask);
}

static int array_map_direct_value_addr(const struct bpf_map *map, u64 *imm,
@@ -287,10 +292,12 @@ static int array_map_get_next_key(struct bpf_map *map, void *key, void *next_key
return 0;
}

-static void check_and_free_timer_in_array(struct bpf_array *arr, void *val)
+static void check_and_free_fields(struct bpf_array *arr, void *val)
{
- if (unlikely(map_value_has_timer(&arr->map)))
+ if (map_value_has_timer(&arr->map))
bpf_timer_cancel_and_free(val + arr->map.timer_off);
+ if (map_value_has_kptrs(&arr->map))
+ bpf_map_free_kptrs(&arr->map, val);
}

/* Called from syscall or from eBPF program */
@@ -322,12 +329,12 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
value, map->value_size);
} else {
val = array->value +
- array->elem_size * (index & array->index_mask);
+ (u64)array->elem_size * (index & array->index_mask);
if (map_flags & BPF_F_LOCK)
copy_map_value_locked(map, val, value, false);
else
copy_map_value(map, val, value);
- check_and_free_timer_in_array(array, val);
+ check_and_free_fields(array, val);
}
return 0;
}
@@ -386,18 +393,25 @@ static void array_map_free_timers(struct bpf_map *map)
struct bpf_array *array = container_of(map, struct bpf_array, map);
int i;

- if (likely(!map_value_has_timer(map)))
+ /* We don't reset or free kptr on uref dropping to zero. */
+ if (!map_value_has_timer(map))
return;

for (i = 0; i < array->map.max_entries; i++)
- bpf_timer_cancel_and_free(array->value + array->elem_size * i +
- map->timer_off);
+ bpf_timer_cancel_and_free(array_map_elem_ptr(array, i) + map->timer_off);
}

/* Called when map->refcnt goes to zero, either from workqueue or from syscall */
static void array_map_free(struct bpf_map *map)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
+ int i;
+
+ if (map_value_has_kptrs(map)) {
+ for (i = 0; i < array->map.max_entries; i++)
+ bpf_map_free_kptrs(map, array_map_elem_ptr(array, i));
+ bpf_map_free_kptr_off_tab(map);
+ }

if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
bpf_array_free_percpu(array);
@@ -531,7 +545,7 @@ static void *bpf_array_map_seq_start(struct seq_file *seq, loff_t *pos)
index = info->index & array->index_mask;
if (info->percpu_value_buf)
return array->pptrs[index];
- return array->value + array->elem_size * index;
+ return array_map_elem_ptr(array, index);
}

static void *bpf_array_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
@@ -550,7 +564,7 @@ static void *bpf_array_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
index = info->index & array->index_mask;
if (info->percpu_value_buf)
return array->pptrs[index];
- return array->value + array->elem_size * index;
+ return array_map_elem_ptr(array, index);
}

static int __bpf_array_map_seq_show(struct seq_file *seq, void *v)
@@ -665,7 +679,7 @@ static int bpf_for_each_array_elem(struct bpf_map *map, bpf_callback_t callback_
if (is_percpu)
val = this_cpu_ptr(array->pptrs[i]);
else
- val = array->value + array->elem_size * i;
+ val = array_map_elem_ptr(array, i);
num_elems++;
key = i;
ret = callback_fn((u64)(long)map, (u64)(long)&key,
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 21069dbe9138..310b0591d91f 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -32,15 +32,15 @@ struct bpf_struct_ops_map {
const struct bpf_struct_ops *st_ops;
/* protect map_update */
struct mutex lock;
- /* progs has all the bpf_prog that is populated
+ /* link has all the bpf_links that is populated
* to the func ptr of the kernel's struct
* (in kvalue.data).
*/
- struct bpf_prog **progs;
+ struct bpf_link **links;
/* image is a page that has all the trampolines
* that stores the func args before calling the bpf_prog.
* A PAGE_SIZE "image" is enough to store all trampoline for
- * "progs[]".
+ * "links[]".
*/
void *image;
/* uvalue->data stores the kernel struct
@@ -282,9 +282,9 @@ static void bpf_struct_ops_map_put_progs(struct bpf_struct_ops_map *st_map)
u32 i;

for (i = 0; i < btf_type_vlen(t); i++) {
- if (st_map->progs[i]) {
- bpf_prog_put(st_map->progs[i]);
- st_map->progs[i] = NULL;
+ if (st_map->links[i]) {
+ bpf_link_put(st_map->links[i]);
+ st_map->links[i] = NULL;
}
}
}
@@ -315,18 +315,34 @@ static int check_zero_holes(const struct btf_type *t, void *data)
return 0;
}

-int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_progs *tprogs,
- struct bpf_prog *prog,
+static void bpf_struct_ops_link_release(struct bpf_link *link)
+{
+}
+
+static void bpf_struct_ops_link_dealloc(struct bpf_link *link)
+{
+ struct bpf_tramp_link *tlink = container_of(link, struct bpf_tramp_link, link);
+
+ kfree(tlink);
+}
+
+const struct bpf_link_ops bpf_struct_ops_link_lops = {
+ .release = bpf_struct_ops_link_release,
+ .dealloc = bpf_struct_ops_link_dealloc,
+};
+
+int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
+ struct bpf_tramp_link *link,
const struct btf_func_model *model,
void *image, void *image_end)
{
u32 flags;

- tprogs[BPF_TRAMP_FENTRY].progs[0] = prog;
- tprogs[BPF_TRAMP_FENTRY].nr_progs = 1;
+ tlinks[BPF_TRAMP_FENTRY].links[0] = link;
+ tlinks[BPF_TRAMP_FENTRY].nr_links = 1;
flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0;
return arch_prepare_bpf_trampoline(NULL, image, image_end,
- model, flags, tprogs, NULL);
+ model, flags, tlinks, NULL);
}

static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
@@ -337,7 +353,7 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
struct bpf_struct_ops_value *uvalue, *kvalue;
const struct btf_member *member;
const struct btf_type *t = st_ops->type;
- struct bpf_tramp_progs *tprogs = NULL;
+ struct bpf_tramp_links *tlinks = NULL;
void *udata, *kdata;
int prog_fd, err = 0;
void *image, *image_end;
@@ -361,8 +377,8 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
if (uvalue->state || refcount_read(&uvalue->refcnt))
return -EINVAL;

- tprogs = kcalloc(BPF_TRAMP_MAX, sizeof(*tprogs), GFP_KERNEL);
- if (!tprogs)
+ tlinks = kcalloc(BPF_TRAMP_MAX, sizeof(*tlinks), GFP_KERNEL);
+ if (!tlinks)
return -ENOMEM;

uvalue = (struct bpf_struct_ops_value *)st_map->uvalue;
@@ -385,6 +401,7 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
for_each_member(i, t, member) {
const struct btf_type *mtype, *ptype;
struct bpf_prog *prog;
+ struct bpf_tramp_link *link;
u32 moff;

moff = __btf_member_bit_offset(t, member) / 8;
@@ -438,16 +455,26 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
err = PTR_ERR(prog);
goto reset_unlock;
}
- st_map->progs[i] = prog;

if (prog->type != BPF_PROG_TYPE_STRUCT_OPS ||
prog->aux->attach_btf_id != st_ops->type_id ||
prog->expected_attach_type != i) {
+ bpf_prog_put(prog);
err = -EINVAL;
goto reset_unlock;
}

- err = bpf_struct_ops_prepare_trampoline(tprogs, prog,
+ link = kzalloc(sizeof(*link), GFP_USER);
+ if (!link) {
+ bpf_prog_put(prog);
+ err = -ENOMEM;
+ goto reset_unlock;
+ }
+ bpf_link_init(&link->link, BPF_LINK_TYPE_STRUCT_OPS,
+ &bpf_struct_ops_link_lops, prog);
+ st_map->links[i] = &link->link;
+
+ err = bpf_struct_ops_prepare_trampoline(tlinks, link,
&st_ops->func_models[i],
image, image_end);
if (err < 0)
@@ -490,7 +517,7 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
memset(uvalue, 0, map->value_size);
memset(kvalue, 0, map->value_size);
unlock:
- kfree(tprogs);
+ kfree(tlinks);
mutex_unlock(&st_map->lock);
return err;
}
@@ -545,9 +572,9 @@ static void bpf_struct_ops_map_free(struct bpf_map *map)
{
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;

- if (st_map->progs)
+ if (st_map->links)
bpf_struct_ops_map_put_progs(st_map);
- bpf_map_area_free(st_map->progs);
+ bpf_map_area_free(st_map->links);
bpf_jit_free_exec(st_map->image);
bpf_map_area_free(st_map->uvalue);
bpf_map_area_free(st_map);
@@ -596,11 +623,11 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
map = &st_map->map;

st_map->uvalue = bpf_map_area_alloc(vt->size, NUMA_NO_NODE);
- st_map->progs =
- bpf_map_area_alloc(btf_type_vlen(t) * sizeof(struct bpf_prog *),
+ st_map->links =
+ bpf_map_area_alloc(btf_type_vlen(t) * sizeof(struct bpf_links *),
NUMA_NO_NODE);
st_map->image = bpf_jit_alloc_exec(PAGE_SIZE);
- if (!st_map->uvalue || !st_map->progs || !st_map->image) {
+ if (!st_map->uvalue || !st_map->links || !st_map->image) {
bpf_struct_ops_map_free(map);
return ERR_PTR(-ENOMEM);
}
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index feef799884d1..7a593ecfbeec 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -207,12 +207,18 @@ enum btf_kfunc_hook {

enum {
BTF_KFUNC_SET_MAX_CNT = 32,
+ BTF_DTOR_KFUNC_MAX_CNT = 256,
};

struct btf_kfunc_set_tab {
struct btf_id_set *sets[BTF_KFUNC_HOOK_MAX][BTF_KFUNC_TYPE_MAX];
};

+struct btf_id_dtor_kfunc_tab {
+ u32 cnt;
+ struct btf_id_dtor_kfunc dtors[];
+};
+
struct btf {
void *data;
struct btf_type **types;
@@ -228,6 +234,7 @@ struct btf {
u32 id;
struct rcu_head rcu;
struct btf_kfunc_set_tab *kfunc_set_tab;
+ struct btf_id_dtor_kfunc_tab *dtor_kfunc_tab;

/* split BTF support */
struct btf *base_btf;
@@ -1616,8 +1623,19 @@ static void btf_free_kfunc_set_tab(struct btf *btf)
btf->kfunc_set_tab = NULL;
}

+static void btf_free_dtor_kfunc_tab(struct btf *btf)
+{
+ struct btf_id_dtor_kfunc_tab *tab = btf->dtor_kfunc_tab;
+
+ if (!tab)
+ return;
+ kfree(tab);
+ btf->dtor_kfunc_tab = NULL;
+}
+
static void btf_free(struct btf *btf)
{
+ btf_free_dtor_kfunc_tab(btf);
btf_free_kfunc_set_tab(btf);
kvfree(btf->types);
kvfree(btf->resolved_sizes);
@@ -3163,24 +3181,86 @@ static void btf_struct_log(struct btf_verifier_env *env,
btf_verifier_log(env, "size=%u vlen=%u", t->size, btf_type_vlen(t));
}

+enum btf_field_type {
+ BTF_FIELD_SPIN_LOCK,
+ BTF_FIELD_TIMER,
+ BTF_FIELD_KPTR,
+};
+
+enum {
+ BTF_FIELD_IGNORE = 0,
+ BTF_FIELD_FOUND = 1,
+};
+
+struct btf_field_info {
+ u32 type_id;
+ u32 off;
+ enum bpf_kptr_type type;
+};
+
+static int btf_find_struct(const struct btf *btf, const struct btf_type *t,
+ u32 off, int sz, struct btf_field_info *info)
+{
+ if (!__btf_type_is_struct(t))
+ return BTF_FIELD_IGNORE;
+ if (t->size != sz)
+ return BTF_FIELD_IGNORE;
+ info->off = off;
+ return BTF_FIELD_FOUND;
+}
+
+static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
+ u32 off, int sz, struct btf_field_info *info)
+{
+ enum bpf_kptr_type type;
+ u32 res_id;
+
+ /* For PTR, sz is always == 8 */
+ if (!btf_type_is_ptr(t))
+ return BTF_FIELD_IGNORE;
+ t = btf_type_by_id(btf, t->type);
+
+ if (!btf_type_is_type_tag(t))
+ return BTF_FIELD_IGNORE;
+ /* Reject extra tags */
+ if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
+ return -EINVAL;
+ if (!strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
+ type = BPF_KPTR_UNREF;
+ else if (!strcmp("kptr_ref", __btf_name_by_offset(btf, t->name_off)))
+ type = BPF_KPTR_REF;
+ else
+ return -EINVAL;
+
+ /* Get the base type */
+ t = btf_type_skip_modifiers(btf, t->type, &res_id);
+ /* Only pointer to struct is allowed */
+ if (!__btf_type_is_struct(t))
+ return -EINVAL;
+
+ info->type_id = res_id;
+ info->off = off;
+ info->type = type;
+ return BTF_FIELD_FOUND;
+}
+
static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t,
- const char *name, int sz, int align)
+ const char *name, int sz, int align,
+ enum btf_field_type field_type,
+ struct btf_field_info *info, int info_cnt)
{
const struct btf_member *member;
- u32 i, off = -ENOENT;
+ struct btf_field_info tmp;
+ int ret, idx = 0;
+ u32 i, off;

for_each_member(i, t, member) {
const struct btf_type *member_type = btf_type_by_id(btf,
member->type);
- if (!__btf_type_is_struct(member_type))
- continue;
- if (member_type->size != sz)
- continue;
- if (strcmp(__btf_name_by_offset(btf, member_type->name_off), name))
+
+ if (name && strcmp(__btf_name_by_offset(btf, member_type->name_off), name))
continue;
- if (off != -ENOENT)
- /* only one such field is allowed */
- return -E2BIG;
+
off = __btf_member_bit_offset(t, member);
if (off % 8)
/* valid C code cannot generate such BTF */
@@ -3188,46 +3268,115 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t
off /= 8;
if (off % align)
return -EINVAL;
+
+ switch (field_type) {
+ case BTF_FIELD_SPIN_LOCK:
+ case BTF_FIELD_TIMER:
+ ret = btf_find_struct(btf, member_type, off, sz,
+ idx < info_cnt ? &info[idx] : &tmp);
+ if (ret < 0)
+ return ret;
+ break;
+ case BTF_FIELD_KPTR:
+ ret = btf_find_kptr(btf, member_type, off, sz,
+ idx < info_cnt ? &info[idx] : &tmp);
+ if (ret < 0)
+ return ret;
+ break;
+ default:
+ return -EFAULT;
+ }
+
+ if (ret == BTF_FIELD_IGNORE)
+ continue;
+ if (idx >= info_cnt)
+ return -E2BIG;
+ ++idx;
}
- return off;
+ return idx;
}

static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
- const char *name, int sz, int align)
+ const char *name, int sz, int align,
+ enum btf_field_type field_type,
+ struct btf_field_info *info, int info_cnt)
{
const struct btf_var_secinfo *vsi;
- u32 i, off = -ENOENT;
+ struct btf_field_info tmp;
+ int ret, idx = 0;
+ u32 i, off;

for_each_vsi(i, t, vsi) {
const struct btf_type *var = btf_type_by_id(btf, vsi->type);
const struct btf_type *var_type = btf_type_by_id(btf, var->type);

- if (!__btf_type_is_struct(var_type))
- continue;
- if (var_type->size != sz)
+ off = vsi->offset;
+
+ if (name && strcmp(__btf_name_by_offset(btf, var_type->name_off), name))
continue;
if (vsi->size != sz)
continue;
- if (strcmp(__btf_name_by_offset(btf, var_type->name_off), name))
- continue;
- if (off != -ENOENT)
- /* only one such field is allowed */
- return -E2BIG;
- off = vsi->offset;
if (off % align)
return -EINVAL;
+
+ switch (field_type) {
+ case BTF_FIELD_SPIN_LOCK:
+ case BTF_FIELD_TIMER:
+ ret = btf_find_struct(btf, var_type, off, sz,
+ idx < info_cnt ? &info[idx] : &tmp);
+ if (ret < 0)
+ return ret;
+ break;
+ case BTF_FIELD_KPTR:
+ ret = btf_find_kptr(btf, var_type, off, sz,
+ idx < info_cnt ? &info[idx] : &tmp);
+ if (ret < 0)
+ return ret;
+ break;
+ default:
+ return -EFAULT;
+ }
+
+ if (ret == BTF_FIELD_IGNORE)
+ continue;
+ if (idx >= info_cnt)
+ return -E2BIG;
+ ++idx;
}
- return off;
+ return idx;
}

static int btf_find_field(const struct btf *btf, const struct btf_type *t,
- const char *name, int sz, int align)
+ enum btf_field_type field_type,
+ struct btf_field_info *info, int info_cnt)
{
+ const char *name;
+ int sz, align;
+
+ switch (field_type) {
+ case BTF_FIELD_SPIN_LOCK:
+ name = "bpf_spin_lock";
+ sz = sizeof(struct bpf_spin_lock);
+ align = __alignof__(struct bpf_spin_lock);
+ break;
+ case BTF_FIELD_TIMER:
+ name = "bpf_timer";
+ sz = sizeof(struct bpf_timer);
+ align = __alignof__(struct bpf_timer);
+ break;
+ case BTF_FIELD_KPTR:
+ name = NULL;
+ sz = sizeof(u64);
+ align = 8;
+ break;
+ default:
+ return -EFAULT;
+ }

if (__btf_type_is_struct(t))
- return btf_find_struct_field(btf, t, name, sz, align);
+ return btf_find_struct_field(btf, t, name, sz, align, field_type, info, info_cnt);
else if (btf_type_is_datasec(t))
- return btf_find_datasec_var(btf, t, name, sz, align);
+ return btf_find_datasec_var(btf, t, name, sz, align, field_type, info, info_cnt);
return -EINVAL;
}

@@ -3237,16 +3386,130 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
*/
int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t)
{
- return btf_find_field(btf, t, "bpf_spin_lock",
- sizeof(struct bpf_spin_lock),
- __alignof__(struct bpf_spin_lock));
+ struct btf_field_info info;
+ int ret;
+
+ ret = btf_find_field(btf, t, BTF_FIELD_SPIN_LOCK, &info, 1);
+ if (ret < 0)
+ return ret;
+ if (!ret)
+ return -ENOENT;
+ return info.off;
}

int btf_find_timer(const struct btf *btf, const struct btf_type *t)
{
- return btf_find_field(btf, t, "bpf_timer",
- sizeof(struct bpf_timer),
- __alignof__(struct bpf_timer));
+ struct btf_field_info info;
+ int ret;
+
+ ret = btf_find_field(btf, t, BTF_FIELD_TIMER, &info, 1);
+ if (ret < 0)
+ return ret;
+ if (!ret)
+ return -ENOENT;
+ return info.off;
+}
+
+struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
+ const struct btf_type *t)
+{
+ struct btf_field_info info_arr[BPF_MAP_VALUE_OFF_MAX];
+ struct bpf_map_value_off *tab;
+ struct btf *kernel_btf = NULL;
+ struct module *mod = NULL;
+ int ret, i, nr_off;
+
+ ret = btf_find_field(btf, t, BTF_FIELD_KPTR, info_arr, ARRAY_SIZE(info_arr));
+ if (ret < 0)
+ return ERR_PTR(ret);
+ if (!ret)
+ return NULL;
+
+ nr_off = ret;
+ tab = kzalloc(offsetof(struct bpf_map_value_off, off[nr_off]), GFP_KERNEL | __GFP_NOWARN);
+ if (!tab)
+ return ERR_PTR(-ENOMEM);
+
+ for (i = 0; i < nr_off; i++) {
+ const struct btf_type *t;
+ s32 id;
+
+ /* Find type in map BTF, and use it to look up the matching type
+ * in vmlinux or module BTFs, by name and kind.
+ */
+ t = btf_type_by_id(btf, info_arr[i].type_id);
+ id = bpf_find_btf_id(__btf_name_by_offset(btf, t->name_off), BTF_INFO_KIND(t->info),
+ &kernel_btf);
+ if (id < 0) {
+ ret = id;
+ goto end;
+ }
+
+ /* Find and stash the function pointer for the destruction function that
+ * needs to be eventually invoked from the map free path.
+ */
+ if (info_arr[i].type == BPF_KPTR_REF) {
+ const struct btf_type *dtor_func;
+ const char *dtor_func_name;
+ unsigned long addr;
+ s32 dtor_btf_id;
+
+ /* This call also serves as a whitelist of allowed objects that
+ * can be used as a referenced pointer and be stored in a map at
+ * the same time.
+ */
+ dtor_btf_id = btf_find_dtor_kfunc(kernel_btf, id);
+ if (dtor_btf_id < 0) {
+ ret = dtor_btf_id;
+ goto end_btf;
+ }
+
+ dtor_func = btf_type_by_id(kernel_btf, dtor_btf_id);
+ if (!dtor_func) {
+ ret = -ENOENT;
+ goto end_btf;
+ }
+
+ if (btf_is_module(kernel_btf)) {
+ mod = btf_try_get_module(kernel_btf);
+ if (!mod) {
+ ret = -ENXIO;
+ goto end_btf;
+ }
+ }
+
+ /* We already verified dtor_func to be btf_type_is_func
+ * in register_btf_id_dtor_kfuncs.
+ */
+ dtor_func_name = __btf_name_by_offset(kernel_btf, dtor_func->name_off);
+ addr = kallsyms_lookup_name(dtor_func_name);
+ if (!addr) {
+ ret = -EINVAL;
+ goto end_mod;
+ }
+ tab->off[i].kptr.dtor = (void *)addr;
+ }
+
+ tab->off[i].offset = info_arr[i].off;
+ tab->off[i].type = info_arr[i].type;
+ tab->off[i].kptr.btf_id = id;
+ tab->off[i].kptr.btf = kernel_btf;
+ tab->off[i].kptr.module = mod;
+ }
+ tab->nr_off = nr_off;
+ return tab;
+end_mod:
+ module_put(mod);
+end_btf:
+ btf_put(kernel_btf);
+end:
+ while (i--) {
+ btf_put(tab->off[i].kptr.btf);
+ if (tab->off[i].kptr.module)
+ module_put(tab->off[i].kptr.module);
+ }
+ kfree(tab);
+ return ERR_PTR(ret);
}

static void __btf_struct_show(const struct btf *btf, const struct btf_type *t,
@@ -5811,6 +6074,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
* verifier sees.
*/
for (i = 0; i < nargs; i++) {
+ enum bpf_arg_type arg_type = ARG_DONTCARE;
u32 regno = i + 1;
struct bpf_reg_state *reg = &regs[regno];

@@ -5831,7 +6095,9 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
ref_tname = btf_name_by_offset(btf, ref_t->name_off);

- ret = check_func_arg_reg_off(env, reg, regno, ARG_DONTCARE, rel);
+ if (rel && reg->ref_obj_id)
+ arg_type |= OBJ_RELEASE;
+ ret = check_func_arg_reg_off(env, reg, regno, arg_type);
if (ret < 0)
return ret;

@@ -5862,11 +6128,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
if (reg->type == PTR_TO_BTF_ID) {
reg_btf = reg->btf;
reg_ref_id = reg->btf_id;
- /* Ensure only one argument is referenced
- * PTR_TO_BTF_ID, check_func_arg_reg_off relies
- * on only one referenced register being allowed
- * for kfuncs.
- */
+ /* Ensure only one argument is referenced PTR_TO_BTF_ID */
if (reg->ref_obj_id) {
if (ref_obj_id) {
bpf_log(log, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
@@ -6832,6 +7094,138 @@ int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
}
EXPORT_SYMBOL_GPL(register_btf_kfunc_id_set);

+s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id)
+{
+ struct btf_id_dtor_kfunc_tab *tab = btf->dtor_kfunc_tab;
+ struct btf_id_dtor_kfunc *dtor;
+
+ if (!tab)
+ return -ENOENT;
+ /* Even though the size of tab->dtors[0] is > sizeof(u32), we only need
+ * to compare the first u32 with btf_id, so we can reuse btf_id_cmp_func.
+ */
+ BUILD_BUG_ON(offsetof(struct btf_id_dtor_kfunc, btf_id) != 0);
+ dtor = bsearch(&btf_id, tab->dtors, tab->cnt, sizeof(tab->dtors[0]), btf_id_cmp_func);
+ if (!dtor)
+ return -ENOENT;
+ return dtor->kfunc_btf_id;
+}
+
+static int btf_check_dtor_kfuncs(struct btf *btf, const struct btf_id_dtor_kfunc *dtors, u32 cnt)
+{
+ const struct btf_type *dtor_func, *dtor_func_proto, *t;
+ const struct btf_param *args;
+ s32 dtor_btf_id;
+ u32 nr_args, i;
+
+ for (i = 0; i < cnt; i++) {
+ dtor_btf_id = dtors[i].kfunc_btf_id;
+
+ dtor_func = btf_type_by_id(btf, dtor_btf_id);
+ if (!dtor_func || !btf_type_is_func(dtor_func))
+ return -EINVAL;
+
+ dtor_func_proto = btf_type_by_id(btf, dtor_func->type);
+ if (!dtor_func_proto || !btf_type_is_func_proto(dtor_func_proto))
+ return -EINVAL;
+
+ /* Make sure the prototype of the destructor kfunc is 'void func(type *)' */
+ t = btf_type_by_id(btf, dtor_func_proto->type);
+ if (!t || !btf_type_is_void(t))
+ return -EINVAL;
+
+ nr_args = btf_type_vlen(dtor_func_proto);
+ if (nr_args != 1)
+ return -EINVAL;
+ args = btf_params(dtor_func_proto);
+ t = btf_type_by_id(btf, args[0].type);
+ /* Allow any pointer type, as width on targets Linux supports
+ * will be same for all pointer types (i.e. sizeof(void *))
+ */
+ if (!t || !btf_type_is_ptr(t))
+ return -EINVAL;
+ }
+ return 0;
+}
+
+/* This function must be invoked only from initcalls/module init functions */
+int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
+ struct module *owner)
+{
+ struct btf_id_dtor_kfunc_tab *tab;
+ struct btf *btf;
+ u32 tab_cnt;
+ int ret;
+
+ btf = btf_get_module_btf(owner);
+ if (!btf) {
+ if (!owner && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) {
+ pr_err("missing vmlinux BTF, cannot register dtor kfuncs\n");
+ return -ENOENT;
+ }
+ if (owner && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES)) {
+ pr_err("missing module BTF, cannot register dtor kfuncs\n");
+ return -ENOENT;
+ }
+ return 0;
+ }
+ if (IS_ERR(btf))
+ return PTR_ERR(btf);
+
+ if (add_cnt >= BTF_DTOR_KFUNC_MAX_CNT) {
+ pr_err("cannot register more than %d kfunc destructors\n", BTF_DTOR_KFUNC_MAX_CNT);
+ ret = -E2BIG;
+ goto end;
+ }
+
+ /* Ensure that the prototype of dtor kfuncs being registered is sane */
+ ret = btf_check_dtor_kfuncs(btf, dtors, add_cnt);
+ if (ret < 0)
+ goto end;
+
+ tab = btf->dtor_kfunc_tab;
+ /* Only one call allowed for modules */
+ if (WARN_ON_ONCE(tab && btf_is_module(btf))) {
+ ret = -EINVAL;
+ goto end;
+ }
+
+ tab_cnt = tab ? tab->cnt : 0;
+ if (tab_cnt > U32_MAX - add_cnt) {
+ ret = -EOVERFLOW;
+ goto end;
+ }
+ if (tab_cnt + add_cnt >= BTF_DTOR_KFUNC_MAX_CNT) {
+ pr_err("cannot register more than %d kfunc destructors\n", BTF_DTOR_KFUNC_MAX_CNT);
+ ret = -E2BIG;
+ goto end;
+ }
+
+ tab = krealloc(btf->dtor_kfunc_tab,
+ offsetof(struct btf_id_dtor_kfunc_tab, dtors[tab_cnt + add_cnt]),
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!tab) {
+ ret = -ENOMEM;
+ goto end;
+ }
+
+ if (!btf->dtor_kfunc_tab)
+ tab->cnt = 0;
+ btf->dtor_kfunc_tab = tab;
+
+ memcpy(tab->dtors + tab->cnt, dtors, add_cnt * sizeof(tab->dtors[0]));
+ tab->cnt += add_cnt;
+
+ sort(tab->dtors, tab->cnt, sizeof(tab->dtors[0]), btf_id_cmp_func, NULL);
+
+ return 0;
+end:
+ btf_free_dtor_kfunc_tab(btf);
+ btf_put(btf);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(register_btf_id_dtor_kfuncs);
+
#define MAX_TYPES_ARE_COMPAT_DEPTH 2

static
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 0cb6211fcb58..e7af3e582b4c 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -747,6 +747,60 @@ static struct bpf_prog_list *find_detach_entry(struct list_head *progs,
return ERR_PTR(-ENOENT);
}

+/**
+ * purge_effective_progs() - After compute_effective_progs fails to alloc new
+ * cgrp->bpf.inactive table we can recover by
+ * recomputing the array in place.
+ *
+ * @cgrp: The cgroup which descendants to travers
+ * @prog: A program to detach or NULL
+ * @link: A link to detach or NULL
+ * @atype: Type of detach operation
+ */
+static void purge_effective_progs(struct cgroup *cgrp, struct bpf_prog *prog,
+ struct bpf_cgroup_link *link,
+ enum cgroup_bpf_attach_type atype)
+{
+ struct cgroup_subsys_state *css;
+ struct bpf_prog_array *progs;
+ struct bpf_prog_list *pl;
+ struct list_head *head;
+ struct cgroup *cg;
+ int pos;
+
+ /* recompute effective prog array in place */
+ css_for_each_descendant_pre(css, &cgrp->self) {
+ struct cgroup *desc = container_of(css, struct cgroup, self);
+
+ if (percpu_ref_is_zero(&desc->bpf.refcnt))
+ continue;
+
+ /* find position of link or prog in effective progs array */
+ for (pos = 0, cg = desc; cg; cg = cgroup_parent(cg)) {
+ if (pos && !(cg->bpf.flags[atype] & BPF_F_ALLOW_MULTI))
+ continue;
+
+ head = &cg->bpf.progs[atype];
+ list_for_each_entry(pl, head, node) {
+ if (!prog_list_prog(pl))
+ continue;
+ if (pl->prog == prog && pl->link == link)
+ goto found;
+ pos++;
+ }
+ }
+found:
+ BUG_ON(!cg);
+ progs = rcu_dereference_protected(
+ desc->bpf.effective[atype],
+ lockdep_is_held(&cgroup_mutex));
+
+ /* Remove the program from the array */
+ WARN_ONCE(bpf_prog_array_delete_safe_at(progs, pos),
+ "Failed to purge a prog from array at index %d", pos);
+ }
+}
+
/**
* __cgroup_bpf_detach() - Detach the program or link from a cgroup, and
* propagate the change to descendants
@@ -766,7 +820,6 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
struct bpf_prog_list *pl;
struct list_head *progs;
u32 flags;
- int err;

atype = to_cgroup_bpf_attach_type(type);
if (atype < 0)
@@ -788,9 +841,12 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
pl->prog = NULL;
pl->link = NULL;

- err = update_effective_progs(cgrp, atype);
- if (err)
- goto cleanup;
+ if (update_effective_progs(cgrp, atype)) {
+ /* if update effective array failed replace the prog with a dummy prog*/
+ pl->prog = old_prog;
+ pl->link = link;
+ purge_effective_progs(cgrp, old_prog, link, atype);
+ }

/* now can actually delete it from this cgroup list */
list_del(&pl->node);
@@ -802,12 +858,6 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
bpf_prog_put(old_prog);
static_branch_dec(&cgroup_bpf_enabled_key[atype]);
return 0;
-
-cleanup:
- /* restore back prog or link */
- pl->prog = old_prog;
- pl->link = link;
- return err;
}

static int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 3adff3831c04..483bee45ead5 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -649,12 +649,6 @@ static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
return fp->jited && !bpf_prog_was_classic(fp);
}

-static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
-{
- return list_empty(&fp->aux->ksym.lnode) ||
- fp->aux->ksym.lnode.prev == LIST_POISON2;
-}
-
void bpf_prog_kallsyms_add(struct bpf_prog *fp)
{
if (!bpf_prog_kallsyms_candidate(fp) ||
@@ -1149,7 +1143,6 @@ int bpf_jit_binary_pack_finalize(struct bpf_prog *prog,
bpf_prog_pack_free(ro_header);
return PTR_ERR(ptr);
}
- prog->aux->use_bpf_prog_pack = true;
return 0;
}

@@ -1173,17 +1166,23 @@ void bpf_jit_binary_pack_free(struct bpf_binary_header *ro_header,
bpf_jit_uncharge_modmem(size);
}

+struct bpf_binary_header *
+bpf_jit_binary_pack_hdr(const struct bpf_prog *fp)
+{
+ unsigned long real_start = (unsigned long)fp->bpf_func;
+ unsigned long addr;
+
+ addr = real_start & BPF_PROG_CHUNK_MASK;
+ return (void *)addr;
+}
+
static inline struct bpf_binary_header *
bpf_jit_binary_hdr(const struct bpf_prog *fp)
{
unsigned long real_start = (unsigned long)fp->bpf_func;
unsigned long addr;

- if (fp->aux->use_bpf_prog_pack)
- addr = real_start & BPF_PROG_CHUNK_MASK;
- else
- addr = real_start & PAGE_MASK;
-
+ addr = real_start & PAGE_MASK;
return (void *)addr;
}

@@ -1196,11 +1195,7 @@ void __weak bpf_jit_free(struct bpf_prog *fp)
if (fp->jited) {
struct bpf_binary_header *hdr = bpf_jit_binary_hdr(fp);

- if (fp->aux->use_bpf_prog_pack)
- bpf_jit_binary_pack_free(hdr, NULL /* rw_buffer */);
- else
- bpf_jit_binary_free(hdr);
-
+ bpf_jit_binary_free(hdr);
WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(fp));
}

@@ -2712,6 +2707,12 @@ bool __weak bpf_jit_needs_zext(void)
return false;
}

+/* Return TRUE if the JIT backend supports mixing bpf2bpf and tailcalls. */
+bool __weak bpf_jit_supports_subprog_tailcalls(void)
+{
+ return false;
+}
+
bool __weak bpf_jit_supports_kfunc_call(void)
{
return false;
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 65877967f414..ea99c91f72b6 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -238,7 +238,7 @@ static void htab_free_prealloced_timers(struct bpf_htab *htab)
u32 num_entries = htab->map.max_entries;
int i;

- if (likely(!map_value_has_timer(&htab->map)))
+ if (!map_value_has_timer(&htab->map))
return;
if (htab_has_extra_elems(htab))
num_entries += num_possible_cpus();
@@ -254,6 +254,25 @@ static void htab_free_prealloced_timers(struct bpf_htab *htab)
}
}

+static void htab_free_prealloced_kptrs(struct bpf_htab *htab)
+{
+ u32 num_entries = htab->map.max_entries;
+ int i;
+
+ if (!map_value_has_kptrs(&htab->map))
+ return;
+ if (htab_has_extra_elems(htab))
+ num_entries += num_possible_cpus();
+
+ for (i = 0; i < num_entries; i++) {
+ struct htab_elem *elem;
+
+ elem = get_htab_elem(htab, i);
+ bpf_map_free_kptrs(&htab->map, elem->key + round_up(htab->map.key_size, 8));
+ cond_resched();
+ }
+}
+
static void htab_free_elems(struct bpf_htab *htab)
{
int i;
@@ -725,12 +744,15 @@ static int htab_lru_map_gen_lookup(struct bpf_map *map,
return insn - insn_buf;
}

-static void check_and_free_timer(struct bpf_htab *htab, struct htab_elem *elem)
+static void check_and_free_fields(struct bpf_htab *htab,
+ struct htab_elem *elem)
{
- if (unlikely(map_value_has_timer(&htab->map)))
- bpf_timer_cancel_and_free(elem->key +
- round_up(htab->map.key_size, 8) +
- htab->map.timer_off);
+ void *map_value = elem->key + round_up(htab->map.key_size, 8);
+
+ if (map_value_has_timer(&htab->map))
+ bpf_timer_cancel_and_free(map_value + htab->map.timer_off);
+ if (map_value_has_kptrs(&htab->map))
+ bpf_map_free_kptrs(&htab->map, map_value);
}

/* It is called from the bpf_lru_list when the LRU needs to delete
@@ -757,7 +779,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
if (l == tgt_l) {
hlist_nulls_del_rcu(&l->hash_node);
- check_and_free_timer(htab, l);
+ check_and_free_fields(htab, l);
break;
}

@@ -829,7 +851,7 @@ static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l)
{
if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH)
free_percpu(htab_elem_get_ptr(l, htab->map.key_size));
- check_and_free_timer(htab, l);
+ check_and_free_fields(htab, l);
kfree(l);
}

@@ -857,7 +879,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
htab_put_fd_value(htab, l);

if (htab_is_prealloc(htab)) {
- check_and_free_timer(htab, l);
+ check_and_free_fields(htab, l);
__pcpu_freelist_push(&htab->freelist, &l->fnode);
} else {
atomic_dec(&htab->count);
@@ -1104,7 +1126,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
if (!htab_is_prealloc(htab))
free_htab_elem(htab, l_old);
else
- check_and_free_timer(htab, l_old);
+ check_and_free_fields(htab, l_old);
}
ret = 0;
err:
@@ -1114,7 +1136,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,

static void htab_lru_push_free(struct bpf_htab *htab, struct htab_elem *elem)
{
- check_and_free_timer(htab, elem);
+ check_and_free_fields(htab, elem);
bpf_lru_push_free(&htab->lru, &elem->lru_node);
}

@@ -1419,8 +1441,14 @@ static void htab_free_malloced_timers(struct bpf_htab *htab)
struct hlist_nulls_node *n;
struct htab_elem *l;

- hlist_nulls_for_each_entry(l, n, head, hash_node)
- check_and_free_timer(htab, l);
+ hlist_nulls_for_each_entry(l, n, head, hash_node) {
+ /* We don't reset or free kptr on uref dropping to zero,
+ * hence just free timer.
+ */
+ bpf_timer_cancel_and_free(l->key +
+ round_up(htab->map.key_size, 8) +
+ htab->map.timer_off);
+ }
cond_resched_rcu();
}
rcu_read_unlock();
@@ -1430,7 +1458,8 @@ static void htab_map_free_timers(struct bpf_map *map)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);

- if (likely(!map_value_has_timer(&htab->map)))
+ /* We don't reset or free kptr on uref dropping to zero. */
+ if (!map_value_has_timer(&htab->map))
return;
if (!htab_is_prealloc(htab))
htab_free_malloced_timers(htab);
@@ -1453,11 +1482,14 @@ static void htab_map_free(struct bpf_map *map)
* not have executed. Wait for them.
*/
rcu_barrier();
- if (!htab_is_prealloc(htab))
+ if (!htab_is_prealloc(htab)) {
delete_all_elements(htab);
- else
+ } else {
+ htab_free_prealloced_kptrs(htab);
prealloc_destroy(htab);
+ }

+ bpf_map_free_kptr_off_tab(map);
free_percpu(htab->extra_elems);
bpf_map_area_free(htab->buckets);
for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 315053ef6a75..3e709fed5306 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1374,6 +1374,28 @@ void bpf_timer_cancel_and_free(void *val)
kfree(t);
}

+BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
+{
+ unsigned long *kptr = map_value;
+
+ return xchg(kptr, (unsigned long)ptr);
+}
+
+/* Unlike other PTR_TO_BTF_ID helpers the btf_id in bpf_kptr_xchg()
+ * helper is determined dynamically by the verifier.
+ */
+#define BPF_PTR_POISON ((void *)((0xeB9FUL << 2) + POISON_POINTER_DELTA))
+
+const struct bpf_func_proto bpf_kptr_xchg_proto = {
+ .func = bpf_kptr_xchg,
+ .gpl_only = false,
+ .ret_type = RET_PTR_TO_BTF_ID_OR_NULL,
+ .ret_btf_id = BPF_PTR_POISON,
+ .arg1_type = ARG_PTR_TO_KPTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL | OBJ_RELEASE,
+ .arg2_btf_id = BPF_PTR_POISON,
+};
+
const struct bpf_func_proto bpf_get_current_task_proto __weak;
const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
const struct bpf_func_proto bpf_probe_read_user_proto __weak;
@@ -1452,6 +1474,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
return &bpf_timer_start_proto;
case BPF_FUNC_timer_cancel:
return &bpf_timer_cancel_proto;
+ case BPF_FUNC_kptr_xchg:
+ return &bpf_kptr_xchg_proto;
default:
break;
}
diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index 5cd8f5277279..135205d0d560 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -52,6 +52,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
inner_map_meta->max_entries = inner_map->max_entries;
inner_map_meta->spin_lock_off = inner_map->spin_lock_off;
inner_map_meta->timer_off = inner_map->timer_off;
+ inner_map_meta->kptr_off_tab = bpf_map_copy_kptr_off_tab(inner_map);
if (inner_map->btf) {
btf_get(inner_map->btf);
inner_map_meta->btf = inner_map->btf;
@@ -71,6 +72,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)

void bpf_map_meta_free(struct bpf_map *map_meta)
{
+ bpf_map_free_kptr_off_tab(map_meta);
btf_put(map_meta->btf);
kfree(map_meta);
}
@@ -83,7 +85,8 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0,
meta0->key_size == meta1->key_size &&
meta0->value_size == meta1->value_size &&
meta0->timer_off == meta1->timer_off &&
- meta0->map_flags == meta1->map_flags;
+ meta0->map_flags == meta1->map_flags &&
+ bpf_map_equal_kptr_off_tab(meta0, meta1);
}

void *bpf_map_fd_get_ptr(struct bpf_map *map,
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 710ba9de12ce..5173fd37590f 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -404,7 +404,7 @@ BPF_CALL_2(bpf_ringbuf_submit, void *, sample, u64, flags)
const struct bpf_func_proto bpf_ringbuf_submit_proto = {
.func = bpf_ringbuf_submit,
.ret_type = RET_VOID,
- .arg1_type = ARG_PTR_TO_ALLOC_MEM,
+ .arg1_type = ARG_PTR_TO_ALLOC_MEM | OBJ_RELEASE,
.arg2_type = ARG_ANYTHING,
};

@@ -417,7 +417,7 @@ BPF_CALL_2(bpf_ringbuf_discard, void *, sample, u64, flags)
const struct bpf_func_proto bpf_ringbuf_discard_proto = {
.func = bpf_ringbuf_discard,
.ret_type = RET_VOID,
- .arg1_type = ARG_PTR_TO_ALLOC_MEM,
+ .arg1_type = ARG_PTR_TO_ALLOC_MEM | OBJ_RELEASE,
.arg2_type = ARG_ANYTHING,
};

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index cdaa1152436a..f4d1f974a8cd 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -6,6 +6,7 @@
#include <linux/bpf_trace.h>
#include <linux/bpf_lirc.h>
#include <linux/bpf_verifier.h>
+#include <linux/bsearch.h>
#include <linux/btf.h>
#include <linux/syscalls.h>
#include <linux/slab.h>
@@ -29,6 +30,7 @@
#include <linux/pgtable.h>
#include <linux/bpf_lsm.h>
#include <linux/poll.h>
+#include <linux/sort.h>
#include <linux/bpf-netns.h>
#include <linux/rcupdate_trace.h>
#include <linux/memcontrol.h>
@@ -473,14 +475,128 @@ static void bpf_map_release_memcg(struct bpf_map *map)
}
#endif

+static int bpf_map_kptr_off_cmp(const void *a, const void *b)
+{
+ const struct bpf_map_value_off_desc *off_desc1 = a, *off_desc2 = b;
+
+ if (off_desc1->offset < off_desc2->offset)
+ return -1;
+ else if (off_desc1->offset > off_desc2->offset)
+ return 1;
+ return 0;
+}
+
+struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u32 offset)
+{
+ /* Since members are iterated in btf_find_field in increasing order,
+ * offsets appended to kptr_off_tab are in increasing order, so we can
+ * do bsearch to find exact match.
+ */
+ struct bpf_map_value_off *tab;
+
+ if (!map_value_has_kptrs(map))
+ return NULL;
+ tab = map->kptr_off_tab;
+ return bsearch(&offset, tab->off, tab->nr_off, sizeof(tab->off[0]), bpf_map_kptr_off_cmp);
+}
+
+void bpf_map_free_kptr_off_tab(struct bpf_map *map)
+{
+ struct bpf_map_value_off *tab = map->kptr_off_tab;
+ int i;
+
+ if (!map_value_has_kptrs(map))
+ return;
+ for (i = 0; i < tab->nr_off; i++) {
+ if (tab->off[i].kptr.module)
+ module_put(tab->off[i].kptr.module);
+ btf_put(tab->off[i].kptr.btf);
+ }
+ kfree(tab);
+ map->kptr_off_tab = NULL;
+}
+
+struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map)
+{
+ struct bpf_map_value_off *tab = map->kptr_off_tab, *new_tab;
+ int size, i;
+
+ if (!map_value_has_kptrs(map))
+ return ERR_PTR(-ENOENT);
+ size = offsetof(struct bpf_map_value_off, off[tab->nr_off]);
+ new_tab = kmemdup(tab, size, GFP_KERNEL | __GFP_NOWARN);
+ if (!new_tab)
+ return ERR_PTR(-ENOMEM);
+ /* Do a deep copy of the kptr_off_tab */
+ for (i = 0; i < tab->nr_off; i++) {
+ btf_get(tab->off[i].kptr.btf);
+ if (tab->off[i].kptr.module && !try_module_get(tab->off[i].kptr.module)) {
+ while (i--) {
+ if (tab->off[i].kptr.module)
+ module_put(tab->off[i].kptr.module);
+ btf_put(tab->off[i].kptr.btf);
+ }
+ kfree(new_tab);
+ return ERR_PTR(-ENXIO);
+ }
+ }
+ return new_tab;
+}
+
+bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b)
+{
+ struct bpf_map_value_off *tab_a = map_a->kptr_off_tab, *tab_b = map_b->kptr_off_tab;
+ bool a_has_kptr = map_value_has_kptrs(map_a), b_has_kptr = map_value_has_kptrs(map_b);
+ int size;
+
+ if (!a_has_kptr && !b_has_kptr)
+ return true;
+ if (a_has_kptr != b_has_kptr)
+ return false;
+ if (tab_a->nr_off != tab_b->nr_off)
+ return false;
+ size = offsetof(struct bpf_map_value_off, off[tab_a->nr_off]);
+ return !memcmp(tab_a, tab_b, size);
+}
+
+/* Caller must ensure map_value_has_kptrs is true. Note that this function can
+ * be called on a map value while the map_value is visible to BPF programs, as
+ * it ensures the correct synchronization, and we already enforce the same using
+ * the bpf_kptr_xchg helper on the BPF program side for referenced kptrs.
+ */
+void bpf_map_free_kptrs(struct bpf_map *map, void *map_value)
+{
+ struct bpf_map_value_off *tab = map->kptr_off_tab;
+ unsigned long *btf_id_ptr;
+ int i;
+
+ for (i = 0; i < tab->nr_off; i++) {
+ struct bpf_map_value_off_desc *off_desc = &tab->off[i];
+ unsigned long old_ptr;
+
+ btf_id_ptr = map_value + off_desc->offset;
+ if (off_desc->type == BPF_KPTR_UNREF) {
+ u64 *p = (u64 *)btf_id_ptr;
+
+ WRITE_ONCE(p, 0);
+ continue;
+ }
+ old_ptr = xchg(btf_id_ptr, 0);
+ off_desc->kptr.dtor((void *)old_ptr);
+ }
+}
+
/* called from workqueue */
static void bpf_map_free_deferred(struct work_struct *work)
{
struct bpf_map *map = container_of(work, struct bpf_map, work);

security_bpf_map_free(map);
+ kfree(map->off_arr);
bpf_map_release_memcg(map);
- /* implementation dependent freeing */
+ /* implementation dependent freeing, map_free callback also does
+ * bpf_map_free_kptr_off_tab, if needed.
+ */
map->ops->map_free(map);
}

@@ -640,7 +756,7 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
int err;

if (!map->ops->map_mmap || map_value_has_spin_lock(map) ||
- map_value_has_timer(map))
+ map_value_has_timer(map) || map_value_has_kptrs(map))
return -ENOTSUPP;

if (!(vma->vm_flags & VM_SHARED))
@@ -767,6 +883,84 @@ int map_check_no_btf(const struct bpf_map *map,
return -ENOTSUPP;
}

+static int map_off_arr_cmp(const void *_a, const void *_b, const void *priv)
+{
+ const u32 a = *(const u32 *)_a;
+ const u32 b = *(const u32 *)_b;
+
+ if (a < b)
+ return -1;
+ else if (a > b)
+ return 1;
+ return 0;
+}
+
+static void map_off_arr_swap(void *_a, void *_b, int size, const void *priv)
+{
+ struct bpf_map *map = (struct bpf_map *)priv;
+ u32 *off_base = map->off_arr->field_off;
+ u32 *a = _a, *b = _b;
+ u8 *sz_a, *sz_b;
+
+ sz_a = map->off_arr->field_sz + (a - off_base);
+ sz_b = map->off_arr->field_sz + (b - off_base);
+
+ swap(*a, *b);
+ swap(*sz_a, *sz_b);
+}
+
+static int bpf_map_alloc_off_arr(struct bpf_map *map)
+{
+ bool has_spin_lock = map_value_has_spin_lock(map);
+ bool has_timer = map_value_has_timer(map);
+ bool has_kptrs = map_value_has_kptrs(map);
+ struct bpf_map_off_arr *off_arr;
+ u32 i;
+
+ if (!has_spin_lock && !has_timer && !has_kptrs) {
+ map->off_arr = NULL;
+ return 0;
+ }
+
+ off_arr = kmalloc(sizeof(*map->off_arr), GFP_KERNEL | __GFP_NOWARN);
+ if (!off_arr)
+ return -ENOMEM;
+ map->off_arr = off_arr;
+
+ off_arr->cnt = 0;
+ if (has_spin_lock) {
+ i = off_arr->cnt;
+
+ off_arr->field_off[i] = map->spin_lock_off;
+ off_arr->field_sz[i] = sizeof(struct bpf_spin_lock);
+ off_arr->cnt++;
+ }
+ if (has_timer) {
+ i = off_arr->cnt;
+
+ off_arr->field_off[i] = map->timer_off;
+ off_arr->field_sz[i] = sizeof(struct bpf_timer);
+ off_arr->cnt++;
+ }
+ if (has_kptrs) {
+ struct bpf_map_value_off *tab = map->kptr_off_tab;
+ u32 *off = &off_arr->field_off[off_arr->cnt];
+ u8 *sz = &off_arr->field_sz[off_arr->cnt];
+
+ for (i = 0; i < tab->nr_off; i++) {
+ *off++ = tab->off[i].offset;
+ *sz++ = sizeof(u64);
+ }
+ off_arr->cnt += tab->nr_off;
+ }
+
+ if (off_arr->cnt == 1)
+ return 0;
+ sort_r(off_arr->field_off, off_arr->cnt, sizeof(off_arr->field_off[0]),
+ map_off_arr_cmp, map_off_arr_swap, map);
+ return 0;
+}
+
static int map_check_btf(struct bpf_map *map, const struct btf *btf,
u32 btf_key_id, u32 btf_value_id)
{
@@ -820,9 +1014,33 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
return -EOPNOTSUPP;
}

- if (map->ops->map_check_btf)
+ map->kptr_off_tab = btf_parse_kptrs(btf, value_type);
+ if (map_value_has_kptrs(map)) {
+ if (!bpf_capable()) {
+ ret = -EPERM;
+ goto free_map_tab;
+ }
+ if (map->map_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG)) {
+ ret = -EACCES;
+ goto free_map_tab;
+ }
+ if (map->map_type != BPF_MAP_TYPE_HASH &&
+ map->map_type != BPF_MAP_TYPE_LRU_HASH &&
+ map->map_type != BPF_MAP_TYPE_ARRAY) {
+ ret = -EOPNOTSUPP;
+ goto free_map_tab;
+ }
+ }
+
+ if (map->ops->map_check_btf) {
ret = map->ops->map_check_btf(map, btf, key_type, value_type);
+ if (ret < 0)
+ goto free_map_tab;
+ }

+ return ret;
+free_map_tab:
+ bpf_map_free_kptr_off_tab(map);
return ret;
}

@@ -912,10 +1130,14 @@ static int map_create(union bpf_attr *attr)
attr->btf_vmlinux_value_type_id;
}

- err = security_bpf_map_alloc(map);
+ err = bpf_map_alloc_off_arr(map);
if (err)
goto free_map;

+ err = security_bpf_map_alloc(map);
+ if (err)
+ goto free_map_off_arr;
+
err = bpf_map_alloc_id(map);
if (err)
goto free_map_sec;
@@ -938,6 +1160,8 @@ static int map_create(union bpf_attr *attr)

free_map_sec:
security_bpf_map_free(map);
+free_map_off_arr:
+ kfree(map->off_arr);
free_map:
btf_put(map->btf);
map->ops->map_free(map);
@@ -1639,7 +1863,7 @@ static int map_freeze(const union bpf_attr *attr)
return PTR_ERR(map);

if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS ||
- map_value_has_timer(map)) {
+ map_value_has_timer(map) || map_value_has_kptrs(map)) {
fdput(f);
return -ENOTSUPP;
}
@@ -2640,19 +2864,12 @@ struct bpf_link *bpf_link_get_from_fd(u32 ufd)
}
EXPORT_SYMBOL(bpf_link_get_from_fd);

-struct bpf_tracing_link {
- struct bpf_link link;
- enum bpf_attach_type attach_type;
- struct bpf_trampoline *trampoline;
- struct bpf_prog *tgt_prog;
-};
-
static void bpf_tracing_link_release(struct bpf_link *link)
{
struct bpf_tracing_link *tr_link =
- container_of(link, struct bpf_tracing_link, link);
+ container_of(link, struct bpf_tracing_link, link.link);

- WARN_ON_ONCE(bpf_trampoline_unlink_prog(link->prog,
+ WARN_ON_ONCE(bpf_trampoline_unlink_prog(&tr_link->link,
tr_link->trampoline));

bpf_trampoline_put(tr_link->trampoline);
@@ -2665,7 +2882,7 @@ static void bpf_tracing_link_release(struct bpf_link *link)
static void bpf_tracing_link_dealloc(struct bpf_link *link)
{
struct bpf_tracing_link *tr_link =
- container_of(link, struct bpf_tracing_link, link);
+ container_of(link, struct bpf_tracing_link, link.link);

kfree(tr_link);
}
@@ -2674,7 +2891,7 @@ static void bpf_tracing_link_show_fdinfo(const struct bpf_link *link,
struct seq_file *seq)
{
struct bpf_tracing_link *tr_link =
- container_of(link, struct bpf_tracing_link, link);
+ container_of(link, struct bpf_tracing_link, link.link);

seq_printf(seq,
"attach_type:\t%d\n",
@@ -2685,7 +2902,7 @@ static int bpf_tracing_link_fill_link_info(const struct bpf_link *link,
struct bpf_link_info *info)
{
struct bpf_tracing_link *tr_link =
- container_of(link, struct bpf_tracing_link, link);
+ container_of(link, struct bpf_tracing_link, link.link);

info->tracing.attach_type = tr_link->attach_type;
bpf_trampoline_unpack_key(tr_link->trampoline->key,
@@ -2766,7 +2983,7 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
err = -ENOMEM;
goto out_put_prog;
}
- bpf_link_init(&link->link, BPF_LINK_TYPE_TRACING,
+ bpf_link_init(&link->link.link, BPF_LINK_TYPE_TRACING,
&bpf_tracing_link_lops, prog);
link->attach_type = prog->expected_attach_type;

@@ -2836,11 +3053,11 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
tgt_prog = prog->aux->dst_prog;
}

- err = bpf_link_prime(&link->link, &link_primer);
+ err = bpf_link_prime(&link->link.link, &link_primer);
if (err)
goto out_unlock;

- err = bpf_trampoline_link_prog(prog, tr);
+ err = bpf_trampoline_link_prog(&link->link, tr);
if (err) {
bpf_link_cleanup(&link_primer);
link = NULL;
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 5d8bfb5ef239..e3bcad5a7c68 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -168,30 +168,30 @@ static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
return ret;
}

-static struct bpf_tramp_progs *
+static struct bpf_tramp_links *
bpf_trampoline_get_progs(const struct bpf_trampoline *tr, int *total, bool *ip_arg)
{
- const struct bpf_prog_aux *aux;
- struct bpf_tramp_progs *tprogs;
- struct bpf_prog **progs;
+ struct bpf_tramp_link *link;
+ struct bpf_tramp_links *tlinks;
+ struct bpf_tramp_link **links;
int kind;

*total = 0;
- tprogs = kcalloc(BPF_TRAMP_MAX, sizeof(*tprogs), GFP_KERNEL);
- if (!tprogs)
+ tlinks = kcalloc(BPF_TRAMP_MAX, sizeof(*tlinks), GFP_KERNEL);
+ if (!tlinks)
return ERR_PTR(-ENOMEM);

for (kind = 0; kind < BPF_TRAMP_MAX; kind++) {
- tprogs[kind].nr_progs = tr->progs_cnt[kind];
+ tlinks[kind].nr_links = tr->progs_cnt[kind];
*total += tr->progs_cnt[kind];
- progs = tprogs[kind].progs;
+ links = tlinks[kind].links;

- hlist_for_each_entry(aux, &tr->progs_hlist[kind], tramp_hlist) {
- *ip_arg |= aux->prog->call_get_func_ip;
- *progs++ = aux->prog;
+ hlist_for_each_entry(link, &tr->progs_hlist[kind], tramp_hlist) {
+ *ip_arg |= link->link.prog->call_get_func_ip;
+ *links++ = link;
}
}
- return tprogs;
+ return tlinks;
}

static void __bpf_tramp_image_put_deferred(struct work_struct *work)
@@ -330,14 +330,14 @@ static struct bpf_tramp_image *bpf_tramp_image_alloc(u64 key, u32 idx)
static int bpf_trampoline_update(struct bpf_trampoline *tr)
{
struct bpf_tramp_image *im;
- struct bpf_tramp_progs *tprogs;
+ struct bpf_tramp_links *tlinks;
u32 flags = BPF_TRAMP_F_RESTORE_REGS;
bool ip_arg = false;
int err, total;

- tprogs = bpf_trampoline_get_progs(tr, &total, &ip_arg);
- if (IS_ERR(tprogs))
- return PTR_ERR(tprogs);
+ tlinks = bpf_trampoline_get_progs(tr, &total, &ip_arg);
+ if (IS_ERR(tlinks))
+ return PTR_ERR(tlinks);

if (total == 0) {
err = unregister_fentry(tr, tr->cur_image->image);
@@ -353,15 +353,15 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
goto out;
}

- if (tprogs[BPF_TRAMP_FEXIT].nr_progs ||
- tprogs[BPF_TRAMP_MODIFY_RETURN].nr_progs)
+ if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
+ tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links)
flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;

if (ip_arg)
flags |= BPF_TRAMP_F_IP_ARG;

err = arch_prepare_bpf_trampoline(im, im->image, im->image + PAGE_SIZE,
- &tr->func.model, flags, tprogs,
+ &tr->func.model, flags, tlinks,
tr->func.addr);
if (err < 0)
goto out;
@@ -381,7 +381,7 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
tr->cur_image = im;
tr->selector++;
out:
- kfree(tprogs);
+ kfree(tlinks);
return err;
}

@@ -407,13 +407,14 @@ static enum bpf_tramp_prog_type bpf_attach_type_to_tramp(struct bpf_prog *prog)
}
}

-int bpf_trampoline_link_prog(struct bpf_prog *prog, struct bpf_trampoline *tr)
+int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr)
{
enum bpf_tramp_prog_type kind;
+ struct bpf_tramp_link *link_exiting;
int err = 0;
int cnt = 0, i;

- kind = bpf_attach_type_to_tramp(prog);
+ kind = bpf_attach_type_to_tramp(link->link.prog);
mutex_lock(&tr->mutex);
if (tr->extension_prog) {
/* cannot attach fentry/fexit if extension prog is attached.
@@ -432,25 +433,33 @@ int bpf_trampoline_link_prog(struct bpf_prog *prog, struct bpf_trampoline *tr)
err = -EBUSY;
goto out;
}
- tr->extension_prog = prog;
+ tr->extension_prog = link->link.prog;
err = bpf_arch_text_poke(tr->func.addr, BPF_MOD_JUMP, NULL,
- prog->bpf_func);
+ link->link.prog->bpf_func);
goto out;
}
- if (cnt >= BPF_MAX_TRAMP_PROGS) {
+ if (cnt >= BPF_MAX_TRAMP_LINKS) {
err = -E2BIG;
goto out;
}
- if (!hlist_unhashed(&prog->aux->tramp_hlist)) {
+ if (!hlist_unhashed(&link->tramp_hlist)) {
/* prog already linked */
err = -EBUSY;
goto out;
}
- hlist_add_head(&prog->aux->tramp_hlist, &tr->progs_hlist[kind]);
+ hlist_for_each_entry(link_exiting, &tr->progs_hlist[kind], tramp_hlist) {
+ if (link_exiting->link.prog != link->link.prog)
+ continue;
+ /* prog already linked */
+ err = -EBUSY;
+ goto out;
+ }
+
+ hlist_add_head(&link->tramp_hlist, &tr->progs_hlist[kind]);
tr->progs_cnt[kind]++;
err = bpf_trampoline_update(tr);
if (err) {
- hlist_del_init(&prog->aux->tramp_hlist);
+ hlist_del_init(&link->tramp_hlist);
tr->progs_cnt[kind]--;
}
out:
@@ -459,12 +468,12 @@ int bpf_trampoline_link_prog(struct bpf_prog *prog, struct bpf_trampoline *tr)
}

/* bpf_trampoline_unlink_prog() should never fail. */
-int bpf_trampoline_unlink_prog(struct bpf_prog *prog, struct bpf_trampoline *tr)
+int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr)
{
enum bpf_tramp_prog_type kind;
int err;

- kind = bpf_attach_type_to_tramp(prog);
+ kind = bpf_attach_type_to_tramp(link->link.prog);
mutex_lock(&tr->mutex);
if (kind == BPF_TRAMP_REPLACE) {
WARN_ON_ONCE(!tr->extension_prog);
@@ -473,7 +482,7 @@ int bpf_trampoline_unlink_prog(struct bpf_prog *prog, struct bpf_trampoline *tr)
tr->extension_prog = NULL;
goto out;
}
- hlist_del_init(&prog->aux->tramp_hlist);
+ hlist_del_init(&link->tramp_hlist);
tr->progs_cnt[kind]--;
err = bpf_trampoline_update(tr);
out:
@@ -641,7 +650,7 @@ void notrace __bpf_tramp_exit(struct bpf_tramp_image *tr)
int __weak
arch_prepare_bpf_trampoline(struct bpf_tramp_image *tr, void *image, void *image_end,
const struct btf_func_model *m, u32 flags,
- struct bpf_tramp_progs *tprogs,
+ struct bpf_tramp_links *tlinks,
void *orig_call)
{
return -ENOTSUPP;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d04147a5efa5..839340d4cc29 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -245,6 +245,7 @@ struct bpf_call_arg_meta {
struct bpf_map *map_ptr;
bool raw_mode;
bool pkt_access;
+ u8 release_regno;
int regno;
int access_size;
int mem_size;
@@ -257,6 +258,7 @@ struct bpf_call_arg_meta {
struct btf *ret_btf;
u32 ret_btf_id;
u32 subprogno;
+ struct bpf_map_value_off_desc *kptr_off_desc;
};

struct btf *btf_vmlinux;
@@ -471,17 +473,6 @@ static bool type_may_be_null(u32 type)
return type & PTR_MAYBE_NULL;
}

-/* Determine whether the function releases some resources allocated by another
- * function call. The first reference type argument will be assumed to be
- * released by release_reference().
- */
-static bool is_release_function(enum bpf_func_id func_id)
-{
- return func_id == BPF_FUNC_sk_release ||
- func_id == BPF_FUNC_ringbuf_submit ||
- func_id == BPF_FUNC_ringbuf_discard;
-}
-
static bool may_be_acquire_function(enum bpf_func_id func_id)
{
return func_id == BPF_FUNC_sk_lookup_tcp ||
@@ -499,7 +490,8 @@ static bool is_acquire_function(enum bpf_func_id func_id,
if (func_id == BPF_FUNC_sk_lookup_tcp ||
func_id == BPF_FUNC_sk_lookup_udp ||
func_id == BPF_FUNC_skc_lookup_tcp ||
- func_id == BPF_FUNC_ringbuf_reserve)
+ func_id == BPF_FUNC_ringbuf_reserve ||
+ func_id == BPF_FUNC_kptr_xchg)
return true;

if (func_id == BPF_FUNC_map_lookup_elem &&
@@ -3210,7 +3202,7 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
return 0;
}

-enum stack_access_src {
+enum bpf_access_src {
ACCESS_DIRECT = 1, /* the access is performed by an instruction */
ACCESS_HELPER = 2, /* the access is performed by a helper */
};
@@ -3218,7 +3210,7 @@ enum stack_access_src {
static int check_stack_range_initialized(struct bpf_verifier_env *env,
int regno, int off, int access_size,
bool zero_size_allowed,
- enum stack_access_src type,
+ enum bpf_access_src type,
struct bpf_call_arg_meta *meta);

static struct bpf_reg_state *reg_state(struct bpf_verifier_env *env, int regno)
@@ -3468,9 +3460,159 @@ static int check_mem_region_access(struct bpf_verifier_env *env, u32 regno,
return 0;
}

+static int __check_ptr_off_reg(struct bpf_verifier_env *env,
+ const struct bpf_reg_state *reg, int regno,
+ bool fixed_off_ok)
+{
+ /* Access to this pointer-typed register or passing it to a helper
+ * is only allowed in its original, unmodified form.
+ */
+
+ if (reg->off < 0) {
+ verbose(env, "negative offset %s ptr R%d off=%d disallowed\n",
+ reg_type_str(env, reg->type), regno, reg->off);
+ return -EACCES;
+ }
+
+ if (!fixed_off_ok && reg->off) {
+ verbose(env, "dereference of modified %s ptr R%d off=%d disallowed\n",
+ reg_type_str(env, reg->type), regno, reg->off);
+ return -EACCES;
+ }
+
+ if (!tnum_is_const(reg->var_off) || reg->var_off.value) {
+ char tn_buf[48];
+
+ tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
+ verbose(env, "variable %s access var_off=%s disallowed\n",
+ reg_type_str(env, reg->type), tn_buf);
+ return -EACCES;
+ }
+
+ return 0;
+}
+
+int check_ptr_off_reg(struct bpf_verifier_env *env,
+ const struct bpf_reg_state *reg, int regno)
+{
+ return __check_ptr_off_reg(env, reg, regno, false);
+}
+
+static int map_kptr_match_type(struct bpf_verifier_env *env,
+ struct bpf_map_value_off_desc *off_desc,
+ struct bpf_reg_state *reg, u32 regno)
+{
+ const char *targ_name = kernel_type_name(off_desc->kptr.btf, off_desc->kptr.btf_id);
+ const char *reg_name = "";
+
+ if (base_type(reg->type) != PTR_TO_BTF_ID || type_flag(reg->type) != PTR_MAYBE_NULL)
+ goto bad_type;
+
+ if (!btf_is_kernel(reg->btf)) {
+ verbose(env, "R%d must point to kernel BTF\n", regno);
+ return -EINVAL;
+ }
+ /* We need to verify reg->type and reg->btf, before accessing reg->btf */
+ reg_name = kernel_type_name(reg->btf, reg->btf_id);
+
+ /* For ref_ptr case, release function check should ensure we get one
+ * referenced PTR_TO_BTF_ID, and that its fixed offset is 0. For the
+ * normal store of unreferenced kptr, we must ensure var_off is zero.
+ * Since ref_ptr cannot be accessed directly by BPF insns, checks for
+ * reg->off and reg->ref_obj_id are not needed here.
+ */
+ if (__check_ptr_off_reg(env, reg, regno, true))
+ return -EACCES;
+
+ /* A full type match is needed, as BTF can be vmlinux or module BTF, and
+ * we also need to take into account the reg->off.
+ *
+ * We want to support cases like:
+ *
+ * struct foo {
+ * struct bar br;
+ * struct baz bz;
+ * };
+ *
+ * struct foo *v;
+ * v = func(); // PTR_TO_BTF_ID
+ * val->foo = v; // reg->off is zero, btf and btf_id match type
+ * val->bar = &v->br; // reg->off is still zero, but we need to retry with
+ * // first member type of struct after comparison fails
+ * val->baz = &v->bz; // reg->off is non-zero, so struct needs to be walked
+ * // to match type
+ *
+ * In the kptr_ref case, check_func_arg_reg_off already ensures reg->off
+ * is zero.
+ */
+ if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
+ off_desc->kptr.btf, off_desc->kptr.btf_id))
+ goto bad_type;
+ return 0;
+bad_type:
+ verbose(env, "invalid kptr access, R%d type=%s%s ", regno,
+ reg_type_str(env, reg->type), reg_name);
+ verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
+ return -EINVAL;
+}
+
+static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
+ int value_regno, int insn_idx,
+ struct bpf_map_value_off_desc *off_desc)
+{
+ struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
+ int class = BPF_CLASS(insn->code);
+ struct bpf_reg_state *val_reg;
+
+ /* Things we already checked for in check_map_access and caller:
+ * - Reject cases where variable offset may touch kptr
+ * - size of access (must be BPF_DW)
+ * - tnum_is_const(reg->var_off)
+ * - off_desc->offset == off + reg->var_off.value
+ */
+ /* Only BPF_[LDX,STX,ST] | BPF_MEM | BPF_DW is supported */
+ if (BPF_MODE(insn->code) != BPF_MEM) {
+ verbose(env, "kptr in map can only be accessed using BPF_MEM instruction mode\n");
+ return -EACCES;
+ }
+
+ /* We cannot directly access kptr_ref */
+ if (off_desc->type == BPF_KPTR_REF) {
+ verbose(env, "accessing referenced kptr disallowed\n");
+ return -EACCES;
+ }
+
+ if (class == BPF_LDX) {
+ val_reg = reg_state(env, value_regno);
+ /* We can simply mark the value_regno receiving the pointer
+ * value from map as PTR_TO_BTF_ID, with the correct type.
+ */
+ mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, off_desc->kptr.btf,
+ off_desc->kptr.btf_id, PTR_MAYBE_NULL);
+ /* For mark_ptr_or_null_reg */
+ val_reg->id = ++env->id_gen;
+ } else if (class == BPF_STX) {
+ val_reg = reg_state(env, value_regno);
+ if (!register_is_null(val_reg) &&
+ map_kptr_match_type(env, off_desc, val_reg, value_regno))
+ return -EACCES;
+ } else if (class == BPF_ST) {
+ if (insn->imm) {
+ verbose(env, "BPF_ST imm must be 0 when storing to kptr at off=%u\n",
+ off_desc->offset);
+ return -EACCES;
+ }
+ } else {
+ verbose(env, "kptr in map can only be accessed using BPF_LDX/BPF_STX/BPF_ST\n");
+ return -EACCES;
+ }
+ return 0;
+}
+
/* check read/write into a map element with possible variable offset */
static int check_map_access(struct bpf_verifier_env *env, u32 regno,
- int off, int size, bool zero_size_allowed)
+ int off, int size, bool zero_size_allowed,
+ enum bpf_access_src src)
{
struct bpf_verifier_state *vstate = env->cur_state;
struct bpf_func_state *state = vstate->frame[vstate->curframe];
@@ -3506,6 +3648,36 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
return -EACCES;
}
}
+ if (map_value_has_kptrs(map)) {
+ struct bpf_map_value_off *tab = map->kptr_off_tab;
+ int i;
+
+ for (i = 0; i < tab->nr_off; i++) {
+ u32 p = tab->off[i].offset;
+
+ if (reg->smin_value + off < p + sizeof(u64) &&
+ p < reg->umax_value + off + size) {
+ if (src != ACCESS_DIRECT) {
+ verbose(env, "kptr cannot be accessed indirectly by helper\n");
+ return -EACCES;
+ }
+ if (!tnum_is_const(reg->var_off)) {
+ verbose(env, "kptr access cannot have variable offset\n");
+ return -EACCES;
+ }
+ if (p != off + reg->var_off.value) {
+ verbose(env, "kptr access misaligned expected=%u off=%llu\n",
+ p, off + reg->var_off.value);
+ return -EACCES;
+ }
+ if (size != bpf_size_to_bytes(BPF_DW)) {
+ verbose(env, "kptr access size must be BPF_DW\n");
+ return -EACCES;
+ }
+ break;
+ }
+ }
+ }
return err;
}

@@ -3979,44 +4151,6 @@ static int get_callee_stack_depth(struct bpf_verifier_env *env,
}
#endif

-static int __check_ptr_off_reg(struct bpf_verifier_env *env,
- const struct bpf_reg_state *reg, int regno,
- bool fixed_off_ok)
-{
- /* Access to this pointer-typed register or passing it to a helper
- * is only allowed in its original, unmodified form.
- */
-
- if (reg->off < 0) {
- verbose(env, "negative offset %s ptr R%d off=%d disallowed\n",
- reg_type_str(env, reg->type), regno, reg->off);
- return -EACCES;
- }
-
- if (!fixed_off_ok && reg->off) {
- verbose(env, "dereference of modified %s ptr R%d off=%d disallowed\n",
- reg_type_str(env, reg->type), regno, reg->off);
- return -EACCES;
- }
-
- if (!tnum_is_const(reg->var_off) || reg->var_off.value) {
- char tn_buf[48];
-
- tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
- verbose(env, "variable %s access var_off=%s disallowed\n",
- reg_type_str(env, reg->type), tn_buf);
- return -EACCES;
- }
-
- return 0;
-}
-
-int check_ptr_off_reg(struct bpf_verifier_env *env,
- const struct bpf_reg_state *reg, int regno)
-{
- return __check_ptr_off_reg(env, reg, regno, false);
-}
-
static int __check_buffer_access(struct bpf_verifier_env *env,
const char *buf_info,
const struct bpf_reg_state *reg,
@@ -4315,7 +4449,7 @@ static int check_stack_slot_within_bounds(int off,
static int check_stack_access_within_bounds(
struct bpf_verifier_env *env,
int regno, int off, int access_size,
- enum stack_access_src src, enum bpf_access_type type)
+ enum bpf_access_src src, enum bpf_access_type type)
{
struct bpf_reg_state *regs = cur_regs(env);
struct bpf_reg_state *reg = regs + regno;
@@ -4411,6 +4545,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
if (value_regno >= 0)
mark_reg_unknown(env, regs, value_regno);
} else if (reg->type == PTR_TO_MAP_VALUE) {
+ struct bpf_map_value_off_desc *kptr_off_desc = NULL;
+
if (t == BPF_WRITE && value_regno >= 0 &&
is_pointer_value(env, value_regno)) {
verbose(env, "R%d leaks addr into map\n", value_regno);
@@ -4419,8 +4555,16 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
err = check_map_access_type(env, regno, off, size, t);
if (err)
return err;
- err = check_map_access(env, regno, off, size, false);
- if (!err && t == BPF_READ && value_regno >= 0) {
+ err = check_map_access(env, regno, off, size, false, ACCESS_DIRECT);
+ if (err)
+ return err;
+ if (tnum_is_const(reg->var_off))
+ kptr_off_desc = bpf_map_kptr_off_contains(reg->map_ptr,
+ off + reg->var_off.value);
+ if (kptr_off_desc) {
+ err = check_map_kptr_access(env, regno, value_regno, insn_idx,
+ kptr_off_desc);
+ } else if (t == BPF_READ && value_regno >= 0) {
struct bpf_map *map = reg->map_ptr;

/* if map is read-only, track its contents as scalars */
@@ -4723,7 +4867,7 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
static int check_stack_range_initialized(
struct bpf_verifier_env *env, int regno, int off,
int access_size, bool zero_size_allowed,
- enum stack_access_src type, struct bpf_call_arg_meta *meta)
+ enum bpf_access_src type, struct bpf_call_arg_meta *meta)
{
struct bpf_reg_state *reg = reg_state(env, regno);
struct bpf_func_state *state = func(env, reg);
@@ -4873,7 +5017,7 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
BPF_READ))
return -EACCES;
return check_map_access(env, regno, reg->off, access_size,
- zero_size_allowed);
+ zero_size_allowed, ACCESS_HELPER);
case PTR_TO_MEM:
if (type_is_rdonly_mem(reg->type)) {
if (meta && meta->raw_mode) {
@@ -5162,6 +5306,53 @@ static int process_timer_func(struct bpf_verifier_env *env, int regno,
return 0;
}

+static int process_kptr_func(struct bpf_verifier_env *env, int regno,
+ struct bpf_call_arg_meta *meta)
+{
+ struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
+ struct bpf_map_value_off_desc *off_desc;
+ struct bpf_map *map_ptr = reg->map_ptr;
+ u32 kptr_off;
+ int ret;
+
+ if (!tnum_is_const(reg->var_off)) {
+ verbose(env,
+ "R%d doesn't have constant offset. kptr has to be at the constant offset\n",
+ regno);
+ return -EINVAL;
+ }
+ if (!map_ptr->btf) {
+ verbose(env, "map '%s' has to have BTF in order to use bpf_kptr_xchg\n",
+ map_ptr->name);
+ return -EINVAL;
+ }
+ if (!map_value_has_kptrs(map_ptr)) {
+ ret = PTR_ERR_OR_ZERO(map_ptr->kptr_off_tab);
+ if (ret == -E2BIG)
+ verbose(env, "map '%s' has more than %d kptr\n", map_ptr->name,
+ BPF_MAP_VALUE_OFF_MAX);
+ else if (ret == -EEXIST)
+ verbose(env, "map '%s' has repeating kptr BTF tags\n", map_ptr->name);
+ else
+ verbose(env, "map '%s' has no valid kptr\n", map_ptr->name);
+ return -EINVAL;
+ }
+
+ meta->map_ptr = map_ptr;
+ kptr_off = reg->off + reg->var_off.value;
+ off_desc = bpf_map_kptr_off_contains(map_ptr, kptr_off);
+ if (!off_desc) {
+ verbose(env, "off=%d doesn't point to kptr\n", kptr_off);
+ return -EACCES;
+ }
+ if (off_desc->type != BPF_KPTR_REF) {
+ verbose(env, "off=%d kptr isn't referenced kptr\n", kptr_off);
+ return -EACCES;
+ }
+ meta->kptr_off_desc = off_desc;
+ return 0;
+}
+
static bool arg_type_is_mem_ptr(enum bpf_arg_type type)
{
return base_type(type) == ARG_PTR_TO_MEM ||
@@ -5185,6 +5376,11 @@ static bool arg_type_is_int_ptr(enum bpf_arg_type type)
type == ARG_PTR_TO_LONG;
}

+static bool arg_type_is_release(enum bpf_arg_type type)
+{
+ return type & OBJ_RELEASE;
+}
+
static int int_ptr_type_to_size(enum bpf_arg_type type)
{
if (type == ARG_PTR_TO_INT)
@@ -5297,6 +5493,7 @@ static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE } };
+static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } };

static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
[ARG_PTR_TO_MAP_KEY] = &map_key_value_types,
@@ -5324,11 +5521,13 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
[ARG_PTR_TO_STACK] = &stack_ptr_types,
[ARG_PTR_TO_CONST_STR] = &const_str_ptr_types,
[ARG_PTR_TO_TIMER] = &timer_types,
+ [ARG_PTR_TO_KPTR] = &kptr_types,
};

static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
enum bpf_arg_type arg_type,
- const u32 *arg_btf_id)
+ const u32 *arg_btf_id,
+ struct bpf_call_arg_meta *meta)
{
struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
enum bpf_reg_type expected, type = reg->type;
@@ -5381,8 +5580,11 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
arg_btf_id = compatible->btf_id;
}

- if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
- btf_vmlinux, *arg_btf_id)) {
+ if (meta->func_id == BPF_FUNC_kptr_xchg) {
+ if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno))
+ return -EACCES;
+ } else if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
+ btf_vmlinux, *arg_btf_id)) {
verbose(env, "R%d is of type %s but %s is expected\n",
regno, kernel_type_name(reg->btf, reg->btf_id),
kernel_type_name(btf_vmlinux, *arg_btf_id));
@@ -5395,11 +5597,10 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,

int check_func_arg_reg_off(struct bpf_verifier_env *env,
const struct bpf_reg_state *reg, int regno,
- enum bpf_arg_type arg_type,
- bool is_release_func)
+ enum bpf_arg_type arg_type)
{
- bool fixed_off_ok = false, release_reg;
enum bpf_reg_type type = reg->type;
+ bool fixed_off_ok = false;

switch ((u32)type) {
case SCALAR_VALUE:
@@ -5417,7 +5618,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
/* Some of the argument types nevertheless require a
* zero register offset.
*/
- if (arg_type != ARG_PTR_TO_ALLOC_MEM)
+ if (base_type(arg_type) != ARG_PTR_TO_ALLOC_MEM)
return 0;
break;
/* All the rest must be rejected, except PTR_TO_BTF_ID which allows
@@ -5425,19 +5626,17 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
*/
case PTR_TO_BTF_ID:
/* When referenced PTR_TO_BTF_ID is passed to release function,
- * it's fixed offset must be 0. We rely on the property that
- * only one referenced register can be passed to BPF helpers and
- * kfuncs. In the other cases, fixed offset can be non-zero.
+ * it's fixed offset must be 0. In the other cases, fixed offset
+ * can be non-zero.
*/
- release_reg = is_release_func && reg->ref_obj_id;
- if (release_reg && reg->off) {
+ if (arg_type_is_release(arg_type) && reg->off) {
verbose(env, "R%d must have zero offset when passed to release func\n",
regno);
return -EINVAL;
}
- /* For release_reg == true, fixed_off_ok must be false, but we
- * already checked and rejected reg->off != 0 above, so set to
- * true to allow fixed offset for all other cases.
+ /* For arg is release pointer, fixed_off_ok must be false, but
+ * we already checked and rejected reg->off != 0 above, so set
+ * to true to allow fixed offset for all other cases.
*/
fixed_off_ok = true;
break;
@@ -5492,18 +5691,28 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
*/
goto skip_type_check;

- err = check_reg_type(env, regno, arg_type, fn->arg_btf_id[arg]);
+ err = check_reg_type(env, regno, arg_type, fn->arg_btf_id[arg], meta);
if (err)
return err;

- err = check_func_arg_reg_off(env, reg, regno, arg_type, is_release_function(meta->func_id));
+ err = check_func_arg_reg_off(env, reg, regno, arg_type);
if (err)
return err;

skip_type_check:
- /* check_func_arg_reg_off relies on only one referenced register being
- * allowed for BPF helpers.
- */
+ if (arg_type_is_release(arg_type)) {
+ if (!reg->ref_obj_id && !register_is_null(reg)) {
+ verbose(env, "R%d must be referenced when passed to release function\n",
+ regno);
+ return -EINVAL;
+ }
+ if (meta->release_regno) {
+ verbose(env, "verifier internal error: more than one release argument\n");
+ return -EFAULT;
+ }
+ meta->release_regno = regno;
+ }
+
if (reg->ref_obj_id) {
if (meta->ref_obj_id) {
verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
@@ -5641,7 +5850,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
}

err = check_map_access(env, regno, reg->off,
- map->value_size - reg->off, false);
+ map->value_size - reg->off, false,
+ ACCESS_HELPER);
if (err)
return err;

@@ -5657,6 +5867,9 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
verbose(env, "string is not zero-terminated\n");
return -EINVAL;
}
+ } else if (arg_type == ARG_PTR_TO_KPTR) {
+ if (process_kptr_func(env, regno, meta))
+ return -EACCES;
}

return err;
@@ -5696,7 +5909,8 @@ static bool may_update_sockmap(struct bpf_verifier_env *env, int func_id)

static bool allow_tail_call_in_subprogs(struct bpf_verifier_env *env)
{
- return env->prog->jit_requested && IS_ENABLED(CONFIG_X86_64);
+ return env->prog->jit_requested &&
+ bpf_jit_supports_subprog_tailcalls();
}

static int check_map_func_compatibility(struct bpf_verifier_env *env,
@@ -5999,17 +6213,18 @@ static bool check_btf_id_ok(const struct bpf_func_proto *fn)
int i;

for (i = 0; i < ARRAY_SIZE(fn->arg_type); i++) {
- if (fn->arg_type[i] == ARG_PTR_TO_BTF_ID && !fn->arg_btf_id[i])
+ if (base_type(fn->arg_type[i]) == ARG_PTR_TO_BTF_ID && !fn->arg_btf_id[i])
return false;

- if (fn->arg_type[i] != ARG_PTR_TO_BTF_ID && fn->arg_btf_id[i])
+ if (base_type(fn->arg_type[i]) != ARG_PTR_TO_BTF_ID && fn->arg_btf_id[i])
return false;
}

return true;
}

-static int check_func_proto(const struct bpf_func_proto *fn, int func_id)
+static int check_func_proto(const struct bpf_func_proto *fn, int func_id,
+ struct bpf_call_arg_meta *meta)
{
return check_raw_mode_ok(fn) &&
check_arg_pair_ok(fn) &&
@@ -6691,7 +6906,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
memset(&meta, 0, sizeof(meta));
meta.pkt_access = fn->pkt_access;

- err = check_func_proto(fn, func_id);
+ err = check_func_proto(fn, func_id, &meta);
if (err) {
verbose(env, "kernel subsystem misconfigured func %s#%d\n",
func_id_name(func_id), func_id);
@@ -6724,8 +6939,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
return err;
}

- if (is_release_function(func_id)) {
- err = release_reference(env, meta.ref_obj_id);
+ regs = cur_regs(env);
+
+ if (meta.release_regno) {
+ err = -EINVAL;
+ if (meta.ref_obj_id)
+ err = release_reference(env, meta.ref_obj_id);
+ /* meta.ref_obj_id can only be 0 if register that is meant to be
+ * released is NULL, which must be > R0.
+ */
+ else if (register_is_null(&regs[meta.release_regno]))
+ err = 0;
if (err) {
verbose(env, "func %s#%d reference has not been acquired before\n",
func_id_name(func_id), func_id);
@@ -6733,8 +6957,6 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
}
}

- regs = cur_regs(env);
-
switch (func_id) {
case BPF_FUNC_tail_call:
err = check_reference_leak(env);
@@ -6858,21 +7080,25 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
regs[BPF_REG_0].btf_id = meta.ret_btf_id;
}
} else if (base_type(ret_type) == RET_PTR_TO_BTF_ID) {
+ struct btf *ret_btf;
int ret_btf_id;

mark_reg_known_zero(env, regs, BPF_REG_0);
regs[BPF_REG_0].type = PTR_TO_BTF_ID | ret_flag;
- ret_btf_id = *fn->ret_btf_id;
+ if (func_id == BPF_FUNC_kptr_xchg) {
+ ret_btf = meta.kptr_off_desc->kptr.btf;
+ ret_btf_id = meta.kptr_off_desc->kptr.btf_id;
+ } else {
+ ret_btf = btf_vmlinux;
+ ret_btf_id = *fn->ret_btf_id;
+ }
if (ret_btf_id == 0) {
verbose(env, "invalid return type %u of func %s#%d\n",
base_type(ret_type), func_id_name(func_id),
func_id);
return -EINVAL;
}
- /* current BPF helper definitions are only coming from
- * built-in code with type IDs from vmlinux BTF
- */
- regs[BPF_REG_0].btf = btf_vmlinux;
+ regs[BPF_REG_0].btf = ret_btf;
regs[BPF_REG_0].btf_id = ret_btf_id;
} else {
verbose(env, "unknown return type %u of func %s#%d\n",
@@ -7459,7 +7685,7 @@ static int sanitize_check_bounds(struct bpf_verifier_env *env,
return -EACCES;
break;
case PTR_TO_MAP_VALUE:
- if (check_map_access(env, dst, dst_reg->off, 1, false)) {
+ if (check_map_access(env, dst, dst_reg->off, 1, false, ACCESS_HELPER)) {
verbose(env, "R%d pointer arithmetic of map value goes out of range, "
"prohibited for !root\n", dst);
return -EACCES;
@@ -13015,6 +13241,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
/* Below members will be freed only at prog->aux */
func[i]->aux->btf = prog->aux->btf;
func[i]->aux->func_info = prog->aux->func_info;
+ func[i]->aux->func_info_cnt = prog->aux->func_info_cnt;
func[i]->aux->poke_tab = prog->aux->poke_tab;
func[i]->aux->size_poke_tab = prog->aux->size_poke_tab;

@@ -13027,9 +13254,6 @@ static int jit_subprogs(struct bpf_verifier_env *env)
poke->aux = func[i]->aux;
}

- /* Use bpf_prog_F_tag to indicate functions in stack traces.
- * Long term would need debug info to populate names
- */
func[i]->aux->name[0] = 'F';
func[i]->aux->stack_depth = env->subprog_info[i].stack_depth;
func[i]->jit_requested = 1;
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 71a418858a5e..58aadfda9b8b 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2239,7 +2239,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
goto out_unlock;

cgroup_taskset_for_each(task, css, tset) {
- ret = task_can_attach(task, cs->cpus_allowed);
+ ret = task_can_attach(task, cs->effective_cpus);
if (ret)
goto out_unlock;
ret = security_task_setscheduler(task);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 73a41cec9e38..90e5f5c92fdc 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -588,7 +588,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
int index;
phys_addr_t tlb_addr;

- if (!mem)
+ if (!mem || !mem->nslabs)
panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");

if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index 10929eda9825..fc760d064a65 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -82,6 +82,7 @@ config IRQ_FASTEOI_HIERARCHY_HANDLERS
# Generic IRQ IPI support
config GENERIC_IRQ_IPI
bool
+ depends on SMP
select IRQ_DOMAIN_HIERARCHY

# Generic MSI interrupt support
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 54af0deb239b..eb921485930f 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1513,7 +1513,8 @@ int irq_chip_request_resources_parent(struct irq_data *data)
if (data->chip->irq_request_resources)
return data->chip->irq_request_resources(data);

- return -ENOSYS;
+ /* no error on missing optional irq_chip::irq_request_resources */
+ return 0;
}
EXPORT_SYMBOL_GPL(irq_chip_request_resources_parent);

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index d5ce96510549..481abb885d61 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -910,6 +910,8 @@ struct irq_desc *__irq_resolve_mapping(struct irq_domain *domain,
data = irq_domain_get_irq_data(domain, hwirq);
if (data && data->hwirq == hwirq)
desc = irq_data_to_desc(data);
+ if (irq && desc)
+ *irq = hwirq;
}

return desc;
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index bb0fb63f563c..ad005cd184a4 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -62,14 +62,7 @@ int kexec_image_probe_default(struct kimage *image, void *buf,
return ret;
}

-/* Architectures can provide this probe function */
-int __weak arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
- unsigned long buf_len)
-{
- return kexec_image_probe_default(image, buf, buf_len);
-}
-
-static void *kexec_image_load_default(struct kimage *image)
+void *kexec_image_load_default(struct kimage *image)
{
if (!image->fops || !image->fops->load)
return ERR_PTR(-ENOEXEC);
@@ -80,11 +73,6 @@ static void *kexec_image_load_default(struct kimage *image)
image->cmdline_buf_len);
}

-void * __weak arch_kexec_kernel_image_load(struct kimage *image)
-{
- return kexec_image_load_default(image);
-}
-
int kexec_image_post_load_cleanup_default(struct kimage *image)
{
if (!image->fops || !image->fops->cleanup)
@@ -93,30 +81,6 @@ int kexec_image_post_load_cleanup_default(struct kimage *image)
return image->fops->cleanup(image->image_loader_data);
}

-int __weak arch_kimage_file_post_load_cleanup(struct kimage *image)
-{
- return kexec_image_post_load_cleanup_default(image);
-}
-
-#ifdef CONFIG_KEXEC_SIG
-static int kexec_image_verify_sig_default(struct kimage *image, void *buf,
- unsigned long buf_len)
-{
- if (!image->fops || !image->fops->verify_sig) {
- pr_debug("kernel loader does not support signature verification.\n");
- return -EKEYREJECTED;
- }
-
- return image->fops->verify_sig(buf, buf_len);
-}
-
-int __weak arch_kexec_kernel_verify_sig(struct kimage *image, void *buf,
- unsigned long buf_len)
-{
- return kexec_image_verify_sig_default(image, buf, buf_len);
-}
-#endif
-
/*
* Free up memory used by kernel, initrd, and command line. This is temporary
* memory allocation which is not needed any more after these buffers have
@@ -159,13 +123,24 @@ void kimage_file_post_load_cleanup(struct kimage *image)
}

#ifdef CONFIG_KEXEC_SIG
+static int kexec_image_verify_sig(struct kimage *image, void *buf,
+ unsigned long buf_len)
+{
+ if (!image->fops || !image->fops->verify_sig) {
+ pr_debug("kernel loader does not support signature verification.\n");
+ return -EKEYREJECTED;
+ }
+
+ return image->fops->verify_sig(buf, buf_len);
+}
+
static int
kimage_validate_signature(struct kimage *image)
{
int ret;

- ret = arch_kexec_kernel_verify_sig(image, image->kernel_buf,
- image->kernel_buf_len);
+ ret = kexec_image_verify_sig(image, image->kernel_buf,
+ image->kernel_buf_len);
if (ret) {

if (sig_enforce) {
@@ -621,19 +596,6 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf)
return ret == 1 ? 0 : -EADDRNOTAVAIL;
}

-/**
- * arch_kexec_locate_mem_hole - Find free memory to place the segments.
- * @kbuf: Parameters for the memory search.
- *
- * On success, kbuf->mem will have the start address of the memory region found.
- *
- * Return: 0 on success, negative errno on error.
- */
-int __weak arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
-{
- return kexec_locate_mem_hole(kbuf);
-}
-
/**
* kexec_add_buffer - place a buffer in a kexec segment
* @kbuf: Buffer contents and memory parameters.
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index f214f8c088ed..80697e5e03e4 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1560,7 +1560,8 @@ static int check_kprobe_address_safe(struct kprobe *p,
preempt_disable();

/* Ensure it is not in reserved area nor out of text */
- if (!kernel_text_address((unsigned long) p->addr) ||
+ if (!(core_kernel_text((unsigned long) p->addr) ||
+ is_module_text_address((unsigned long) p->addr)) ||
within_kprobe_blacklist((unsigned long) p->addr) ||
jump_label_text_reserved(p->addr, p->addr) ||
static_call_text_reserved(p->addr, p->addr) ||
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index c06cab6546ed..e45ec82b42e6 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -5214,9 +5214,10 @@ __lock_set_class(struct lockdep_map *lock, const char *name,
return 0;
}

- lockdep_init_map_waits(lock, name, key, 0,
- lock->wait_type_inner,
- lock->wait_type_outer);
+ lockdep_init_map_type(lock, name, key, 0,
+ lock->wait_type_inner,
+ lock->wait_type_outer,
+ lock->lock_type);
class = register_lock_class(lock, subclass, 0);
hlock->class_idx = class - lock_classes;

diff --git a/kernel/power/user.c b/kernel/power/user.c
index ad241b4ff64c..d43c2aa583b2 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -26,6 +26,7 @@

#include "power.h"

+static bool need_wait;

static struct snapshot_data {
struct snapshot_handle handle;
@@ -78,7 +79,7 @@ static int snapshot_open(struct inode *inode, struct file *filp)
* Resuming. We may need to wait for the image device to
* appear.
*/
- wait_for_device_probe();
+ need_wait = true;

data->swap = -1;
data->mode = O_WRONLY;
@@ -168,6 +169,11 @@ static ssize_t snapshot_write(struct file *filp, const char __user *buf,
ssize_t res;
loff_t pg_offp = *offp & ~PAGE_MASK;

+ if (need_wait) {
+ wait_for_device_probe();
+ need_wait = false;
+ }
+
lock_system_sleep();

data = filp->private_data;
@@ -244,6 +250,11 @@ static long snapshot_ioctl(struct file *filp, unsigned int cmd,
loff_t size;
sector_t offset;

+ if (need_wait) {
+ wait_for_device_probe();
+ need_wait = false;
+ }
+
if (_IOC_TYPE(cmd) != SNAPSHOT_IOC_MAGIC)
return -ENOTTY;
if (_IOC_NR(cmd) > SNAPSHOT_IOC_MAXNR)
diff --git a/kernel/profile.c b/kernel/profile.c
index 37640a0bd8a3..ae82ddfc6a68 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -109,6 +109,13 @@ int __ref profile_init(void)

/* only text is profiled */
prof_len = (_etext - _stext) >> prof_shift;
+
+ if (!prof_len) {
+ pr_warn("profiling shift: %u too large\n", prof_shift);
+ prof_on = 0;
+ return -EINVAL;
+ }
+
buffer_bytes = prof_len*sizeof(atomic_t);

if (!alloc_cpumask_var(&prof_cpu_mask, GFP_KERNEL))
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 55d049c39608..7c3a7f8c828b 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -2030,6 +2030,19 @@ static int rcutorture_booster_init(unsigned int cpu)
if (boost_tasks[cpu] != NULL)
return 0; /* Already created, nothing more to do. */

+ // Testing RCU priority boosting requires rcutorture do
+ // some serious abuse. Counter this by running ksoftirqd
+ // at higher priority.
+ if (IS_BUILTIN(CONFIG_RCU_TORTURE_TEST)) {
+ struct sched_param sp;
+ struct task_struct *t;
+
+ t = per_cpu(ksoftirqd, cpu);
+ WARN_ON_ONCE(!t);
+ sp.sched_priority = 2;
+ sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
+ }
+
/* Don't allow time recalculation while creating a new task. */
mutex_lock(&boost_mutex);
rcu_torture_disable_rt_throttle();
@@ -3281,21 +3294,6 @@ rcu_torture_init(void)
rcutor_hp = firsterr;
if (torture_init_error(firsterr))
goto unwind;
-
- // Testing RCU priority boosting requires rcutorture do
- // some serious abuse. Counter this by running ksoftirqd
- // at higher priority.
- if (IS_BUILTIN(CONFIG_RCU_TORTURE_TEST)) {
- for_each_online_cpu(cpu) {
- struct sched_param sp;
- struct task_struct *t;
-
- t = per_cpu(ksoftirqd, cpu);
- WARN_ON_ONCE(!t);
- sp.sched_priority = 2;
- sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
- }
- }
}
shutdown_jiffies = jiffies + shutdown_secs * HZ;
firsterr = torture_shutdown_init(shutdown_secs, rcu_torture_cleanup);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dd11daa7a84b..9671796a11cc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -88,7 +88,7 @@
#include "stats.h"

#include "../workqueue_internal.h"
-#include "../../fs/io-wq.h"
+#include "../../io_uring/io-wq.h"
#include "../smpboot.h"

/*
@@ -3811,7 +3811,7 @@ bool cpus_share_cache(int this_cpu, int that_cpu)
return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);
}

-static inline bool ttwu_queue_cond(int cpu, int wake_flags)
+static inline bool ttwu_queue_cond(struct task_struct *p, int cpu)
{
/*
* Do not complicate things with the async wake_list while the CPU is
@@ -3820,6 +3820,10 @@ static inline bool ttwu_queue_cond(int cpu, int wake_flags)
if (!cpu_active(cpu))
return false;

+ /* Ensure the task will still be allowed to run on the CPU. */
+ if (!cpumask_test_cpu(cpu, p->cpus_ptr))
+ return false;
+
/*
* If the CPU does not share cache, then queue the task on the
* remote rqs wakelist to avoid accessing remote data.
@@ -3827,13 +3831,21 @@ static inline bool ttwu_queue_cond(int cpu, int wake_flags)
if (!cpus_share_cache(smp_processor_id(), cpu))
return true;

+ if (cpu == smp_processor_id())
+ return false;
+
/*
- * If the task is descheduling and the only running task on the
- * CPU then use the wakelist to offload the task activation to
- * the soon-to-be-idle CPU as the current CPU is likely busy.
- * nr_running is checked to avoid unnecessary task stacking.
+ * If the wakee cpu is idle, or the task is descheduling and the
+ * only running task on the CPU, then use the wakelist to offload
+ * the task activation to the idle (or soon-to-be-idle) CPU as
+ * the current CPU is likely busy. nr_running is checked to
+ * avoid unnecessary task stacking.
+ *
+ * Note that we can only get here with (wakee) p->on_rq=0,
+ * p->on_cpu can be whatever, we've done the dequeue, so
+ * the wakee has been accounted out of ->nr_running.
*/
- if ((wake_flags & WF_ON_CPU) && cpu_rq(cpu)->nr_running <= 1)
+ if (!cpu_rq(cpu)->nr_running)
return true;

return false;
@@ -3841,10 +3853,7 @@ static inline bool ttwu_queue_cond(int cpu, int wake_flags)

static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)
{
- if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {
- if (WARN_ON_ONCE(cpu == smp_processor_id()))
- return false;
-
+ if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(p, cpu)) {
sched_clock_cpu(cpu); /* Sync clocks across CPUs */
__ttwu_queue_wakelist(p, cpu, wake_flags);
return true;
@@ -4166,7 +4175,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
* scheduling.
*/
if (smp_load_acquire(&p->on_cpu) &&
- ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_CPU))
+ ttwu_queue_wakelist(p, task_cpu(p), wake_flags))
goto unlock;

/*
@@ -4710,7 +4719,8 @@ static inline void prepare_task(struct task_struct *next)
* Claim the task as running, we do this before switching to it
* such that any running task will have this set.
*
- * See the ttwu() WF_ON_CPU case and its ordering comment.
+ * See the smp_load_acquire(&p->on_cpu) case in ttwu() and
+ * its ordering comment.
*/
WRITE_ONCE(next->on_cpu, 1);
#endif
@@ -6460,8 +6470,12 @@ static inline void sched_submit_work(struct task_struct *tsk)
io_wq_worker_sleeping(tsk);
}

- if (tsk_is_pi_blocked(tsk))
- return;
+ /*
+ * spinlock and rwlock must not flush block requests. This will
+ * deadlock if the callback attempts to acquire a lock which is
+ * already acquired.
+ */
+ SCHED_WARN_ON(current->__state & TASK_RTLOCK_WAIT);

/*
* If we are going to sleep and we have plugged IO queued,
@@ -8895,7 +8909,7 @@ int cpuset_cpumask_can_shrink(const struct cpumask *cur,
}

int task_can_attach(struct task_struct *p,
- const struct cpumask *cs_cpus_allowed)
+ const struct cpumask *cs_effective_cpus)
{
int ret = 0;

@@ -8914,9 +8928,11 @@ int task_can_attach(struct task_struct *p,
}

if (dl_task(p) && !cpumask_intersects(task_rq(p)->rd->span,
- cs_cpus_allowed)) {
- int cpu = cpumask_any_and(cpu_active_mask, cs_cpus_allowed);
+ cs_effective_cpus)) {
+ int cpu = cpumask_any_and(cpu_active_mask, cs_effective_cpus);

+ if (unlikely(cpu >= nr_cpu_ids))
+ return -EINVAL;
ret = dl_cpu_busy(cpu, p);
}

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cc8daa3dcc8b..46f6674a0979 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6324,6 +6324,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
{
struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
int i, cpu, idle_cpu = -1, nr = INT_MAX;
+ struct sched_domain_shared *sd_share;
struct rq *this_rq = this_rq();
int this = smp_processor_id();
struct sched_domain *this_sd;
@@ -6363,6 +6364,17 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
time = cpu_clock(this);
}

+ if (sched_feat(SIS_UTIL)) {
+ sd_share = rcu_dereference(per_cpu(sd_llc_shared, target));
+ if (sd_share) {
+ /* because !--nr is the condition to stop scan */
+ nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
+ /* overloaded LLC is unlikely to have idle cpu/core */
+ if (nr == 1)
+ return -1;
+ }
+ }
+
for_each_cpu_wrap(cpu, cpus, target + 1) {
if (has_idle_core) {
i = select_idle_core(p, cpu, cpus, &idle_cpu);
@@ -7616,8 +7628,8 @@ enum group_type {
*/
group_fully_busy,
/*
- * SD_ASYM_CPUCAPACITY only: One task doesn't fit with CPU's capacity
- * and must be migrated to a more powerful CPU.
+ * One task doesn't fit with CPU's capacity and must be migrated to a
+ * more powerful CPU.
*/
group_misfit_task,
/*
@@ -8700,6 +8712,19 @@ sched_asym(struct lb_env *env, struct sd_lb_stats *sds, struct sg_lb_stats *sgs
return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
}

+static inline bool
+sched_reduced_capacity(struct rq *rq, struct sched_domain *sd)
+{
+ /*
+ * When there is more than 1 task, the group_overloaded case already
+ * takes care of cpu with reduced capacity
+ */
+ if (rq->cfs.h_nr_running != 1)
+ return false;
+
+ return check_cpu_capacity(rq, sd);
+}
+
/**
* update_sg_lb_stats - Update sched_group's statistics for load balancing.
* @env: The load balancing environment.
@@ -8722,8 +8747,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,

for_each_cpu_and(i, sched_group_span(group), env->cpus) {
struct rq *rq = cpu_rq(i);
+ unsigned long load = cpu_load(rq);

- sgs->group_load += cpu_load(rq);
+ sgs->group_load += load;
sgs->group_util += cpu_util_cfs(i);
sgs->group_runnable += cpu_runnable(rq);
sgs->sum_h_nr_running += rq->cfs.h_nr_running;
@@ -8753,11 +8779,17 @@ static inline void update_sg_lb_stats(struct lb_env *env,
if (local_group)
continue;

- /* Check for a misfit task on the cpu */
- if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
- sgs->group_misfit_task_load < rq->misfit_task_load) {
- sgs->group_misfit_task_load = rq->misfit_task_load;
- *sg_status |= SG_OVERLOAD;
+ if (env->sd->flags & SD_ASYM_CPUCAPACITY) {
+ /* Check for a misfit task on the cpu */
+ if (sgs->group_misfit_task_load < rq->misfit_task_load) {
+ sgs->group_misfit_task_load = rq->misfit_task_load;
+ *sg_status |= SG_OVERLOAD;
+ }
+ } else if ((env->idle != CPU_NOT_IDLE) &&
+ sched_reduced_capacity(rq, env->sd)) {
+ /* Check for a task running on a CPU with reduced capacity */
+ if (sgs->group_misfit_task_load < load)
+ sgs->group_misfit_task_load = load;
}
}

@@ -8810,7 +8842,8 @@ static bool update_sd_pick_busiest(struct lb_env *env,
* CPUs in the group should either be possible to resolve
* internally or be covered by avg_load imbalance (eventually).
*/
- if (sgs->group_type == group_misfit_task &&
+ if ((env->sd->flags & SD_ASYM_CPUCAPACITY) &&
+ (sgs->group_type == group_misfit_task) &&
(!capacity_greater(capacity_of(env->dst_cpu), sg->sgc->max_capacity) ||
sds->local_stat.group_type != group_has_spare))
return false;
@@ -9253,6 +9286,77 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
return idlest;
}

+static void update_idle_cpu_scan(struct lb_env *env,
+ unsigned long sum_util)
+{
+ struct sched_domain_shared *sd_share;
+ int llc_weight, pct;
+ u64 x, y, tmp;
+ /*
+ * Update the number of CPUs to scan in LLC domain, which could
+ * be used as a hint in select_idle_cpu(). The update of sd_share
+ * could be expensive because it is within a shared cache line.
+ * So the write of this hint only occurs during periodic load
+ * balancing, rather than CPU_NEWLY_IDLE, because the latter
+ * can fire way more frequently than the former.
+ */
+ if (!sched_feat(SIS_UTIL) || env->idle == CPU_NEWLY_IDLE)
+ return;
+
+ llc_weight = per_cpu(sd_llc_size, env->dst_cpu);
+ if (env->sd->span_weight != llc_weight)
+ return;
+
+ sd_share = rcu_dereference(per_cpu(sd_llc_shared, env->dst_cpu));
+ if (!sd_share)
+ return;
+
+ /*
+ * The number of CPUs to search drops as sum_util increases, when
+ * sum_util hits 85% or above, the scan stops.
+ * The reason to choose 85% as the threshold is because this is the
+ * imbalance_pct(117) when a LLC sched group is overloaded.
+ *
+ * let y = SCHED_CAPACITY_SCALE - p * x^2 [1]
+ * and y'= y / SCHED_CAPACITY_SCALE
+ *
+ * x is the ratio of sum_util compared to the CPU capacity:
+ * x = sum_util / (llc_weight * SCHED_CAPACITY_SCALE)
+ * y' is the ratio of CPUs to be scanned in the LLC domain,
+ * and the number of CPUs to scan is calculated by:
+ *
+ * nr_scan = llc_weight * y' [2]
+ *
+ * When x hits the threshold of overloaded, AKA, when
+ * x = 100 / pct, y drops to 0. According to [1],
+ * p should be SCHED_CAPACITY_SCALE * pct^2 / 10000
+ *
+ * Scale x by SCHED_CAPACITY_SCALE:
+ * x' = sum_util / llc_weight; [3]
+ *
+ * and finally [1] becomes:
+ * y = SCHED_CAPACITY_SCALE -
+ * x'^2 * pct^2 / (10000 * SCHED_CAPACITY_SCALE) [4]
+ *
+ */
+ /* equation [3] */
+ x = sum_util;
+ do_div(x, llc_weight);
+
+ /* equation [4] */
+ pct = env->sd->imbalance_pct;
+ tmp = x * x * pct * pct;
+ do_div(tmp, 10000 * SCHED_CAPACITY_SCALE);
+ tmp = min_t(long, tmp, SCHED_CAPACITY_SCALE);
+ y = SCHED_CAPACITY_SCALE - tmp;
+
+ /* equation [2] */
+ y *= llc_weight;
+ do_div(y, SCHED_CAPACITY_SCALE);
+ if ((int)y != sd_share->nr_idle_scan)
+ WRITE_ONCE(sd_share->nr_idle_scan, (int)y);
+}
+
/**
* update_sd_lb_stats - Update sched_domain's statistics for load balancing.
* @env: The load balancing environment.
@@ -9265,6 +9369,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
struct sched_group *sg = env->sd->groups;
struct sg_lb_stats *local = &sds->local_stat;
struct sg_lb_stats tmp_sgs;
+ unsigned long sum_util = 0;
int sg_status = 0;

do {
@@ -9297,6 +9402,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
sds->total_load += sgs->group_load;
sds->total_capacity += sgs->group_capacity;

+ sum_util += sgs->group_util;
sg = sg->next;
} while (sg != env->sd->groups);

@@ -9322,6 +9428,8 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
WRITE_ONCE(rd->overutilized, SG_OVERUTILIZED);
trace_sched_overutilized_tp(rd, SG_OVERUTILIZED);
}
+
+ update_idle_cpu_scan(env, sum_util);
}

#define NUMA_IMBALANCE_MIN 2
@@ -9356,9 +9464,18 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
busiest = &sds->busiest_stat;

if (busiest->group_type == group_misfit_task) {
- /* Set imbalance to allow misfit tasks to be balanced. */
- env->migration_type = migrate_misfit;
- env->imbalance = 1;
+ if (env->sd->flags & SD_ASYM_CPUCAPACITY) {
+ /* Set imbalance to allow misfit tasks to be balanced. */
+ env->migration_type = migrate_misfit;
+ env->imbalance = 1;
+ } else {
+ /*
+ * Set load imbalance to allow moving task from cpu
+ * with reduced capacity.
+ */
+ env->migration_type = migrate_load;
+ env->imbalance = busiest->group_misfit_task_load;
+ }
return;
}

diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 1cf435bbcd9c..ee7f23c76bd3 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -60,7 +60,8 @@ SCHED_FEAT(TTWU_QUEUE, true)
/*
* When doing wakeups, attempt to limit superfluous scans of the LLC domain.
*/
-SCHED_FEAT(SIS_PROP, true)
+SCHED_FEAT(SIS_PROP, false)
+SCHED_FEAT(SIS_UTIL, true)

/*
* Issue a WARN when we do multiple update_rq_clock() calls
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 7891c0f0e1ff..907a5d7507a8 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -430,7 +430,7 @@ static inline void rt_queue_push_tasks(struct rq *rq)
#endif /* CONFIG_SMP */

static void enqueue_top_rt_rq(struct rt_rq *rt_rq);
-static void dequeue_top_rt_rq(struct rt_rq *rt_rq);
+static void dequeue_top_rt_rq(struct rt_rq *rt_rq, unsigned int count);

static inline int on_rt_rq(struct sched_rt_entity *rt_se)
{
@@ -551,7 +551,7 @@ static void sched_rt_rq_dequeue(struct rt_rq *rt_rq)
rt_se = rt_rq->tg->rt_se[cpu];

if (!rt_se) {
- dequeue_top_rt_rq(rt_rq);
+ dequeue_top_rt_rq(rt_rq, rt_rq->rt_nr_running);
/* Kick cpufreq (see the comment in kernel/sched/sched.h). */
cpufreq_update_util(rq_of_rt_rq(rt_rq), 0);
}
@@ -637,7 +637,7 @@ static inline void sched_rt_rq_enqueue(struct rt_rq *rt_rq)

static inline void sched_rt_rq_dequeue(struct rt_rq *rt_rq)
{
- dequeue_top_rt_rq(rt_rq);
+ dequeue_top_rt_rq(rt_rq, rt_rq->rt_nr_running);
}

static inline int rt_rq_throttled(struct rt_rq *rt_rq)
@@ -1039,7 +1039,7 @@ static void update_curr_rt(struct rq *rq)
}

static void
-dequeue_top_rt_rq(struct rt_rq *rt_rq)
+dequeue_top_rt_rq(struct rt_rq *rt_rq, unsigned int count)
{
struct rq *rq = rq_of_rt_rq(rt_rq);

@@ -1050,7 +1050,7 @@ dequeue_top_rt_rq(struct rt_rq *rt_rq)

BUG_ON(!rq->nr_running);

- sub_nr_running(rq, rt_rq->rt_nr_running);
+ sub_nr_running(rq, count);
rt_rq->rt_queued = 0;

}
@@ -1436,18 +1436,21 @@ static void __dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flag
static void dequeue_rt_stack(struct sched_rt_entity *rt_se, unsigned int flags)
{
struct sched_rt_entity *back = NULL;
+ unsigned int rt_nr_running;

for_each_sched_rt_entity(rt_se) {
rt_se->back = back;
back = rt_se;
}

- dequeue_top_rt_rq(rt_rq_of_se(back));
+ rt_nr_running = rt_rq_of_se(back)->rt_nr_running;

for (rt_se = back; rt_se; rt_se = rt_se->back) {
if (on_rt_rq(rt_se))
__dequeue_rt_entity(rt_se, flags);
}
+
+ dequeue_top_rt_rq(rt_rq_of_se(back), rt_nr_running);
}

static void enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 84bba67c92dc..08fdb9ccd14d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2042,7 +2042,6 @@ static inline int task_on_rq_migrating(struct task_struct *p)

#define WF_SYNC 0x10 /* Waker goes to sleep after wakeup */
#define WF_MIGRATED 0x20 /* Internal use, task got migrated */
-#define WF_ON_CPU 0x40 /* Wakee is on_cpu */

#ifdef CONFIG_SMP
static_assert(WF_EXEC == SD_BALANCE_EXEC);
diff --git a/kernel/smp.c b/kernel/smp.c
index 65a630f62363..381eb15cd28f 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -174,9 +174,9 @@ static int __init csdlock_debug(char *str)
if (val)
static_branch_enable(&csdlock_debug_enabled);

- return 0;
+ return 1;
}
-early_param("csdlock_debug", csdlock_debug);
+__setup("csdlock_debug=", csdlock_debug);

static DEFINE_PER_CPU(call_single_data_t *, cur_csd);
static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 0ea8702eb516..23af5eca11b1 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -2311,6 +2311,7 @@ schedule_hrtimeout_range_clock(ktime_t *expires, u64 delta,

return !t.task ? 0 : -EINTR;
}
+EXPORT_SYMBOL_GPL(schedule_hrtimeout_range_clock);

/**
* schedule_hrtimeout_range - sleep until timeout
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 871c912860ed..d6a0ff68df41 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -23,6 +23,7 @@
#include <linux/pvclock_gtod.h>
#include <linux/compiler.h>
#include <linux/audit.h>
+#include <linux/random.h>

#include "tick-internal.h"
#include "ntp_internal.h"
@@ -1326,8 +1327,10 @@ int do_settimeofday64(const struct timespec64 *ts)
/* Signal hrtimers about time change */
clock_was_set(CLOCK_SET_WALL);

- if (!ret)
+ if (!ret) {
audit_tk_injoffset(ts_delta);
+ add_device_randomness(ts, sizeof(*ts));
+ }

return ret;
}
@@ -2413,6 +2416,7 @@ int do_adjtimex(struct __kernel_timex *txc)
ret = timekeeping_validate_timex(txc);
if (ret)
return ret;
+ add_device_randomness(txc, sizeof(*txc));

if (txc->modes & ADJ_SETOFFSET) {
struct timespec64 delta;
@@ -2430,6 +2434,7 @@ int do_adjtimex(struct __kernel_timex *txc)
audit_ntp_init(&ad);

ktime_get_real_ts64(&ts);
+ add_device_randomness(&ts, sizeof(ts));

raw_spin_lock_irqsave(&timekeeper_lock, flags);
write_seqcount_begin(&tk_core.seq);
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 4d5629196d01..f0500b5cfefe 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -770,14 +770,11 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg)
**/
void blk_trace_shutdown(struct request_queue *q)
{
- mutex_lock(&q->debugfs_mutex);
if (rcu_dereference_protected(q->blk_trace,
lockdep_is_held(&q->debugfs_mutex))) {
__blk_trace_startstop(q, 0);
__blk_trace_remove(q);
}
-
- mutex_unlock(&q->debugfs_mutex);
}

#ifdef CONFIG_BLK_CGROUP
@@ -1059,7 +1056,7 @@ static void blk_add_trace_rq_remap(void *ignore, struct request *rq, dev_t dev,
r.sector_from = cpu_to_be64(from);

__blk_add_trace(bt, blk_rq_pos(rq), blk_rq_bytes(rq),
- rq_data_dir(rq), 0, BLK_TA_REMAP, 0,
+ req_op(rq), rq->cmd_flags, BLK_TA_REMAP, 0,
sizeof(r), &r, blk_trace_request_get_cgid(rq));
rcu_read_unlock();
}
diff --git a/lib/crypto/blake2s-selftest.c b/lib/crypto/blake2s-selftest.c
index 409e4b728770..7d77dea15587 100644
--- a/lib/crypto/blake2s-selftest.c
+++ b/lib/crypto/blake2s-selftest.c
@@ -4,6 +4,8 @@
*/

#include <crypto/internal/blake2s.h>
+#include <linux/kernel.h>
+#include <linux/random.h>
#include <linux/string.h>

/*
@@ -587,5 +589,44 @@ bool __init blake2s_selftest(void)
}
}

+ for (i = 0; i < 32; ++i) {
+ enum { TEST_ALIGNMENT = 16 };
+ u8 unaligned_block[BLAKE2S_BLOCK_SIZE + TEST_ALIGNMENT - 1]
+ __aligned(TEST_ALIGNMENT);
+ u8 blocks[BLAKE2S_BLOCK_SIZE * 2];
+ struct blake2s_state state1, state2;
+
+ get_random_bytes(blocks, sizeof(blocks));
+ get_random_bytes(&state, sizeof(state));
+
+#if defined(CONFIG_CRYPTO_LIB_BLAKE2S_GENERIC) && \
+ defined(CONFIG_CRYPTO_ARCH_HAVE_LIB_BLAKE2S)
+ memcpy(&state1, &state, sizeof(state1));
+ memcpy(&state2, &state, sizeof(state2));
+ blake2s_compress(&state1, blocks, 2, BLAKE2S_BLOCK_SIZE);
+ blake2s_compress_generic(&state2, blocks, 2, BLAKE2S_BLOCK_SIZE);
+ if (memcmp(&state1, &state2, sizeof(state1))) {
+ pr_err("blake2s random compress self-test %d: FAIL\n",
+ i + 1);
+ success = false;
+ }
+#endif
+
+ memcpy(&state1, &state, sizeof(state1));
+ blake2s_compress(&state1, blocks, 1, BLAKE2S_BLOCK_SIZE);
+ for (l = 1; l < TEST_ALIGNMENT; ++l) {
+ memcpy(unaligned_block + l, blocks,
+ BLAKE2S_BLOCK_SIZE);
+ memcpy(&state2, &state, sizeof(state2));
+ blake2s_compress(&state2, unaligned_block + l, 1,
+ BLAKE2S_BLOCK_SIZE);
+ if (memcmp(&state1, &state2, sizeof(state1))) {
+ pr_err("blake2s random compress align %d self-test %d: FAIL\n",
+ l, i + 1);
+ success = false;
+ }
+ }
+ }
+
return success;
}
diff --git a/lib/crypto/blake2s.c b/lib/crypto/blake2s.c
index c71c09621c09..98e688c6d891 100644
--- a/lib/crypto/blake2s.c
+++ b/lib/crypto/blake2s.c
@@ -16,16 +16,44 @@
#include <linux/init.h>
#include <linux/bug.h>

+static inline void blake2s_set_lastblock(struct blake2s_state *state)
+{
+ state->f[0] = -1;
+}
+
void blake2s_update(struct blake2s_state *state, const u8 *in, size_t inlen)
{
- __blake2s_update(state, in, inlen, false);
+ const size_t fill = BLAKE2S_BLOCK_SIZE - state->buflen;
+
+ if (unlikely(!inlen))
+ return;
+ if (inlen > fill) {
+ memcpy(state->buf + state->buflen, in, fill);
+ blake2s_compress(state, state->buf, 1, BLAKE2S_BLOCK_SIZE);
+ state->buflen = 0;
+ in += fill;
+ inlen -= fill;
+ }
+ if (inlen > BLAKE2S_BLOCK_SIZE) {
+ const size_t nblocks = DIV_ROUND_UP(inlen, BLAKE2S_BLOCK_SIZE);
+ blake2s_compress(state, in, nblocks - 1, BLAKE2S_BLOCK_SIZE);
+ in += BLAKE2S_BLOCK_SIZE * (nblocks - 1);
+ inlen -= BLAKE2S_BLOCK_SIZE * (nblocks - 1);
+ }
+ memcpy(state->buf + state->buflen, in, inlen);
+ state->buflen += inlen;
}
EXPORT_SYMBOL(blake2s_update);

void blake2s_final(struct blake2s_state *state, u8 *out)
{
WARN_ON(IS_ENABLED(DEBUG) && !out);
- __blake2s_final(state, out, false);
+ blake2s_set_lastblock(state);
+ memset(state->buf + state->buflen, 0,
+ BLAKE2S_BLOCK_SIZE - state->buflen); /* Padding */
+ blake2s_compress(state, state->buf, 1, state->buflen);
+ cpu_to_le32_array(state->h, ARRAY_SIZE(state->h));
+ memcpy(out, state->h, state->outlen);
memzero_explicit(state, sizeof(*state));
}
EXPORT_SYMBOL(blake2s_final);
@@ -38,12 +66,7 @@ static int __init blake2s_mod_init(void)
return 0;
}

-static void __exit blake2s_mod_exit(void)
-{
-}
-
module_init(blake2s_mod_init);
-module_exit(blake2s_mod_exit);
MODULE_LICENSE("GPL v2");
MODULE_DESCRIPTION("BLAKE2s hash function");
MODULE_AUTHOR("Jason A. Donenfeld <Jason@xxxxxxxxx>");
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 0b64695ab632..2bf20b48a04a 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -689,6 +689,7 @@ static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
struct pipe_inode_info *pipe = i->pipe;
unsigned int p_mask = pipe->ring_size - 1;
unsigned int i_head;
+ unsigned int valid = pipe->head;
size_t n, off, xfer = 0;

if (!sanity(i))
@@ -702,11 +703,17 @@ static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
rem = copy_mc_to_kernel(p + off, addr + xfer, chunk);
chunk -= rem;
kunmap_local(p);
- i->head = i_head;
- i->iov_offset = off + chunk;
- xfer += chunk;
- if (rem)
+ if (chunk) {
+ i->head = i_head;
+ i->iov_offset = off + chunk;
+ xfer += chunk;
+ valid = i_head + 1;
+ }
+ if (rem) {
+ pipe->bufs[i_head & p_mask].len -= rem;
+ pipe_discard_from(pipe, valid);
break;
+ }
n -= chunk;
off = 0;
i_head++;
diff --git a/lib/kunit/executor.c b/lib/kunit/executor.c
index 96f96e42ce06..16fb88c0aca3 100644
--- a/lib/kunit/executor.c
+++ b/lib/kunit/executor.c
@@ -76,8 +76,10 @@ kunit_filter_tests(struct kunit_suite *const suite, const char *test_glob)
memcpy(copy, suite, sizeof(*copy));

filtered = kcalloc(n + 1, sizeof(*filtered), GFP_KERNEL);
- if (!filtered)
+ if (!filtered) {
+ kfree(copy);
return ERR_PTR(-ENOMEM);
+ }

n = 0;
kunit_suite_for_each_test_case(suite, test_case) {
diff --git a/lib/livepatch/test_klp_callbacks_busy.c b/lib/livepatch/test_klp_callbacks_busy.c
index 7ac845f65be5..133929e0ce8f 100644
--- a/lib/livepatch/test_klp_callbacks_busy.c
+++ b/lib/livepatch/test_klp_callbacks_busy.c
@@ -16,10 +16,12 @@ MODULE_PARM_DESC(block_transition, "block_transition (default=false)");

static void busymod_work_func(struct work_struct *work);
static DECLARE_WORK(work, busymod_work_func);
+static DECLARE_COMPLETION(busymod_work_started);

static void busymod_work_func(struct work_struct *work)
{
pr_info("%s enter\n", __func__);
+ complete(&busymod_work_started);

while (READ_ONCE(block_transition)) {
/*
@@ -37,6 +39,12 @@ static int test_klp_callbacks_busy_init(void)
pr_info("%s\n", __func__);
schedule_work(&work);

+ /*
+ * To synchronize kernel messages, hold the init function from
+ * exiting until the work function's entry message has printed.
+ */
+ wait_for_completion(&busymod_work_started);
+
if (!block_transition) {
/*
* Serialize output: print all messages from the work
diff --git a/lib/overflow_kunit.c b/lib/overflow_kunit.c
index 475f0c064bf6..7e3e43679b73 100644
--- a/lib/overflow_kunit.c
+++ b/lib/overflow_kunit.c
@@ -91,6 +91,7 @@ DEFINE_TEST_ARRAY(u32) = {
{-4U, 5U, 1U, -9U, -20U, true, false, true},
};

+#if BITS_PER_LONG == 64
DEFINE_TEST_ARRAY(u64) = {
{0, 0, 0, 0, 0, false, false, false},
{1, 1, 2, 0, 1, false, false, false},
@@ -114,6 +115,7 @@ DEFINE_TEST_ARRAY(u64) = {
false, true, false},
{-15ULL, 10ULL, -5ULL, -25ULL, -150ULL, false, false, true},
};
+#endif

DEFINE_TEST_ARRAY(s8) = {
{0, 0, 0, 0, 0, false, false, false},
@@ -188,6 +190,8 @@ DEFINE_TEST_ARRAY(s32) = {
{S32_MIN, S32_MIN, 0, 0, 0, true, false, true},
{S32_MAX, S32_MAX, -2, 0, 1, true, false, true},
};
+
+#if BITS_PER_LONG == 64
DEFINE_TEST_ARRAY(s64) = {
{0, 0, 0, 0, 0, false, false, false},

@@ -216,6 +220,7 @@ DEFINE_TEST_ARRAY(s64) = {
{-128, -1, -129, -127, 128, false, false, false},
{0, -S64_MAX, -S64_MAX, S64_MAX, 0, false, false, false},
};
+#endif

#define check_one_op(t, fmt, op, sym, a, b, r, of) do { \
t _r; \
@@ -650,6 +655,7 @@ static struct kunit_case overflow_test_cases[] = {
KUNIT_CASE(s16_overflow_test),
KUNIT_CASE(u32_overflow_test),
KUNIT_CASE(s32_overflow_test),
+/* Clang 13 and earlier generate unwanted libcalls on 32-bit. */
#if BITS_PER_LONG == 64
KUNIT_CASE(u64_overflow_test),
KUNIT_CASE(s64_overflow_test),
diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c
index 046ac6297c78..a2bb7738c373 100644
--- a/lib/smp_processor_id.c
+++ b/lib/smp_processor_id.c
@@ -47,9 +47,9 @@ unsigned int check_preemption_disabled(const char *what1, const char *what2)

printk("caller is %pS\n", __builtin_return_address(0));
dump_stack();
- instrumentation_end();

out_enable:
+ instrumentation_end();
preempt_enable_no_resched_notrace();
out:
return this_cpu;
diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index 0c5cb2d6436a..1a00d0247f70 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -14456,9 +14456,9 @@ static struct skb_segment_test skb_segment_tests[] __initconst = {
.build_skb = build_test_skb_linear_no_head_frag,
.features = NETIF_F_SG | NETIF_F_FRAGLIST |
NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_GSO |
- NETIF_F_LLTX_BIT | NETIF_F_GRO |
+ NETIF_F_LLTX | NETIF_F_GRO |
NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM |
- NETIF_F_HW_VLAN_STAG_TX_BIT
+ NETIF_F_HW_VLAN_STAG_TX
}
};

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index cfe632047839..f2c3015c5c82 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -732,7 +732,7 @@ static int dmirror_exclusive(struct dmirror *dmirror,

mmap_read_lock(mm);
for (addr = start; addr < end; addr = next) {
- unsigned long mapped;
+ unsigned long mapped = 0;
int i;

if (end < addr + (ARRAY_SIZE(pages) << PAGE_SHIFT))
@@ -741,7 +741,13 @@ static int dmirror_exclusive(struct dmirror *dmirror,
next = addr + (ARRAY_SIZE(pages) << PAGE_SHIFT);

ret = make_device_exclusive_range(mm, addr, next, pages, NULL);
- mapped = dmirror_atomic_map(addr, next, pages, dmirror);
+ /*
+ * Do dmirror_atomic_map() iff all pages are marked for
+ * exclusive access to avoid accessing uninitialized
+ * fields of pages.
+ */
+ if (ret == (next - addr) >> PAGE_SHIFT)
+ mapped = dmirror_atomic_map(addr, next, pages, dmirror);
for (i = 0; i < ret; i++) {
if (pages[i]) {
unlock_page(pages[i]);
diff --git a/lib/test_kasan.c b/lib/test_kasan.c
index ad880231dfa8..630e0c31d7c2 100644
--- a/lib/test_kasan.c
+++ b/lib/test_kasan.c
@@ -131,6 +131,7 @@ static void kmalloc_oob_right(struct kunit *test)
ptr = kmalloc(size, GFP_KERNEL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);

+ OPTIMIZER_HIDE_VAR(ptr);
/*
* An unaligned access past the requested kmalloc size.
* Only generic KASAN can precisely detect these.
@@ -159,6 +160,7 @@ static void kmalloc_oob_left(struct kunit *test)
ptr = kmalloc(size, GFP_KERNEL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);

+ OPTIMIZER_HIDE_VAR(ptr);
KUNIT_EXPECT_KASAN_FAIL(test, *ptr = *(ptr - 1));
kfree(ptr);
}
@@ -171,6 +173,7 @@ static void kmalloc_node_oob_right(struct kunit *test)
ptr = kmalloc_node(size, GFP_KERNEL, 0);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);

+ OPTIMIZER_HIDE_VAR(ptr);
KUNIT_EXPECT_KASAN_FAIL(test, ptr[0] = ptr[size]);
kfree(ptr);
}
@@ -191,6 +194,7 @@ static void kmalloc_pagealloc_oob_right(struct kunit *test)
ptr = kmalloc(size, GFP_KERNEL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);

+ OPTIMIZER_HIDE_VAR(ptr);
KUNIT_EXPECT_KASAN_FAIL(test, ptr[size + OOB_TAG_OFF] = 0);

kfree(ptr);
@@ -271,6 +275,7 @@ static void kmalloc_large_oob_right(struct kunit *test)
ptr = kmalloc(size, GFP_KERNEL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);

+ OPTIMIZER_HIDE_VAR(ptr);
KUNIT_EXPECT_KASAN_FAIL(test, ptr[size] = 0);
kfree(ptr);
}
@@ -410,6 +415,8 @@ static void kmalloc_oob_16(struct kunit *test)
ptr2 = kmalloc(sizeof(*ptr2), GFP_KERNEL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr2);

+ OPTIMIZER_HIDE_VAR(ptr1);
+ OPTIMIZER_HIDE_VAR(ptr2);
KUNIT_EXPECT_KASAN_FAIL(test, *ptr1 = *ptr2);
kfree(ptr1);
kfree(ptr2);
@@ -756,6 +763,8 @@ static void ksize_unpoisons_memory(struct kunit *test)
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
real_size = ksize(ptr);

+ OPTIMIZER_HIDE_VAR(ptr);
+
/* This access shouldn't trigger a KASAN report. */
ptr[size] = 'x';

@@ -778,6 +787,7 @@ static void ksize_uaf(struct kunit *test)
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
kfree(ptr);

+ OPTIMIZER_HIDE_VAR(ptr);
KUNIT_EXPECT_KASAN_FAIL(test, ksize(ptr));
KUNIT_EXPECT_KASAN_FAIL(test, ((volatile char *)ptr)[0]);
KUNIT_EXPECT_KASAN_FAIL(test, ((volatile char *)ptr)[size]);
diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
index e34c4d0c4d93..11982685508e 100644
--- a/mm/damon/reclaim.c
+++ b/mm/damon/reclaim.c
@@ -384,8 +384,10 @@ static int __init damon_reclaim_init(void)
if (!ctx)
return -ENOMEM;

- if (damon_select_ops(ctx, DAMON_OPS_PADDR))
+ if (damon_select_ops(ctx, DAMON_OPS_PADDR)) {
+ damon_destroy_ctx(ctx);
return -EINVAL;
+ }

ctx->callback.after_aggregation = damon_reclaim_after_aggregation;

diff --git a/mm/gup.c b/mm/gup.c
index c5d076d43d9b..414f0c4b4caa 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1795,7 +1795,7 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
* Try to move out any movable page before pinning the range.
*/
if (folio_test_hugetlb(folio)) {
- if (!isolate_huge_page(&folio->page,
+ if (isolate_hugetlb(&folio->page,
&movable_page_list))
isolation_error_count++;
continue;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b7d0697b1f26..e0c1cf3168d7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -18,6 +18,7 @@
#include <linux/shrinker.h>
#include <linux/mm_inline.h>
#include <linux/swapops.h>
+#include <linux/backing-dev.h>
#include <linux/dax.h>
#include <linux/khugepaged.h>
#include <linux/freezer.h>
@@ -2389,11 +2390,15 @@ static void __split_huge_page(struct page *page, struct list_head *list,
__split_huge_page_tail(head, i, lruvec, list);
/* Some pages can be beyond EOF: drop them from page cache */
if (head[i].index >= end) {
- ClearPageDirty(head + i);
- __delete_from_page_cache(head + i, NULL);
+ struct folio *tail = page_folio(head + i);
+
if (shmem_mapping(head->mapping))
shmem_uncharge(head->mapping->host, 1);
- put_page(head + i);
+ else if (folio_test_clear_dirty(tail))
+ folio_account_cleaned(tail,
+ inode_to_wb(folio->mapping->host));
+ __filemap_remove_folio(tail, NULL);
+ folio_put(tail);
} else if (!PageAnon(page)) {
__xa_store(&head->mapping->i_pages, head[i].index,
head + i, 0);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 859cfcaecddb..19063dbed332 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2758,8 +2758,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
* Fail with -EBUSY if not possible.
*/
spin_unlock_irq(&hugetlb_lock);
- if (!isolate_huge_page(old_page, list))
- ret = -EBUSY;
+ ret = isolate_hugetlb(old_page, list);
spin_lock_irq(&hugetlb_lock);
goto free_new;
} else if (!HPageFreed(old_page)) {
@@ -2835,7 +2834,7 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list)
if (hstate_is_gigantic(h))
return -ENOMEM;

- if (page_count(head) && isolate_huge_page(head, list))
+ if (page_count(head) && !isolate_hugetlb(head, list))
ret = 0;
else if (!page_count(head))
ret = alloc_and_dissolve_huge_page(h, head, list);
@@ -5603,7 +5602,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
*/
entry = huge_ptep_get(ptep);
if (unlikely(is_hugetlb_entry_migration(entry))) {
- migration_entry_wait_huge(vma, mm, ptep);
+ migration_entry_wait_huge(vma, ptep);
return 0;
} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
return VM_FAULT_HWPOISON_LARGE |
@@ -6726,7 +6725,7 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
} else {
if (is_hugetlb_entry_migration(pte)) {
spin_unlock(ptl);
- __migration_entry_wait(mm, (pte_t *)pmd, ptl);
+ __migration_entry_wait_huge((pte_t *)pmd, ptl);
goto retry;
}
/*
@@ -6758,15 +6757,15 @@ follow_huge_pgd(struct mm_struct *mm, unsigned long address, pgd_t *pgd, int fla
return pte_page(*(pte_t *)pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
}

-bool isolate_huge_page(struct page *page, struct list_head *list)
+int isolate_hugetlb(struct page *page, struct list_head *list)
{
- bool ret = true;
+ int ret = 0;

spin_lock_irq(&hugetlb_lock);
if (!PageHeadHuge(page) ||
!HPageMigratable(page) ||
!get_page_unless_zero(page)) {
- ret = false;
+ ret = -EBUSY;
goto unlock;
}
ClearHPageMigratable(page);
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index f9942841df18..c86691c431fd 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -772,6 +772,7 @@ static void __init __hugetlb_cgroup_file_dfl_init(int idx)
/* Add the numa stat file */
cft = &h->cgroup_files_dfl[6];
snprintf(cft->name, MAX_CFTYPE_NAME, "%s.numa_stat", buf);
+ cft->private = MEMFILE_PRIVATE(idx, 0);
cft->seq_show = hugetlb_cgroup_read_numa_stat;
cft->flags = CFTYPE_NOT_ON_ROOT;

diff --git a/mm/kasan/hw_tags.c b/mm/kasan/hw_tags.c
index 9e1b6544bfa8..9ad8eff71b28 100644
--- a/mm/kasan/hw_tags.c
+++ b/mm/kasan/hw_tags.c
@@ -257,27 +257,37 @@ static void unpoison_vmalloc_pages(const void *addr, u8 tag)
}
}

+static void init_vmalloc_pages(const void *start, unsigned long size)
+{
+ const void *addr;
+
+ for (addr = start; addr < start + size; addr += PAGE_SIZE) {
+ struct page *page = virt_to_page(addr);
+
+ clear_highpage_kasan_tagged(page);
+ }
+}
+
void *__kasan_unpoison_vmalloc(const void *start, unsigned long size,
kasan_vmalloc_flags_t flags)
{
u8 tag;
unsigned long redzone_start, redzone_size;

- if (!kasan_vmalloc_enabled())
- return (void *)start;
-
- if (!is_vmalloc_or_module_addr(start))
+ if (!kasan_vmalloc_enabled() || !is_vmalloc_or_module_addr(start)) {
+ if (flags & KASAN_VMALLOC_INIT)
+ init_vmalloc_pages(start, size);
return (void *)start;
+ }

/*
- * Skip unpoisoning and assigning a pointer tag for non-VM_ALLOC
- * mappings as:
+ * Don't tag non-VM_ALLOC mappings, as:
*
* 1. Unlike the software KASAN modes, hardware tag-based KASAN only
* supports tagging physical memory. Therefore, it can only tag a
* single mapping of normal physical pages.
* 2. Hardware tag-based KASAN can only tag memory mapped with special
- * mapping protection bits, see arch_vmalloc_pgprot_modify().
+ * mapping protection bits, see arch_vmap_pgprot_tagged().
* As non-VM_ALLOC mappings can be mapped outside of vmalloc code,
* providing these bits would require tracking all non-VM_ALLOC
* mappers.
@@ -289,15 +299,19 @@ void *__kasan_unpoison_vmalloc(const void *start, unsigned long size,
*
* For non-VM_ALLOC allocations, page_alloc memory is tagged as usual.
*/
- if (!(flags & KASAN_VMALLOC_VM_ALLOC))
+ if (!(flags & KASAN_VMALLOC_VM_ALLOC)) {
+ WARN_ON(flags & KASAN_VMALLOC_INIT);
return (void *)start;
+ }

/*
* Don't tag executable memory.
* The kernel doesn't tolerate having the PC register tagged.
*/
- if (!(flags & KASAN_VMALLOC_PROT_NORMAL))
+ if (!(flags & KASAN_VMALLOC_PROT_NORMAL)) {
+ WARN_ON(flags & KASAN_VMALLOC_INIT);
return (void *)start;
+ }

tag = kasan_random_tag();
start = set_tag(start, tag);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 94dac77f5eba..fb8c9b0b53ab 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2211,7 +2211,7 @@ static bool isolate_page(struct page *page, struct list_head *pagelist)
bool lru = PageLRU(page);

if (PageHuge(page)) {
- isolated = isolate_huge_page(page, pagelist);
+ isolated = !isolate_hugetlb(page, pagelist);
} else {
if (lru)
isolated = !isolate_lru_page(page);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 416b38ca8def..3a60c1444bc7 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1627,7 +1627,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)

if (PageHuge(page)) {
pfn = page_to_pfn(head) + compound_nr(head) - 1;
- isolate_huge_page(head, &source);
+ isolate_hugetlb(head, &source);
continue;
} else if (PageTransHuge(page))
pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index ea6dee61bc9d..c1ccc89845f4 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -607,7 +607,7 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
/* With MPOL_MF_MOVE, we migrate only unshared hugepage. */
if (flags & (MPOL_MF_MOVE_ALL) ||
(flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) {
- if (!isolate_huge_page(page, qp->pagelist) &&
+ if (isolate_hugetlb(page, qp->pagelist) &&
(flags & MPOL_MF_STRICT))
/*
* Failed to isolate page but allow migrating pages
@@ -1387,7 +1387,7 @@ static int get_nodes(nodemask_t *nodes, const unsigned long __user *nmask,
unsigned long bits = min_t(unsigned long, maxnode, BITS_PER_LONG);
unsigned long t;

- if (get_bitmap(&t, &nmask[maxnode / BITS_PER_LONG], bits))
+ if (get_bitmap(&t, &nmask[(maxnode - 1) / BITS_PER_LONG], bits))
return -EFAULT;

if (maxnode - bits >= MAX_NUMNODES) {
diff --git a/mm/memremap.c b/mm/memremap.c
index e11653fd348c..2c1130486d28 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -141,10 +141,10 @@ void memunmap_pages(struct dev_pagemap *pgmap)
for (i = 0; i < pgmap->nr_range; i++)
percpu_ref_put_many(&pgmap->ref, pfn_len(pgmap, i));
wait_for_completion(&pgmap->done);
- percpu_ref_exit(&pgmap->ref);

for (i = 0; i < pgmap->nr_range; i++)
pageunmap_range(pgmap, i);
+ percpu_ref_exit(&pgmap->ref);

WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n");
devmap_managed_enable_put(pgmap);
diff --git a/mm/migrate.c b/mm/migrate.c
index 6c31ee1e1c9b..5aba3fb612d4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -133,7 +133,7 @@ static void putback_movable_page(struct page *page)
*
* This function shall be used whenever the isolated pageset has been
* built from lru, balloon, hugetlbfs page. See isolate_migratepages_range()
- * and isolate_huge_page().
+ * and isolate_hugetlb().
*/
void putback_movable_pages(struct list_head *l)
{
@@ -309,13 +309,28 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
__migration_entry_wait(mm, ptep, ptl);
}

-void migration_entry_wait_huge(struct vm_area_struct *vma,
- struct mm_struct *mm, pte_t *pte)
+#ifdef CONFIG_HUGETLB_PAGE
+void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl)
{
- spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), mm, pte);
- __migration_entry_wait(mm, pte, ptl);
+ pte_t pte;
+
+ spin_lock(ptl);
+ pte = huge_ptep_get(ptep);
+
+ if (unlikely(!is_hugetlb_entry_migration(pte)))
+ spin_unlock(ptl);
+ else
+ migration_entry_wait_on_locked(pte_to_swp_entry(pte), NULL, ptl);
}

+void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte)
+{
+ spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), vma->vm_mm, pte);
+
+ __migration_entry_wait_huge(pte, ptl);
+}
+#endif
+
#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
{
@@ -1631,8 +1646,9 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,

if (PageHuge(page)) {
if (PageHead(page)) {
- isolate_huge_page(page, pagelist);
- err = 1;
+ err = isolate_hugetlb(page, pagelist);
+ if (!err)
+ err = 1;
}
} else {
struct page *head;
diff --git a/mm/mmap.c b/mm/mmap.c
index 313b57d55a63..6d9bfd9c94ca 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1882,7 +1882,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr,

/* Undo any partial mapping done by a device driver. */
unmap_region(mm, vma, prev, vma->vm_start, vma->vm_end);
- charged = 0;
if (vm_flags & VM_SHARED)
mapping_unmap_writable(file->f_mapping);
free_vma:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 135a081edb82..a8bd356a4d1e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1287,12 +1287,8 @@ static void kernel_init_free_pages(struct page *page, int numpages)

/* s390's use of memset() could override KASAN redzones. */
kasan_disable_current();
- for (i = 0; i < numpages; i++) {
- u8 tag = page_kasan_tag(page + i);
- page_kasan_tag_reset(page + i);
- clear_highpage(page + i);
- page_kasan_tag_set(page + i, tag);
- }
+ for (i = 0; i < numpages; i++)
+ clear_highpage_kasan_tagged(page + i);
kasan_enable_current();
}

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index cadfbb5155ea..06c555fc5be0 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3170,15 +3170,15 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,

/*
* Mark the pages as accessible, now that they are mapped.
- * The init condition should match the one in post_alloc_hook()
- * (except for the should_skip_init() check) to make sure that memory
- * is initialized under the same conditions regardless of the enabled
- * KASAN mode.
+ * The condition for setting KASAN_VMALLOC_INIT should complement the
+ * one in post_alloc_hook() with regards to the __GFP_SKIP_ZERO check
+ * to make sure that memory is initialized under the same conditions.
* Tag-based KASAN modes only assign tags to normal non-executable
* allocations, see __kasan_unpoison_vmalloc().
*/
kasan_flags |= KASAN_VMALLOC_VM_ALLOC;
- if (!want_init_on_free() && want_init_on_alloc(gfp_mask))
+ if (!want_init_on_free() && want_init_on_alloc(gfp_mask) &&
+ (gfp_mask & __GFP_SKIP_ZERO))
kasan_flags |= KASAN_VMALLOC_INIT;
/* KASAN_VMALLOC_PROT_NORMAL already set if required. */
area->addr = kasan_unpoison_vmalloc(area->addr, real_size, kasan_flags);
diff --git a/net/9p/client.c b/net/9p/client.c
index 8bba0d9cf975..87cde948f628 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -305,7 +305,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
* callback), so p9_client_cb eats the second ref there
* as the pointer is duplicated directly by virtqueue_add_sgs()
*/
- refcount_set(&req->refcount.refcount, 2);
+ refcount_set(&req->refcount, 2);

return req;

@@ -341,7 +341,7 @@ struct p9_req_t *p9_tag_lookup(struct p9_client *c, u16 tag)
if (!p9_req_try_get(req))
goto again;
if (req->tc.tag != tag) {
- p9_req_put(req);
+ p9_req_put(c, req);
goto again;
}
}
@@ -367,21 +367,18 @@ static int p9_tag_remove(struct p9_client *c, struct p9_req_t *r)
spin_lock_irqsave(&c->lock, flags);
idr_remove(&c->reqs, tag);
spin_unlock_irqrestore(&c->lock, flags);
- return p9_req_put(r);
+ return p9_req_put(c, r);
}

-static void p9_req_free(struct kref *ref)
+int p9_req_put(struct p9_client *c, struct p9_req_t *r)
{
- struct p9_req_t *r = container_of(ref, struct p9_req_t, refcount);
-
- p9_fcall_fini(&r->tc);
- p9_fcall_fini(&r->rc);
- kmem_cache_free(p9_req_cache, r);
-}
-
-int p9_req_put(struct p9_req_t *r)
-{
- return kref_put(&r->refcount, p9_req_free);
+ if (refcount_dec_and_test(&r->refcount)) {
+ p9_fcall_fini(&r->tc);
+ p9_fcall_fini(&r->rc);
+ kmem_cache_free(p9_req_cache, r);
+ return 1;
+ }
+ return 0;
}
EXPORT_SYMBOL(p9_req_put);

@@ -426,7 +423,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)

wake_up(&req->wq);
p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc.tag);
- p9_req_put(req);
+ p9_req_put(c, req);
}
EXPORT_SYMBOL(p9_client_cb);

@@ -709,7 +706,7 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
reterr:
p9_tag_remove(c, req);
/* We have to put also the 2nd reference as it won't be used */
- p9_req_put(req);
+ p9_req_put(c, req);
return ERR_PTR(err);
}

@@ -746,7 +743,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
err = c->trans_mod->request(c, req);
if (err < 0) {
/* write won't happen */
- p9_req_put(req);
+ p9_req_put(c, req);
if (err != -ERESTARTSYS && err != -EFAULT)
c->status = Disconnected;
goto recalc_sigpending;
@@ -889,16 +886,13 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
struct p9_fid *fid;

p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
- fid = kmalloc(sizeof(*fid), GFP_KERNEL);
+ fid = kzalloc(sizeof(*fid), GFP_KERNEL);
if (!fid)
return NULL;

- memset(&fid->qid, 0, sizeof(fid->qid));
fid->mode = -1;
fid->uid = current_fsuid();
fid->clnt = clnt;
- fid->rdir = NULL;
- fid->fid = 0;
refcount_set(&fid->count, 1);

idr_preload(GFP_KERNEL);
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 8f8f95e39b03..e758978b44be 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -343,6 +343,7 @@ static void p9_read_work(struct work_struct *work)
p9_debug(P9_DEBUG_ERROR,
"No recv fcall for tag %d (req %p), disconnecting!\n",
m->rc.tag, m->rreq);
+ p9_req_put(m->client, m->rreq);
m->rreq = NULL;
err = -EIO;
goto error;
@@ -378,7 +379,7 @@ static void p9_read_work(struct work_struct *work)
m->rc.sdata = NULL;
m->rc.offset = 0;
m->rc.capacity = 0;
- p9_req_put(m->rreq);
+ p9_req_put(m->client, m->rreq);
m->rreq = NULL;
}

@@ -492,7 +493,7 @@ static void p9_write_work(struct work_struct *work)
m->wpos += err;
if (m->wpos == m->wsize) {
m->wpos = m->wsize = 0;
- p9_req_put(m->wreq);
+ p9_req_put(m->client, m->wreq);
m->wreq = NULL;
}

@@ -695,7 +696,7 @@ static int p9_fd_cancel(struct p9_client *client, struct p9_req_t *req)
if (req->status == REQ_STATUS_UNSENT) {
list_del(&req->req_list);
req->status = REQ_STATUS_FLSHD;
- p9_req_put(req);
+ p9_req_put(client, req);
ret = 0;
}
spin_unlock(&client->lock);
@@ -722,7 +723,7 @@ static int p9_fd_cancelled(struct p9_client *client, struct p9_req_t *req)
list_del(&req->req_list);
req->status = REQ_STATUS_FLSHD;
spin_unlock(&client->lock);
- p9_req_put(req);
+ p9_req_put(client, req);

return 0;
}
@@ -883,12 +884,12 @@ static void p9_conn_destroy(struct p9_conn *m)
p9_mux_poll_stop(m);
cancel_work_sync(&m->rq);
if (m->rreq) {
- p9_req_put(m->rreq);
+ p9_req_put(m->client, m->rreq);
m->rreq = NULL;
}
cancel_work_sync(&m->wq);
if (m->wreq) {
- p9_req_put(m->wreq);
+ p9_req_put(m->client, m->wreq);
m->wreq = NULL;
}

diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index 88e563826674..d817d3745238 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -350,7 +350,7 @@ send_done(struct ib_cq *cq, struct ib_wc *wc)
c->busa, c->req->tc.size,
DMA_TO_DEVICE);
up(&rdma->sq_sem);
- p9_req_put(c->req);
+ p9_req_put(client, c->req);
kfree(c);
}

diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index b24a4fb0f0a2..147972bf2e79 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -199,7 +199,7 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
/* Reply won't come, so drop req ref */
static int p9_virtio_cancelled(struct p9_client *client, struct p9_req_t *req)
{
- p9_req_put(req);
+ p9_req_put(client, req);
return 0;
}

@@ -523,7 +523,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
kvfree(out_pages);
if (!kicked) {
/* reply won't come */
- p9_req_put(req);
+ p9_req_put(client, req);
}
return err;
}
diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
index 77883b6788cd..4cf0c78d4d22 100644
--- a/net/9p/trans_xen.c
+++ b/net/9p/trans_xen.c
@@ -163,7 +163,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
ring->intf->out_prod = prod;
spin_unlock_irqrestore(&ring->lock, flags);
notify_remote_via_irq(ring->irq);
- p9_req_put(p9_req);
+ p9_req_put(client, p9_req);

return 0;
}
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 4c7030ed8d33..5b5363c99ed5 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1065,7 +1065,7 @@ static int ax25_release(struct socket *sock)
del_timer_sync(&ax25->t3timer);
del_timer_sync(&ax25->idletimer);
}
- dev_put_track(ax25_dev->dev, &ax25_dev->dev_tracker);
+ dev_put_track(ax25_dev->dev, &ax25->dev_tracker);
ax25_dev_put(ax25_dev);
}

@@ -1146,7 +1146,7 @@ static int ax25_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)

if (ax25_dev) {
ax25_fillin_cb(ax25, ax25_dev);
- dev_hold_track(ax25_dev->dev, &ax25_dev->dev_tracker, GFP_ATOMIC);
+ dev_hold_track(ax25_dev->dev, &ax25->dev_tracker, GFP_ATOMIC);
}

done:
diff --git a/net/batman-adv/trace.h b/net/batman-adv/trace.h
index d673ebdd0426..31c8f922651d 100644
--- a/net/batman-adv/trace.h
+++ b/net/batman-adv/trace.h
@@ -28,8 +28,6 @@

#endif /* CONFIG_BATMAN_ADV_TRACING */

-#define BATADV_MAX_MSG_LEN 256
-
TRACE_EVENT(batadv_dbg,

TP_PROTO(struct batadv_priv *bat_priv,
@@ -40,16 +38,13 @@ TRACE_EVENT(batadv_dbg,
TP_STRUCT__entry(
__string(device, bat_priv->soft_iface->name)
__string(driver, KBUILD_MODNAME)
- __dynamic_array(char, msg, BATADV_MAX_MSG_LEN)
+ __vstring(msg, vaf->fmt, vaf->va)
),

TP_fast_assign(
__assign_str(device, bat_priv->soft_iface->name);
__assign_str(driver, KBUILD_MODNAME);
- WARN_ON_ONCE(vsnprintf(__get_dynamic_array(msg),
- BATADV_MAX_MSG_LEN,
- vaf->fmt,
- *vaf->va) >= BATADV_MAX_MSG_LEN);
+ __assign_vstr(msg, vaf->fmt, vaf->va);
),

TP_printk(
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 19df3905c5f8..2b8b51b89452 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -593,6 +593,11 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
skb_queue_purge(&hdev->rx_q);
skb_queue_purge(&hdev->cmd_q);

+ /* Cancel these to avoid queueing non-chained pending work */
+ hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
+ cancel_delayed_work(&hdev->cmd_timer);
+ cancel_delayed_work(&hdev->ncmd_timer);
+
/* Avoid potential lockdep warnings from the *_flush() calls by
* ensuring the workqueue is empty up front.
*/
@@ -606,6 +611,8 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
if (hdev->flush)
hdev->flush(hdev);

+ hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
+
atomic_set(&hdev->cmd_cnt, 1);
hdev->acl_cnt = 0; hdev->sco_cnt = 0; hdev->le_cnt = 0;

@@ -3863,7 +3870,8 @@ static void hci_cmd_work(struct work_struct *work)
if (res < 0)
__hci_cmd_sync_cancel(hdev, -res);

- if (test_bit(HCI_RESET, &hdev->flags))
+ if (test_bit(HCI_RESET, &hdev->flags) ||
+ hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
cancel_delayed_work(&hdev->cmd_timer);
else
schedule_delayed_work(&hdev->cmd_timer,
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index af17dfb20e01..7cb956d3abb2 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -3768,8 +3768,9 @@ static inline void handle_cmd_cnt_and_timer(struct hci_dev *hdev, u8 ncmd)
cancel_delayed_work(&hdev->ncmd_timer);
atomic_set(&hdev->cmd_cnt, 1);
} else {
- schedule_delayed_work(&hdev->ncmd_timer,
- HCI_NCMD_TIMEOUT);
+ if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
+ schedule_delayed_work(&hdev->ncmd_timer,
+ HCI_NCMD_TIMEOUT);
}
}
}
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index 9e2a42299fc0..6f901398132e 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -1612,6 +1612,9 @@ static int hci_le_add_resolve_list_sync(struct hci_dev *hdev,
bacpy(&cp.bdaddr, &params->addr);
memcpy(cp.peer_irk, irk->val, 16);

+ /* Default privacy mode is always Network */
+ params->privacy_mode = HCI_NETWORK_PRIVACY;
+
done:
if (hci_dev_test_flag(hdev, HCI_PRIVACY))
memcpy(cp.local_irk, hdev->irk, 16);
@@ -5008,13 +5011,13 @@ static int hci_resume_scan_sync(struct hci_dev *hdev)
if (!hdev->scanning_paused)
return 0;

+ hdev->scanning_paused = false;
+
hci_update_scan_sync(hdev);

/* Reset passive scanning to normal */
hci_update_passive_scan_sync(hdev);

- hdev->scanning_paused = false;
-
return 0;
}

@@ -5033,7 +5036,6 @@ int hci_resume_sync(struct hci_dev *hdev)
return 0;

hdev->suspended = false;
- hdev->scanning_paused = false;

/* Restore event mask */
hci_set_event_mask_sync(hdev);
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
index 52668662ae8d..f18d0c72713f 100644
--- a/net/bluetooth/l2cap_core.c
+++ b/net/bluetooth/l2cap_core.c
@@ -1969,11 +1969,11 @@ static struct l2cap_chan *l2cap_global_chan_by_psm(int state, __le16 psm,
bdaddr_t *dst,
u8 link_type)
{
- struct l2cap_chan *c, *c1 = NULL;
+ struct l2cap_chan *c, *tmp, *c1 = NULL;

read_lock(&chan_list_lock);

- list_for_each_entry(c, &chan_list, global_l) {
+ list_for_each_entry_safe(c, tmp, &chan_list, global_l) {
if (state && c->state != state)
continue;

@@ -1992,11 +1992,10 @@ static struct l2cap_chan *l2cap_global_chan_by_psm(int state, __le16 psm,
dst_match = !bacmp(&c->dst, dst);
if (src_match && dst_match) {
c = l2cap_chan_hold_unless_zero(c);
- if (!c)
- continue;
-
- read_unlock(&chan_list_lock);
- return c;
+ if (c) {
+ read_unlock(&chan_list_lock);
+ return c;
+ }
}

/* Closest match */
diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
index ae758ab1b558..12c1ecdba3f3 100644
--- a/net/bluetooth/mgmt.c
+++ b/net/bluetooth/mgmt.c
@@ -6821,11 +6821,14 @@ static int get_conn_info(struct sock *sk, struct hci_dev *hdev, void *data,

cmd = mgmt_pending_new(sk, MGMT_OP_GET_CONN_INFO, hdev, data,
len);
- if (!cmd)
+ if (!cmd) {
err = -ENOMEM;
- else
+ } else {
+ hci_conn_hold(conn);
+ cmd->user_data = hci_conn_get(conn);
err = hci_cmd_sync_queue(hdev, get_conn_info_sync,
cmd, get_conn_info_complete);
+ }

if (err < 0) {
mgmt_cmd_complete(sk, hdev->id, MGMT_OP_GET_CONN_INFO,
@@ -6837,9 +6840,6 @@ static int get_conn_info(struct sock *sk, struct hci_dev *hdev, void *data,
goto unlock;
}

- hci_conn_hold(conn);
- cmd->user_data = hci_conn_get(conn);
-
conn->conn_info_timestamp = jiffies;
} else {
/* Cache is valid, just reply with values cached in hci_conn */
diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c
index d0e54e30658a..e78dadfc5829 100644
--- a/net/bpf/bpf_dummy_struct_ops.c
+++ b/net/bpf/bpf_dummy_struct_ops.c
@@ -72,13 +72,16 @@ static int dummy_ops_call_op(void *image, struct bpf_dummy_ops_test_args *args)
args->args[3], args->args[4]);
}

+extern const struct bpf_link_ops bpf_struct_ops_link_lops;
+
int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
union bpf_attr __user *uattr)
{
const struct bpf_struct_ops *st_ops = &bpf_bpf_dummy_ops;
const struct btf_type *func_proto;
struct bpf_dummy_ops_test_args *args;
- struct bpf_tramp_progs *tprogs;
+ struct bpf_tramp_links *tlinks;
+ struct bpf_tramp_link *link = NULL;
void *image = NULL;
unsigned int op_idx;
int prog_ret;
@@ -92,8 +95,8 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
if (IS_ERR(args))
return PTR_ERR(args);

- tprogs = kcalloc(BPF_TRAMP_MAX, sizeof(*tprogs), GFP_KERNEL);
- if (!tprogs) {
+ tlinks = kcalloc(BPF_TRAMP_MAX, sizeof(*tlinks), GFP_KERNEL);
+ if (!tlinks) {
err = -ENOMEM;
goto out;
}
@@ -105,8 +108,17 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
}
set_vm_flush_reset_perms(image);

+ link = kzalloc(sizeof(*link), GFP_USER);
+ if (!link) {
+ err = -ENOMEM;
+ goto out;
+ }
+ /* prog doesn't take the ownership of the reference from caller */
+ bpf_prog_inc(prog);
+ bpf_link_init(&link->link, BPF_LINK_TYPE_STRUCT_OPS, &bpf_struct_ops_link_lops, prog);
+
op_idx = prog->expected_attach_type;
- err = bpf_struct_ops_prepare_trampoline(tprogs, prog,
+ err = bpf_struct_ops_prepare_trampoline(tlinks, link,
&st_ops->func_models[op_idx],
image, image + PAGE_SIZE);
if (err < 0)
@@ -124,7 +136,9 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
out:
kfree(args);
bpf_jit_free_exec(image);
- kfree(tprogs);
+ if (link)
+ bpf_link_put(&link->link);
+ kfree(tlinks);
return err;
}

diff --git a/net/core/filter.c b/net/core/filter.c
index d0b0c163d3f3..a98f34cb5aee 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3917,7 +3917,7 @@ static void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len)
offset -= frag_size;
}
out:
- return offset + len < size ? addr + offset : NULL;
+ return offset + len <= size ? addr + offset : NULL;
}

BPF_CALL_4(bpf_xdp_load_bytes, struct xdp_buff *, xdp, u32, offset,
@@ -6642,7 +6642,7 @@ static const struct bpf_func_proto bpf_sk_release_proto = {
.func = bpf_sk_release,
.gpl_only = false,
.ret_type = RET_INTEGER,
- .arg1_type = ARG_PTR_TO_BTF_ID_SOCK_COMMON,
+ .arg1_type = ARG_PTR_TO_BTF_ID_SOCK_COMMON | OBJ_RELEASE,
};

BPF_CALL_5(bpf_xdp_sk_lookup_udp, struct xdp_buff *, ctx,
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index ede0af308f40..f50f8d95b628 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -462,7 +462,7 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,

if (copied == len)
break;
- } while (i != msg_rx->sg.end);
+ } while (!sg_is_last(sge));

if (unlikely(peek)) {
msg_rx = sk_psock_next_msg(psock, msg_rx);
@@ -472,7 +472,7 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
}

msg_rx->sg.start = i;
- if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) {
+ if (!sge->length && sg_is_last(sge)) {
msg_rx = sk_psock_dequeue_msg(psock);
kfree_sk_msg(msg_rx);
}
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index a976b4d29892..0ee62506105a 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -736,11 +736,6 @@ int dccp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)

lock_sock(sk);

- if (dccp_qpolicy_full(sk)) {
- rc = -EAGAIN;
- goto out_release;
- }
-
timeo = sock_sndtimeo(sk, noblock);

/*
@@ -759,6 +754,11 @@ int dccp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
if (skb == NULL)
goto out_release;

+ if (dccp_qpolicy_full(sk)) {
+ rc = -EAGAIN;
+ goto out_discard;
+ }
+
if (sk->sk_state == DCCP_CLOSED) {
rc = -ENOTCONN;
goto out_discard;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 5c207367b3b4..9a70032625af 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1920,6 +1920,8 @@ static int __init inet_init(void)

sock_skb_cb_check_size(sizeof(struct inet_skb_parm));

+ raw_hashinfo_init(&raw_v4_hashinfo);
+
rc = proto_register(&tcp_prot, 1);
if (rc)
goto out;
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 55654e335d43..d631198e0938 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -410,13 +410,11 @@ struct sock *__inet_lookup_established(struct net *net,
sk_nulls_for_each_rcu(sk, node, &head->chain) {
if (sk->sk_hash != hash)
continue;
- if (likely(INET_MATCH(sk, net, acookie,
- saddr, daddr, ports, dif, sdif))) {
+ if (likely(INET_MATCH(net, sk, acookie, ports, dif, sdif))) {
if (unlikely(!refcount_inc_not_zero(&sk->sk_refcnt)))
goto out;
- if (unlikely(!INET_MATCH(sk, net, acookie,
- saddr, daddr, ports,
- dif, sdif))) {
+ if (unlikely(!INET_MATCH(net, sk, acookie,
+ ports, dif, sdif))) {
sock_gen_put(sk);
goto begin;
}
@@ -465,8 +463,7 @@ static int __inet_check_established(struct inet_timewait_death_row *death_row,
if (sk2->sk_hash != hash)
continue;

- if (likely(INET_MATCH(sk2, net, acookie,
- saddr, daddr, ports, dif, sdif))) {
+ if (likely(INET_MATCH(net, sk2, acookie, ports, dif, sdif))) {
if (sk2->sk_state == TCP_TIME_WAIT) {
tw = inet_twsk(sk2);
if (twsk_unique(sk, sk2, twp))
@@ -532,16 +529,14 @@ static bool inet_ehash_lookup_by_sk(struct sock *sk,
if (esk->sk_hash != sk->sk_hash)
continue;
if (sk->sk_family == AF_INET) {
- if (unlikely(INET_MATCH(esk, net, acookie,
- sk->sk_daddr,
- sk->sk_rcv_saddr,
+ if (unlikely(INET_MATCH(net, esk, acookie,
ports, dif, sdif))) {
return true;
}
}
#if IS_ENABLED(CONFIG_IPV6)
else if (sk->sk_family == AF_INET6) {
- if (unlikely(INET6_MATCH(esk, net,
+ if (unlikely(inet6_match(net, esk,
&sk->sk_v6_daddr,
&sk->sk_v6_rcv_saddr,
ports, dif, sdif))) {
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index c9dd9603f2e7..d269b85630b2 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -85,20 +85,19 @@ struct raw_frag_vec {
int hlen;
};

-struct raw_hashinfo raw_v4_hashinfo = {
- .lock = __RW_LOCK_UNLOCKED(raw_v4_hashinfo.lock),
-};
+struct raw_hashinfo raw_v4_hashinfo;
EXPORT_SYMBOL_GPL(raw_v4_hashinfo);

int raw_hash_sk(struct sock *sk)
{
struct raw_hashinfo *h = sk->sk_prot->h.raw_hash;
- struct hlist_head *head;
+ struct hlist_nulls_head *hlist;

- head = &h->ht[inet_sk(sk)->inet_num & (RAW_HTABLE_SIZE - 1)];
+ hlist = &h->ht[inet_sk(sk)->inet_num & (RAW_HTABLE_SIZE - 1)];

write_lock_bh(&h->lock);
- sk_add_node(sk, head);
+ hlist_nulls_add_head_rcu(&sk->sk_nulls_node, hlist);
+ sock_set_flag(sk, SOCK_RCU_FREE);
write_unlock_bh(&h->lock);
sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);

@@ -111,30 +110,25 @@ void raw_unhash_sk(struct sock *sk)
struct raw_hashinfo *h = sk->sk_prot->h.raw_hash;

write_lock_bh(&h->lock);
- if (sk_del_node_init(sk))
+ if (__sk_nulls_del_node_init_rcu(sk))
sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
write_unlock_bh(&h->lock);
}
EXPORT_SYMBOL_GPL(raw_unhash_sk);

-struct sock *__raw_v4_lookup(struct net *net, struct sock *sk,
- unsigned short num, __be32 raddr, __be32 laddr,
- int dif, int sdif)
+bool raw_v4_match(struct net *net, struct sock *sk, unsigned short num,
+ __be32 raddr, __be32 laddr, int dif, int sdif)
{
- sk_for_each_from(sk) {
- struct inet_sock *inet = inet_sk(sk);
-
- if (net_eq(sock_net(sk), net) && inet->inet_num == num &&
- !(inet->inet_daddr && inet->inet_daddr != raddr) &&
- !(inet->inet_rcv_saddr && inet->inet_rcv_saddr != laddr) &&
- raw_sk_bound_dev_eq(net, sk->sk_bound_dev_if, dif, sdif))
- goto found; /* gotcha */
- }
- sk = NULL;
-found:
- return sk;
+ struct inet_sock *inet = inet_sk(sk);
+
+ if (net_eq(sock_net(sk), net) && inet->inet_num == num &&
+ !(inet->inet_daddr && inet->inet_daddr != raddr) &&
+ !(inet->inet_rcv_saddr && inet->inet_rcv_saddr != laddr) &&
+ raw_sk_bound_dev_eq(net, sk->sk_bound_dev_if, dif, sdif))
+ return true;
+ return false;
}
-EXPORT_SYMBOL_GPL(__raw_v4_lookup);
+EXPORT_SYMBOL_GPL(raw_v4_match);

/*
* 0 - deliver
@@ -168,23 +162,20 @@ static int icmp_filter(const struct sock *sk, const struct sk_buff *skb)
*/
static int raw_v4_input(struct sk_buff *skb, const struct iphdr *iph, int hash)
{
+ struct net *net = dev_net(skb->dev);
+ struct hlist_nulls_head *hlist;
+ struct hlist_nulls_node *hnode;
int sdif = inet_sdif(skb);
int dif = inet_iif(skb);
- struct sock *sk;
- struct hlist_head *head;
int delivered = 0;
- struct net *net;
-
- read_lock(&raw_v4_hashinfo.lock);
- head = &raw_v4_hashinfo.ht[hash];
- if (hlist_empty(head))
- goto out;
-
- net = dev_net(skb->dev);
- sk = __raw_v4_lookup(net, __sk_head(head), iph->protocol,
- iph->saddr, iph->daddr, dif, sdif);
+ struct sock *sk;

- while (sk) {
+ hlist = &raw_v4_hashinfo.ht[hash];
+ rcu_read_lock();
+ hlist_nulls_for_each_entry(sk, hnode, hlist, sk_nulls_node) {
+ if (!raw_v4_match(net, sk, iph->protocol,
+ iph->saddr, iph->daddr, dif, sdif))
+ continue;
delivered = 1;
if ((iph->protocol != IPPROTO_ICMP || !icmp_filter(sk, skb)) &&
ip_mc_sf_allow(sk, iph->daddr, iph->saddr,
@@ -195,31 +186,16 @@ static int raw_v4_input(struct sk_buff *skb, const struct iphdr *iph, int hash)
if (clone)
raw_rcv(sk, clone);
}
- sk = __raw_v4_lookup(net, sk_next(sk), iph->protocol,
- iph->saddr, iph->daddr,
- dif, sdif);
}
-out:
- read_unlock(&raw_v4_hashinfo.lock);
+ rcu_read_unlock();
return delivered;
}

int raw_local_deliver(struct sk_buff *skb, int protocol)
{
- int hash;
- struct sock *raw_sk;
-
- hash = protocol & (RAW_HTABLE_SIZE - 1);
- raw_sk = sk_head(&raw_v4_hashinfo.ht[hash]);
-
- /* If there maybe a raw socket we must check - if not we
- * don't care less
- */
- if (raw_sk && !raw_v4_input(skb, ip_hdr(skb), hash))
- raw_sk = NULL;
-
- return raw_sk != NULL;
+ int hash = protocol & (RAW_HTABLE_SIZE - 1);

+ return raw_v4_input(skb, ip_hdr(skb), hash);
}

static void raw_err(struct sock *sk, struct sk_buff *skb, u32 info)
@@ -286,31 +262,27 @@ static void raw_err(struct sock *sk, struct sk_buff *skb, u32 info)

void raw_icmp_error(struct sk_buff *skb, int protocol, u32 info)
{
- int hash;
- struct sock *raw_sk;
+ struct net *net = dev_net(skb->dev);
+ struct hlist_nulls_head *hlist;
+ struct hlist_nulls_node *hnode;
+ int dif = skb->dev->ifindex;
+ int sdif = inet_sdif(skb);
const struct iphdr *iph;
- struct net *net;
+ struct sock *sk;
+ int hash;

hash = protocol & (RAW_HTABLE_SIZE - 1);
+ hlist = &raw_v4_hashinfo.ht[hash];

- read_lock(&raw_v4_hashinfo.lock);
- raw_sk = sk_head(&raw_v4_hashinfo.ht[hash]);
- if (raw_sk) {
- int dif = skb->dev->ifindex;
- int sdif = inet_sdif(skb);
-
+ rcu_read_lock();
+ hlist_nulls_for_each_entry(sk, hnode, hlist, sk_nulls_node) {
iph = (const struct iphdr *)skb->data;
- net = dev_net(skb->dev);
-
- while ((raw_sk = __raw_v4_lookup(net, raw_sk, protocol,
- iph->daddr, iph->saddr,
- dif, sdif)) != NULL) {
- raw_err(raw_sk, skb, info);
- raw_sk = sk_next(raw_sk);
- iph = (const struct iphdr *)skb->data;
- }
+ if (!raw_v4_match(net, sk, iph->protocol,
+ iph->daddr, iph->saddr, dif, sdif))
+ continue;
+ raw_err(sk, skb, info);
}
- read_unlock(&raw_v4_hashinfo.lock);
+ rcu_read_unlock();
}

static int raw_rcv_skb(struct sock *sk, struct sk_buff *skb)
@@ -972,44 +944,41 @@ struct proto raw_prot = {
};

#ifdef CONFIG_PROC_FS
-static struct sock *raw_get_first(struct seq_file *seq)
+static struct sock *raw_get_first(struct seq_file *seq, int bucket)
{
- struct sock *sk;
struct raw_hashinfo *h = pde_data(file_inode(seq->file));
struct raw_iter_state *state = raw_seq_private(seq);
+ struct hlist_nulls_head *hlist;
+ struct hlist_nulls_node *hnode;
+ struct sock *sk;

- for (state->bucket = 0; state->bucket < RAW_HTABLE_SIZE;
+ for (state->bucket = bucket; state->bucket < RAW_HTABLE_SIZE;
++state->bucket) {
- sk_for_each(sk, &h->ht[state->bucket])
+ hlist = &h->ht[state->bucket];
+ hlist_nulls_for_each_entry(sk, hnode, hlist, sk_nulls_node) {
if (sock_net(sk) == seq_file_net(seq))
- goto found;
+ return sk;
+ }
}
- sk = NULL;
-found:
- return sk;
+ return NULL;
}

static struct sock *raw_get_next(struct seq_file *seq, struct sock *sk)
{
- struct raw_hashinfo *h = pde_data(file_inode(seq->file));
struct raw_iter_state *state = raw_seq_private(seq);

do {
- sk = sk_next(sk);
-try_again:
- ;
+ sk = sk_nulls_next(sk);
} while (sk && sock_net(sk) != seq_file_net(seq));

- if (!sk && ++state->bucket < RAW_HTABLE_SIZE) {
- sk = sk_head(&h->ht[state->bucket]);
- goto try_again;
- }
+ if (!sk)
+ return raw_get_first(seq, state->bucket + 1);
return sk;
}

static struct sock *raw_get_idx(struct seq_file *seq, loff_t pos)
{
- struct sock *sk = raw_get_first(seq);
+ struct sock *sk = raw_get_first(seq, 0);

if (sk)
while (pos && (sk = raw_get_next(seq, sk)) != NULL)
@@ -1018,11 +987,9 @@ static struct sock *raw_get_idx(struct seq_file *seq, loff_t pos)
}

void *raw_seq_start(struct seq_file *seq, loff_t *pos)
- __acquires(&h->lock)
+ __acquires(RCU)
{
- struct raw_hashinfo *h = pde_data(file_inode(seq->file));
-
- read_lock(&h->lock);
+ rcu_read_lock();
return *pos ? raw_get_idx(seq, *pos - 1) : SEQ_START_TOKEN;
}
EXPORT_SYMBOL_GPL(raw_seq_start);
@@ -1032,7 +999,7 @@ void *raw_seq_next(struct seq_file *seq, void *v, loff_t *pos)
struct sock *sk;

if (v == SEQ_START_TOKEN)
- sk = raw_get_first(seq);
+ sk = raw_get_first(seq, 0);
else
sk = raw_get_next(seq, v);
++*pos;
@@ -1041,11 +1008,9 @@ void *raw_seq_next(struct seq_file *seq, void *v, loff_t *pos)
EXPORT_SYMBOL_GPL(raw_seq_next);

void raw_seq_stop(struct seq_file *seq, void *v)
- __releases(&h->lock)
+ __releases(RCU)
{
- struct raw_hashinfo *h = pde_data(file_inode(seq->file));
-
- read_unlock(&h->lock);
+ rcu_read_unlock();
}
EXPORT_SYMBOL_GPL(raw_seq_stop);

@@ -1107,6 +1072,7 @@ static __net_initdata struct pernet_operations raw_net_ops = {

int __init raw_proc_init(void)
{
+
return register_pernet_subsys(&raw_net_ops);
}

diff --git a/net/ipv4/raw_diag.c b/net/ipv4/raw_diag.c
index ccacbde30a2c..5f208e840d85 100644
--- a/net/ipv4/raw_diag.c
+++ b/net/ipv4/raw_diag.c
@@ -34,57 +34,57 @@ raw_get_hashinfo(const struct inet_diag_req_v2 *r)
* use helper to figure it out.
*/

-static struct sock *raw_lookup(struct net *net, struct sock *from,
- const struct inet_diag_req_v2 *req)
+static bool raw_lookup(struct net *net, struct sock *sk,
+ const struct inet_diag_req_v2 *req)
{
struct inet_diag_req_raw *r = (void *)req;
- struct sock *sk = NULL;

if (r->sdiag_family == AF_INET)
- sk = __raw_v4_lookup(net, from, r->sdiag_raw_protocol,
- r->id.idiag_dst[0],
- r->id.idiag_src[0],
- r->id.idiag_if, 0);
+ return raw_v4_match(net, sk, r->sdiag_raw_protocol,
+ r->id.idiag_dst[0],
+ r->id.idiag_src[0],
+ r->id.idiag_if, 0);
#if IS_ENABLED(CONFIG_IPV6)
else
- sk = __raw_v6_lookup(net, from, r->sdiag_raw_protocol,
- (const struct in6_addr *)r->id.idiag_src,
- (const struct in6_addr *)r->id.idiag_dst,
- r->id.idiag_if, 0);
+ return raw_v6_match(net, sk, r->sdiag_raw_protocol,
+ (const struct in6_addr *)r->id.idiag_src,
+ (const struct in6_addr *)r->id.idiag_dst,
+ r->id.idiag_if, 0);
#endif
- return sk;
+ return false;
}

static struct sock *raw_sock_get(struct net *net, const struct inet_diag_req_v2 *r)
{
struct raw_hashinfo *hashinfo = raw_get_hashinfo(r);
- struct sock *sk = NULL, *s;
+ struct hlist_nulls_head *hlist;
+ struct hlist_nulls_node *hnode;
+ struct sock *sk;
int slot;

if (IS_ERR(hashinfo))
return ERR_CAST(hashinfo);

- read_lock(&hashinfo->lock);
+ rcu_read_lock();
for (slot = 0; slot < RAW_HTABLE_SIZE; slot++) {
- sk_for_each(s, &hashinfo->ht[slot]) {
- sk = raw_lookup(net, s, r);
- if (sk) {
+ hlist = &hashinfo->ht[slot];
+ hlist_nulls_for_each_entry(sk, hnode, hlist, sk_nulls_node) {
+ if (raw_lookup(net, sk, r)) {
/*
* Grab it and keep until we fill
- * diag meaage to be reported, so
+ * diag message to be reported, so
* caller should call sock_put then.
- * We can do that because we're keeping
- * hashinfo->lock here.
*/
- sock_hold(sk);
- goto out_unlock;
+ if (refcount_inc_not_zero(&sk->sk_refcnt))
+ goto out_unlock;
}
}
}
+ sk = ERR_PTR(-ENOENT);
out_unlock:
- read_unlock(&hashinfo->lock);
+ rcu_read_unlock();

- return sk ? sk : ERR_PTR(-ENOENT);
+ return sk;
}

static int raw_diag_dump_one(struct netlink_callback *cb,
@@ -142,6 +142,8 @@ static void raw_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
struct raw_hashinfo *hashinfo = raw_get_hashinfo(r);
struct net *net = sock_net(skb->sk);
struct inet_diag_dump_data *cb_data;
+ struct hlist_nulls_head *hlist;
+ struct hlist_nulls_node *hnode;
int num, s_num, slot, s_slot;
struct sock *sk = NULL;
struct nlattr *bc;
@@ -158,7 +160,8 @@ static void raw_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
for (slot = s_slot; slot < RAW_HTABLE_SIZE; s_num = 0, slot++) {
num = 0;

- sk_for_each(sk, &hashinfo->ht[slot]) {
+ hlist = &hashinfo->ht[slot];
+ hlist_nulls_for_each_entry(sk, hnode, hlist, sk_nulls_node) {
struct inet_sock *inet = inet_sk(sk);

if (!net_eq(sock_net(sk), net))
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 91735d631a28..51116166e3d2 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -953,6 +953,23 @@ static int tcp_downgrade_zcopy_pure(struct sock *sk, struct sk_buff *skb)
return 0;
}

+static int tcp_wmem_schedule(struct sock *sk, int copy)
+{
+ int left;
+
+ if (likely(sk_wmem_schedule(sk, copy)))
+ return copy;
+
+ /* We could be in trouble if we have nothing queued.
+ * Use whatever is left in sk->sk_forward_alloc and tcp_wmem[0]
+ * to guarantee some progress.
+ */
+ left = sock_net(sk)->ipv4.sysctl_tcp_wmem[0] - sk->sk_wmem_queued;
+ if (left > 0)
+ sk_forced_mem_schedule(sk, min(left, copy));
+ return min(copy, sk->sk_forward_alloc);
+}
+
static struct sk_buff *tcp_build_frag(struct sock *sk, int size_goal, int flags,
struct page *page, int offset, size_t *size)
{
@@ -988,7 +1005,11 @@ static struct sk_buff *tcp_build_frag(struct sock *sk, int size_goal, int flags,
tcp_mark_push(tp, skb);
goto new_segment;
}
- if (tcp_downgrade_zcopy_pure(sk, skb) || !sk_wmem_schedule(sk, copy))
+ if (tcp_downgrade_zcopy_pure(sk, skb))
+ return NULL;
+
+ copy = tcp_wmem_schedule(sk, copy);
+ if (!copy)
return NULL;

if (can_coalesce) {
@@ -1337,8 +1358,11 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)

copy = min_t(int, copy, pfrag->size - pfrag->offset);

- if (tcp_downgrade_zcopy_pure(sk, skb) ||
- !sk_wmem_schedule(sk, copy))
+ if (tcp_downgrade_zcopy_pure(sk, skb))
+ goto wait_for_space;
+
+ copy = tcp_wmem_schedule(sk, copy);
+ if (!copy)
goto wait_for_space;

err = skb_copy_to_page_nocache(sk, &msg->msg_iter, skb,
@@ -1365,7 +1389,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
skb_shinfo(skb)->flags |= SKBFL_PURE_ZEROCOPY;

if (!skb_zcopy_pure(skb)) {
- if (!sk_wmem_schedule(sk, copy))
+ copy = tcp_wmem_schedule(sk, copy);
+ if (!copy)
goto wait_for_space;
}

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a7f0a1f0c2a3..b19090b81ff0 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3140,7 +3140,7 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
struct tcp_sock *tp = tcp_sk(sk);
unsigned int cur_mss;
int diff, len, err;
-
+ int avail_wnd;

/* Inconclusive MTU probe */
if (icsk->icsk_mtup.probe_size)
@@ -3162,17 +3162,25 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
return -EHOSTUNREACH; /* Routing failure or similar. */

cur_mss = tcp_current_mss(sk);
+ avail_wnd = tcp_wnd_end(tp) - TCP_SKB_CB(skb)->seq;

/* If receiver has shrunk his window, and skb is out of
* new window, do not retransmit it. The exception is the
* case, when window is shrunk to zero. In this case
- * our retransmit serves as a zero window probe.
+ * our retransmit of one segment serves as a zero window probe.
*/
- if (!before(TCP_SKB_CB(skb)->seq, tcp_wnd_end(tp)) &&
- TCP_SKB_CB(skb)->seq != tp->snd_una)
- return -EAGAIN;
+ if (avail_wnd <= 0) {
+ if (TCP_SKB_CB(skb)->seq != tp->snd_una)
+ return -EAGAIN;
+ avail_wnd = cur_mss;
+ }

len = cur_mss * segs;
+ if (len > avail_wnd) {
+ len = rounddown(avail_wnd, cur_mss);
+ if (!len)
+ len = avail_wnd;
+ }
if (skb->len > len) {
if (tcp_fragment(sk, TCP_FRAG_IN_RTX_QUEUE, skb, len,
cur_mss, GFP_ATOMIC))
@@ -3186,8 +3194,9 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
diff -= tcp_skb_pcount(skb);
if (diff)
tcp_adjust_pcount(sk, skb, diff);
- if (skb->len < cur_mss)
- tcp_retrans_try_collapse(sk, skb, cur_mss);
+ avail_wnd = min_t(int, avail_wnd, cur_mss);
+ if (skb->len < avail_wnd)
+ tcp_retrans_try_collapse(sk, skb, avail_wnd);
}

/* RFC3168, section 6.1.1.1. ECN fallback */
@@ -3358,11 +3367,12 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
*/
void sk_forced_mem_schedule(struct sock *sk, int size)
{
- int amt;
+ int delta, amt;

- if (size <= sk->sk_forward_alloc)
+ delta = size - sk->sk_forward_alloc;
+ if (delta <= 0)
return;
- amt = sk_mem_pages(size);
+ amt = sk_mem_pages(delta);
sk->sk_forward_alloc += amt * SK_MEM_QUANTUM;
sk_memory_allocated_add(sk, amt);

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 6b4d8361560f..201e2533acb2 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2564,8 +2564,7 @@ static struct sock *__udp4_lib_demux_lookup(struct net *net,
struct sock *sk;

udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
- if (INET_MATCH(sk, net, acookie, rmt_addr,
- loc_addr, ports, dif, sdif))
+ if (INET_MATCH(net, sk, acookie, ports, dif, sdif))
return sk;
/* Only check first socket in chain */
break;
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index ef1e6545d869..095298c5727a 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -63,6 +63,7 @@
#include <net/compat.h>
#include <net/xfrm.h>
#include <net/ioam6.h>
+#include <net/rawv6.h>

#include <linux/uaccess.h>
#include <linux/mroute6.h>
@@ -1074,6 +1075,8 @@ static int __init inet6_init(void)
goto out;
}

+ raw_hashinfo_init(&raw_v6_hashinfo);
+
err = proto_register(&tcpv6_prot, 1);
if (err)
goto out;
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 32ccac10bd62..bb4939e36840 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -71,12 +71,12 @@ struct sock *__inet6_lookup_established(struct net *net,
sk_nulls_for_each_rcu(sk, node, &head->chain) {
if (sk->sk_hash != hash)
continue;
- if (!INET6_MATCH(sk, net, saddr, daddr, ports, dif, sdif))
+ if (!inet6_match(net, sk, saddr, daddr, ports, dif, sdif))
continue;
if (unlikely(!refcount_inc_not_zero(&sk->sk_refcnt)))
goto out;

- if (unlikely(!INET6_MATCH(sk, net, saddr, daddr, ports, dif, sdif))) {
+ if (unlikely(!inet6_match(net, sk, saddr, daddr, ports, dif, sdif))) {
sock_gen_put(sk);
goto begin;
}
@@ -269,7 +269,7 @@ static int __inet6_check_established(struct inet_timewait_death_row *death_row,
if (sk2->sk_hash != hash)
continue;

- if (likely(INET6_MATCH(sk2, net, saddr, daddr, ports,
+ if (likely(inet6_match(net, sk2, saddr, daddr, ports,
dif, sdif))) {
if (sk2->sk_state == TCP_TIME_WAIT) {
tw = inet_twsk(sk2);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 8bb41f3b246a..62f0a3d08d4d 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -61,46 +61,30 @@

#define ICMPV6_HDRLEN 4 /* ICMPv6 header, RFC 4443 Section 2.1 */

-struct raw_hashinfo raw_v6_hashinfo = {
- .lock = __RW_LOCK_UNLOCKED(raw_v6_hashinfo.lock),
-};
+struct raw_hashinfo raw_v6_hashinfo;
EXPORT_SYMBOL_GPL(raw_v6_hashinfo);

-struct sock *__raw_v6_lookup(struct net *net, struct sock *sk,
- unsigned short num, const struct in6_addr *loc_addr,
- const struct in6_addr *rmt_addr, int dif, int sdif)
+bool raw_v6_match(struct net *net, struct sock *sk, unsigned short num,
+ const struct in6_addr *loc_addr,
+ const struct in6_addr *rmt_addr, int dif, int sdif)
{
- bool is_multicast = ipv6_addr_is_multicast(loc_addr);
-
- sk_for_each_from(sk)
- if (inet_sk(sk)->inet_num == num) {
-
- if (!net_eq(sock_net(sk), net))
- continue;
-
- if (!ipv6_addr_any(&sk->sk_v6_daddr) &&
- !ipv6_addr_equal(&sk->sk_v6_daddr, rmt_addr))
- continue;
-
- if (!raw_sk_bound_dev_eq(net, sk->sk_bound_dev_if,
- dif, sdif))
- continue;
-
- if (!ipv6_addr_any(&sk->sk_v6_rcv_saddr)) {
- if (ipv6_addr_equal(&sk->sk_v6_rcv_saddr, loc_addr))
- goto found;
- if (is_multicast &&
- inet6_mc_check(sk, loc_addr, rmt_addr))
- goto found;
- continue;
- }
- goto found;
- }
- sk = NULL;
-found:
- return sk;
+ if (inet_sk(sk)->inet_num != num ||
+ !net_eq(sock_net(sk), net) ||
+ (!ipv6_addr_any(&sk->sk_v6_daddr) &&
+ !ipv6_addr_equal(&sk->sk_v6_daddr, rmt_addr)) ||
+ !raw_sk_bound_dev_eq(net, sk->sk_bound_dev_if,
+ dif, sdif))
+ return false;
+
+ if (ipv6_addr_any(&sk->sk_v6_rcv_saddr) ||
+ ipv6_addr_equal(&sk->sk_v6_rcv_saddr, loc_addr) ||
+ (ipv6_addr_is_multicast(loc_addr) &&
+ inet6_mc_check(sk, loc_addr, rmt_addr)))
+ return true;
+
+ return false;
}
-EXPORT_SYMBOL_GPL(__raw_v6_lookup);
+EXPORT_SYMBOL_GPL(raw_v6_match);

/*
* 0 - deliver
@@ -156,31 +140,27 @@ EXPORT_SYMBOL(rawv6_mh_filter_unregister);
*/
static bool ipv6_raw_deliver(struct sk_buff *skb, int nexthdr)
{
+ struct net *net = dev_net(skb->dev);
+ struct hlist_nulls_head *hlist;
+ struct hlist_nulls_node *hnode;
const struct in6_addr *saddr;
const struct in6_addr *daddr;
struct sock *sk;
bool delivered = false;
__u8 hash;
- struct net *net;

saddr = &ipv6_hdr(skb)->saddr;
daddr = saddr + 1;

hash = nexthdr & (RAW_HTABLE_SIZE - 1);
-
- read_lock(&raw_v6_hashinfo.lock);
- sk = sk_head(&raw_v6_hashinfo.ht[hash]);
-
- if (!sk)
- goto out;
-
- net = dev_net(skb->dev);
- sk = __raw_v6_lookup(net, sk, nexthdr, daddr, saddr,
- inet6_iif(skb), inet6_sdif(skb));
-
- while (sk) {
+ hlist = &raw_v6_hashinfo.ht[hash];
+ rcu_read_lock();
+ hlist_nulls_for_each_entry(sk, hnode, hlist, sk_nulls_node) {
int filtered;

+ if (!raw_v6_match(net, sk, nexthdr, daddr, saddr,
+ inet6_iif(skb), inet6_sdif(skb)))
+ continue;
delivered = true;
switch (nexthdr) {
case IPPROTO_ICMPV6:
@@ -219,23 +199,14 @@ static bool ipv6_raw_deliver(struct sk_buff *skb, int nexthdr)
rawv6_rcv(sk, clone);
}
}
- sk = __raw_v6_lookup(net, sk_next(sk), nexthdr, daddr, saddr,
- inet6_iif(skb), inet6_sdif(skb));
}
-out:
- read_unlock(&raw_v6_hashinfo.lock);
+ rcu_read_unlock();
return delivered;
}

bool raw6_local_deliver(struct sk_buff *skb, int nexthdr)
{
- struct sock *raw_sk;
-
- raw_sk = sk_head(&raw_v6_hashinfo.ht[nexthdr & (RAW_HTABLE_SIZE - 1)]);
- if (raw_sk && !ipv6_raw_deliver(skb, nexthdr))
- raw_sk = NULL;
-
- return raw_sk != NULL;
+ return ipv6_raw_deliver(skb, nexthdr);
}

/* This cleans up af_inet6 a bit. -DaveM */
@@ -361,30 +332,25 @@ static void rawv6_err(struct sock *sk, struct sk_buff *skb,
void raw6_icmp_error(struct sk_buff *skb, int nexthdr,
u8 type, u8 code, int inner_offset, __be32 info)
{
+ struct net *net = dev_net(skb->dev);
+ struct hlist_nulls_head *hlist;
+ struct hlist_nulls_node *hnode;
struct sock *sk;
int hash;
- const struct in6_addr *saddr, *daddr;
- struct net *net;

hash = nexthdr & (RAW_HTABLE_SIZE - 1);
-
- read_lock(&raw_v6_hashinfo.lock);
- sk = sk_head(&raw_v6_hashinfo.ht[hash]);
- if (sk) {
+ hlist = &raw_v6_hashinfo.ht[hash];
+ rcu_read_lock();
+ hlist_nulls_for_each_entry(sk, hnode, hlist, sk_nulls_node) {
/* Note: ipv6_hdr(skb) != skb->data */
const struct ipv6hdr *ip6h = (const struct ipv6hdr *)skb->data;
- saddr = &ip6h->saddr;
- daddr = &ip6h->daddr;
- net = dev_net(skb->dev);
-
- while ((sk = __raw_v6_lookup(net, sk, nexthdr, saddr, daddr,
- inet6_iif(skb), inet6_iif(skb)))) {
- rawv6_err(sk, skb, NULL, type, code,
- inner_offset, info);
- sk = sk_next(sk);
- }
+
+ if (!raw_v6_match(net, sk, nexthdr, &ip6h->saddr, &ip6h->daddr,
+ inet6_iif(skb), inet6_iif(skb)))
+ continue;
+ rawv6_err(sk, skb, NULL, type, code, inner_offset, info);
}
- read_unlock(&raw_v6_hashinfo.lock);
+ rcu_read_unlock();
}

static inline int rawv6_rcv_skb(struct sock *sk, struct sk_buff *skb)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index aea28bf701be..544842fc831e 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1044,7 +1044,7 @@ static struct sock *__udp6_lib_demux_lookup(struct net *net,

udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
if (sk->sk_state == TCP_ESTABLISHED &&
- INET6_MATCH(sk, net, rmt_addr, loc_addr, ports, dif, sdif))
+ inet6_match(net, sk, rmt_addr, loc_addr, ports, dif, sdif))
return sk;
/* Only check first socket in chain */
break;
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 07b5a2044cab..9c25e6a9a279 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -323,9 +323,10 @@ static bool mptcp_rmem_schedule(struct sock *sk, struct sock *ssk, int size)
struct mptcp_sock *msk = mptcp_sk(sk);
int amt, amount;

- if (size < msk->rmem_fwd_alloc)
+ if (size <= msk->rmem_fwd_alloc)
return true;

+ size -= msk->rmem_fwd_alloc;
amt = sk_mem_pages(size);
amount = amt << SK_MEM_QUANTUM_SHIFT;
msk->rmem_fwd_alloc += amount;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index de3dc35ce609..2b60923cf87b 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -153,6 +153,7 @@ static struct nft_trans *nft_trans_alloc_gfp(const struct nft_ctx *ctx,
if (trans == NULL)
return NULL;

+ INIT_LIST_HEAD(&trans->list);
trans->msg_type = msg_type;
trans->ctx = *ctx;

@@ -2472,6 +2473,7 @@ static int nf_tables_updchain(struct nft_ctx *ctx, u8 genmask, u8 policy,
}

static struct nft_chain *nft_chain_lookup_byid(const struct net *net,
+ const struct nft_table *table,
const struct nlattr *nla)
{
struct nftables_pernet *nft_net = nft_pernet(net);
@@ -2482,6 +2484,7 @@ static struct nft_chain *nft_chain_lookup_byid(const struct net *net,
struct nft_chain *chain = trans->ctx.chain;

if (trans->msg_type == NFT_MSG_NEWCHAIN &&
+ chain->table == table &&
id == nft_trans_chain_id(trans))
return chain;
}
@@ -3369,6 +3372,7 @@ static int nft_table_validate(struct net *net, const struct nft_table *table)
}

static struct nft_rule *nft_rule_lookup_byid(const struct net *net,
+ const struct nft_chain *chain,
const struct nlattr *nla);

#define NFT_RULE_MAXEXPRS 128
@@ -3415,7 +3419,7 @@ static int nf_tables_newrule(struct sk_buff *skb, const struct nfnl_info *info,
return -EOPNOTSUPP;

} else if (nla[NFTA_RULE_CHAIN_ID]) {
- chain = nft_chain_lookup_byid(net, nla[NFTA_RULE_CHAIN_ID]);
+ chain = nft_chain_lookup_byid(net, table, nla[NFTA_RULE_CHAIN_ID]);
if (IS_ERR(chain)) {
NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_CHAIN_ID]);
return PTR_ERR(chain);
@@ -3457,7 +3461,7 @@ static int nf_tables_newrule(struct sk_buff *skb, const struct nfnl_info *info,
return PTR_ERR(old_rule);
}
} else if (nla[NFTA_RULE_POSITION_ID]) {
- old_rule = nft_rule_lookup_byid(net, nla[NFTA_RULE_POSITION_ID]);
+ old_rule = nft_rule_lookup_byid(net, chain, nla[NFTA_RULE_POSITION_ID]);
if (IS_ERR(old_rule)) {
NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_POSITION_ID]);
return PTR_ERR(old_rule);
@@ -3602,6 +3606,7 @@ static int nf_tables_newrule(struct sk_buff *skb, const struct nfnl_info *info,
}

static struct nft_rule *nft_rule_lookup_byid(const struct net *net,
+ const struct nft_chain *chain,
const struct nlattr *nla)
{
struct nftables_pernet *nft_net = nft_pernet(net);
@@ -3612,6 +3617,7 @@ static struct nft_rule *nft_rule_lookup_byid(const struct net *net,
struct nft_rule *rule = nft_trans_rule(trans);

if (trans->msg_type == NFT_MSG_NEWRULE &&
+ trans->ctx.chain == chain &&
id == nft_trans_rule_id(trans))
return rule;
}
@@ -3661,7 +3667,7 @@ static int nf_tables_delrule(struct sk_buff *skb, const struct nfnl_info *info,

err = nft_delrule(&ctx, rule);
} else if (nla[NFTA_RULE_ID]) {
- rule = nft_rule_lookup_byid(net, nla[NFTA_RULE_ID]);
+ rule = nft_rule_lookup_byid(net, chain, nla[NFTA_RULE_ID]);
if (IS_ERR(rule)) {
NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_ID]);
return PTR_ERR(rule);
@@ -3840,6 +3846,7 @@ static struct nft_set *nft_set_lookup_byhandle(const struct nft_table *table,
}

static struct nft_set *nft_set_lookup_byid(const struct net *net,
+ const struct nft_table *table,
const struct nlattr *nla, u8 genmask)
{
struct nftables_pernet *nft_net = nft_pernet(net);
@@ -3851,6 +3858,7 @@ static struct nft_set *nft_set_lookup_byid(const struct net *net,
struct nft_set *set = nft_trans_set(trans);

if (id == nft_trans_set_id(trans) &&
+ set->table == table &&
nft_active_genmask(set, genmask))
return set;
}
@@ -3871,7 +3879,7 @@ struct nft_set *nft_set_lookup_global(const struct net *net,
if (!nla_set_id)
return set;

- set = nft_set_lookup_byid(net, nla_set_id, genmask);
+ set = nft_set_lookup_byid(net, table, nla_set_id, genmask);
}
return set;
}
@@ -9595,7 +9603,7 @@ static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data,
tb[NFTA_VERDICT_CHAIN],
genmask);
} else if (tb[NFTA_VERDICT_CHAIN_ID]) {
- chain = nft_chain_lookup_byid(ctx->net,
+ chain = nft_chain_lookup_byid(ctx->net, ctx->table,
tb[NFTA_VERDICT_CHAIN_ID]);
if (IS_ERR(chain))
return PTR_ERR(chain);
diff --git a/net/netfilter/nft_queue.c b/net/netfilter/nft_queue.c
index 15e4b7640dc0..da29e92c03e2 100644
--- a/net/netfilter/nft_queue.c
+++ b/net/netfilter/nft_queue.c
@@ -68,6 +68,31 @@ static void nft_queue_sreg_eval(const struct nft_expr *expr,
regs->verdict.code = ret;
}

+static int nft_queue_validate(const struct nft_ctx *ctx,
+ const struct nft_expr *expr,
+ const struct nft_data **data)
+{
+ static const unsigned int supported_hooks = ((1 << NF_INET_PRE_ROUTING) |
+ (1 << NF_INET_LOCAL_IN) |
+ (1 << NF_INET_FORWARD) |
+ (1 << NF_INET_LOCAL_OUT) |
+ (1 << NF_INET_POST_ROUTING));
+
+ switch (ctx->family) {
+ case NFPROTO_IPV4:
+ case NFPROTO_IPV6:
+ case NFPROTO_INET:
+ case NFPROTO_BRIDGE:
+ break;
+ case NFPROTO_NETDEV: /* lacks okfn */
+ fallthrough;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ return nft_chain_validate_hooks(ctx->chain, supported_hooks);
+}
+
static const struct nla_policy nft_queue_policy[NFTA_QUEUE_MAX + 1] = {
[NFTA_QUEUE_NUM] = { .type = NLA_U16 },
[NFTA_QUEUE_TOTAL] = { .type = NLA_U16 },
@@ -164,6 +189,7 @@ static const struct nft_expr_ops nft_queue_ops = {
.eval = nft_queue_eval,
.init = nft_queue_init,
.dump = nft_queue_dump,
+ .validate = nft_queue_validate,
.reduce = NFT_REDUCE_READONLY,
};

@@ -173,6 +199,7 @@ static const struct nft_expr_ops nft_queue_sreg_ops = {
.eval = nft_queue_sreg_eval,
.init = nft_queue_sreg_init,
.dump = nft_queue_sreg_dump,
+ .validate = nft_queue_validate,
.reduce = NFT_REDUCE_READONLY,
};

diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index bf2d986a6bc3..a8e3ec800a9c 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -192,6 +192,7 @@ static void rose_kill_by_device(struct net_device *dev)
rose_disconnect(s, ENETUNREACH, ROSE_OUT_OF_ORDER, 0);
if (rose->neighbour)
rose->neighbour->use--;
+ dev_put(rose->device);
rose->device = NULL;
}
}
@@ -592,6 +593,8 @@ static struct sock *rose_make_new(struct sock *osk)
rose->idle = orose->idle;
rose->defer = orose->defer;
rose->device = orose->device;
+ if (rose->device)
+ dev_hold(rose->device);
rose->qbitincl = orose->qbitincl;

return sk;
@@ -645,6 +648,7 @@ static int rose_release(struct socket *sock)
break;
}

+ dev_put(rose->device);
sock->sk = NULL;
release_sock(sk);
sock_put(sk);
@@ -721,7 +725,6 @@ static int rose_connect(struct socket *sock, struct sockaddr *uaddr, int addr_le
struct rose_sock *rose = rose_sk(sk);
struct sockaddr_rose *addr = (struct sockaddr_rose *)uaddr;
unsigned char cause, diagnostic;
- struct net_device *dev;
ax25_uid_assoc *user;
int n, err = 0;

@@ -778,9 +781,12 @@ static int rose_connect(struct socket *sock, struct sockaddr *uaddr, int addr_le
}

if (sock_flag(sk, SOCK_ZAPPED)) { /* Must bind first - autobinding in this may or may not work */
+ struct net_device *dev;
+
sock_reset_flag(sk, SOCK_ZAPPED);

- if ((dev = rose_dev_first()) == NULL) {
+ dev = rose_dev_first();
+ if (!dev) {
err = -ENETUNREACH;
goto out_release;
}
@@ -788,6 +794,7 @@ static int rose_connect(struct socket *sock, struct sockaddr *uaddr, int addr_le
user = ax25_findbyuid(current_euid());
if (!user) {
err = -EINVAL;
+ dev_put(dev);
goto out_release;
}

diff --git a/net/rose/rose_route.c b/net/rose/rose_route.c
index 5e93510fa3d2..09ad2d9e63c9 100644
--- a/net/rose/rose_route.c
+++ b/net/rose/rose_route.c
@@ -615,6 +615,8 @@ struct net_device *rose_dev_first(void)
if (first == NULL || strncmp(dev->name, first->name, 3) < 0)
first = dev;
}
+ if (first)
+ dev_hold(first);
rcu_read_unlock();

return first;
diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
index a35ab8c27866..3f935cbbaff6 100644
--- a/net/sched/cls_route.c
+++ b/net/sched/cls_route.c
@@ -526,7 +526,7 @@ static int route4_change(struct net *net, struct sk_buff *in_skb,
rcu_assign_pointer(f->next, f1);
rcu_assign_pointer(*fp, f);

- if (fold && fold->handle && f->handle != fold->handle) {
+ if (fold) {
th = to_hash(fold->handle);
h = from_hash(fold->handle >> 16);
b = rtnl_dereference(head->table[th]);
diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
index 48585c4d04ad..0273bf7375e2 100644
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost
@@ -87,8 +87,7 @@ obj := $(KBUILD_EXTMOD)
src := $(obj)

# Include the module's Makefile to find KBUILD_EXTRA_SYMBOLS
-include $(if $(wildcard $(KBUILD_EXTMOD)/Kbuild), \
- $(KBUILD_EXTMOD)/Kbuild, $(KBUILD_EXTMOD)/Makefile)
+include $(if $(wildcard $(src)/Kbuild), $(src)/Kbuild, $(src)/Makefile)

# modpost option for external modules
MODPOST += -e
diff --git a/scripts/faddr2line b/scripts/faddr2line
index 94ed98dd899f..57099687e5e1 100755
--- a/scripts/faddr2line
+++ b/scripts/faddr2line
@@ -112,7 +112,9 @@ __faddr2line() {
# section offsets.
local file_type=$(${READELF} --file-header $objfile |
${AWK} '$1 == "Type:" { print $2; exit }')
- [[ $file_type = "EXEC" ]] && is_vmlinux=1
+ if [[ $file_type = "EXEC" ]] || [[ $file_type == "DYN" ]]; then
+ is_vmlinux=1
+ fi

# Go through each of the object's symbols which match the func name.
# In rare cases there might be duplicates, in which case we print all
diff --git a/scripts/gdb/linux/dmesg.py b/scripts/gdb/linux/dmesg.py
index d5983cf3db7d..c771831eb077 100644
--- a/scripts/gdb/linux/dmesg.py
+++ b/scripts/gdb/linux/dmesg.py
@@ -22,7 +22,6 @@ prb_desc_type = utils.CachedType("struct prb_desc")
prb_desc_ring_type = utils.CachedType("struct prb_desc_ring")
prb_data_ring_type = utils.CachedType("struct prb_data_ring")
printk_ringbuffer_type = utils.CachedType("struct printk_ringbuffer")
-atomic_long_type = utils.CachedType("atomic_long_t")

class LxDmesg(gdb.Command):
"""Print Linux kernel log buffer."""
@@ -68,8 +67,6 @@ class LxDmesg(gdb.Command):
off = prb_data_ring_type.get_type()['data'].bitpos // 8
text_data_addr = utils.read_ulong(text_data_ring, off)

- counter_off = atomic_long_type.get_type()['counter'].bitpos // 8
-
sv_off = prb_desc_type.get_type()['state_var'].bitpos // 8

off = prb_desc_type.get_type()['text_blk_lpos'].bitpos // 8
@@ -89,9 +86,9 @@ class LxDmesg(gdb.Command):

# read in tail and head descriptor ids
off = prb_desc_ring_type.get_type()['tail_id'].bitpos // 8
- tail_id = utils.read_u64(desc_ring, off + counter_off)
+ tail_id = utils.read_atomic_long(desc_ring, off)
off = prb_desc_ring_type.get_type()['head_id'].bitpos // 8
- head_id = utils.read_u64(desc_ring, off + counter_off)
+ head_id = utils.read_atomic_long(desc_ring, off)

did = tail_id
while True:
@@ -102,7 +99,7 @@ class LxDmesg(gdb.Command):
desc = utils.read_memoryview(inf, desc_addr + desc_off, desc_sz).tobytes()

# skip non-committed record
- state = 3 & (utils.read_u64(desc, sv_off + counter_off) >> desc_flags_shift)
+ state = 3 & (utils.read_atomic_long(desc, sv_off) >> desc_flags_shift)
if state != desc_committed and state != desc_finalized:
if did == head_id:
break
diff --git a/scripts/gdb/linux/utils.py b/scripts/gdb/linux/utils.py
index ff7c1799d588..1553f68716cc 100644
--- a/scripts/gdb/linux/utils.py
+++ b/scripts/gdb/linux/utils.py
@@ -35,13 +35,12 @@ class CachedType:

long_type = CachedType("long")
-
+atomic_long_type = CachedType("atomic_long_t")

def get_long_type():
global long_type
return long_type.get_type()

-
def offset_of(typeobj, field):
element = gdb.Value(0).cast(typeobj)
return int(str(element[field].address).split()[0], 16)
@@ -129,6 +128,17 @@ def read_ulong(buffer, offset):
else:
return read_u32(buffer, offset)

+atomic_long_counter_offset = atomic_long_type.get_type()['counter'].bitpos
+atomic_long_counter_sizeof = atomic_long_type.get_type()['counter'].type.sizeof
+
+def read_atomic_long(buffer, offset):
+ global atomic_long_counter_offset
+ global atomic_long_counter_sizeof
+
+ if atomic_long_counter_sizeof == 8:
+ return read_u64(buffer, offset + atomic_long_counter_offset)
+ else:
+ return read_u32(buffer, offset + atomic_long_counter_offset)

target_arch = None

diff --git a/security/selinux/ss/policydb.h b/security/selinux/ss/policydb.h
index c24d4e1063ea..ffc4e7bad205 100644
--- a/security/selinux/ss/policydb.h
+++ b/security/selinux/ss/policydb.h
@@ -370,6 +370,8 @@ static inline int put_entry(const void *buf, size_t bytes, int num, struct polic
{
size_t len = bytes * num;

+ if (len > fp->len)
+ return -EINVAL;
memcpy(fp->data, buf, len);
fp->data += len;
fp->len -= len;
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index 6901dc07680d..cad54f454d01 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -4049,6 +4049,7 @@ int security_read_policy(struct selinux_state *state,
int security_read_state_kernel(struct selinux_state *state,
void **data, size_t *len)
{
+ int err;
struct selinux_policy *policy;

policy = rcu_dereference_protected(
@@ -4061,5 +4062,11 @@ int security_read_state_kernel(struct selinux_state *state,
if (!*data)
return -ENOMEM;

- return __security_read_policy(policy, *data, len);
+ err = __security_read_policy(policy, *data, len);
+ if (err) {
+ vfree(*data);
+ *data = NULL;
+ *len = 0;
+ }
+ return err;
}
diff --git a/sound/pci/hda/patch_cirrus.c b/sound/pci/hda/patch_cirrus.c
index 678fbcaf2a3b..6807b4708a17 100644
--- a/sound/pci/hda/patch_cirrus.c
+++ b/sound/pci/hda/patch_cirrus.c
@@ -395,6 +395,7 @@ static const struct snd_pci_quirk cs420x_fixup_tbl[] = {

/* codec SSID */
SND_PCI_QUIRK(0x106b, 0x0600, "iMac 14,1", CS420X_IMAC27_122),
+ SND_PCI_QUIRK(0x106b, 0x0900, "iMac 12,1", CS420X_IMAC27_122),
SND_PCI_QUIRK(0x106b, 0x1c00, "MacBookPro 8,1", CS420X_MBP81),
SND_PCI_QUIRK(0x106b, 0x2000, "iMac 12,2", CS420X_IMAC27_122),
SND_PCI_QUIRK(0x106b, 0x2800, "MacBookPro 10,1", CS420X_MBP101),
diff --git a/sound/pci/hda/patch_conexant.c b/sound/pci/hda/patch_conexant.c
index 8c68e6e1387e..2bc9274e0960 100644
--- a/sound/pci/hda/patch_conexant.c
+++ b/sound/pci/hda/patch_conexant.c
@@ -222,6 +222,7 @@ enum {
CXT_PINCFG_LEMOTE_A1205,
CXT_PINCFG_COMPAQ_CQ60,
CXT_FIXUP_STEREO_DMIC,
+ CXT_PINCFG_LENOVO_NOTEBOOK,
CXT_FIXUP_INC_MIC_BOOST,
CXT_FIXUP_HEADPHONE_MIC_PIN,
CXT_FIXUP_HEADPHONE_MIC,
@@ -772,6 +773,14 @@ static const struct hda_fixup cxt_fixups[] = {
.type = HDA_FIXUP_FUNC,
.v.func = cxt_fixup_stereo_dmic,
},
+ [CXT_PINCFG_LENOVO_NOTEBOOK] = {
+ .type = HDA_FIXUP_PINS,
+ .v.pins = (const struct hda_pintbl[]) {
+ { 0x1a, 0x05d71030 },
+ { }
+ },
+ .chain_id = CXT_FIXUP_STEREO_DMIC,
+ },
[CXT_FIXUP_INC_MIC_BOOST] = {
.type = HDA_FIXUP_FUNC,
.v.func = cxt5066_increase_mic_boost,
@@ -971,7 +980,7 @@ static const struct snd_pci_quirk cxt5066_fixups[] = {
SND_PCI_QUIRK(0x17aa, 0x3905, "Lenovo G50-30", CXT_FIXUP_STEREO_DMIC),
SND_PCI_QUIRK(0x17aa, 0x390b, "Lenovo G50-80", CXT_FIXUP_STEREO_DMIC),
SND_PCI_QUIRK(0x17aa, 0x3975, "Lenovo U300s", CXT_FIXUP_STEREO_DMIC),
- SND_PCI_QUIRK(0x17aa, 0x3977, "Lenovo IdeaPad U310", CXT_FIXUP_STEREO_DMIC),
+ SND_PCI_QUIRK(0x17aa, 0x3977, "Lenovo IdeaPad U310", CXT_PINCFG_LENOVO_NOTEBOOK),
SND_PCI_QUIRK(0x17aa, 0x3978, "Lenovo G50-70", CXT_FIXUP_STEREO_DMIC),
SND_PCI_QUIRK(0x17aa, 0x397b, "Lenovo S205", CXT_FIXUP_STEREO_DMIC),
SND_PCI_QUIRK_VENDOR(0x17aa, "Thinkpad", CXT_FIXUP_THINKPAD_ACPI),
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index b7f1f2fb60cb..9c7335721cd0 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -6839,6 +6839,43 @@ static void alc_fixup_dell4_mic_no_presence_quiet(struct hda_codec *codec,
}
}

+static void alc287_fixup_yoga9_14iap7_bass_spk_pin(struct hda_codec *codec,
+ const struct hda_fixup *fix, int action)
+{
+ /*
+ * The Pin Complex 0x17 for the bass speakers is wrongly reported as
+ * unconnected.
+ */
+ static const struct hda_pintbl pincfgs[] = {
+ { 0x17, 0x90170121 },
+ { }
+ };
+ /*
+ * Avoid DAC 0x06 and 0x08, as they have no volume controls.
+ * DAC 0x02 and 0x03 would be fine.
+ */
+ static const hda_nid_t conn[] = { 0x02, 0x03 };
+ /*
+ * Prefer both speakerbar (0x14) and bass speakers (0x17) connected to DAC 0x02.
+ * Headphones (0x21) are connected to DAC 0x03.
+ */
+ static const hda_nid_t preferred_pairs[] = {
+ 0x14, 0x02,
+ 0x17, 0x02,
+ 0x21, 0x03,
+ 0
+ };
+ struct alc_spec *spec = codec->spec;
+
+ switch (action) {
+ case HDA_FIXUP_ACT_PRE_PROBE:
+ snd_hda_apply_pincfgs(codec, pincfgs);
+ snd_hda_override_conn_list(codec, 0x17, ARRAY_SIZE(conn), conn);
+ spec->gen.preferred_dacs = preferred_pairs;
+ break;
+ }
+}
+
enum {
ALC269_FIXUP_GPIO2,
ALC269_FIXUP_SONY_VAIO,
@@ -6894,6 +6931,7 @@ enum {
ALC269_FIXUP_LIMIT_INT_MIC_BOOST,
ALC269VB_FIXUP_ASUS_ZENBOOK,
ALC269VB_FIXUP_ASUS_ZENBOOK_UX31A,
+ ALC269VB_FIXUP_ASUS_MIC_NO_PRESENCE,
ALC269_FIXUP_LIMIT_INT_MIC_BOOST_MUTE_LED,
ALC269VB_FIXUP_ORDISSIMO_EVE2,
ALC283_FIXUP_CHROME_BOOK,
@@ -7075,6 +7113,8 @@ enum {
ALC245_FIXUP_CS35L41_SPI_4_HP_GPIO_LED,
ALC285_FIXUP_HP_SPEAKERS_MICMUTE_LED,
ALC295_FIXUP_FRAMEWORK_LAPTOP_MIC_NO_PRESENCE,
+ ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK,
+ ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK_PIN,
};

/* A special fixup for Lenovo C940 and Yoga Duet 7;
@@ -7479,6 +7519,15 @@ static const struct hda_fixup alc269_fixups[] = {
.chained = true,
.chain_id = ALC269VB_FIXUP_ASUS_ZENBOOK,
},
+ [ALC269VB_FIXUP_ASUS_MIC_NO_PRESENCE] = {
+ .type = HDA_FIXUP_PINS,
+ .v.pins = (const struct hda_pintbl[]) {
+ { 0x18, 0x01a110f0 }, /* use as headset mic */
+ { }
+ },
+ .chained = true,
+ .chain_id = ALC269_FIXUP_HEADSET_MIC
+ },
[ALC269_FIXUP_LIMIT_INT_MIC_BOOST_MUTE_LED] = {
.type = HDA_FIXUP_FUNC,
.v.func = alc269_fixup_limit_int_mic_boost,
@@ -8917,6 +8966,74 @@ static const struct hda_fixup alc269_fixups[] = {
.chained = true,
.chain_id = ALC269_FIXUP_HEADSET_MODE_NO_HP_MIC
},
+ [ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK] = {
+ .type = HDA_FIXUP_VERBS,
+ .v.verbs = (const struct hda_verb[]) {
+ // enable left speaker
+ { 0x20, AC_VERB_SET_COEF_INDEX, 0x24 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x41 },
+
+ { 0x20, AC_VERB_SET_COEF_INDEX, 0x26 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0xc },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x0 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x1a },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0xb020 },
+
+ { 0x20, AC_VERB_SET_COEF_INDEX, 0x26 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0xf },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x0 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x42 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0xb020 },
+
+ { 0x20, AC_VERB_SET_COEF_INDEX, 0x26 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x10 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x0 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x40 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0xb020 },
+
+ { 0x20, AC_VERB_SET_COEF_INDEX, 0x26 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x2 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x0 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x0 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0xb020 },
+
+ // enable right speaker
+ { 0x20, AC_VERB_SET_COEF_INDEX, 0x24 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x46 },
+
+ { 0x20, AC_VERB_SET_COEF_INDEX, 0x26 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0xc },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x0 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x2a },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0xb020 },
+
+ { 0x20, AC_VERB_SET_COEF_INDEX, 0x26 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0xf },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x0 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x46 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0xb020 },
+
+ { 0x20, AC_VERB_SET_COEF_INDEX, 0x26 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x10 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x0 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x44 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0xb020 },
+
+ { 0x20, AC_VERB_SET_COEF_INDEX, 0x26 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x2 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x0 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0x0 },
+ { 0x20, AC_VERB_SET_PROC_COEF, 0xb020 },
+
+ { },
+ },
+ },
+ [ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK_PIN] = {
+ .type = HDA_FIXUP_FUNC,
+ .v.func = alc287_fixup_yoga9_14iap7_bass_spk_pin,
+ .chained = true,
+ .chain_id = ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK,
+ },
};

static const struct snd_pci_quirk alc269_fixup_tbl[] = {
@@ -9096,6 +9213,8 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x103c, 0x861f, "HP Elite Dragonfly G1", ALC285_FIXUP_HP_GPIO_AMP_INIT),
SND_PCI_QUIRK(0x103c, 0x869d, "HP", ALC236_FIXUP_HP_MUTE_LED),
SND_PCI_QUIRK(0x103c, 0x86c7, "HP Envy AiO 32", ALC274_FIXUP_HP_ENVY_GPIO),
+ SND_PCI_QUIRK(0x103c, 0x86e7, "HP Spectre x360 15-eb0xxx", ALC285_FIXUP_HP_SPECTRE_X360_EB1),
+ SND_PCI_QUIRK(0x103c, 0x86e8, "HP Spectre x360 15-eb0xxx", ALC285_FIXUP_HP_SPECTRE_X360_EB1),
SND_PCI_QUIRK(0x103c, 0x8716, "HP Elite Dragonfly G2 Notebook PC", ALC285_FIXUP_HP_GPIO_AMP_INIT),
SND_PCI_QUIRK(0x103c, 0x8720, "HP EliteBook x360 1040 G8 Notebook PC", ALC285_FIXUP_HP_GPIO_AMP_INIT),
SND_PCI_QUIRK(0x103c, 0x8724, "HP EliteBook 850 G7", ALC285_FIXUP_HP_GPIO_LED),
@@ -9111,6 +9230,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
ALC285_FIXUP_HP_GPIO_AMP_INIT),
SND_PCI_QUIRK(0x103c, 0x8783, "HP ZBook Fury 15 G7 Mobile Workstation",
ALC285_FIXUP_HP_GPIO_AMP_INIT),
+ SND_PCI_QUIRK(0x103c, 0x8786, "HP OMEN 15", ALC285_FIXUP_HP_MUTE_LED),
SND_PCI_QUIRK(0x103c, 0x8787, "HP OMEN 15", ALC285_FIXUP_HP_MUTE_LED),
SND_PCI_QUIRK(0x103c, 0x8788, "HP OMEN 15", ALC285_FIXUP_HP_MUTE_LED),
SND_PCI_QUIRK(0x103c, 0x87c8, "HP", ALC287_FIXUP_HP_GPIO_LED),
@@ -9180,6 +9300,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x1043, 0x12a0, "ASUS X441UV", ALC233_FIXUP_EAPD_COEF_AND_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1043, 0x12e0, "ASUS X541SA", ALC256_FIXUP_ASUS_MIC),
SND_PCI_QUIRK(0x1043, 0x12f0, "ASUS X541UV", ALC256_FIXUP_ASUS_MIC),
+ SND_PCI_QUIRK(0x1043, 0x1313, "Asus K42JZ", ALC269VB_FIXUP_ASUS_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1043, 0x13b0, "ASUS Z550SA", ALC256_FIXUP_ASUS_MIC),
SND_PCI_QUIRK(0x1043, 0x1427, "Asus Zenbook UX31E", ALC269VB_FIXUP_ASUS_ZENBOOK),
SND_PCI_QUIRK(0x1043, 0x1517, "Asus Zenbook UX31A", ALC269VB_FIXUP_ASUS_ZENBOOK_UX31A),
@@ -9255,6 +9376,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x1558, 0x4018, "Clevo NV40M[BE]", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1558, 0x4019, "Clevo NV40MZ", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1558, 0x4020, "Clevo NV40MB", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
+ SND_PCI_QUIRK(0x1558, 0x4041, "Clevo NV4[15]PZ", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1558, 0x40a1, "Clevo NL40GU", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1558, 0x40c1, "Clevo NL40[CZ]U", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1558, 0x40d1, "Clevo NL41DU", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
@@ -9367,6 +9489,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x17aa, 0x3176, "ThinkCentre Station", ALC283_FIXUP_HEADSET_MIC),
SND_PCI_QUIRK(0x17aa, 0x3178, "ThinkCentre Station", ALC283_FIXUP_HEADSET_MIC),
SND_PCI_QUIRK(0x17aa, 0x31af, "ThinkCentre Station", ALC623_FIXUP_LENOVO_THINKSTATION_P340),
+ SND_PCI_QUIRK(0x17aa, 0x3801, "Lenovo Yoga9 14IAP7", ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK_PIN),
SND_PCI_QUIRK(0x17aa, 0x3802, "Lenovo Yoga DuetITL 2021", ALC287_FIXUP_YOGA7_14ITL_SPEAKERS),
SND_PCI_QUIRK(0x17aa, 0x3813, "Legion 7i 15IMHG05", ALC287_FIXUP_LEGION_15IMHG05_SPEAKERS),
SND_PCI_QUIRK(0x17aa, 0x3818, "Lenovo C940 / Yoga Duet 7", ALC298_FIXUP_LENOVO_C940_DUET7),
@@ -9612,6 +9735,7 @@ static const struct hda_model_fixup alc269_fixup_models[] = {
{.id = ALC285_FIXUP_HP_SPECTRE_X360, .name = "alc285-hp-spectre-x360"},
{.id = ALC285_FIXUP_HP_SPECTRE_X360_EB1, .name = "alc285-hp-spectre-x360-eb1"},
{.id = ALC287_FIXUP_IDEAPAD_BASS_SPK_AMP, .name = "alc287-ideapad-bass-spk-amp"},
+ {.id = ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK_PIN, .name = "alc287-yoga9-bass-spk-pin"},
{.id = ALC623_FIXUP_LENOVO_THINKSTATION_P340, .name = "alc623-lenovo-thinkstation-p340"},
{.id = ALC255_FIXUP_ACER_HEADPHONE_AND_MIC, .name = "alc255-acer-headphone-and-mic"},
{.id = ALC285_FIXUP_HP_GPIO_AMP_INIT, .name = "alc285-hp-amp-init"},
diff --git a/sound/soc/amd/yc/acp6x-mach.c b/sound/soc/amd/yc/acp6x-mach.c
index 959b70e8baf2..7d3d7d33fa20 100644
--- a/sound/soc/amd/yc/acp6x-mach.c
+++ b/sound/soc/amd/yc/acp6x-mach.c
@@ -104,28 +104,14 @@ static const struct dmi_system_id yc_acp_quirk_table[] = {
.driver_data = &acp6x_card,
.matches = {
DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
- DMI_MATCH(DMI_PRODUCT_NAME, "21AW"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "21CM"),
}
},
{
.driver_data = &acp6x_card,
.matches = {
DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
- DMI_MATCH(DMI_PRODUCT_NAME, "21AX"),
- }
- },
- {
- .driver_data = &acp6x_card,
- .matches = {
- DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
- DMI_MATCH(DMI_PRODUCT_NAME, "21BN"),
- }
- },
- {
- .driver_data = &acp6x_card,
- .matches = {
- DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
- DMI_MATCH(DMI_PRODUCT_NAME, "21BQ"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "21CN"),
}
},
{
@@ -156,20 +142,6 @@ static const struct dmi_system_id yc_acp_quirk_table[] = {
DMI_MATCH(DMI_PRODUCT_NAME, "21CL"),
}
},
- {
- .driver_data = &acp6x_card,
- .matches = {
- DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
- DMI_MATCH(DMI_PRODUCT_NAME, "21D8"),
- }
- },
- {
- .driver_data = &acp6x_card,
- .matches = {
- DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
- DMI_MATCH(DMI_PRODUCT_NAME, "21D9"),
- }
- },
{}
};

diff --git a/sound/soc/atmel/mchp-spdifrx.c b/sound/soc/atmel/mchp-spdifrx.c
index 5fc968483f2c..a7baa0385ec5 100644
--- a/sound/soc/atmel/mchp-spdifrx.c
+++ b/sound/soc/atmel/mchp-spdifrx.c
@@ -288,15 +288,17 @@ static void mchp_spdifrx_isr_blockend_en(struct mchp_spdifrx_dev *dev)
spin_unlock_irqrestore(&dev->blockend_lock, flags);
}

-/* called from atomic context only */
+/* called from atomic/non-atomic context */
static void mchp_spdifrx_isr_blockend_dis(struct mchp_spdifrx_dev *dev)
{
- spin_lock(&dev->blockend_lock);
+ unsigned long flags;
+
+ spin_lock_irqsave(&dev->blockend_lock, flags);
dev->blockend_refcount--;
/* don't enable BLOCKEND interrupt if it's already enabled */
if (dev->blockend_refcount == 0)
regmap_write(dev->regmap, SPDIFRX_IDR, SPDIFRX_IR_BLOCKEND);
- spin_unlock(&dev->blockend_lock);
+ spin_unlock_irqrestore(&dev->blockend_lock, flags);
}

static irqreturn_t mchp_spdif_interrupt(int irq, void *dev_id)
@@ -575,6 +577,7 @@ static int mchp_spdifrx_subcode_ch_get(struct mchp_spdifrx_dev *dev,
if (ret <= 0) {
dev_dbg(dev->dev, "user data for channel %d timeout\n",
channel);
+ mchp_spdifrx_isr_blockend_dis(dev);
return ret;
}

diff --git a/sound/soc/codecs/cros_ec_codec.c b/sound/soc/codecs/cros_ec_codec.c
index 9b92e1a0d1a3..43561ff1bb8d 100644
--- a/sound/soc/codecs/cros_ec_codec.c
+++ b/sound/soc/codecs/cros_ec_codec.c
@@ -994,6 +994,7 @@ static int cros_ec_codec_platform_probe(struct platform_device *pdev)
dev_dbg(dev, "ap_shm_phys_addr=%#llx len=%#x\n",
priv->ap_shm_phys_addr, priv->ap_shm_len);
}
+ of_node_put(node);
}
#endif

diff --git a/sound/soc/codecs/da7210.c b/sound/soc/codecs/da7210.c
index 8af344b2fdbf..d75d15006f64 100644
--- a/sound/soc/codecs/da7210.c
+++ b/sound/soc/codecs/da7210.c
@@ -1336,6 +1336,8 @@ static int __init da7210_modinit(void)
int ret = 0;
#if IS_ENABLED(CONFIG_I2C)
ret = i2c_add_driver(&da7210_i2c_driver);
+ if (ret)
+ return ret;
#endif
#if defined(CONFIG_SPI_MASTER)
ret = spi_register_driver(&da7210_spi_driver);
diff --git a/sound/soc/codecs/msm8916-wcd-digital.c b/sound/soc/codecs/msm8916-wcd-digital.c
index 20a07c92b2fc..098a58990f07 100644
--- a/sound/soc/codecs/msm8916-wcd-digital.c
+++ b/sound/soc/codecs/msm8916-wcd-digital.c
@@ -328,8 +328,8 @@ static const struct snd_kcontrol_new rx1_mix2_inp1_mux = SOC_DAPM_ENUM(
static const struct snd_kcontrol_new rx2_mix2_inp1_mux = SOC_DAPM_ENUM(
"RX2 MIX2 INP1 Mux", rx2_mix2_inp1_chain_enum);

-/* Digital Gain control -38.4 dB to +38.4 dB in 0.3 dB steps */
-static const DECLARE_TLV_DB_SCALE(digital_gain, -3840, 30, 0);
+/* Digital Gain control -84 dB to +40 dB in 1 dB steps */
+static const DECLARE_TLV_DB_SCALE(digital_gain, -8400, 100, -8400);

/* Cutoff Freq for High Pass Filter at -3dB */
static const char * const hpf_cutoff_text[] = {
@@ -510,15 +510,15 @@ static int wcd_iir_filter_info(struct snd_kcontrol *kcontrol,

static const struct snd_kcontrol_new msm8916_wcd_digital_snd_controls[] = {
SOC_SINGLE_S8_TLV("RX1 Digital Volume", LPASS_CDC_RX1_VOL_CTL_B2_CTL,
- -128, 127, digital_gain),
+ -84, 40, digital_gain),
SOC_SINGLE_S8_TLV("RX2 Digital Volume", LPASS_CDC_RX2_VOL_CTL_B2_CTL,
- -128, 127, digital_gain),
+ -84, 40, digital_gain),
SOC_SINGLE_S8_TLV("RX3 Digital Volume", LPASS_CDC_RX3_VOL_CTL_B2_CTL,
- -128, 127, digital_gain),
+ -84, 40, digital_gain),
SOC_SINGLE_S8_TLV("TX1 Digital Volume", LPASS_CDC_TX1_VOL_CTL_GAIN,
- -128, 127, digital_gain),
+ -84, 40, digital_gain),
SOC_SINGLE_S8_TLV("TX2 Digital Volume", LPASS_CDC_TX2_VOL_CTL_GAIN,
- -128, 127, digital_gain),
+ -84, 40, digital_gain),
SOC_ENUM("TX1 HPF Cutoff", tx1_hpf_cutoff_enum),
SOC_ENUM("TX2 HPF Cutoff", tx2_hpf_cutoff_enum),
SOC_SINGLE("TX1 HPF Switch", LPASS_CDC_TX1_MUX_CTL, 3, 1, 0),
@@ -553,22 +553,22 @@ static const struct snd_kcontrol_new msm8916_wcd_digital_snd_controls[] = {
WCD_IIR_FILTER_CTL("IIR2 Band3", IIR2, BAND3),
WCD_IIR_FILTER_CTL("IIR2 Band4", IIR2, BAND4),
WCD_IIR_FILTER_CTL("IIR2 Band5", IIR2, BAND5),
- SOC_SINGLE_SX_TLV("IIR1 INP1 Volume", LPASS_CDC_IIR1_GAIN_B1_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("IIR1 INP2 Volume", LPASS_CDC_IIR1_GAIN_B2_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("IIR1 INP3 Volume", LPASS_CDC_IIR1_GAIN_B3_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("IIR1 INP4 Volume", LPASS_CDC_IIR1_GAIN_B4_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("IIR2 INP1 Volume", LPASS_CDC_IIR2_GAIN_B1_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("IIR2 INP2 Volume", LPASS_CDC_IIR2_GAIN_B2_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("IIR2 INP3 Volume", LPASS_CDC_IIR2_GAIN_B3_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("IIR2 INP4 Volume", LPASS_CDC_IIR2_GAIN_B4_CTL,
- 0, -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("IIR1 INP1 Volume", LPASS_CDC_IIR1_GAIN_B1_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("IIR1 INP2 Volume", LPASS_CDC_IIR1_GAIN_B2_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("IIR1 INP3 Volume", LPASS_CDC_IIR1_GAIN_B3_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("IIR1 INP4 Volume", LPASS_CDC_IIR1_GAIN_B4_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("IIR2 INP1 Volume", LPASS_CDC_IIR2_GAIN_B1_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("IIR2 INP2 Volume", LPASS_CDC_IIR2_GAIN_B2_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("IIR2 INP3 Volume", LPASS_CDC_IIR2_GAIN_B3_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("IIR2 INP4 Volume", LPASS_CDC_IIR2_GAIN_B4_CTL,
+ -84, 40, digital_gain),

};

diff --git a/sound/soc/codecs/mt6359-accdet.c b/sound/soc/codecs/mt6359-accdet.c
index 6d3d170144a0..c190628e2905 100644
--- a/sound/soc/codecs/mt6359-accdet.c
+++ b/sound/soc/codecs/mt6359-accdet.c
@@ -675,6 +675,7 @@ static int mt6359_accdet_parse_dt(struct mt6359_accdet *priv)
sizeof(struct three_key_threshold));
}

+ of_node_put(node);
dev_warn(priv->dev, "accdet caps=%x\n", priv->caps);

return 0;
diff --git a/sound/soc/codecs/mt6359.c b/sound/soc/codecs/mt6359.c
index f8532aa7e4aa..9a9c8555f720 100644
--- a/sound/soc/codecs/mt6359.c
+++ b/sound/soc/codecs/mt6359.c
@@ -2780,6 +2780,7 @@ static int mt6359_parse_dt(struct mt6359_priv *priv)

ret = of_property_read_u32(np, "mediatek,mic-type-2",
&priv->mux_select[MUX_MIC_TYPE_2]);
+ of_node_put(np);
if (ret) {
dev_info(priv->dev,
"%s() failed to read mic-type-2, use default (%d)\n",
diff --git a/sound/soc/codecs/wcd9335.c b/sound/soc/codecs/wcd9335.c
index aa685980a97b..b7c5bfc44127 100644
--- a/sound/soc/codecs/wcd9335.c
+++ b/sound/soc/codecs/wcd9335.c
@@ -2259,51 +2259,42 @@ static int wcd9335_rx_hph_mode_put(struct snd_kcontrol *kc,

static const struct snd_kcontrol_new wcd9335_snd_controls[] = {
/* -84dB min - 40dB max */
- SOC_SINGLE_SX_TLV("RX0 Digital Volume", WCD9335_CDC_RX0_RX_VOL_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("RX1 Digital Volume", WCD9335_CDC_RX1_RX_VOL_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("RX2 Digital Volume", WCD9335_CDC_RX2_RX_VOL_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("RX3 Digital Volume", WCD9335_CDC_RX3_RX_VOL_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("RX4 Digital Volume", WCD9335_CDC_RX4_RX_VOL_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("RX5 Digital Volume", WCD9335_CDC_RX5_RX_VOL_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("RX6 Digital Volume", WCD9335_CDC_RX6_RX_VOL_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("RX7 Digital Volume", WCD9335_CDC_RX7_RX_VOL_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("RX8 Digital Volume", WCD9335_CDC_RX8_RX_VOL_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("RX0 Mix Digital Volume",
- WCD9335_CDC_RX0_RX_VOL_MIX_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("RX1 Mix Digital Volume",
- WCD9335_CDC_RX1_RX_VOL_MIX_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("RX2 Mix Digital Volume",
- WCD9335_CDC_RX2_RX_VOL_MIX_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("RX3 Mix Digital Volume",
- WCD9335_CDC_RX3_RX_VOL_MIX_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("RX4 Mix Digital Volume",
- WCD9335_CDC_RX4_RX_VOL_MIX_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("RX5 Mix Digital Volume",
- WCD9335_CDC_RX5_RX_VOL_MIX_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("RX6 Mix Digital Volume",
- WCD9335_CDC_RX6_RX_VOL_MIX_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("RX7 Mix Digital Volume",
- WCD9335_CDC_RX7_RX_VOL_MIX_CTL,
- 0, -84, 40, digital_gain),
- SOC_SINGLE_SX_TLV("RX8 Mix Digital Volume",
- WCD9335_CDC_RX8_RX_VOL_MIX_CTL,
- 0, -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("RX0 Digital Volume", WCD9335_CDC_RX0_RX_VOL_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("RX1 Digital Volume", WCD9335_CDC_RX1_RX_VOL_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("RX2 Digital Volume", WCD9335_CDC_RX2_RX_VOL_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("RX3 Digital Volume", WCD9335_CDC_RX3_RX_VOL_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("RX4 Digital Volume", WCD9335_CDC_RX4_RX_VOL_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("RX5 Digital Volume", WCD9335_CDC_RX5_RX_VOL_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("RX6 Digital Volume", WCD9335_CDC_RX6_RX_VOL_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("RX7 Digital Volume", WCD9335_CDC_RX7_RX_VOL_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("RX8 Digital Volume", WCD9335_CDC_RX8_RX_VOL_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("RX0 Mix Digital Volume", WCD9335_CDC_RX0_RX_VOL_MIX_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("RX1 Mix Digital Volume", WCD9335_CDC_RX1_RX_VOL_MIX_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("RX2 Mix Digital Volume", WCD9335_CDC_RX2_RX_VOL_MIX_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("RX3 Mix Digital Volume", WCD9335_CDC_RX3_RX_VOL_MIX_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("RX4 Mix Digital Volume", WCD9335_CDC_RX4_RX_VOL_MIX_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("RX5 Mix Digital Volume", WCD9335_CDC_RX5_RX_VOL_MIX_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("RX6 Mix Digital Volume", WCD9335_CDC_RX6_RX_VOL_MIX_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("RX7 Mix Digital Volume", WCD9335_CDC_RX7_RX_VOL_MIX_CTL,
+ -84, 40, digital_gain),
+ SOC_SINGLE_S8_TLV("RX8 Mix Digital Volume", WCD9335_CDC_RX8_RX_VOL_MIX_CTL,
+ -84, 40, digital_gain),
SOC_ENUM("RX INT0_1 HPF cut off", cf_int0_1_enum),
SOC_ENUM("RX INT0_2 HPF cut off", cf_int0_2_enum),
SOC_ENUM("RX INT1_1 HPF cut off", cf_int1_1_enum),
diff --git a/sound/soc/codecs/wsa881x.c b/sound/soc/codecs/wsa881x.c
index 616b26c70c3b..649c7e73f774 100644
--- a/sound/soc/codecs/wsa881x.c
+++ b/sound/soc/codecs/wsa881x.c
@@ -1174,11 +1174,17 @@ static int __maybe_unused wsa881x_runtime_resume(struct device *dev)
struct sdw_slave *slave = dev_to_sdw_dev(dev);
struct regmap *regmap = dev_get_regmap(dev, NULL);
struct wsa881x_priv *wsa881x = dev_get_drvdata(dev);
+ unsigned long time;

gpiod_direction_output(wsa881x->sd_n, 1);

- wait_for_completion_timeout(&slave->initialization_complete,
- msecs_to_jiffies(WSA881X_PROBE_TIMEOUT));
+ time = wait_for_completion_timeout(&slave->initialization_complete,
+ msecs_to_jiffies(WSA881X_PROBE_TIMEOUT));
+ if (!time) {
+ dev_err(dev, "Initialization not complete, timed out\n");
+ gpiod_direction_output(wsa881x->sd_n, 0);
+ return -ETIMEDOUT;
+ }

regcache_cache_only(regmap, false);
regcache_sync(regmap);
diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index d9a0d4768c4d..c836848ef0a6 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -537,6 +537,7 @@ static int fsl_asoc_card_probe(struct platform_device *pdev)
struct device *codec_dev = NULL;
const char *codec_dai_name;
const char *codec_dev_name;
+ u32 asrc_fmt = 0;
u32 width;
int ret;

@@ -829,8 +830,8 @@ static int fsl_asoc_card_probe(struct platform_device *pdev)
goto asrc_fail;
}

- ret = of_property_read_u32(asrc_np, "fsl,asrc-format",
- &priv->asrc_format);
+ ret = of_property_read_u32(asrc_np, "fsl,asrc-format", &asrc_fmt);
+ priv->asrc_format = (__force snd_pcm_format_t)asrc_fmt;
if (ret) {
/* Fallback to old binding; translate to asrc_format */
ret = of_property_read_u32(asrc_np, "fsl,asrc-width",
diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c
index d7d1536a4f37..44dcbf49456c 100644
--- a/sound/soc/fsl/fsl_asrc.c
+++ b/sound/soc/fsl/fsl_asrc.c
@@ -1066,6 +1066,7 @@ static int fsl_asrc_probe(struct platform_device *pdev)
struct resource *res;
void __iomem *regs;
int irq, ret, i;
+ u32 asrc_fmt = 0;
u32 map_idx;
char tmp[16];
u32 width;
@@ -1174,7 +1175,8 @@ static int fsl_asrc_probe(struct platform_device *pdev)
return ret;
}

- ret = of_property_read_u32(np, "fsl,asrc-format", &asrc->asrc_format);
+ ret = of_property_read_u32(np, "fsl,asrc-format", &asrc_fmt);
+ asrc->asrc_format = (__force snd_pcm_format_t)asrc_fmt;
if (ret) {
ret = of_property_read_u32(np, "fsl,asrc-width", &width);
if (ret) {
@@ -1197,7 +1199,7 @@ static int fsl_asrc_probe(struct platform_device *pdev)
}
}

- if (!(FSL_ASRC_FORMATS & (1ULL << asrc->asrc_format))) {
+ if (!(FSL_ASRC_FORMATS & pcm_format_to_bits(asrc->asrc_format))) {
dev_warn(&pdev->dev, "unsupported width, use default S24_LE\n");
asrc->asrc_format = SNDRV_PCM_FORMAT_S24_LE;
}
diff --git a/sound/soc/fsl/fsl_easrc.c b/sound/soc/fsl/fsl_easrc.c
index be14f84796cb..cf0e10d17dbe 100644
--- a/sound/soc/fsl/fsl_easrc.c
+++ b/sound/soc/fsl/fsl_easrc.c
@@ -476,7 +476,8 @@ static int fsl_easrc_prefilter_config(struct fsl_asrc *easrc,
struct fsl_asrc_pair *ctx;
struct device *dev;
u32 inrate, outrate, offset = 0;
- u32 in_s_rate, out_s_rate, in_s_fmt, out_s_fmt;
+ u32 in_s_rate, out_s_rate;
+ snd_pcm_format_t in_s_fmt, out_s_fmt;
int ret, i;

if (!easrc)
@@ -1873,6 +1874,7 @@ static int fsl_easrc_probe(struct platform_device *pdev)
struct resource *res;
struct device_node *np;
void __iomem *regs;
+ u32 asrc_fmt = 0;
int ret, irq;

easrc = devm_kzalloc(dev, sizeof(*easrc), GFP_KERNEL);
@@ -1933,13 +1935,14 @@ static int fsl_easrc_probe(struct platform_device *pdev)
return ret;
}

- ret = of_property_read_u32(np, "fsl,asrc-format", &easrc->asrc_format);
+ ret = of_property_read_u32(np, "fsl,asrc-format", &asrc_fmt);
+ easrc->asrc_format = (__force snd_pcm_format_t)asrc_fmt;
if (ret) {
dev_err(dev, "failed to asrc format\n");
return ret;
}

- if (!(FSL_EASRC_FORMATS & (1ULL << easrc->asrc_format))) {
+ if (!(FSL_EASRC_FORMATS & (pcm_format_to_bits(easrc->asrc_format)))) {
dev_warn(dev, "unsupported format, switching to S24_LE\n");
easrc->asrc_format = SNDRV_PCM_FORMAT_S24_LE;
}
diff --git a/sound/soc/fsl/fsl_easrc.h b/sound/soc/fsl/fsl_easrc.h
index 30620d56252c..5b8469757c12 100644
--- a/sound/soc/fsl/fsl_easrc.h
+++ b/sound/soc/fsl/fsl_easrc.h
@@ -569,7 +569,7 @@ struct fsl_easrc_io_params {
unsigned int access_len;
unsigned int fifo_wtmk;
unsigned int sample_rate;
- unsigned int sample_format;
+ snd_pcm_format_t sample_format;
unsigned int norm_rate;
};

diff --git a/sound/soc/fsl/imx-audmux.c b/sound/soc/fsl/imx-audmux.c
index dfa05d40b276..a8e5e0f57faf 100644
--- a/sound/soc/fsl/imx-audmux.c
+++ b/sound/soc/fsl/imx-audmux.c
@@ -298,7 +298,7 @@ static int imx_audmux_probe(struct platform_device *pdev)
audmux_clk = NULL;
}

- audmux_type = (enum imx_audmux_type)of_device_get_match_data(&pdev->dev);
+ audmux_type = (uintptr_t)of_device_get_match_data(&pdev->dev);

switch (audmux_type) {
case IMX31_AUDMUX:
diff --git a/sound/soc/fsl/imx-card.c b/sound/soc/fsl/imx-card.c
index 6f8efd838fcc..4a8609b0d700 100644
--- a/sound/soc/fsl/imx-card.c
+++ b/sound/soc/fsl/imx-card.c
@@ -17,6 +17,9 @@

#include "fsl_sai.h"

+#define IMX_CARD_MCLK_22P5792MHZ 22579200
+#define IMX_CARD_MCLK_24P576MHZ 24576000
+
enum codec_type {
CODEC_DUMMY = 0,
CODEC_AK5558 = 1,
@@ -115,7 +118,7 @@ struct imx_card_data {
struct snd_soc_card card;
int num_dapm_routes;
u32 asrc_rate;
- u32 asrc_format;
+ snd_pcm_format_t asrc_format;
};

static struct imx_akcodec_fs_mul ak4458_fs_mul[] = {
@@ -353,9 +356,14 @@ static int imx_aif_hw_params(struct snd_pcm_substream *substream,
mclk_freq = akcodec_get_mclk_rate(substream, params, slots, slot_width);
else
mclk_freq = params_rate(params) * slots * slot_width;
- /* Use the maximum freq from DSD512 (512*44100 = 22579200) */
- if (format_is_dsd(params))
- mclk_freq = 22579200;
+
+ if (format_is_dsd(params)) {
+ /* Use the maximum freq from DSD512 (512*44100 = 22579200) */
+ if (!(params_rate(params) % 11025))
+ mclk_freq = IMX_CARD_MCLK_22P5792MHZ;
+ else
+ mclk_freq = IMX_CARD_MCLK_24P576MHZ;
+ }

ret = snd_soc_dai_set_sysclk(cpu_dai, link_data->cpu_sysclk_id, mclk_freq,
SND_SOC_CLOCK_OUT);
@@ -466,7 +474,7 @@ static int be_hw_params_fixup(struct snd_soc_pcm_runtime *rtd,

mask = hw_param_mask(params, SNDRV_PCM_HW_PARAM_FORMAT);
snd_mask_none(mask);
- snd_mask_set(mask, data->asrc_format);
+ snd_mask_set(mask, (__force unsigned int)data->asrc_format);

return 0;
}
@@ -485,6 +493,7 @@ static int imx_card_parse_of(struct imx_card_data *data)
struct dai_link_data *link_data;
struct of_phandle_args args;
int ret, num_links;
+ u32 asrc_fmt = 0;
u32 width;

ret = snd_soc_of_parse_card_name(card, "model");
@@ -631,7 +640,8 @@ static int imx_card_parse_of(struct imx_card_data *data)
goto err;
}

- ret = of_property_read_u32(args.np, "fsl,asrc-format", &data->asrc_format);
+ ret = of_property_read_u32(args.np, "fsl,asrc-format", &asrc_fmt);
+ data->asrc_format = (__force snd_pcm_format_t)asrc_fmt;
if (ret) {
/* Fallback to old binding; translate to asrc_format */
ret = of_property_read_u32(args.np, "fsl,asrc-width", &width);
diff --git a/sound/soc/generic/audio-graph-card.c b/sound/soc/generic/audio-graph-card.c
index 2b598af8feef..b327372f2e4a 100644
--- a/sound/soc/generic/audio-graph-card.c
+++ b/sound/soc/generic/audio-graph-card.c
@@ -158,8 +158,10 @@ static int asoc_simple_parse_dai(struct device_node *ep,
* if he unbinded CPU or Codec.
*/
ret = snd_soc_get_dai_name(&args, &dlc->dai_name);
- if (ret < 0)
+ if (ret < 0) {
+ of_node_put(node);
return ret;
+ }

dlc->of_node = node;

diff --git a/sound/soc/generic/audio-graph-card2.c b/sound/soc/generic/audio-graph-card2.c
index c0f3907a01fd..da782403316f 100644
--- a/sound/soc/generic/audio-graph-card2.c
+++ b/sound/soc/generic/audio-graph-card2.c
@@ -229,7 +229,8 @@ enum graph_type {

static enum graph_type __graph_get_type(struct device_node *lnk)
{
- struct device_node *np;
+ struct device_node *np, *parent_np;
+ enum graph_type ret;

/*
* target {
@@ -240,19 +241,33 @@ static enum graph_type __graph_get_type(struct device_node *lnk)
* };
*/
np = of_get_parent(lnk);
- if (of_node_name_eq(np, "ports"))
- np = of_get_parent(np);
+ if (of_node_name_eq(np, "ports")) {
+ parent_np = of_get_parent(np);
+ of_node_put(np);
+ np = parent_np;
+ }
+
+ if (of_node_name_eq(np, GRAPH_NODENAME_MULTI)) {
+ ret = GRAPH_MULTI;
+ goto out_put;
+ }
+
+ if (of_node_name_eq(np, GRAPH_NODENAME_DPCM)) {
+ ret = GRAPH_DPCM;
+ goto out_put;
+ }

- if (of_node_name_eq(np, GRAPH_NODENAME_MULTI))
- return GRAPH_MULTI;
+ if (of_node_name_eq(np, GRAPH_NODENAME_C2C)) {
+ ret = GRAPH_C2C;
+ goto out_put;
+ }

- if (of_node_name_eq(np, GRAPH_NODENAME_DPCM))
- return GRAPH_DPCM;
+ ret = GRAPH_NORMAL;

- if (of_node_name_eq(np, GRAPH_NODENAME_C2C))
- return GRAPH_C2C;
+out_put:
+ of_node_put(np);
+ return ret;

- return GRAPH_NORMAL;
}

static enum graph_type graph_get_type(struct asoc_simple_priv *priv,
@@ -430,8 +445,10 @@ static int asoc_simple_parse_dai(struct device_node *ep,
* if he unbinded CPU or Codec.
*/
ret = snd_soc_get_dai_name(&args, &dlc->dai_name);
- if (ret < 0)
+ if (ret < 0) {
+ of_node_put(node);
return ret;
+ }

dlc->of_node = node;

@@ -856,7 +873,7 @@ int audio_graph2_link_c2c(struct asoc_simple_priv *priv,
struct device_node *port0, *port1, *ports;
struct device_node *codec0_port, *codec1_port;
struct device_node *ep0, *ep1;
- u32 val;
+ u32 val = 0;
int ret = -EINVAL;

/*
@@ -880,7 +897,8 @@ int audio_graph2_link_c2c(struct asoc_simple_priv *priv,
ports = of_get_parent(port0);
port1 = of_get_next_child(ports, lnk);

- if (!of_get_property(ports, "rate", &val)) {
+ of_property_read_u32(ports, "rate", &val);
+ if (!val) {
struct device *dev = simple_priv_to_dev(priv);

dev_err(dev, "Codec2Codec needs rate settings\n");
diff --git a/sound/soc/intel/boards/sof_rt5682.c b/sound/soc/intel/boards/sof_rt5682.c
index 7126fcb63d90..d9f83b04be87 100644
--- a/sound/soc/intel/boards/sof_rt5682.c
+++ b/sound/soc/intel/boards/sof_rt5682.c
@@ -435,6 +435,15 @@ static int sof_card_late_probe(struct snd_soc_card *card)
int err;
int i = 0;

+ if (sof_rt5682_quirk & SOF_MAX98373_SPEAKER_AMP_PRESENT) {
+ /* Disable Left and Right Spk pin after boot */
+ snd_soc_dapm_disable_pin(dapm, "Left Spk");
+ snd_soc_dapm_disable_pin(dapm, "Right Spk");
+ err = snd_soc_dapm_sync(dapm);
+ if (err < 0)
+ return err;
+ }
+
/* HDMI is not supported by SOF on Baytrail/CherryTrail */
if (is_legacy_cpu || !ctx->idisp_codec)
return 0;
@@ -468,15 +477,6 @@ static int sof_card_late_probe(struct snd_soc_card *card)
i++;
}

- if (sof_rt5682_quirk & SOF_MAX98373_SPEAKER_AMP_PRESENT) {
- /* Disable Left and Right Spk pin after boot */
- snd_soc_dapm_disable_pin(dapm, "Left Spk");
- snd_soc_dapm_disable_pin(dapm, "Right Spk");
- err = snd_soc_dapm_sync(dapm);
- if (err < 0)
- return err;
- }
-
return hdac_hdmi_jack_port_init(component, &card->dapm);
}

diff --git a/sound/soc/mediatek/mt6797/mt6797-mt6351.c b/sound/soc/mediatek/mt6797/mt6797-mt6351.c
index 496f32bcfb5e..d2f6213a6bfc 100644
--- a/sound/soc/mediatek/mt6797/mt6797-mt6351.c
+++ b/sound/soc/mediatek/mt6797/mt6797-mt6351.c
@@ -217,7 +217,8 @@ static int mt6797_mt6351_dev_probe(struct platform_device *pdev)
if (!codec_node) {
dev_err(&pdev->dev,
"Property 'audio-codec' missing or invalid\n");
- return -EINVAL;
+ ret = -EINVAL;
+ goto put_platform_node;
}
for_each_card_prelinks(card, i, dai_link) {
if (dai_link->codecs->name)
@@ -230,6 +231,9 @@ static int mt6797_mt6351_dev_probe(struct platform_device *pdev)
dev_err(&pdev->dev, "%s snd_soc_register_card fail %d\n",
__func__, ret);

+ of_node_put(codec_node);
+put_platform_node:
+ of_node_put(platform_node);
return ret;
}

diff --git a/sound/soc/mediatek/mt8173/mt8173-rt5650-rt5676.c b/sound/soc/mediatek/mt8173/mt8173-rt5650-rt5676.c
index 5716d9299066..c1e8633a74a7 100644
--- a/sound/soc/mediatek/mt8173/mt8173-rt5650-rt5676.c
+++ b/sound/soc/mediatek/mt8173/mt8173-rt5650-rt5676.c
@@ -256,14 +256,16 @@ static int mt8173_rt5650_rt5676_dev_probe(struct platform_device *pdev)
if (!mt8173_rt5650_rt5676_dais[DAI_LINK_CODEC_I2S].codecs[0].of_node) {
dev_err(&pdev->dev,
"Property 'audio-codec' missing or invalid\n");
- return -EINVAL;
+ ret = -EINVAL;
+ goto put_node;
}
mt8173_rt5650_rt5676_dais[DAI_LINK_CODEC_I2S].codecs[1].of_node =
of_parse_phandle(pdev->dev.of_node, "mediatek,audio-codec", 1);
if (!mt8173_rt5650_rt5676_dais[DAI_LINK_CODEC_I2S].codecs[1].of_node) {
dev_err(&pdev->dev,
"Property 'audio-codec' missing or invalid\n");
- return -EINVAL;
+ ret = -EINVAL;
+ goto put_node;
}
mt8173_rt5650_rt5676_codec_conf[0].dlc.of_node =
mt8173_rt5650_rt5676_dais[DAI_LINK_CODEC_I2S].codecs[1].of_node;
@@ -276,13 +278,15 @@ static int mt8173_rt5650_rt5676_dev_probe(struct platform_device *pdev)
if (!mt8173_rt5650_rt5676_dais[DAI_LINK_HDMI_I2S].codecs->of_node) {
dev_err(&pdev->dev,
"Property 'audio-codec' missing or invalid\n");
- return -EINVAL;
+ ret = -EINVAL;
+ goto put_node;
}

card->dev = &pdev->dev;

ret = devm_snd_soc_register_card(&pdev->dev, card);

+put_node:
of_node_put(platform_node);
return ret;
}
diff --git a/sound/soc/mediatek/mt8173/mt8173-rt5650.c b/sound/soc/mediatek/mt8173/mt8173-rt5650.c
index fc164f4f95f8..487807ee7019 100644
--- a/sound/soc/mediatek/mt8173/mt8173-rt5650.c
+++ b/sound/soc/mediatek/mt8173/mt8173-rt5650.c
@@ -280,7 +280,8 @@ static int mt8173_rt5650_dev_probe(struct platform_device *pdev)
if (!mt8173_rt5650_dais[DAI_LINK_CODEC_I2S].codecs[0].of_node) {
dev_err(&pdev->dev,
"Property 'audio-codec' missing or invalid\n");
- return -EINVAL;
+ ret = -EINVAL;
+ goto put_platform_node;
}
mt8173_rt5650_dais[DAI_LINK_CODEC_I2S].codecs[1].of_node =
mt8173_rt5650_dais[DAI_LINK_CODEC_I2S].codecs[0].of_node;
@@ -293,7 +294,7 @@ static int mt8173_rt5650_dev_probe(struct platform_device *pdev)
dev_err(&pdev->dev,
"%s codec_capture_dai name fail %d\n",
__func__, ret);
- return ret;
+ goto put_platform_node;
}
mt8173_rt5650_dais[DAI_LINK_CODEC_I2S].codecs[1].dai_name =
codec_capture_dai;
@@ -315,12 +316,14 @@ static int mt8173_rt5650_dev_probe(struct platform_device *pdev)
if (!mt8173_rt5650_dais[DAI_LINK_HDMI_I2S].codecs->of_node) {
dev_err(&pdev->dev,
"Property 'audio-codec' missing or invalid\n");
- return -EINVAL;
+ ret = -EINVAL;
+ goto put_platform_node;
}
card->dev = &pdev->dev;

ret = devm_snd_soc_register_card(&pdev->dev, card);

+put_platform_node:
of_node_put(platform_node);
return ret;
}
diff --git a/sound/soc/qcom/lpass-cpu.c b/sound/soc/qcom/lpass-cpu.c
index e6846ad2b5fa..964eb07f46d6 100644
--- a/sound/soc/qcom/lpass-cpu.c
+++ b/sound/soc/qcom/lpass-cpu.c
@@ -1090,6 +1090,7 @@ int asoc_qcom_lpass_cpu_platform_probe(struct platform_device *pdev)
dsp_of_node = of_parse_phandle(pdev->dev.of_node, "qcom,adsp", 0);
if (dsp_of_node) {
dev_err(dev, "DSP exists and holds audio resources\n");
+ of_node_put(dsp_of_node);
return -EBUSY;
}

diff --git a/sound/soc/qcom/qdsp6/q6adm.c b/sound/soc/qcom/qdsp6/q6adm.c
index 72c5719f1d25..a0678e8cf20a 100644
--- a/sound/soc/qcom/qdsp6/q6adm.c
+++ b/sound/soc/qcom/qdsp6/q6adm.c
@@ -217,7 +217,7 @@ static struct q6copp *q6adm_alloc_copp(struct q6adm *adm, int port_idx)
idx = find_first_zero_bit(&adm->copp_bitmap[port_idx],
MAX_COPPS_PER_PORT);

- if (idx > MAX_COPPS_PER_PORT)
+ if (idx >= MAX_COPPS_PER_PORT)
return ERR_PTR(-EBUSY);

c = kzalloc(sizeof(*c), GFP_ATOMIC);
diff --git a/sound/soc/samsung/aries_wm8994.c b/sound/soc/samsung/aries_wm8994.c
index 83acbe57b248..a0825da9fff9 100644
--- a/sound/soc/samsung/aries_wm8994.c
+++ b/sound/soc/samsung/aries_wm8994.c
@@ -628,8 +628,10 @@ static int aries_audio_probe(struct platform_device *pdev)
return -EINVAL;

codec = of_get_child_by_name(dev->of_node, "codec");
- if (!codec)
- return -EINVAL;
+ if (!codec) {
+ ret = -EINVAL;
+ goto out;
+ }

for_each_card_prelinks(card, i, dai_link) {
dai_link->codecs->of_node = of_parse_phandle(codec,
diff --git a/sound/soc/samsung/h1940_uda1380.c b/sound/soc/samsung/h1940_uda1380.c
index c994e67d1eaf..ca086243fcfd 100644
--- a/sound/soc/samsung/h1940_uda1380.c
+++ b/sound/soc/samsung/h1940_uda1380.c
@@ -8,7 +8,7 @@
// Based on version from Arnaud Patard <arnaud.patard@xxxxxxxxxxx>

#include <linux/types.h>
-#include <linux/gpio.h>
+#include <linux/gpio/consumer.h>
#include <linux/module.h>

#include <sound/soc.h>
diff --git a/sound/soc/samsung/rx1950_uda1380.c b/sound/soc/samsung/rx1950_uda1380.c
index 6ea1c8cc9167..2820097b00b9 100644
--- a/sound/soc/samsung/rx1950_uda1380.c
+++ b/sound/soc/samsung/rx1950_uda1380.c
@@ -128,7 +128,7 @@ static int rx1950_startup(struct snd_pcm_substream *substream)
&hw_rates);
}

-struct gpio_desc *gpiod_speaker_power;
+static struct gpio_desc *gpiod_speaker_power;

static int rx1950_spk_power(struct snd_soc_dapm_widget *w,
struct snd_kcontrol *kcontrol, int event)
@@ -227,7 +227,7 @@ static int rx1950_probe(struct platform_device *pdev)
return devm_snd_soc_register_card(dev, &rx1950_asoc);
}

-struct platform_driver rx1950_audio = {
+static struct platform_driver rx1950_audio = {
.driver = {
.name = "rx1950-audio",
.pm = &snd_soc_pm_ops,
diff --git a/sound/soc/sof/ipc3-topology.c b/sound/soc/sof/ipc3-topology.c
index 80fb82ece38d..74c230fbcf68 100644
--- a/sound/soc/sof/ipc3-topology.c
+++ b/sound/soc/sof/ipc3-topology.c
@@ -1629,6 +1629,7 @@ static int sof_ipc3_control_load_bytes(struct snd_sof_dev *sdev, struct snd_sof_
return 0;
err:
kfree(scontrol->ipc_control_data);
+ scontrol->ipc_control_data = NULL;
return ret;
}

diff --git a/sound/soc/sof/mediatek/mt8195/mt8195-loader.c b/sound/soc/sof/mediatek/mt8195/mt8195-loader.c
index ed18d6379e92..ef2664c3cd47 100644
--- a/sound/soc/sof/mediatek/mt8195/mt8195-loader.c
+++ b/sound/soc/sof/mediatek/mt8195/mt8195-loader.c
@@ -21,7 +21,7 @@ void sof_hifixdsp_boot_sequence(struct snd_sof_dev *sdev, u32 boot_addr)

/* pull high StatVectorSel to use AltResetVec (set bit4 to 1) */
snd_sof_dsp_update_bits(sdev, DSP_REG_BAR, DSP_RESET_SW,
- DSP_RESET_SW, DSP_RESET_SW);
+ STATVECTOR_SEL, STATVECTOR_SEL);

/* toggle DReset & BReset */
/* pull high DReset & BReset */
diff --git a/sound/soc/sof/sof-priv.h b/sound/soc/sof/sof-priv.h
index c856f0d84e49..f369242dd975 100644
--- a/sound/soc/sof/sof-priv.h
+++ b/sound/soc/sof/sof-priv.h
@@ -364,8 +364,8 @@ struct snd_sof_ipc_msg {

/**
* struct sof_ipc_pm_ops - IPC-specific PM ops
- * @ctx_save: Function pointer for context save
- * @ctx_restore: Function pointer for context restore
+ * @ctx_save: Optional function pointer for context save
+ * @ctx_restore: Optional function pointer for context restore
*/
struct sof_ipc_pm_ops {
int (*ctx_save)(struct snd_sof_dev *sdev);
diff --git a/sound/usb/bcd2000/bcd2000.c b/sound/usb/bcd2000/bcd2000.c
index cd4a0bc6d278..7aec0a95c609 100644
--- a/sound/usb/bcd2000/bcd2000.c
+++ b/sound/usb/bcd2000/bcd2000.c
@@ -348,7 +348,8 @@ static int bcd2000_init_midi(struct bcd2000 *bcd2k)
static void bcd2000_free_usb_related_resources(struct bcd2000 *bcd2k,
struct usb_interface *interface)
{
- /* usb_kill_urb not necessary, urb is aborted automatically */
+ usb_kill_urb(bcd2k->midi_out_urb);
+ usb_kill_urb(bcd2k->midi_in_urb);

usb_free_urb(bcd2k->midi_out_urb);
usb_free_urb(bcd2k->midi_in_urb);
diff --git a/sound/usb/quirks.c b/sound/usb/quirks.c
index 968d90caeefa..168fd802d70b 100644
--- a/sound/usb/quirks.c
+++ b/sound/usb/quirks.c
@@ -1843,6 +1843,8 @@ static const struct usb_audio_quirk_flags_table quirk_flags_table[] = {
QUIRK_FLAG_SHARE_MEDIA_DEVICE | QUIRK_FLAG_ALIGN_TRANSFER),
DEVICE_FLG(0x1395, 0x740a, /* Sennheiser DECT */
QUIRK_FLAG_GET_SAMPLE_RATE),
+ DEVICE_FLG(0x1397, 0x0507, /* Behringer UMC202HD */
+ QUIRK_FLAG_PLAYBACK_FIRST | QUIRK_FLAG_GENERIC_IMPLICIT_FB),
DEVICE_FLG(0x1397, 0x0508, /* Behringer UMC204HD */
QUIRK_FLAG_PLAYBACK_FIRST | QUIRK_FLAG_GENERIC_IMPLICIT_FB),
DEVICE_FLG(0x1397, 0x0509, /* Behringer UMC404HD */
diff --git a/tools/bpf/bpftool/link.c b/tools/bpf/bpftool/link.c
index 97dec81950e5..6353a789322b 100644
--- a/tools/bpf/bpftool/link.c
+++ b/tools/bpf/bpftool/link.c
@@ -20,6 +20,10 @@ static const char * const link_type_name[] = {
[BPF_LINK_TYPE_CGROUP] = "cgroup",
[BPF_LINK_TYPE_ITER] = "iter",
[BPF_LINK_TYPE_NETNS] = "netns",
+ [BPF_LINK_TYPE_XDP] = "xdp",
+ [BPF_LINK_TYPE_PERF_EVENT] = "perf_event",
+ [BPF_LINK_TYPE_KPROBE_MULTI] = "kprobe_multi",
+ [BPF_LINK_TYPE_STRUCT_OPS] = "struct_ops",
};

static struct hashmap *link_table;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d14b10b85e51..ff9af73859ca 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1013,6 +1013,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_XDP = 6,
BPF_LINK_TYPE_PERF_EVENT = 7,
BPF_LINK_TYPE_KPROBE_MULTI = 8,
+ BPF_LINK_TYPE_STRUCT_OPS = 9,

MAX_BPF_LINK_TYPE,
};
@@ -5143,6 +5144,17 @@ union bpf_attr {
* The **hash_algo** is returned on success,
* **-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
* invalid arguments are passed.
+ *
+ * void *bpf_kptr_xchg(void *map_value, void *ptr)
+ * Description
+ * Exchange kptr at pointer *map_value* with *ptr*, and return the
+ * old value. *ptr* can be NULL, otherwise it must be a referenced
+ * pointer which will be released when this helper is called.
+ * Return
+ * The old value of kptr (which can be NULL). The returned pointer
+ * if not NULL, is a reference which must be released using its
+ * corresponding release function, or moved into a BPF map before
+ * program exit.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5339,6 +5351,7 @@ union bpf_attr {
FN(copy_from_user_task), \
FN(skb_set_tstamp), \
FN(ima_file_hash), \
+ FN(kptr_xchg), \
/* */

/* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h
index e3a8c947e89f..479f514f7411 100644
--- a/tools/lib/bpf/bpf_tracing.h
+++ b/tools/lib/bpf/bpf_tracing.h
@@ -227,7 +227,7 @@ struct pt_regs___arm64 {
#define __PT_PARM5_REG a4
#define __PT_RET_REG ra
#define __PT_FP_REG s0
-#define __PT_RC_REG a5
+#define __PT_RC_REG a0
#define __PT_SP_REG sp
#define __PT_IP_REG pc
/* riscv does not select ARCH_HAS_SYSCALL_WRAPPER. */
diff --git a/tools/lib/bpf/gen_loader.c b/tools/lib/bpf/gen_loader.c
index 927745b08014..23f5c46708f8 100644
--- a/tools/lib/bpf/gen_loader.c
+++ b/tools/lib/bpf/gen_loader.c
@@ -533,7 +533,7 @@ void bpf_gen__record_attach_target(struct bpf_gen *gen, const char *attach_name,
gen->attach_kind = kind;
ret = snprintf(gen->attach_target, sizeof(gen->attach_target), "%s%s",
prefix, attach_name);
- if (ret == sizeof(gen->attach_target))
+ if (ret >= sizeof(gen->attach_target))
gen->error = -ENOSPC;
}

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 881ea905ca81..2323d3e38a8e 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -4271,7 +4271,7 @@ static int bpf_get_map_info_from_fdinfo(int fd, struct bpf_map_info *info)
int bpf_map__reuse_fd(struct bpf_map *map, int fd)
{
struct bpf_map_info info = {};
- __u32 len = sizeof(info);
+ __u32 len = sizeof(info), name_len;
int new_fd, err;
char *new_name;

@@ -4281,7 +4281,12 @@ int bpf_map__reuse_fd(struct bpf_map *map, int fd)
if (err)
return libbpf_err(err);

- new_name = strdup(info.name);
+ name_len = strlen(info.name);
+ if (name_len == BPF_OBJ_NAME_LEN - 1 && strncmp(map->name, info.name, name_len) == 0)
+ new_name = strdup(map->name);
+ else
+ new_name = strdup(info.name);
+
if (!new_name)
return libbpf_err(-errno);

diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index af136f73b09d..67dc010e9fe3 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -1147,8 +1147,6 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
goto out_mmap_tx;
}

- ctx->prog_fd = -1;
-
if (!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) {
err = __xsk_setup_xdp_prog(xsk, NULL);
if (err)
@@ -1229,7 +1227,10 @@ void xsk_socket__delete(struct xsk_socket *xsk)

ctx = xsk->ctx;
umem = ctx->umem;
- if (ctx->prog_fd != -1) {
+
+ xsk_put_ctx(ctx, true);
+
+ if (!ctx->refcount) {
xsk_delete_bpf_maps(xsk);
close(ctx->prog_fd);
if (ctx->has_bpf_link)
@@ -1248,8 +1249,6 @@ void xsk_socket__delete(struct xsk_socket *xsk)
}
}

- xsk_put_ctx(ctx, true);
-
umem->refcount--;
/* Do not close an fd that also has an associated umem connected
* to it.
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index f058e8cddfa8..e15fd6f9f7ea 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1663,12 +1663,6 @@ static int add_default_attributes(void)
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },

-};
- struct perf_event_attr default_sw_attrs[] = {
- { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
- { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES },
- { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS },
- { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS },
};

/*
@@ -1910,30 +1904,6 @@ static int add_default_attributes(void)
}

if (!evsel_list->core.nr_entries) {
- if (perf_pmu__has_hybrid()) {
- struct parse_events_error errinfo;
- const char *hybrid_str = "cycles,instructions,branches,branch-misses";
-
- if (target__has_cpu(&target))
- default_sw_attrs[0].config = PERF_COUNT_SW_CPU_CLOCK;
-
- if (evlist__add_default_attrs(evsel_list,
- default_sw_attrs) < 0) {
- return -1;
- }
-
- parse_events_error__init(&errinfo);
- err = parse_events(evsel_list, hybrid_str, &errinfo);
- if (err) {
- fprintf(stderr,
- "Cannot set up hybrid events %s: %d\n",
- hybrid_str, err);
- parse_events_error__print(&errinfo, hybrid_str);
- }
- parse_events_error__exit(&errinfo);
- return err ? -1 : 0;
- }
-
if (target__has_cpu(&target))
default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;

diff --git a/tools/perf/util/dsos.c b/tools/perf/util/dsos.c
index b97366f77bbf..2bd23e4cf19e 100644
--- a/tools/perf/util/dsos.c
+++ b/tools/perf/util/dsos.c
@@ -23,8 +23,19 @@ static int __dso_id__cmp(struct dso_id *a, struct dso_id *b)
if (a->ino > b->ino) return -1;
if (a->ino < b->ino) return 1;

- if (a->ino_generation > b->ino_generation) return -1;
- if (a->ino_generation < b->ino_generation) return 1;
+ /*
+ * Synthesized MMAP events have zero ino_generation, avoid comparing
+ * them with MMAP events with actual ino_generation.
+ *
+ * I found it harmful because the mismatch resulted in a new
+ * dso that did not have a build ID whereas the original dso did have a
+ * build ID. The build ID was essential because the object was not found
+ * otherwise. - Adrian
+ */
+ if (a->ino_generation && b->ino_generation) {
+ if (a->ino_generation > b->ino_generation) return -1;
+ if (a->ino_generation < b->ino_generation) return 1;
+ }

return 0;
}
diff --git a/tools/perf/util/genelf.c b/tools/perf/util/genelf.c
index aed49806a09b..953338b9e887 100644
--- a/tools/perf/util/genelf.c
+++ b/tools/perf/util/genelf.c
@@ -30,7 +30,11 @@

#define BUILD_ID_URANDOM /* different uuid for each run */

-#ifdef HAVE_LIBCRYPTO
+// FIXME, remove this and fix the deprecation warnings before its removed and
+// We'll break for good here...
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
+
+#ifdef HAVE_LIBCRYPTO_SUPPORT

#define BUILD_ID_MD5
#undef BUILD_ID_SHA /* does not seem to work well when linked with Java */
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index ef6ced5c5746..cb7b24493782 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1294,16 +1294,29 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,

if (elf_read_program_header(syms_ss->elf,
(u64)sym.st_value, &phdr)) {
- pr_warning("%s: failed to find program header for "
+ pr_debug4("%s: failed to find program header for "
"symbol: %s st_value: %#" PRIx64 "\n",
__func__, elf_name, (u64)sym.st_value);
- continue;
+ pr_debug4("%s: adjusting symbol: st_value: %#" PRIx64 " "
+ "sh_addr: %#" PRIx64 " sh_offset: %#" PRIx64 "\n",
+ __func__, (u64)sym.st_value, (u64)shdr.sh_addr,
+ (u64)shdr.sh_offset);
+ /*
+ * Fail to find program header, let's rollback
+ * to use shdr.sh_addr and shdr.sh_offset to
+ * calibrate symbol's file address, though this
+ * is not necessary for normal C ELF file, we
+ * still need to handle java JIT symbols in this
+ * case.
+ */
+ sym.st_value -= shdr.sh_addr - shdr.sh_offset;
+ } else {
+ pr_debug4("%s: adjusting symbol: st_value: %#" PRIx64 " "
+ "p_vaddr: %#" PRIx64 " p_offset: %#" PRIx64 "\n",
+ __func__, (u64)sym.st_value, (u64)phdr.p_vaddr,
+ (u64)phdr.p_offset);
+ sym.st_value -= phdr.p_vaddr - phdr.p_offset;
}
- pr_debug4("%s: adjusting symbol: st_value: %#" PRIx64 " "
- "p_vaddr: %#" PRIx64 " p_offset: %#" PRIx64 "\n",
- __func__, (u64)sym.st_value, (u64)phdr.p_vaddr,
- (u64)phdr.p_offset);
- sym.st_value -= phdr.p_vaddr - phdr.p_offset;
}

demangled = demangle_sym(dso, kmodule, elf_name);
diff --git a/tools/power/x86/intel-speed-select/isst-daemon.c b/tools/power/x86/intel-speed-select/isst-daemon.c
index dd372924bc82..d0400c6684ba 100644
--- a/tools/power/x86/intel-speed-select/isst-daemon.c
+++ b/tools/power/x86/intel-speed-select/isst-daemon.c
@@ -41,7 +41,7 @@ void process_level_change(int cpu)
time_t tm;
int ret;

- if (pkg_id >= MAX_PACKAGE_COUNT || die_id > MAX_DIE_PER_PACKAGE) {
+ if (pkg_id >= MAX_PACKAGE_COUNT || die_id >= MAX_DIE_PER_PACKAGE) {
debug_printf("Invalid package/die info for cpu:%d\n", cpu);
return;
}
diff --git a/tools/testing/selftests/bpf/prog_tests/btf.c b/tools/testing/selftests/bpf/prog_tests/btf.c
index ec823561b912..a294176f8a9d 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf.c
@@ -5226,7 +5226,7 @@ static void do_test_pprint(int test_num)
ret = snprintf(pin_path, sizeof(pin_path), "%s/%s",
"/sys/fs/bpf", test->map_name);

- if (CHECK(ret == sizeof(pin_path), "pin_path %s/%s is too long",
+ if (CHECK(ret >= sizeof(pin_path), "pin_path %s/%s is too long",
"/sys/fs/bpf", test->map_name)) {
err = -1;
goto done;
diff --git a/tools/testing/selftests/bpf/prog_tests/fexit_stress.c b/tools/testing/selftests/bpf/prog_tests/fexit_stress.c
index 3ee2107bbf7a..58b03d1a70c8 100644
--- a/tools/testing/selftests/bpf/prog_tests/fexit_stress.c
+++ b/tools/testing/selftests/bpf/prog_tests/fexit_stress.c
@@ -7,11 +7,9 @@

void test_fexit_stress(void)
{
- char test_skb[128] = {};
int fexit_fd[CNT] = {};
int link_fd[CNT] = {};
- char error[4096];
- int err, i, filter_fd;
+ int err, i;

const struct bpf_insn trace_program[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
@@ -20,25 +18,9 @@ void test_fexit_stress(void)

LIBBPF_OPTS(bpf_prog_load_opts, trace_opts,
.expected_attach_type = BPF_TRACE_FEXIT,
- .log_buf = error,
- .log_size = sizeof(error),
);

- const struct bpf_insn skb_program[] = {
- BPF_MOV64_IMM(BPF_REG_0, 0),
- BPF_EXIT_INSN(),
- };
-
- LIBBPF_OPTS(bpf_prog_load_opts, skb_opts,
- .log_buf = error,
- .log_size = sizeof(error),
- );
-
- LIBBPF_OPTS(bpf_test_run_opts, topts,
- .data_in = test_skb,
- .data_size_in = sizeof(test_skb),
- .repeat = 1,
- );
+ LIBBPF_OPTS(bpf_test_run_opts, topts);

err = libbpf_find_vmlinux_btf_id("bpf_fentry_test1",
trace_opts.expected_attach_type);
@@ -58,15 +40,9 @@ void test_fexit_stress(void)
goto out;
}

- filter_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL",
- skb_program, sizeof(skb_program) / sizeof(struct bpf_insn),
- &skb_opts);
- if (!ASSERT_GE(filter_fd, 0, "test_program_loaded"))
- goto out;
+ err = bpf_prog_test_run_opts(fexit_fd[0], &topts);
+ ASSERT_OK(err, "bpf_prog_test_run_opts");

- err = bpf_prog_test_run_opts(filter_fd, &topts);
- close(filter_fd);
- CHECK_FAIL(err);
out:
for (i = 0; i < CNT; i++) {
if (link_fd[i])
diff --git a/tools/testing/selftests/bpf/prog_tests/sock_fields.c b/tools/testing/selftests/bpf/prog_tests/sock_fields.c
index 9d211b5c22c4..7d23166c77af 100644
--- a/tools/testing/selftests/bpf/prog_tests/sock_fields.c
+++ b/tools/testing/selftests/bpf/prog_tests/sock_fields.c
@@ -394,7 +394,6 @@ void serial_test_sock_fields(void)
test();

done:
- test_sock_fields__detach(skel);
test_sock_fields__destroy(skel);
if (child_cg_fd >= 0)
close(child_cg_fd);
diff --git a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
index 7ad66a247c02..b2e415647bd7 100644
--- a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
+++ b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
@@ -646,7 +646,7 @@ static void test_tcp_clear_dtime(struct test_tc_dtime *skel)
__u32 *errs = skel->bss->errs[t];

skel->bss->test = t;
- test_inet_dtime(AF_INET6, SOCK_STREAM, IP6_DST, 0);
+ test_inet_dtime(AF_INET6, SOCK_STREAM, IP6_DST, 50000 + t);

ASSERT_EQ(dtimes[INGRESS_FWDNS_P100], 0,
dtime_cnt_str(t, INGRESS_FWDNS_P100));
@@ -683,7 +683,7 @@ static void test_tcp_dtime(struct test_tc_dtime *skel, int family, bool bpf_fwd)
errs = skel->bss->errs[t];

skel->bss->test = t;
- test_inet_dtime(family, SOCK_STREAM, addr, 0);
+ test_inet_dtime(family, SOCK_STREAM, addr, 50000 + t);

/* fwdns_prio100 prog does not read delivery_time_type, so
* kernel puts the (rcv) timetamp in __sk_buff->tstamp
@@ -715,13 +715,13 @@ static void test_udp_dtime(struct test_tc_dtime *skel, int family, bool bpf_fwd)
errs = skel->bss->errs[t];

skel->bss->test = t;
- test_inet_dtime(family, SOCK_DGRAM, addr, 0);
+ test_inet_dtime(family, SOCK_DGRAM, addr, 50000 + t);

ASSERT_EQ(dtimes[INGRESS_FWDNS_P100], 0,
dtime_cnt_str(t, INGRESS_FWDNS_P100));
/* non mono delivery time is not forwarded */
ASSERT_EQ(dtimes[INGRESS_FWDNS_P101], 0,
- dtime_cnt_str(t, INGRESS_FWDNS_P100));
+ dtime_cnt_str(t, INGRESS_FWDNS_P101));
for (i = EGRESS_FWDNS_P100; i < SET_DTIME; i++)
ASSERT_GT(dtimes[i], 0, dtime_cnt_str(t, i));

diff --git a/tools/testing/selftests/bpf/progs/test_tc_dtime.c b/tools/testing/selftests/bpf/progs/test_tc_dtime.c
index 06f300d06dbd..b596479a9ebe 100644
--- a/tools/testing/selftests/bpf/progs/test_tc_dtime.c
+++ b/tools/testing/selftests/bpf/progs/test_tc_dtime.c
@@ -11,6 +11,8 @@
#include <linux/in.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
+#include <linux/tcp.h>
+#include <linux/udp.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
#include <sys/socket.h>
@@ -115,6 +117,19 @@ static bool bpf_fwd(void)
return test < TCP_IP4_RT_FWD;
}

+static __u8 get_proto(void)
+{
+ switch (test) {
+ case UDP_IP4:
+ case UDP_IP6:
+ case UDP_IP4_RT_FWD:
+ case UDP_IP6_RT_FWD:
+ return IPPROTO_UDP;
+ default:
+ return IPPROTO_TCP;
+ }
+}
+
/* -1: parse error: TC_ACT_SHOT
* 0: not testing traffic: TC_ACT_OK
* >0: first byte is the inet_proto, second byte has the netns
@@ -122,11 +137,16 @@ static bool bpf_fwd(void)
*/
static int skb_get_type(struct __sk_buff *skb)
{
+ __u16 dst_ns_port = __bpf_htons(50000 + test);
void *data_end = ctx_ptr(skb->data_end);
void *data = ctx_ptr(skb->data);
__u8 inet_proto = 0, ns = 0;
struct ipv6hdr *ip6h;
+ __u16 sport, dport;
struct iphdr *iph;
+ struct tcphdr *th;
+ struct udphdr *uh;
+ void *trans;

switch (skb->protocol) {
case __bpf_htons(ETH_P_IP):
@@ -138,6 +158,7 @@ static int skb_get_type(struct __sk_buff *skb)
else if (iph->saddr == ip4_dst)
ns = DST_NS;
inet_proto = iph->protocol;
+ trans = iph + 1;
break;
case __bpf_htons(ETH_P_IPV6):
ip6h = data + sizeof(struct ethhdr);
@@ -148,15 +169,43 @@ static int skb_get_type(struct __sk_buff *skb)
else if (v6_equal(ip6h->saddr, (struct in6_addr)ip6_dst))
ns = DST_NS;
inet_proto = ip6h->nexthdr;
+ trans = ip6h + 1;
break;
default:
return 0;
}

- if ((inet_proto != IPPROTO_TCP && inet_proto != IPPROTO_UDP) || !ns)
+ /* skb is not from src_ns or dst_ns.
+ * skb is not the testing IPPROTO.
+ */
+ if (!ns || inet_proto != get_proto())
return 0;

- return (ns << 8 | inet_proto);
+ switch (inet_proto) {
+ case IPPROTO_TCP:
+ th = trans;
+ if (th + 1 > data_end)
+ return -1;
+ sport = th->source;
+ dport = th->dest;
+ break;
+ case IPPROTO_UDP:
+ uh = trans;
+ if (uh + 1 > data_end)
+ return -1;
+ sport = uh->source;
+ dport = uh->dest;
+ break;
+ default:
+ return 0;
+ }
+
+ /* The skb is the testing traffic */
+ if ((ns == SRC_NS && dport == dst_ns_port) ||
+ (ns == DST_NS && sport == dst_ns_port))
+ return (ns << 8 | inet_proto);
+
+ return 0;
}

/* format: direction@iface@netns
diff --git a/tools/testing/selftests/bpf/verifier/ref_tracking.c b/tools/testing/selftests/bpf/verifier/ref_tracking.c
index fbd682520e47..57a83d763ec1 100644
--- a/tools/testing/selftests/bpf/verifier/ref_tracking.c
+++ b/tools/testing/selftests/bpf/verifier/ref_tracking.c
@@ -796,7 +796,7 @@
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.result = REJECT,
- .errstr = "reference has not been acquired before",
+ .errstr = "R1 must be referenced when passed to release function",
},
{
/* !bpf_sk_fullsock(sk) is checked but !bpf_tcp_sock(sk) is not checked */
diff --git a/tools/testing/selftests/bpf/verifier/sock.c b/tools/testing/selftests/bpf/verifier/sock.c
index 86b24cad27a7..d11d0b28be41 100644
--- a/tools/testing/selftests/bpf/verifier/sock.c
+++ b/tools/testing/selftests/bpf/verifier/sock.c
@@ -417,7 +417,7 @@
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.result = REJECT,
- .errstr = "reference has not been acquired before",
+ .errstr = "R1 must be referenced when passed to release function",
},
{
"bpf_sk_release(bpf_sk_fullsock(skb->sk))",
@@ -436,7 +436,7 @@
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.result = REJECT,
- .errstr = "reference has not been acquired before",
+ .errstr = "R1 must be referenced when passed to release function",
},
{
"bpf_sk_release(bpf_tcp_sock(skb->sk))",
@@ -455,7 +455,7 @@
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.result = REJECT,
- .errstr = "reference has not been acquired before",
+ .errstr = "R1 must be referenced when passed to release function",
},
{
"sk_storage_get(map, skb->sk, NULL, 0): value == NULL",
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 33ea5e9955d9..b9984124a294 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -1423,7 +1423,7 @@ uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2,

asm volatile("vmcall"
: "=a"(r)
- : "b"(a0), "c"(a1), "d"(a2), "S"(a3));
+ : "a"(nr), "b"(a0), "c"(a1), "d"(a2), "S"(a3));
return r;
}

diff --git a/tools/testing/selftests/powerpc/papr_attributes/attr_test.c b/tools/testing/selftests/powerpc/papr_attributes/attr_test.c
index bab0dc06e90b..9b655be641c9 100644
--- a/tools/testing/selftests/powerpc/papr_attributes/attr_test.c
+++ b/tools/testing/selftests/powerpc/papr_attributes/attr_test.c
@@ -7,6 +7,7 @@
* Copyright 2022, Pratik Rajesh Sampat, IBM Corp.
*/

+#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <dirent.h>
@@ -32,7 +33,7 @@ enum type {
NUM_VAL
};

-int value_type(int id)
+static int value_type(int id)
{
int val_type;

@@ -54,15 +55,21 @@ int value_type(int id)
return val_type;
}

-int verify_energy_info(void)
+static int verify_energy_info(void)
{
const char *path = "/sys/firmware/papr/energy_scale_info";
struct dirent *entry;
struct stat s;
DIR *dirp;

- if (stat(path, &s) || !S_ISDIR(s.st_mode))
- return -1;
+ errno = 0;
+ if (stat(path, &s)) {
+ SKIP_IF(errno == ENOENT);
+ FAIL_IF(errno);
+ }
+
+ FAIL_IF(!S_ISDIR(s.st_mode));
+
dirp = opendir(path);

while ((entry = readdir(dirp)) != NULL) {
@@ -76,25 +83,24 @@ int verify_energy_info(void)

id = atoi(entry->d_name);
attr_type = value_type(id);
- if (attr_type == INVALID)
- return -1;
+ FAIL_IF(attr_type == INVALID);

/* Check if the files exist and have data in them */
sprintf(file_name, "%s/%d/desc", path, id);
f = fopen(file_name, "r");
- if (!f || fgetc(f) == EOF)
- return -1;
+ FAIL_IF(!f);
+ FAIL_IF(fgetc(f) == EOF);

sprintf(file_name, "%s/%d/value", path, id);
f = fopen(file_name, "r");
- if (!f || fgetc(f) == EOF)
- return -1;
+ FAIL_IF(!f);
+ FAIL_IF(fgetc(f) == EOF);

if (attr_type == STR_VAL) {
sprintf(file_name, "%s/%d/value_desc", path, id);
f = fopen(file_name, "r");
- if (!f || fgetc(f) == EOF)
- return -1;
+ FAIL_IF(!f);
+ FAIL_IF(fgetc(f) == EOF);
}
}

diff --git a/tools/testing/selftests/rcutorture/bin/kvm.sh b/tools/testing/selftests/rcutorture/bin/kvm.sh
index 55b2c1533282..f2768994d861 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm.sh
@@ -44,6 +44,7 @@ TORTURE_KCONFIG_KASAN_ARG=""
TORTURE_KCONFIG_KCSAN_ARG=""
TORTURE_KMAKE_ARG=""
TORTURE_QEMU_MEM=512
+torture_qemu_mem_default=1
TORTURE_REMOTE=
TORTURE_SHUTDOWN_GRACE=180
TORTURE_SUITE=rcu
@@ -163,7 +164,7 @@ do
shift
;;
--gdb)
- TORTURE_KCONFIG_GDB_ARG="CONFIG_DEBUG_INFO=y"; export TORTURE_KCONFIG_GDB_ARG
+ TORTURE_KCONFIG_GDB_ARG="CONFIG_DEBUG_INFO_NONE=n CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y"; export TORTURE_KCONFIG_GDB_ARG
TORTURE_BOOT_GDB_ARG="nokaslr"; export TORTURE_BOOT_GDB_ARG
TORTURE_QEMU_GDB_ARG="-s -S"; export TORTURE_QEMU_GDB_ARG
;;
@@ -179,7 +180,11 @@ do
shift
;;
--kasan)
- TORTURE_KCONFIG_KASAN_ARG="CONFIG_DEBUG_INFO=y CONFIG_KASAN=y"; export TORTURE_KCONFIG_KASAN_ARG
+ TORTURE_KCONFIG_KASAN_ARG="CONFIG_DEBUG_INFO_NONE=n CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y CONFIG_KASAN=y"; export TORTURE_KCONFIG_KASAN_ARG
+ if test -n "$torture_qemu_mem_default"
+ then
+ TORTURE_QEMU_MEM=2G
+ fi
;;
--kconfig|--kconfigs)
checkarg --kconfig "(Kconfig options)" $# "$2" '^CONFIG_[A-Z0-9_]\+=$[ynm]\|[0-9]\+$$ CONFIG_[A-Z0-9_]\+=\([ynm]\|[0-9]\+$\)*$' '^error$'
@@ -187,7 +192,7 @@ do
shift
;;
--kcsan)
- TORTURE_KCONFIG_KCSAN_ARG="CONFIG_DEBUG_INFO=y CONFIG_KCSAN=y CONFIG_KCSAN_STRICT=y CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000 CONFIG_KCSAN_VERBOSE=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y"; export TORTURE_KCONFIG_KCSAN_ARG
+ TORTURE_KCONFIG_KCSAN_ARG="CONFIG_DEBUG_INFO_NONE=n CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y CONFIG_KCSAN=y CONFIG_KCSAN_STRICT=y CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000 CONFIG_KCSAN_VERBOSE=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y"; export TORTURE_KCONFIG_KCSAN_ARG
;;
--kmake-arg|--kmake-args)
checkarg --kmake-arg "(kernel make arguments)" $# "$2" '.*' '^error$'
@@ -202,6 +207,7 @@ do
--memory)
checkarg --memory "(memory size)" $# "$2" '^[0-9]\+[MG]\?$' error
TORTURE_QEMU_MEM=$2
+ torture_qemu_mem_default=
shift
;;
--no-initrd)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 313bb0cbfb1e..6f65eeb6a3dd 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -802,7 +802,7 @@ void kill_thread_or_group(struct __test_metadata *_metadata,
.len = (unsigned short)ARRAY_SIZE(filter_thread),
.filter = filter_thread,
};
- int kill = kill_how == KILL_PROCESS ? SECCOMP_RET_KILL_PROCESS : 0xAAAAAAAAA;
+ int kill = kill_how == KILL_PROCESS ? SECCOMP_RET_KILL_PROCESS : 0xAAAAAAAA;
struct sock_filter filter_process[] = {
BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
offsetof(struct seccomp_data, nr)),
diff --git a/tools/testing/selftests/timers/clocksource-switch.c b/tools/testing/selftests/timers/clocksource-switch.c
index ef8eb3604595..b57f0a9be490 100644
--- a/tools/testing/selftests/timers/clocksource-switch.c
+++ b/tools/testing/selftests/timers/clocksource-switch.c
@@ -110,10 +110,10 @@ int run_tests(int secs)

sprintf(buf, "./inconsistency-check -t %i", secs);
ret = system(buf);
- if (ret)
- return ret;
+ if (WIFEXITED(ret) && WEXITSTATUS(ret))
+ return WEXITSTATUS(ret);
ret = system("./nanosleep");
- return ret;
+ return WIFEXITED(ret) ? WEXITSTATUS(ret) : 0;
}

diff --git a/tools/testing/selftests/timers/valid-adjtimex.c b/tools/testing/selftests/timers/valid-adjtimex.c
index 5397de708d3c..48b9a803235a 100644
--- a/tools/testing/selftests/timers/valid-adjtimex.c
+++ b/tools/testing/selftests/timers/valid-adjtimex.c
@@ -40,7 +40,7 @@
#define ADJ_SETOFFSET 0x0100

#include <sys/syscall.h>
-static int clock_adjtime(clockid_t id, struct timex *tx)
+int clock_adjtime(clockid_t id, struct timex *tx)
{
return syscall(__NR_clock_adjtime, id, tx);
}
diff --git a/tools/testing/selftests/vm/hugepage-mremap.c b/tools/testing/selftests/vm/hugepage-mremap.c
index 1d689084a54b..22546f279534 100644
--- a/tools/testing/selftests/vm/hugepage-mremap.c
+++ b/tools/testing/selftests/vm/hugepage-mremap.c
@@ -107,7 +107,7 @@ static void register_region_with_uffd(char *addr, size_t len)

int main(int argc, char *argv[])
{
- size_t length;
+ size_t length = 0;

if (argc != 2 && argc != 3) {
printf("Usage: %s [length_in_MB] <hugetlb_file>\n", argv[0]);
diff --git a/tools/testing/selftests/vm/hugetlb-madvise.c b/tools/testing/selftests/vm/hugetlb-madvise.c
index 6c6af40f5747..3c9943131881 100644
--- a/tools/testing/selftests/vm/hugetlb-madvise.c
+++ b/tools/testing/selftests/vm/hugetlb-madvise.c
@@ -89,10 +89,11 @@ void write_fault_pages(void *addr, unsigned long nr_pages)

void read_fault_pages(void *addr, unsigned long nr_pages)
{
- unsigned long i, tmp;
+ unsigned long dummy = 0;
+ unsigned long i;

for (i = 0; i < nr_pages; i++)
- tmp += *((unsigned long *)(addr + (i * huge_page_size)));
+ dummy += *((unsigned long *)(addr + (i * huge_page_size)));
}

int main(int argc, char **argv)
diff --git a/tools/testing/selftests/wireguard/qemu/arch/riscv32.config b/tools/testing/selftests/wireguard/qemu/arch/riscv32.config
index 0bd0e72d95d4..2fc36efb166d 100644
--- a/tools/testing/selftests/wireguard/qemu/arch/riscv32.config
+++ b/tools/testing/selftests/wireguard/qemu/arch/riscv32.config
@@ -1,3 +1,4 @@
+CONFIG_NONPORTABLE=y
CONFIG_ARCH_RV32I=y
CONFIG_MMU=y
CONFIG_FPU=y
diff --git a/tools/thermal/tmon/sysfs.c b/tools/thermal/tmon/sysfs.c
index b00b1bfd9d8e..cb1108bc9249 100644
--- a/tools/thermal/tmon/sysfs.c
+++ b/tools/thermal/tmon/sysfs.c
@@ -13,6 +13,7 @@
#include <stdint.h>
#include <dirent.h>
#include <libintl.h>
+#include <limits.h>
#include <ctype.h>
#include <time.h>
#include <syslog.h>
@@ -33,9 +34,9 @@ int sysfs_set_ulong(char *path, char *filename, unsigned long val)
{
FILE *fd;
int ret = -1;
- char filepath[256];
+ char filepath[PATH_MAX + 2]; /* NUL and '/' */

- snprintf(filepath, 256, "%s/%s", path, filename);
+ snprintf(filepath, sizeof(filepath), "%s/%s", path, filename);

fd = fopen(filepath, "w");
if (!fd) {
@@ -57,9 +58,9 @@ static int sysfs_get_ulong(char *path, char *filename, unsigned long *p_ulong)
{
FILE *fd;
int ret = -1;
- char filepath[256];
+ char filepath[PATH_MAX + 2]; /* NUL and '/' */

- snprintf(filepath, 256, "%s/%s", path, filename);
+ snprintf(filepath, sizeof(filepath), "%s/%s", path, filename);

fd = fopen(filepath, "r");
if (!fd) {
@@ -76,9 +77,9 @@ static int sysfs_get_string(char *path, char *filename, char *str)
{
FILE *fd;
int ret = -1;
- char filepath[256];
+ char filepath[PATH_MAX + 2]; /* NUL and '/' */

- snprintf(filepath, 256, "%s/%s", path, filename);
+ snprintf(filepath, sizeof(filepath), "%s/%s", path, filename);

fd = fopen(filepath, "r");
if (!fd) {
@@ -199,8 +200,8 @@ static int find_tzone_cdev(struct dirent *nl, char *tz_name,
{
unsigned long trip_instance = 0;
char cdev_name_linked[256];
- char cdev_name[256];
- char cdev_trip_name[256];
+ char cdev_name[PATH_MAX];
+ char cdev_trip_name[PATH_MAX];
int cdev_id;

if (nl->d_type == DT_LNK) {
@@ -213,7 +214,8 @@ static int find_tzone_cdev(struct dirent *nl, char *tz_name,
return -EINVAL;
}
/* find the link to real cooling device record binding */
- snprintf(cdev_name, 256, "%s/%s", tz_name, nl->d_name);
+ snprintf(cdev_name, sizeof(cdev_name) - 2, "%s/%s",
+ tz_name, nl->d_name);
memset(cdev_name_linked, 0, sizeof(cdev_name_linked));
if (readlink(cdev_name, cdev_name_linked,
sizeof(cdev_name_linked) - 1) != -1) {
@@ -226,8 +228,8 @@ static int find_tzone_cdev(struct dirent *nl, char *tz_name,
/* find the trip point in which the cdev is binded to
* in this tzone
*/
- snprintf(cdev_trip_name, 256, "%s%s", nl->d_name,
- "_trip_point");
+ snprintf(cdev_trip_name, sizeof(cdev_trip_name) - 1,
+ "%s%s", nl->d_name, "_trip_point");
sysfs_get_ulong(tz_name, cdev_trip_name,
&trip_instance);
/* validate trip point range, e.g. trip could return -1
diff --git a/tools/thermal/tmon/tmon.h b/tools/thermal/tmon/tmon.h
index c9066ec104dd..44d16d778f04 100644
--- a/tools/thermal/tmon/tmon.h
+++ b/tools/thermal/tmon/tmon.h
@@ -27,6 +27,9 @@
#define NR_LINES_TZDATA 1
#define TMON_LOG_FILE "/var/tmp/tmon.log"

+#include <sys/time.h>
+#include <pthread.h>
+
extern unsigned long ticktime;
extern double time_elapsed;
extern unsigned long target_temp_user;
diff --git a/tools/tracing/rtla/Makefile b/tools/tracing/rtla/Makefile
index 3822f4ea5f49..1bea2d16d4c1 100644
--- a/tools/tracing/rtla/Makefile
+++ b/tools/tracing/rtla/Makefile
@@ -1,6 +1,6 @@
NAME := rtla
# Follow the kernel version
-VERSION := $(shell cat VERSION 2> /dev/null || make -sC ../../.. kernelversion)
+VERSION := $(shell cat VERSION 2> /dev/null || make -sC ../../.. kernelversion | grep -v make)

# From libtracefs:
# Makefiles suck: This macro sets a default value of $(2) for the
diff --git a/tools/tracing/rtla/src/trace.c b/tools/tracing/rtla/src/trace.c
index 5784c9f9e570..e1ba6d9f4265 100644
--- a/tools/tracing/rtla/src/trace.c
+++ b/tools/tracing/rtla/src/trace.c
@@ -134,13 +134,18 @@ void trace_instance_destroy(struct trace_instance *trace)
if (trace->inst) {
disable_tracer(trace->inst);
destroy_instance(trace->inst);
+ trace->inst = NULL;
}

- if (trace->seq)
+ if (trace->seq) {
free(trace->seq);
+ trace->seq = NULL;
+ }

- if (trace->tep)
+ if (trace->tep) {
tep_free(trace->tep);
+ trace->tep = NULL;
+ }
}

/*
diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c
index 5352167a1e75..5ae2fa96fde1 100644
--- a/tools/tracing/rtla/src/utils.c
+++ b/tools/tracing/rtla/src/utils.c
@@ -106,8 +106,9 @@ int parse_cpu_list(char *cpu_list, char **monitored_cpus)

nr_cpus = sysconf(_SC_NPROCESSORS_CONF);

- mon_cpus = malloc(nr_cpus * sizeof(char));
- memset(mon_cpus, 0, (nr_cpus * sizeof(char)));
+ mon_cpus = calloc(nr_cpus, sizeof(char));
+ if (!mon_cpus)
+ goto err;

for (p = cpu_list; *p; ) {
cpu = atoi(p);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7f1d19689701..843396ed4ad3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -724,6 +724,15 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
kvm->mn_active_invalidate_count++;
spin_unlock(&kvm->mn_invalidate_lock);

+ /*
+ * Invalidate pfn caches _before_ invalidating the secondary MMUs, i.e.
+ * before acquiring mmu_lock, to avoid holding mmu_lock while acquiring
+ * each cache's lock. There are relatively few caches in existence at
+ * any given time, and the caches themselves can check for hva overlap,
+ * i.e. don't need to rely on memslot overlap checks for performance.
+ * Because this runs without holding mmu_lock, the pfn caches must use
+ * mn_active_invalidate_count (see above) instead of mmu_notifier_count.
+ */
gfn_to_pfn_cache_invalidate_start(kvm, range->start, range->end,
hva_range.may_block);

@@ -2843,16 +2852,28 @@ void kvm_release_pfn_dirty(kvm_pfn_t pfn)
}
EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);

+static bool kvm_is_ad_tracked_pfn(kvm_pfn_t pfn)
+{
+ if (!pfn_valid(pfn))
+ return false;
+
+ /*
+ * Per page-flags.h, pages tagged PG_reserved "should in general not be
+ * touched (e.g. set dirty) except by its owner".
+ */
+ return !PageReserved(pfn_to_page(pfn));
+}
+
void kvm_set_pfn_dirty(kvm_pfn_t pfn)
{
- if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn))
+ if (kvm_is_ad_tracked_pfn(pfn))
SetPageDirty(pfn_to_page(pfn));
}
EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);

void kvm_set_pfn_accessed(kvm_pfn_t pfn)
{
- if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn))
+ if (kvm_is_ad_tracked_pfn(pfn))
mark_page_accessed(pfn_to_page(pfn));
}
EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index dd84676615f1..b0b678367376 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -95,7 +95,7 @@ bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
}
EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_check);

-static void __release_gpc(struct kvm *kvm, kvm_pfn_t pfn, void *khva, gpa_t gpa)
+static void gpc_release_pfn_and_khva(struct kvm *kvm, kvm_pfn_t pfn, void *khva)
{
/* Unmap the old page if it was mapped before, and release it */
if (!is_error_noslot_pfn(pfn)) {
@@ -112,31 +112,122 @@ static void __release_gpc(struct kvm *kvm, kvm_pfn_t pfn, void *khva, gpa_t gpa)
}
}

-static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva)
+static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_seq)
{
+ /*
+ * mn_active_invalidate_count acts for all intents and purposes
+ * like mmu_notifier_count here; but the latter cannot be used
+ * here because the invalidation of caches in the mmu_notifier
+ * event occurs _before_ mmu_notifier_count is elevated.
+ *
+ * Note, it does not matter that mn_active_invalidate_count
+ * is not protected by gpc->lock. It is guaranteed to
+ * be elevated before the mmu_notifier acquires gpc->lock, and
+ * isn't dropped until after mmu_notifier_seq is updated.
+ */
+ if (kvm->mn_active_invalidate_count)
+ return true;
+
+ /*
+ * Ensure mn_active_invalidate_count is read before
+ * mmu_notifier_seq. This pairs with the smp_wmb() in
+ * mmu_notifier_invalidate_range_end() to guarantee either the
+ * old (non-zero) value of mn_active_invalidate_count or the
+ * new (incremented) value of mmu_notifier_seq is observed.
+ */
+ smp_rmb();
+ return kvm->mmu_notifier_seq != mmu_seq;
+}
+
+static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
+{
+ /* Note, the new page offset may be different than the old! */
+ void *old_khva = gpc->khva - offset_in_page(gpc->khva);
+ kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
+ void *new_khva = NULL;
unsigned long mmu_seq;
- kvm_pfn_t new_pfn;
- int retry;
+
+ lockdep_assert_held(&gpc->refresh_lock);
+
+ lockdep_assert_held_write(&gpc->lock);
+
+ /*
+ * Invalidate the cache prior to dropping gpc->lock, the gpa=>uhva
+ * assets have already been updated and so a concurrent check() from a
+ * different task may not fail the gpa/uhva/generation checks.
+ */
+ gpc->valid = false;

do {
mmu_seq = kvm->mmu_notifier_seq;
smp_rmb();

+ write_unlock_irq(&gpc->lock);
+
+ /*
+ * If the previous iteration "failed" due to an mmu_notifier
+ * event, release the pfn and unmap the kernel virtual address
+ * from the previous attempt. Unmapping might sleep, so this
+ * needs to be done after dropping the lock. Opportunistically
+ * check for resched while the lock isn't held.
+ */
+ if (new_pfn != KVM_PFN_ERR_FAULT) {
+ /*
+ * Keep the mapping if the previous iteration reused
+ * the existing mapping and didn't create a new one.
+ */
+ if (new_khva == old_khva)
+ new_khva = NULL;
+
+ gpc_release_pfn_and_khva(kvm, new_pfn, new_khva);
+
+ cond_resched();
+ }
+
/* We always request a writeable mapping */
- new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL);
+ new_pfn = hva_to_pfn(gpc->uhva, false, NULL, true, NULL);
if (is_error_noslot_pfn(new_pfn))
- break;
+ goto out_error;
+
+ /*
+ * Obtain a new kernel mapping if KVM itself will access the
+ * pfn. Note, kmap() and memremap() can both sleep, so this
+ * too must be done outside of gpc->lock!
+ */
+ if (gpc->usage & KVM_HOST_USES_PFN) {
+ if (new_pfn == gpc->pfn) {
+ new_khva = old_khva;
+ } else if (pfn_valid(new_pfn)) {
+ new_khva = kmap(pfn_to_page(new_pfn));
+#ifdef CONFIG_HAS_IOMEM
+ } else {
+ new_khva = memremap(pfn_to_hpa(new_pfn), PAGE_SIZE, MEMREMAP_WB);
+#endif
+ }
+ if (!new_khva) {
+ kvm_release_pfn_clean(new_pfn);
+ goto out_error;
+ }
+ }

- KVM_MMU_READ_LOCK(kvm);
- retry = mmu_notifier_retry_hva(kvm, mmu_seq, uhva);
- KVM_MMU_READ_UNLOCK(kvm);
- if (!retry)
- break;
+ write_lock_irq(&gpc->lock);

- cond_resched();
- } while (1);
+ /*
+ * Other tasks must wait for _this_ refresh to complete before
+ * attempting to refresh.
+ */
+ WARN_ON_ONCE(gpc->valid);
+ } while (mmu_notifier_retry_cache(kvm, mmu_seq));

- return new_pfn;
+ gpc->valid = true;
+ gpc->pfn = new_pfn;
+ gpc->khva = new_khva + (gpc->gpa & ~PAGE_MASK);
+ return 0;
+
+out_error:
+ write_lock_irq(&gpc->lock);
+
+ return -EFAULT;
}

int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
@@ -146,9 +237,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
unsigned long page_offset = gpa & ~PAGE_MASK;
kvm_pfn_t old_pfn, new_pfn;
unsigned long old_uhva;
- gpa_t old_gpa;
void *old_khva;
- bool old_valid;
int ret = 0;

/*
@@ -158,13 +247,18 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
if (page_offset + len > PAGE_SIZE)
return -EINVAL;

+ /*
+ * If another task is refreshing the cache, wait for it to complete.
+ * There is no guarantee that concurrent refreshes will see the same
+ * gpa, memslots generation, etc..., so they must be fully serialized.
+ */
+ mutex_lock(&gpc->refresh_lock);
+
write_lock_irq(&gpc->lock);

- old_gpa = gpc->gpa;
old_pfn = gpc->pfn;
old_khva = gpc->khva - offset_in_page(gpc->khva);
old_uhva = gpc->uhva;
- old_valid = gpc->valid;

/* If the userspace HVA is invalid, refresh that first */
if (gpc->gpa != gpa || gpc->generation != slots->generation ||
@@ -177,64 +271,17 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
gpc->uhva = gfn_to_hva_memslot(gpc->memslot, gfn);

if (kvm_is_error_hva(gpc->uhva)) {
- gpc->pfn = KVM_PFN_ERR_FAULT;
ret = -EFAULT;
goto out;
}
-
- gpc->uhva += page_offset;
}

/*
* If the userspace HVA changed or the PFN was already invalid,
* drop the lock and do the HVA to PFN lookup again.
*/
- if (!old_valid || old_uhva != gpc->uhva) {
- unsigned long uhva = gpc->uhva;
- void *new_khva = NULL;
-
- /* Placeholders for "hva is valid but not yet mapped" */
- gpc->pfn = KVM_PFN_ERR_FAULT;
- gpc->khva = NULL;
- gpc->valid = true;
-
- write_unlock_irq(&gpc->lock);
-
- new_pfn = hva_to_pfn_retry(kvm, uhva);
- if (is_error_noslot_pfn(new_pfn)) {
- ret = -EFAULT;
- goto map_done;
- }
-
- if (gpc->usage & KVM_HOST_USES_PFN) {
- if (new_pfn == old_pfn) {
- new_khva = old_khva;
- old_pfn = KVM_PFN_ERR_FAULT;
- old_khva = NULL;
- } else if (pfn_valid(new_pfn)) {
- new_khva = kmap(pfn_to_page(new_pfn));
-#ifdef CONFIG_HAS_IOMEM
- } else {
- new_khva = memremap(pfn_to_hpa(new_pfn), PAGE_SIZE, MEMREMAP_WB);
-#endif
- }
- if (new_khva)
- new_khva += page_offset;
- else
- ret = -EFAULT;
- }
-
- map_done:
- write_lock_irq(&gpc->lock);
- if (ret) {
- gpc->valid = false;
- gpc->pfn = KVM_PFN_ERR_FAULT;
- gpc->khva = NULL;
- } else {
- /* At this point, gpc->valid may already have been cleared */
- gpc->pfn = new_pfn;
- gpc->khva = new_khva;
- }
+ if (!gpc->valid || old_uhva != gpc->uhva) {
+ ret = hva_to_pfn_retry(kvm, gpc);
} else {
/* If the HVA→PFN mapping was already valid, don't unmap it. */
old_pfn = KVM_PFN_ERR_FAULT;
@@ -242,9 +289,26 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
}

out:
+ /*
+ * Invalidate the cache and purge the pfn/khva if the refresh failed.
+ * Some/all of the uhva, gpa, and memslot generation info may still be
+ * valid, leave it as is.
+ */
+ if (ret) {
+ gpc->valid = false;
+ gpc->pfn = KVM_PFN_ERR_FAULT;
+ gpc->khva = NULL;
+ }
+
+ /* Snapshot the new pfn before dropping the lock! */
+ new_pfn = gpc->pfn;
+
write_unlock_irq(&gpc->lock);

- __release_gpc(kvm, old_pfn, old_khva, old_gpa);
+ mutex_unlock(&gpc->refresh_lock);
+
+ if (old_pfn != new_pfn)
+ gpc_release_pfn_and_khva(kvm, old_pfn, old_khva);

return ret;
}
@@ -254,14 +318,13 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
{
void *old_khva;
kvm_pfn_t old_pfn;
- gpa_t old_gpa;

+ mutex_lock(&gpc->refresh_lock);
write_lock_irq(&gpc->lock);

gpc->valid = false;

old_khva = gpc->khva - offset_in_page(gpc->khva);
- old_gpa = gpc->gpa;
old_pfn = gpc->pfn;

/*
@@ -272,8 +335,9 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
gpc->pfn = KVM_PFN_ERR_FAULT;

write_unlock_irq(&gpc->lock);
+ mutex_unlock(&gpc->refresh_lock);

- __release_gpc(kvm, old_pfn, old_khva, old_gpa);
+ gpc_release_pfn_and_khva(kvm, old_pfn, old_khva);
}
EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_unmap);

@@ -286,6 +350,7 @@ int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,

if (!gpc->active) {
rwlock_init(&gpc->lock);
+ mutex_init(&gpc->refresh_lock);

gpc->khva = NULL;
gpc->pfn = KVM_PFN_ERR_FAULT;

Next message: Greg Kroah-Hartman: "Re: Linux 5.19.2"
Previous message: Greg Kroah-Hartman: "Re: Linux 5.15.61"
In reply to: Greg Kroah-Hartman: "Linux 5.18.18"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]