Re: [PATCH v2] nvdimm: Avoid race between probe and reading device attributes

From: Dan Williams
Date: Mon Feb 01 2021 - 18:20:48 EST


Yikes, sorry this languished so long, comments below:

On Mon, Jun 15, 2020 at 12:48 AM Richard Palethorpe
<rpalethorpe@xxxxxxxx> wrote:
>
> It is possible to cause a division error and use-after-free by querying the
> nmem device before the driver data is fully initialised in nvdimm_probe. E.g
> by doing
>
> (while true; do
> cat /sys/bus/nd/devices/nmem*/available_slots 2>&1 > /dev/null
> done) &
>
> while true; do
> for i in $(seq 0 4); do
> echo nmem$i > /sys/bus/nd/drivers/nvdimm/bind
> done
> for i in $(seq 0 4); do
> echo nmem$i > /sys/bus/nd/drivers/nvdimm/unbind
> done
> done
>
> On 5.7-rc3 this causes:
>
> [ 12.711578] divide error: 0000 [#1] SMP KASAN PTI
> [ 12.714857] RIP: 0010:nd_label_nfree+0x134/0x1a0 [libnvdimm]
[..]
> [ 12.725308] CR2: 00007fd16f1ec000 CR3: 0000000064322006 CR4: 0000000000160ef0
> [ 12.726268] Call Trace:
> [ 12.726633] available_slots_show+0x4e/0x120 [libnvdimm]
> [ 12.727380] dev_attr_show+0x42/0x80
> [ 12.727891] ? memset+0x20/0x40
> [ 12.728341] sysfs_kf_seq_show+0x218/0x410
> [ 12.728923] seq_read+0x389/0xe10
> [ 12.729415] vfs_read+0x101/0x2d0
> [ 12.729891] ksys_read+0xf9/0x1d0
> [ 12.730361] ? kernel_write+0x120/0x120
> [ 12.730915] do_syscall_64+0x95/0x4a0
> [ 12.731435] entry_SYSCALL_64_after_hwframe+0x49/0xb3
[..]
> Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure")
> Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
> Cc: Vishal Verma <vishal.l.verma@xxxxxxxxx>
> Cc: Dave Jiang <dave.jiang@xxxxxxxxx>
> Cc: Ira Weiny <ira.weiny@xxxxxxxxx>
> Cc: linux-nvdimm@xxxxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Cc: Coly Li <colyli@xxxxxxxx>
> Signed-off-by: Richard Palethorpe <rpalethorpe@xxxxxxxx>
> ---
>
> V2:
> + Reviewed by Coly and removed unecessary lock
>
> drivers/nvdimm/dimm.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvdimm/dimm.c b/drivers/nvdimm/dimm.c
> index 7d4ddc4d9322..3d3988e1d9a0 100644
> --- a/drivers/nvdimm/dimm.c
> +++ b/drivers/nvdimm/dimm.c
> @@ -43,7 +43,6 @@ static int nvdimm_probe(struct device *dev)
> if (!ndd)
> return -ENOMEM;
>
> - dev_set_drvdata(dev, ndd);
> ndd->dpa.name = dev_name(dev);
> ndd->ns_current = -1;
> ndd->ns_next = -1;
> @@ -106,6 +105,8 @@ static int nvdimm_probe(struct device *dev)
> if (rc)
> goto err;
>
> + dev_set_drvdata(dev, ndd);
> +

I see why this works, but I think the bug is in
available_slots_show(). It is a bug for a sysfs attribute to reference
driver-data without synchronizing against bind. So it should be
possible for probe set that pointer whenever it wants. In other words
this fix (forgive the whitespace damage from pasting).

diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index b59032e0859b..e68b17bc7aab 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -335,10 +335,8 @@ static ssize_t state_show(struct device *dev,
struct device_attribute *attr,
}
static DEVICE_ATTR_RO(state);

-static ssize_t available_slots_show(struct device *dev,
- struct device_attribute *attr, char *buf)
+static ssize_t __available_slots_show(struct nvdimm_drvdata *ndd, char *buf)
{
- struct nvdimm_drvdata *ndd = dev_get_drvdata(dev);
ssize_t rc;
u32 nfree;

@@ -356,6 +354,18 @@ static ssize_t available_slots_show(struct device *dev,
nvdimm_bus_unlock(dev);
return rc;
}
+
+static ssize_t available_slots_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ ssize_t rc;
+
+ nd_device_lock(dev);
+ rc = __available_slots_show(dev_get_drvdata(dev), buf);
+ nd_device_unlock(dev);
+
+ return rc;
+}
static DEVICE_ATTR_RO(available_slots);

__weak ssize_t security_show(struct device *dev,