Re: [PATCH -next 3/4] block/rq_qos: use a global mutex to protect rq_qos apis

From: Yu Kuai
Date: Wed Jan 04 2023 - 20:35:40 EST


Hi,

在 2023/01/05 8:28, Tejun Heo 写道:
Hello, again.

On Wed, Jan 04, 2023 at 11:39:47AM -1000, Tejun Heo wrote:
2) rq_qos_add() and blkcg_activate_policy() is not atomic, if
rq_qos_exit() is done before blkcg_activate_policy(),
null-ptr-deference can be triggered.

I'm not sure this part does. I think it'd be better to guarantee that device
destruction is blocked while these configuration operations are in progress
which can be built into blkg_conf helpers.

A bit more explanation:

Usually, this would be handled in the core - when a device goes away, its
sysfs files get shut down before stuff gets freed and the sysfs file removal
waits for in-flight operations to finish and prevents new ones from
starting, so we don't have to worry about in-flight config file operations
racing against device removal.

Here, the problem isn't solved by that because the config files live on
cgroupfs and their lifetimes are not coupled with the block devices'. So, we
need to synchronize manually. And, given that, the right place to do is the
blkg config helpers cuz they're the ones which establish the connection
between cgroup and block layer.

Thanks for the explanation, I agree with that.

Can you please take a look at the following patchset I just posted:

https://lkml.kernel.org/r/20230105002007.157497-1-tj@xxxxxxxxxx

After that, all these configuration operations are wrapped between
blkg_conf_init() and blkg_conf_exit() which probably are the right place to
implement the synchronization.

I see that, blkg_conf_init() and blkg_conf_exit() is good, however there
are some details I want to confirm:

1) rq_qos_add() can be called from iocost/iolatency, where
blkg_conf_init() will be called first, while rq_qos_add() can also be
called from wbt, where there is no blkg_conf_init(). Hence it seems to
me we need two locks here, one to protect rq_qos apis; one to
synchronize policy configuration and device removal.

2) If you agree with 1), it seems better to use the other lock in device
level, consider that there is no need to synchronize confituration for
different devices.

Thanks,
Kuai

Thanks.