Re: [PATCH v3 2/2] tpm: in tpm2_del_space check if ops pointer is still valid

From: James Bottomley
Date: Fri Feb 05 2021 - 15:17:14 EST


On Fri, 2021-02-05 at 04:18 +0200, Jarkko Sakkinen wrote:
> On Thu, Feb 04, 2021 at 04:34:11PM -0800, James Bottomley wrote:
> > On Fri, 2021-02-05 at 00:50 +0100, Lino Sanfilippo wrote:
> > > From: Lino Sanfilippo <l.sanfilippo@xxxxxxxxxx>
> > >
> > > In tpm2_del_space() chip->ops is used for flushing the sessions.
> > > However
> > > this function may be called after tpm_chip_unregister() which
> > > sets
> > > the chip->ops pointer to NULL.
> > > Avoid a possible NULL pointer dereference by checking if chip-
> > > >ops is
> > > still
> > > valid before accessing it.
> > >
> > > Fixes: a3fbfae82b4c ("tpm: take TPM chip power gating out of
> > > tpm_transmit()")
> > > Signed-off-by: Lino Sanfilippo <l.sanfilippo@xxxxxxxxxx>
> > > ---
> > > drivers/char/tpm/tpm2-space.c | 15 ++++++++++-----
> > > 1 file changed, 10 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/char/tpm/tpm2-space.c
> > > b/drivers/char/tpm/tpm2-
> > > space.c
> > > index 784b8b3..9a29a40 100644
> > > --- a/drivers/char/tpm/tpm2-space.c
> > > +++ b/drivers/char/tpm/tpm2-space.c
> > > @@ -58,12 +58,17 @@ int tpm2_init_space(struct tpm_space *space,
> > > unsigned int buf_size)
> > >
> > > void tpm2_del_space(struct tpm_chip *chip, struct tpm_space
> > > *space)
> > > {
> > > - mutex_lock(&chip->tpm_mutex);
> > > - if (!tpm_chip_start(chip)) {
> > > - tpm2_flush_sessions(chip, space);
> > > - tpm_chip_stop(chip);
> > > + down_read(&chip->ops_sem);
> > > + if (chip->ops) {
> > > + mutex_lock(&chip->tpm_mutex);
> > > + if (!tpm_chip_start(chip)) {
> > > + tpm2_flush_sessions(chip, space);
> > > + tpm_chip_stop(chip);
> > > + }
> > > + mutex_unlock(&chip->tpm_mutex);
> > > }
> > > - mutex_unlock(&chip->tpm_mutex);
> > > + up_read(&chip->ops_sem);
> > > +
> > > kfree(space->context_buf);
> > > kfree(space->session_buf);
> > > }
> >
> > Actually, this still isn't right. As I said to the last person who
> > reported this, we should be doing a get/put on the ops, not rolling
> > our
> > own here:
> >
> > https://lore.kernel.org/linux-integrity/e7566e1e48f5be9dca034b4bfb67683b5d3cb88f.camel@xxxxxxxxxxxxxxxxxxxxx/
> >
> > The reporter went silent before we could get this tested, but could
> > you
> > try, please, because your patch is still hand rolling the ops
> > get/put,
> > just slightly better than it had been done previously.
> >
> > James
>
> Thanks for pointing this out. I'd strongly support Jason's proposal:
>
> https://lore.kernel.org/linux-integrity/20201215175624.GG5487@xxxxxxxx/
>
> It's the best long-term way to fix this.

Really, no it's not. It introduces extra mechanism we don't need.

To recap the issue: character devices already have an automatic
mechanism which holds a reference to the struct device while the
character node is open so the default is to release resources on final
put of the struct device.

tpm 2 is special because we have two character device nodes: /dev/tpm0
and /dev/tpmrm0. The way we make this work is that tpm0 is the master
and tpmrm0 the slave, so the slave holds an extra reference on the
master which is put when the slave final put happens. This means that
our resources aren't freed until the final puts of both devices, which
is the model we're using.

The practical consequence of this model is that if you allocate a chip
structure with tpm_chip_alloc() you have to release it again by doing a
put of *both* devices.

However, patch fdc915f7f719 ("tpm: expose spaces via a device link
/dev/tpmrm<n>") contains two bugs: firstly it didn't add a devm action
release for devs and secondly it didn't update the only non-devm user
ibm vtpm to do the double put.

Stefan noticed the latter, so we got the bogus patch 8979b02aaf1d
("tpm: Fix reference count to main device") applied which simply breaks
the master/slave model by not taking a reference on the master for the
slave. I'm not sure why I didn't notice the problem with this fix at
the time, but attention must have been elsewhere.

Subsequently we got ftpm added which copied the ibm vtpm put bug.

So I think 1/2 is the correct fix for all three bugs. I just need to
find a way to verify it.

James