[GIT PULL] afs: Fix cell management, add tracepoint, downgrade assert

From: David Howells
Date: Fri Oct 16 2020 - 10:40:24 EST


Hi Linus,

Here are a collection of fixes to fix afs_cell struct refcounting, thereby
fixing a slew of related syzbot bugs:

(1) Fix the cell tree in the netns to use an rwsem rather than RCU.

There seem to be some problems deriving from the use of RCU and a
seqlock to walk the rbtree, but it's not entirely clear what since
there are several different failures being seen.

Changing things to use an rwsem instead makes it more robust. The
extra performance derived from using RCU isn't necessary in this case
since the only time we're looking up a cell is during mount or when
cells are being manually added.

(2) Fix the refcounting by splitting the usage counter into a memory
refcount and an active users counter. The usage counter was doing
double duty, keeping track of whether a cell is still in use and
keeping track of when it needs to be destroyed - but this makes the
clean up tricky. Separating these out simplifies the logic.

(3) Fix purging a cell that has an alias. A cell alias pins the cell it's
an alias of, but the alias is always later in the list. Trying to
purge in a single pass causes rmmod to hang in such a case.

(4) Fix cell removal. If a cell's manager is requeued whilst it's
removing itself, the manager will run again and re-remove itself,
causing problems in various places. Follow Hillf Danton's suggestion
to insert a more terminal state that causes the manager to do nothing
post-removal.

In additional to the above, I've included two more patches:

(1) Add a tracepoint for the cell refcount and active users count. This
helped with debugging the above and may be useful again in future.

(2) Downgrade an assertion to a print when a still-active server is seen
during purging. This was happening as a consequence of incomplete
cell removal before the servers were cleaned up.

David
---
The following changes since commit bbf5c979011a099af5dc76498918ed7df445635b:

Linux 5.9 (2020-10-11 14:15:50 -0700)

are available in the Git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/afs-fixes-20201016

for you to fetch changes up to 7530d3eb3dcf1a30750e8e7f1f88b782b96b72b8:

afs: Don't assert on unpurgeable server records (2020-10-16 14:39:34 +0100)

----------------------------------------------------------------
afs fixes

----------------------------------------------------------------
David Howells (6):
afs: Fix rapid cell addition/removal by not using RCU on cells tree
afs: Fix cell refcounting by splitting the usage counter
afs: Fix cell purging with aliases
afs: Fix cell removal
afs: Add tracing for cell refcount and active user count
afs: Don't assert on unpurgeable server records

fs/afs/cell.c | 328 +++++++++++++++++++++++++++++----------------
fs/afs/dynroot.c | 23 ++--
fs/afs/internal.h | 20 ++-
fs/afs/main.c | 2 +-
fs/afs/mntpt.c | 4 +-
fs/afs/proc.c | 23 ++--
fs/afs/server.c | 7 +-
fs/afs/super.c | 18 +--
fs/afs/vl_alias.c | 8 +-
fs/afs/vl_rotate.c | 2 +-
fs/afs/volume.c | 6 +-
include/trace/events/afs.h | 109 +++++++++++++++
12 files changed, 378 insertions(+), 172 deletions(-)