Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

From: Steve French
Date: Tue Sep 19 2023 - 01:36:27 EST


Does the attached patch help in your case? It avoids starting the
laundromat thread for IPC shares (which cuts the number of the threads
in half for many cases) and also avoids starting them if the server
does not support directory leases (e.g. if Samba server instead of
Windows server).


On Mon, Sep 18, 2023 at 10:00 PM Steve French <smfrench@xxxxxxxxx> wrote:
>
> Paulo and I were discussing the laundromat thread at the SMB3.1.1 test
> event (at SDC this week) which is now going on - will let you know
> what we find.
>
> One obvious thing is that it probably isn't necessary for cases when
> the server does not support directory leases, but we noticed another
> problem as well.
>
>
> On Mon, Sep 18, 2023 at 9:20 PM Brian Pardy <brian.pardy@xxxxxxxxx> wrote:
> >
> > [RS removed from CC due to bounce message]
> >
> > On Wed, Sep 6, 2023 at 5:03 PM Brian Pardy <brian.pardy@xxxxxxxxx> wrote:
> > > On Tue, Sep 5, 2023 at 9:01 PM Bagas Sanjaya <bagasdotme@xxxxxxxxx> wrote:
> > > > Thanks for the regression report. But if you want to get it fixed,
> > > > you have to do your part: perform bisection. See Documentation/admin-guide/bug-bisect.rst in the kernel sources for how to do that.
> > > >
> > > > Anyway, I'm adding it to regzbot:
> > > >
> > > > #regzbot ^introduced: v6.4..v6.5
> > > > #regzbot title: incorrect CPU utilization report (multiplied) when mounting CIFS
> > >
> > > Thank you for directing me to the bug-bisect documentation. Results below:
> > >
> > > # git bisect bad
> > > d14de8067e3f9653cdef5a094176d00f3260ab20 is the first bad commit
> > > commit d14de8067e3f9653cdef5a094176d00f3260ab20
> > > Author: Ronnie Sahlberg <lsahlber@xxxxxxxxxx>
> > > Date: Thu Jul 6 12:32:24 2023 +1000
> > >
> > > cifs: Add a laundromat thread for cached directories
> > >
> > > and drop cached directories after 30 seconds
> > >
> > > Signed-off-by: Ronnie Sahlberg <lsahlber@xxxxxxxxxx>
> > > Signed-off-by: Steve French <stfrench@xxxxxxxxxxxxx>
> > >
> > > fs/smb/client/cached_dir.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++
> > > fs/smb/client/cached_dir.h | 1 +
> > > 2 files changed, 68 insertions(+)
> >
> > Is there any further information I can provide to aid in debugging
> > this issue? Should I just expect incorrect load average reporting when
> > a CIFS share is mounted on any kernel >6.5.0?
> >
> > I'm not clear on the value or necessity of this "laundromat thread" -
> > everything worked as expected before it was added - shall I just patch
> > it out of my kernel builds going forward if there is no interest in
> > fixing it? Is a .config option to disable it possible?
>
>
>
> --
> Thanks,
>
> Steve



--
Thanks,

Steve
From a63586b4b0b12265097e874b42c7a2cdbbf4138b Mon Sep 17 00:00:00 2001
From: Steve French <stfrench@xxxxxxxxxxxxx>
Date: Tue, 19 Sep 2023 00:28:02 -0500
Subject: [PATCH] smb3: do not start laundromat thread when dir leases disabled

When no directory lease support, or for IPC shares where directories
can not be opened, do not start an unneeded laundromat thread for
that mount (it wastes resources).

Fixes: d14de8067e3f ("cifs: Add a laundromat thread for cached directories")
Signed-off-by: Steve French <stfrench@xxxxxxxxxxxxx>
---
fs/smb/client/cached_dir.c | 6 ++++++
fs/smb/client/cifsglob.h | 2 +-
fs/smb/client/cifsproto.h | 2 +-
fs/smb/client/connect.c | 9 +++++++--
fs/smb/client/misc.c | 16 ++++++++++------
fs/smb/client/smb2pdu.c | 5 ++++-
6 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
index b17f067e4ada..a7cbaa32005c 100644
--- a/fs/smb/client/cached_dir.c
+++ b/fs/smb/client/cached_dir.c
@@ -450,6 +450,9 @@ void invalidate_all_cached_dirs(struct cifs_tcon *tcon)
{
struct cached_fids *cfids = tcon->cfids;
struct cached_fid *cfid, *q;
+
+ if (cfids == NULL)
+ return;
LIST_HEAD(entry);

spin_lock(&cfids->cfid_list_lock);
@@ -651,6 +654,9 @@ void free_cached_dirs(struct cached_fids *cfids)
struct cached_fid *cfid, *q;
LIST_HEAD(entry);

+ if (cfids == NULL)
+ return;
+
if (cfids->laundromat) {
kthread_stop(cfids->laundromat);
cfids->laundromat = NULL;
diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index 032d8716f671..f594fcc0e889 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -1943,7 +1943,7 @@ require use of the stronger protocol */
* cifsInodeInfo->lock_sem cifsInodeInfo->llist cifs_init_once
* ->can_cache_brlcks
* cifsInodeInfo->deferred_lock cifsInodeInfo->deferred_closes cifsInodeInfo_alloc
- * cached_fid->fid_mutex cifs_tcon->crfid tconInfoAlloc
+ * cached_fid->fid_mutex cifs_tcon->crfid tcon_info_alloc
* cifsFileInfo->fh_mutex cifsFileInfo cifs_new_fileinfo
* cifsFileInfo->file_info_lock cifsFileInfo->count cifs_new_fileinfo
* ->invalidHandle initiate_cifs_search
diff --git a/fs/smb/client/cifsproto.h b/fs/smb/client/cifsproto.h
index 7d8035846680..74c428d3c916 100644
--- a/fs/smb/client/cifsproto.h
+++ b/fs/smb/client/cifsproto.h
@@ -512,7 +512,7 @@ extern int CIFSSMBLogoff(const unsigned int xid, struct cifs_ses *ses);

extern struct cifs_ses *sesInfoAlloc(void);
extern void sesInfoFree(struct cifs_ses *);
-extern struct cifs_tcon *tconInfoAlloc(void);
+extern struct cifs_tcon *tcon_info_alloc(bool init_cached_dirs);
extern void tconInfoFree(struct cifs_tcon *);

extern int cifs_sign_rqst(struct smb_rqst *rqst, struct TCP_Server_Info *server,
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 687754791bf0..45e60c0d262c 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -1882,7 +1882,9 @@ cifs_setup_ipc(struct cifs_ses *ses, struct smb3_fs_context *ctx)
}
}

- tcon = tconInfoAlloc();
+ /* no need to setup directory caching on IPC share, so pass in false */
+ tcon = tcon_info_alloc(false);
+
if (tcon == NULL)
return -ENOMEM;

@@ -2492,7 +2494,10 @@ cifs_get_tcon(struct cifs_ses *ses, struct smb3_fs_context *ctx)
goto out_fail;
}

- tcon = tconInfoAlloc();
+ if (ses->server->capabilities & SMB2_GLOBAL_CAP_DIRECTORY_LEASING)
+ tcon = tcon_info_alloc(true);
+ else
+ tcon = tcon_info_alloc(false);
if (tcon == NULL) {
rc = -ENOMEM;
goto out_fail;
diff --git a/fs/smb/client/misc.c b/fs/smb/client/misc.c
index 366b755ca913..ec23b9989b63 100644
--- a/fs/smb/client/misc.c
+++ b/fs/smb/client/misc.c
@@ -113,18 +113,22 @@ sesInfoFree(struct cifs_ses *buf_to_free)
}

struct cifs_tcon *
-tconInfoAlloc(void)
+tcon_info_alloc(bool dir_leases_enabled)
{
struct cifs_tcon *ret_buf;

ret_buf = kzalloc(sizeof(*ret_buf), GFP_KERNEL);
if (!ret_buf)
return NULL;
- ret_buf->cfids = init_cached_dirs();
- if (!ret_buf->cfids) {
- kfree(ret_buf);
- return NULL;
- }
+
+ if (dir_leases_enabled == true) {
+ ret_buf->cfids = init_cached_dirs();
+ if (!ret_buf->cfids) {
+ kfree(ret_buf);
+ return NULL;
+ }
+ } else
+ ret_buf->cfids = NULL;

atomic_inc(&tconInfoAllocCount);
ret_buf->status = TID_NEW;
diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c
index 44d4943e9c56..c63b72ed7fe9 100644
--- a/fs/smb/client/smb2pdu.c
+++ b/fs/smb/client/smb2pdu.c
@@ -3878,7 +3878,10 @@ void smb2_reconnect_server(struct work_struct *work)
goto done;

/* allocate a dummy tcon struct used for reconnect */
- tcon = tconInfoAlloc();
+ if (pserver->capabilities & SMB2_GLOBAL_CAP_LEASING)
+ tcon = tcon_info_alloc(true);
+ else
+ tcon = tcon_info_alloc(false);
if (!tcon) {
resched = true;
list_for_each_entry_safe(ses, ses2, &tmp_ses_list, rlist) {
--
2.39.2