Re: [PATCH v2 0/4] dedupe smb unicode files

From: Tom Talpey
Date: Fri Jul 21 2023 - 09:19:49 EST


On 7/20/2023 7:57 PM, Dr. David Alan Gilbert wrote:
* Tom Talpey (tom@xxxxxxxxxx) wrote:
On 7/19/2023 6:06 PM, Dave Kleikamp wrote:
On 7/19/23 4:58PM, Dr. David Alan Gilbert wrote:
* Steve French (smfrench@xxxxxxxxx) wrote:
The related question is which tree to send it from, if no problems
reported (presumably mine since it mostly affect cifs.ko and ksmbd.ko,
and because there hasn't been activity in fs/nls for years)

That was my hope, given that ~half of the patches are directly on that
code, and it's the only very active tree this touches as far as I can
tell.

On Wed, Jul 19, 2023 at 12:56 PM Steve French
<smfrench@xxxxxxxxx> wrote:

No objections to this on my part.  If Shaggy is ok with the JFS
change, we could target it for 6.6-rc1 if it tests out ok

For the series:
Reviewed-by: Dave Kleikamp <dave.kleikamp@xxxxxxxxxx>

Steve,
Feel free to pull in even the 4th patch into your tree with my consent.
Or if you're more comfortable, I could submit it after yours hits
mainline.

Shaggy

The changes look good to me but there is one quirk with the
copyrights and SPDX in patch 2.

In the new fs/nls/nls_ucs2_utils.c, the SPDX line changes from
a "/* ... */" form to "// ...", which may be a proper update, but
then partway down, adds the same SPDX in "/* ... */ form. These
should at least be consistent.

+++ b/fs/nls/nls_ucs2_utils.c
@@ -1,19 +1,25 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
+// SPDX-License-Identifier: GPL-2.0-or-later

vs

+++ b/fs/nls/nls_ucs2_utils.h
@@ -0,0 +1,297 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */

Yeh that's an easy fix - so that's just the fact the .h has
the older /* where I'd fixed up the .c ?

Yep, for consistency that sounds good.

Second, the copyright in fs/nls/nls_ucs2_utils.c is a bit of
a mash-up (adding 2009 especially).

I think it's better to keep the exact text of both copyrights,
perhaps with a note as to which files had them previously, and
adding some new note/blank line to separate the recent contributions
from Namjae and you from the ancient history.

How about the following;

* This file has taken chunks from a few other files
* smb/server/uniupr.h had the declaration:

These two lines above aren't needed, because the lines below
contain the copyright and where they originated. So just omit
the two above.

*
* Some of the source code in this file came from fs/cifs/uniupr.h
* Copyright (c) International Business Machines Corp., 2000,2002
*
* fs/smb/server/unicode.c had the declaration:

And this one above - not needed.

*
* Some of the source code in this file came from fs/cifs/cifs_unicode.c
*
* Copyright (c) International Business Machines Corp., 2000,2009
* Modified by Steve French (sfrench@xxxxxxxxxx)
* Modified by Namjae Jeon (linkinjeon@xxxxxxxxxx)
*

I haven't added the extra line above Namjae's line, since it's now
a straight copy from the unicode.c entry.

Straight copy is what's important. No deletion, no edit in a copyright.
So, ok.

I'm not particularly fussed about adding my own line unless you think
it's needed; git keeps better history!

In fact, since you technically didn't add any code, just deleted,
moved or renamed, I think it might be best to leave yourself out.
But, totally your choice.

Tom.

+++ b/fs/nls/nls_ucs2_utils.c
...
- * Some of the source code in this file came from fs/cifs/uniupr.h
- * Copyright (c) International Business Machines Corp., 2000,2002
- *
- * uniupr.h - Unicode compressed case ranges
+ * Some of the source code in this file came from fs/cifs/cifs_unicode.c
+ * via fs/smb/unicode.c and fs/smb/uniupr.h and fs/cifs/uniupr.h
+ * Copyright (c) International Business Machines Corp., 2000,2002,2009
+ * Modified by Steve French (sfrench@xxxxxxxxxx)
+ * Modified by Namjae Jeon (linkinjeon@xxxxxxxxxx)
+ * Modified by Dr. David Alan Gilbert <linux@xxxxxxxxxxx>

Apart from considering these:

Reviewed-by: Tom Talpey <tom@xxxxxxxxxx>

Thanks!

Dave

Nice work!


Thanks.

Dave

On Wed, Jul 12, 2023 at 6:28 PM Dr. David Alan Gilbert
<dave@xxxxxxxxxxx> wrote:

* linux@xxxxxxxxxxx (linux@xxxxxxxxxxx) wrote:
From: "Dr. David Alan Gilbert" <linux@xxxxxxxxxxx>

The smb client and server code have (mostly) duplicated code
for unicode manipulation, in particular upper case handling.

Flatten this lot into shared code.

Gentle two week ping on this please.

Dave

(Apologies to the 3 of you who already got a copy of this ping,
recent due to a missing header ',' )

There's some code that's slightly different between the two, and
I've not attempted to share that - this should be strictly a no
behaviour change set.

In addition, the same tables and code are shared in jfs, however
there's very little testing available for the unicode in there,
so just share the raw data tables.

I suspect there's more UCS-2 code that can be shared, in the NLS code
and in the UCS-2 code used by the EFI interfaces.

Lightly tested with a module and a monolithic build,
and just mounting
itself.

This dupe was found using PMD:
   https://pmd.github.io/pmd/pmd_userdocs_cpd.html

Dave

Version 2
   Moved the shared code to fs/nls after v1 feedback.
   Renamed shared tables from Smb to Nls prefix
   Move UniStrcat as well
   Share the JFS tables

Dr. David Alan Gilbert (4):
   fs/smb: Remove unicode 'lower' tables
   fs/smb: Swing unicode common code from smb->NLS
   fs/smb/client: Use common code in client
   fs/jfs: Use common ucs2 upper case table

  fs/jfs/Kconfig               |   1 +
  fs/jfs/Makefile              |   2 +-
  fs/jfs/jfs_unicode.h         |  17 +-
  fs/jfs/jfs_uniupr.c          | 121 -------------
  fs/nls/Kconfig               |   8 +
  fs/nls/Makefile              |   1 +
  fs/nls/nls_ucs2_data.h       |  15 ++
  fs/nls/nls_ucs2_utils.c      | 144 +++++++++++++++
  fs/nls/nls_ucs2_utils.h      | 285 ++++++++++++++++++++++++++++++
  fs/smb/client/Kconfig        |   1 +
  fs/smb/client/cifs_unicode.c |   1 -
  fs/smb/client/cifs_unicode.h | 330
+----------------------------------
  fs/smb/client/cifs_uniupr.h  | 239 -------------------------
  fs/smb/server/Kconfig        |   1 +
  fs/smb/server/unicode.c      |   1 -
  fs/smb/server/unicode.h      | 325
+---------------------------------
  fs/smb/server/uniupr.h       | 268 ----------------------------
  17 files changed, 467 insertions(+), 1293 deletions(-)
  delete mode 100644 fs/jfs/jfs_uniupr.c
  create mode 100644 fs/nls/nls_ucs2_data.h
  create mode 100644 fs/nls/nls_ucs2_utils.c
  create mode 100644 fs/nls/nls_ucs2_utils.h
  delete mode 100644 fs/smb/client/cifs_uniupr.h
  delete mode 100644 fs/smb/server/uniupr.h

--
2.41.0

--
  -----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \
\        dave @ treblig.org |                               | In Hex /
  \ _________________________|_____ http://www.treblig.org  ; |_______/



--
Thanks,

Steve



--
Thanks,

Steve