Re: [PATCH 2/4] tools/nolibc: Fix strlcat() return code and size usage

From: Rodrigo Campos
Date: Mon Feb 12 2024 - 18:17:45 EST


On 2/11/24 11:48, Willy Tarreau wrote:
Hi Rodrigo,

first, thanks for the series!

Thank you, for your time and review! :)

On Mon, Jan 29, 2024 at 03:15:14PM +0100, Rodrigo Campos wrote:
The return code should always be strlen(src) + strlen(dst), but dst is
considered shorter if size is less than strlen(dst).

While we are there, make sure to copy at most size-1 bytes and
null-terminate the dst buffer.

Signed-off-by: Rodrigo Campos <rodrigo@xxxxxxxxxxx>
---
tools/include/nolibc/string.h | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index ed15c22b1b2a..b2149e1342a8 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -187,23 +187,23 @@ char *strndup(const char *str, size_t maxlen)
static __attribute__((unused))
size_t strlcat(char *dst, const char *src, size_t size)
{
- size_t len;
char c;
+ size_t len = strlen(dst);
+ size_t ret = strlen(src) + (size < len? size: len);

From what I'm reading in the man page, ret should *always* be the sum
of the two string lengths. I guess it helps for reallocation. It's even
explicitly mentioned:

"While this may seem somewhat confusing, it was done to make truncation
detection simple."

Yes, that was my *first* understanding of the manpage too. But it doesn't seem to be the correct interpretation.

Also, this is exactly what I tried to say in the cover letter, with:

I thought the manpage was clear, but when checking against that,
I noted a few twists (like the manpage says the return code of
strlcat is strlen(src) + strlen(dst), but it was not clear it is
not that if size < strlen(dst). When looking at the libbsd
implementation and re-reading the manpage, I understood what it
really meant).


Sorry if I wasn't clear. Let me try again.

My first interpretation of the manpage was also that, and I think it would make sense to be that one. However, it is wrong, they also say this, that is what made me add the ternary operator:

Note, however, that if strlcat() traverses size characters
without finding a NUL, the length of the string is considered
to be size and the destination string will not be NUL
terminated (since there was no space for the NUL)

So, my interpretation is that it is the sum of both, except when we can't find the NUL in that size, in which case we conside "size" to be the len of dst.

If you compare it with the output of libbsd, the return code seems to be exactly that. I was surprised too, as the manpage seem so clear... :-/

Above ret will be bound to the existing size so a realloc wouldn't work.
Thus I think the correct solution is:

Note that this implementation fails the tests added on patch 4. I've tested them (output and return code) to match the libbsd implementation.


The test inside the loop is going to make this not very efficient. Same
for the fact that we're measuring the length of src twice (once via
strlen, a second time through the loop). I've just had a look and it
compiles to 77 bytes at -Os. A simpler variant would consist in trying

How are you measuring the size?

I've added noinline to strlcat to the version I sent, so now it is shown in nm, but I have this as output:

$ nm --size -t x test.o
0000000000000004 V errno
0000000000000006 t strlcat.constprop.0
0000000000000008 V _auxv
0000000000000008 V environ
000000000000000e W strlen
000000000000000f W _start
0000000000000018 W raise
000000000000001b W abort
000000000000004c T main
000000000000005a t u64toa_r.isra.0
0000000000000095 W _start_c
00000000000002a8 t printf

How are you measuring it there?

Sorry, I'm not familiar with this :)


to copy what fits in <size> and once reached, go on just for trailing
zero and the size measurement:

size_t strlcat(char *dst, const char *src, size_t size)
{
size_t len = strlen(dst);

The thing is, we need to return always at least strlen(src). Maybe plus something else. Even if size==0 and we don't copy anything.

Maybe we can special case that, so we simplify the loop, and it's smaller?

I've been playing, but as I can't measure the size, I'm not sure what is really better. I'd like to play a little once I know how to measure it :)



Best,
Rodrigo