Re: [RFC] Printing numbers in SI units

From: Rasmus Villemoes
Date: Wed Jan 24 2024 - 17:43:31 EST


On 24/01/2024 19.58, Matthew Wilcox wrote:
> I was looking at hugetlbfs and it has several snippets of code like
> this:
>
> string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32);
> pr_warn("HugeTLB: allocating %u of page size %s failed node%d. Only allocated %lu hugepages.\n",
> h->max_huge_pages_node[nid], buf, nid, i);
>
> That's not terribly ergonomic, so I wondered if I could do better.
> Unfortunately, I decided to do it using the SPECIAL flag which GCC
> warns about. But I've written the code now, so I'm sending it out in
> case anybody has a better idea for how to incorporate it.

Well, something that gcc will warn about with Wformat isn't gonna fly,
obviously. But my man page also mentions ' as a possible flag for d
conversions:

' For decimal conversion (i, d, u, f, F, g, G) the output is
to be grouped with thousands'
grouping characters if the locale information indicates any.

Obviously, our printf wouldn't implement that, but "grouping by
(roughly) 1000s" is kinda related to what you want the output to be, so
it seems apt. The man page also says "Note that many versions of gcc(1)
cannot parse this option and will issue a warning.", but I think that's
an ancient and irrelevant note. None of the gcc (or clang) versions
godbolt knows about give that warning.

I'm not really sure the implementation should imply a trailing B; that
might as well be included in the format string itself - perhaps the
quantity is bits, so a lowercase b is better, or perhaps it's some
frequency so the user wants Hz.

As for frequency, one would probably prefer the real, 1000^n, SI
prefixes and not the IEC 1024^n ones, and regardless, I think there
should be some way of doing both STRING_UNITS_2 and STRING_UNITS_10. At
first I thought one could distinguish by using either one or two ', but
gcc does warn for repeated flags. Two options I can think of: (a)
Overload the precision, so a precision p >= 10 means STRING_UNITS_10
with actual precision being p-10 (and yes, by all means cap that at 2).
(b) Currently %i and %d are completely equivalent, we could make them
different in case a ' flag is present and make one do the 2^n and other
the 10^n.

In the over-engineering department, maybe let the space flag indicate to
put a space between the number and the prefix.

>
> diff --git a/lib/test_printf.c b/lib/test_printf.c
> index 69b6a5e177f2..69af4d24a814 100644
> --- a/lib/test_printf.c
> +++ b/lib/test_printf.c
> @@ -178,6 +178,16 @@ test_number(void)
> * behaviour.
> */
> test("00|0|0|0|0", "%.2d|%.1d|%.0d|%.*d|%1.0d", 0, 0, 0, 0, 0, 0);
> +
> + /*
> + * C23 does not define the effect of "alternative form". Indeed
> + * I think it actually defines it to be Undefined Behaviour which
> + * apparently lets the compiler delete your entire source code.
> + */
> + test("2KiB", "%#d", 2048);
> + test("2MiB", "%#d", 2048 * 1024);
> + test("1GiB", "%#d", 1024 * 1024 * 1024);
> + test("1000MiB", "%#d", 1024 * 1024 * 1000);
> }
>
> static void __init
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 552738f14275..a702582c598c 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -370,6 +370,56 @@ char *put_dec(char *buf, unsigned long long n)
>
> #endif
>
> +/*
> + * precision is the number of digits after the decimal place. we limit
> + * it to two, because more would be complicated and unnecessary.
> + */
> +static noinline_for_stack
> +char *put_si(char *buf, unsigned long long n, int precision)

Please don't call the Mi etc. "SI", there's already enough confusion
about what a megabyte really is.

Rasmus