Re: [RFC] tools/nolibc: replace duplicated -ENOSYS return with single -ENOSYS return

From: Thomas Weißschuh
Date: Sun Aug 27 2023 - 05:18:08 EST


Hi Zhangjin,

thanks for the RFC discussion!

On 2023-08-27 16:32:25+0800, Zhangjin Wu wrote:
> Since we have already finished the size inflate regression task [1], to share
> and discuss the progress about the -ENOSYS return work, here launchs a new
> thread, it is split from [2].
>
> [1]: https://lore.kernel.org/lkml/ZNtszQeigYuItaKA@xxxxxx/
> [2]: https://lore.kernel.org/lkml/20230814172233.225944-1-falcon@xxxxxxxxxxx/#R
>
> This is only for brain storming, it is far from a solution ;-)
>
> >
> > > [...]
> > > > >
> > > > > /* __systry2() is used to select one of two provided low level syscalls */
> > > > > #define __systry2(a, sys_a, sys_b) \
> > > > > ((NOLIBC__NR_##a != NOLIBC__NR_NOSYS) ? (sys_a) : (sys_b))
> > > >
> > > > But this supposes that all of them are manually defined as you did above.
> > > > I'd rather implement an ugly is_numeric() macro based on argument
> > > > resolution. I've done it once in another project, I don't remember
> > > > precisely where it is but I vaguely remember that it used to check
> > > > that the string resolution of the argument gave a letter (when it
> > > > does not exist) or a digit (when it does). I can look into that later
> > > > if needed. But please avoid extra macro definitions as much as possible,
> > > > they're a real pain to handle in the code. There's no error when one is
> > > > missing or has a typo, it's difficult to follow them and they don't
> > > > appear in the debugger.
> > > >
> > >
> > > Yeah, your reply inspired me to look into the IS_ENABLED() from
> > > ../include/linux/kconfig.h macro again, there was a __is_defined() there, let's
> > > throw away the ugly sysnr.h. I thought of IS_ENABLED() was only for y/n/m
> > > before, but it does return 0 when the macro is not defined, it uses the same
> > > trick in syscall() to calculate the number of arguments, if the macro is not
> > > defined, then, 0 "argument".
> > >
> >
> > The above trick is only for ""#define something 1" ;-)
> >
>
> Here shares a little progress on this, I have found it is easy to implement an
> ugly is_numeric() like macro as following:
>
> /* Imported from include/linux/stringify.h */
> #define __stringify_1(x...) #x
> #define __stringify(x...) __stringify_1(x)
>
> /*
> * Check __NR_* definition by stringizing
> *
> * - The stringizing is to silence compile error about undefined macro
> * - If defined, the result looks like "3", "(4000 + 168)", not begin with '_'
> * - If not defined, the result looks like "__NR_read", begins with '_'
> */
>
> #define __is_nr_defined(nr) ___is_nr_defined(__stringify(nr))
> #define ___is_nr_defined(str) (str[0] != '_')
>
> __is_nr_defined() is able to check if __NR_xxx is defined, but the harder part
> is getting the number of defined __NR_* without the error about undefined
> macro.
>
> Of course, we can also use the __stringify() trick to do so, but it is
> expensive (bigger size, worse performance) to unstringify and get the number
> again, the expensive atoi() 'works' for the numeric __NR_*, but not work for
> (__NR_*_base + offset) like __NR_* definitions (used by ARM and MIPS), a simple
> interpreter is required for such cases and it is more expensive than atoi().
>
> /* not for ARM and MIPS */
>
> static int atoi(const char *s);
> #define __get_nr(name) __nr_atoi(__stringify(__NR_##name))
> #define __nr_atoi(str) (str[0] == '_' ? -1L : ___nr_atoi(str))
> #define ___nr_atoi(str) (str[0] == '(' ? -1L : atoi(str))
>
> Welcome more discussion or let's simply throw away this direction ;-)
>
> But it may really help us to drop tons of duplicated code pieces like this:
>
> #ifdef __NR_xxxx
> ...
> #else
> return -ENOSYS;
> #endif
>
> David, Thomas and Arnd, any inspiration on this, or is this really impossible
> (or make things worse) in language level? ;-)
>
> What I'm thinking about is something like this or similar (As Willy commented
> before, the __sysdef() itself is not that good, please ignore itself, the core
> target here is using a single -ENOSYS return for all of the undefined
> branches):
>
> #define __sysdef(name, ...) \
> (__is_nr_defined(__NR_##name) ? my_syscall(__get_nr(name), ##__VA_ARGS__) : (long)-ENOSYS)
>
> Or as Arnd replied in an old email thread before, perhaps the whole #ifdef's
> code piece (and even the input types and return types of sys_*) above can be
> generated from .tbl or the generic unistd.h automatically in the sysroot
> installation stage?

To be honest I don't see a problem with the current aproach.
It is very obvious what is going on, the same pattern is used by other
projects and the "overhead" is very small.


It seems the macros will only work for simple cases which only test the
availability of a single syscall number.

Of these we currently only have:
gettimeofday(), lseek(), statx(), wait4()

So in it's current form we save 4 * 4 = 16 lines of code.
The proposed solution introduces 14 + 2 (empty) = 16 lines of new code,
and a bunch of mental overhead.

In case multiple underlying syscalls can be used these take different
arguments which a simple macro won't be able to encode sanely.

Thomas