Re: [PATCH v2 01/12] tools/nolibc: rename arch-<ARCH>.h to <ARCH>/arch.h

From: Thomas Weißschuh
Date: Mon Jul 10 2023 - 11:53:12 EST


On 2023-07-10 15:23:40+0800, Zhangjin Wu wrote:
> > On Sat, Jul 08, 2023 at 11:26:42PM +0800, Zhangjin Wu wrote:

> [..]

> > > As a preparation, this creates the architecture specific directory and
> > > moves tools/include/nolibc/arch-<ARCH>.h to
> > > tools/include/nolibc/<ARCH>/arch.h.
> >
> > I'm sorry but I still don't understand what it *provides*. I'm reading
> > it as "we *can* do this so let's do it". But what is the specific
> > purpose of adding this extra directory structure ? It's really unclear
> > to me and worries me that it'll only result in complicating maintenance
> > by adding even more files, thus even more "include" lines and cross
> > dependencies.
>
> Willy, I was assuming you had a look at the discussion between Thomas
> and me, so, I didn't add the link to our discussion, it is more about
> the 'clarity' of code "include" [1].
>
> I have proposed the idea in the discussion but got no response yet, so,
> sent this revision for more discussion, obviously, it is better to
> discuss more there and get more feedback from Thomas and you.

To be honest I got overwhelmed at some point and instead of figuring out
to which series' I already responded and which not I only responded to
those where I had time to do so immediately.

Keeping the amount of in-flight serieses lower would help this.

> The v0 included "crt.h" before "arch.h", Thomas suggested me include
> "crt.h" in arch_<ARCH>.h, just like the "compiler.h" did. His suggestion
> did inspire me to think about how to treat the relationship among crt.h,
> sys.h and arch.h.
>
> The idea behind is we have many directions to divide nolibc to different
> parts/modules:
>
> - one is arch specific (arch.h) and non-arch specific (the others)
>
> This method is used by us currently, It is very good to put all of the
> arch specific parts together to simplify (in the files to be
> added/maintained) the porting of a new architecture.
>
> But to be honest, It also confuse the modularity a little, for
> example, like sys.h, crt.h should be a core function/feature of
> nolibc, arch.h is not so. arch.h only provides the necessary minimal
> assembly "pieces".
>
> both sys.h and crt.h are not a sub modules of arch.h (although they
> have minimal arch specific code), so, like sys.h, crt.h should be
> included in the top-level headers, not in arch.h, reversely, the
> minimal arch specific should be included in crt.h. To do so and to
> avoid include the non-crt part, the split of arch.h is required, and
> therefore, the <ARCH>/ is created to put the divided <ARCH>/sys.h and
> <ARCH>/crt.h, otherwise, there will be many sys-<ARCH>.h and
> crt-<ARCH>.h in the top-level directory of nolibc.
>
> - another is the parallel functions/features (like crt.h, sys.h, stack protector ...)
>
> This is used by musl and glibc, before sending this proposal, I have
> taken a look at both of them, musl is simpler and clearer, we apply
> the similar method:
>
> musl:
> crt/crt1.c
> #include "crt_arch.h" /* arch/<ARCH>/crt_arch.h */

In musl crt_arch.h seems to be used in different ways. So it makes sense
to split it from syscall_arch.h. In nolibc there is no such distinction.
And everything will end up in a global namespace anyways.

> void _start_c(long *p)
> {
> int argc = p[0];
> char **argv = (void *)(p+1);
> ...
> }
>
> src/internal/syscall.h:
> ##include "syscall_arch.h" /* arch/<ARCH>/syscall_arch.h */
>
> ...
>
> glibc: (it is more complicated than musl)
>
> csu/libc-start.c, sysdeps/<ARCH>/start.S
>
> sysdeps/unix/sysv/linux/sysdep.h, sysdeps/unix/sysv/linux/<ARCH>/sysdep.h,
>
>
> With this method, the "crt_arch.h + crt.h" together provide the C
> RunTime (startup code, stack protector, environ and _auxv currently)
> function, the "sys_arch.h + sys.h" together provide the syscall
> definitions. The arch specific parts are hidden behind, and only
> require to include the crt_arch.h in crt.h and sys_arch.h in sys.h, no
> need to include the whole arch.h for all.
>
> As a summary, the core obvious reason here is, to this crt.h itself, it
> is ok for us to include crt.h in arch.h in code side, but reversely, I
> do prefer to include arch.h (and therefore the crt_arch.h) in crt.h,
> crt.h is the core function should be exported, arch.h is not, it only
> provide some low-level helpers for crt.h. If we treat sys.h as a core
> function and crt.h as a arch specific thing, it does confuse a little.
> This reorg may also help the similar future functions who require arch
> specific support, but of course, it does require to add/maintain more
> files for a new architecture, but it also allow to develop/debug at a
> smaller fineness.
>
> In current stage, include crt.h in arch.h is not that unacceptable, but

Why would it be more unacceptable in the future?

> if reorg is a better direction, why not do it currently, because we do
> have two functions (crt.h and sys.h) in <ARCH>/, if only one, it is not
> urgent ;-)

> Is this explanation better than before? welcome to discuss more ;-)

Personally I'm not convinced :-)

The arch-specific code in nolibc in mainline is currentl ~200 lines per
arch. With this series in general it will be even less.
If at some point there are many more architectures it may make sense to
introduce an arch/ directory then.

> Like musl, if required, another top-level arch/ may be required to put
> all of the <ARCH>/ directories together to clean up the top-level nolibc
> directory.

At the moment in mainline there are 26 files in nolibc.
That does not seem excessive, in fact it looks to be less than most
other kernel directories.

> > Zhangjin, very often in your series, the justification for a change is
> > missing, instead it's only explaining what is being changed, and I must
> > confess that it makes it particularly difficult to figure the benefits.
> > I'm only seeing this as an opportunity for a change ("can be split").
> > I could have missed something of course, but I can't figure what problem
> > it is trying to solve.
>
> Willy, thanks very much for pointing out this and so sorry, "commit
> message should tell why but not how" is in my mind, but sometimes, it
> may be lost especially when the change list are 'huge' (must improve).
>
> In reality, It is a little difficult to just explain it at the right
> dimension for this change, so I have wrotten several versions of the
> commit message for this change locally (and also for the other changes
> too), at last, I choose the one currently used.
>
> As explained in another reply, it is really hard to write a just ok
> commit message for every change, sometimes, the justification is
> 'obvious' to some develoers who have the background information or who
> have dicussed together, sometimes, it is not easy to just write
> precisely about the core justification, to improve this, here is the
> list I plan to do, hope it help the others too:
>
> - discuss more before send new patches, especially for 'new' or 'big'
> change

This sounds like the correct thing to do for reorganization patches.
It should be very easy to describe and very annoying to actually do.

> - reword carefully about every change before sending patches (show
> benefits to let maintainer 'buy' (merge) them)
>
> - send the patches not frequently, keep the mind conscious, not like a
> "flood"

> [..]