Re: clangd cannot handle tree_nocb.h

From: Joel Fernandes
Date: Sat Apr 29 2023 - 02:19:58 EST


Fangrui/Florent/David/All:

Sorry for the late reply, I got distracted.

On Wed, Apr 19, 2023 at 4:02 PM Fangrui Song <maskray@xxxxxxxxxx> wrote:
>
> On Wed, Apr 19, 2023 at 6:00 AM Florent Revest <revest@xxxxxxxxxx> wrote:
> >
> > On Tue, Apr 18, 2023 at 6:28 PM Nick Desaulniers
> > <ndesaulniers@xxxxxxxxxx> wrote:
> > >
> > > + Florent
> >
> > Hi there!
> >
> > > Joel, Florent is doing some cool stuff with clangd and kernels (check
> > > out the demo at go/linux-kernel-vscode). I'm pushing Florent to
> >
> > Apologies for folks outside Google, this is an internal link to a
> > kernel dev setup I originally created for myself, then for my team and
> > apparently more and more people are starting to use it internally. :)
> > If there's enough appetite for it externally too, I'll try to
> > open-source it someday. Anyway, in the context of this conversation,
> > it's just something that uses clangd. :)
> >
> > > publish some of the cool stuff he's been working on externally because
> > > it is awesome. ;)
> > >
> > > Florent, have you seen any such issues such as what Joel reported below?
> >
> > Yes, I've seen this problem a bunch of times. Afaiu, Clangd operates
> > under the assumption that every source file is a valid compilation
> > unit. My understanding is that it's generally a good practice to keep
> > things that way and I wouldn't be surprised if the userspace Chrome
> > code-base Joel saw enforces this (iirc, it's a rule for
> > Google-internal C++ too, all headers must be interpretable
> > independently).
> >
> > However, from the perspective of the C spec, anything can be included
> > anywhere and a C file can only make sense if it's included
> > after/before certain other things are defined/included. Spontaneously,
> > I would call these ".inc" rather than ".h" or ".c" because I would
> > expect a source file to be always valid and this suffix makes it
> > clearer they depend on their context, but as a matter of fact source
> > files that don't compile when interpreted individually are quite
> > common in the kernel tree. Other examples that have been reported to
> > me include a lot of kernel/sched/*, since many of these files
> > (including .c files) are included from kernel/sched/build_policy.c in
> > a specific order to form one big compilation unit.
> >
> > Unfortunately, I don't know of any solution. :( This feels like a
> > limit of C or compile_commands.json to me. "compile commands" can not
> > be enough to interpret any file, clangd would need a way to express
> > "interpret this file as if it were included in that spot of that
> > compilation unit" and maybe even need a bunch of heuristics to choose
> > one such include spot.
> >
> > I don't know if clangd has any plan to address this and so far I've
> > just lived with these error squiggles.
> >
>
> Some information about Clang based language servers
>
> It's good practice to ensure that a header file is self-contained. If
> not, for example, if a.c includes a.h, which is not self-contained,
> a.h is generally compiled on its own (as if using clang [options] a.h)
> to confine diagnostics/completion results to a.h and not to other
> headers included by a.c.
>
> However, this design choice may cause language servers to emit
> diagnostics that you don't see when building the project.
>
> For my tool ccls, the index file for an included file (e.g. a.h) is
> computed when compiling the main source file (a.c). Therefore, if the
> project builds successfully, the index is always accurate.
>
> Language servers usually have the limitation that only one
> configuration of a main source file is recorded, e.g., you only get
> index data for either clang -DA=1 a.c or clang -DA=2 a.c, not both.
> This limitation exists due to technical challenges and practicality.
> If a main source file has 10 configurations, when you edit the file,
> do you compile it 10 times to get all indexes?

Indeed regarding the 10 configs points you brought up. The .h not
being 1:1 with compilation units makes it a similar problem.

For example, if I edit a.h and a.h is included in both b.c and c.c.
Then, in order to know if edits I am making to a.h really will not
lead to build errors, clangd has to compile both b.c and c.c
separately. Why?

Because a.h's inclusion in b.c may build successfully, but in c.c may
not! This generalizes to any number of source files.

However, I was thinking of a scheme such as the following:

Have a file class Clangd.yaml in the source directory of a .h. In this
file, clearly state how to build the .h. For the above example, for
changes to a.h -- the YAML will say include a.h and then b.c , and
then build the combined thing. Then a script like the one building
compile_commands.json can use the YAML to generate something useful
for the .h That may not work for every usecase such as the one David
pointed out, but at least it will be somewhat useful for most use
cases -- versus what we currently do for the issue (nothing).

But even if you edit the .h, the compiler errors typically end up
showing up in the .c files. So even clangd says, there is a build
error in the combined result -- can it really pinpoint where in the .h
is the issue? My observation is clangd will point errors within the
thing that is being compiled (i.e. the .c file) and not the include
file.

Other Thoughts? On the process side of things, is there a place I can
file a bug report where some of this may be interesting to a team
working on this?

- Joel