Re: [RFC] A first shot at asciidoc-based formatted docs

From: Jani Nikula
Date: Wed Feb 10 2016 - 09:03:47 EST



[Sorry this turned out a long email, I didn't have the time to write a
short one.]

On Wed, 10 Feb 2016, Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:
> On Wed, Feb 10, 2016 at 1:09 AM, Jonathan Corbet <corbet@xxxxxxx> wrote:
>> On Tue, 26 Jan 2016 14:08:45 +0200
>> Jani Nikula <jani.nikula@xxxxxxxxx> wrote:
>>
>>> I'm afraid we've done some overlapping work in the mean time, but I'm
>>> happy we've both looked at the tool chain, and can have a more
>>> meaningful conversation now.
>>
>> [Adding Keith since you said you wanted to be a part of this - let us know
>> when you've had enough!]
>>
>> So I've spent a bit of time looking at this, and quite a bit more time
>> talking with various folks at LCA. There is pretty much universal
>> agreement that this is interesting work and the direction we'd like to
>> go. My current hope is that we can merge some version of it for 4.6 and
>> see where it goes from there.
>>
>> So naturally I have some thoughts on the whole thing...
>>
>> - I would like to format directly to HTML if at all possible. It seems
>> it should be possible to get a table of contents into the files, and
>> the feedback I got was that a TOC would be enough for navigation - it
>> would not be necessary to split the files at that point. We might
>> still want to try to figure that out too, though. In any case, this
>> isn't a show stopper, in that we can change it anytime if a better way
>> shows up. But I'd like to have it in mind.
>
> I think for 4.6 it'd be best to go with the hybrid asciidoc->docbook
> toolchain, since that's less disruptive. And with that we can also
> fully concentrating on the frontend, and how it'll look and behave.

I'd like to clarify the end goal a bit more before deciding what to do
next. In particular, is the aim to have asciidoc->HTML only or dual
asciidoc->HTML and asciidoc->XML->whatever? Or independent
asciidoc->HTML first, with the existing DocBook on the side until
everything's converted? Something else?

Direct asciidoc->HTML has the problem I mentioned that there is no
chunked output. If the source is big (as-is or via asciidoc includes)
the output is big. The current gpu.tmpl turned way too big. We could
alleviate that by splitting the source documents into smaller pieces (in
gpu.tmpl case it's desirable no matter what), and tying them together
via cross-references and TOC rather than asciidoc includes.

The problem with this, in turn, is that I don't really know how
automatic cross-referencing between kernel-doc comments would turn out
then (e.g. i915 kernel-doc references a symbol in drm core kernel-doc
after gpu.tmpl split) as asciidoc would process the files
independently. A kernel-doc comment writer shouldn't have to know which
document the referenced symbol is in... We could do post-processing I
guess, but I'd really like to get rid of the homebrew aspects here.

Is it acceptable to have dead links when referencing symbols outside of
the document in question, for the time being, until someone figures out
a nice way to do this?

> Once that's solid we can look into the icing on the cake for later
> kernels I think.
>
>> - Asciidoc templates and processing should happen in a new directory
>> (perhaps imaginatively called "asciidoc"); having them in a directory
>> called "DocBook" seems a little weird. More importantly, though, I'd
>> like to separate them out as a fresh start, and not mess with the
>> existing DocBook templates until we decide we don't need them anymore.
>> If we could end up with a cleaner, simpler makefile in the process,
>> that would be a bonus.
>
> For the long term dream plan of including other .txt files from the
> existing pile of unstructured docs, do we really want a separate
> asciidoc directory? Or just .asciidoc as a special extension?

Also in my dream world you could have asciidoc files anywhere in the
Documentation tree, with a Makefile per directory identifying which ones
should be processed as asciidoc. I might even name them all .txt, and
you wouldn't have to rename existing "almost markup" plain text files to
have them processed, just fix the markup and update the Makefile. (FWIW
asciidoc suggests .txt extension, though asciidoctor suggests .adoc or
.asciidoc.) I think this would better promote a gradual transition to
lightweight markup, with easier to review patches. Also you mentioned
there's no structure under Documentation. Allowing asciidoc files
anywhere would, I think, help gradual restructuring.

The output could be a subdirectory (one per output format?) under
Documentation.

>> - I'm not sold on the new inclusion mechanism. Creating thousands of
>> little files and tracking them for dependencies and such doesn't seem
>> like a simplification or a path toward better performance. I would
>> like to at least consider keeping the direct-from-source inclusion.
>
> The motivation behind the new inclusion mechanism isn't the speed-up
> due to parallelization, but being able to use native asciidoc
> includes. With those you can pass options to e.g. shift the hierarchy.
> With that you can do subheadings in DOC: sections and then seamlessly
> include them. Or similar stuff.
>
> The speed-up due to parallelization is just a small bonus.
>
> Also generating thousands of files is totally not unheard of in the kernel:
>
> $ find include/config | wc -l
> 2623
>
> None of those are in git.

Yes, my main motivation here was to get rid of the preprocessing step
(currently tmpl->xml). I wanted to have the source documents in pure
markup which could be directly processed by asciidoc. I wanted to have
the editor markup helpers and syntax highlighting just work, with no
extra non-markup cruft to confuse it. (For example, emacs tells me the
current tmpl files are invalid XML because of the docproc directives.)
This ties back to the dream above; just have .txt files with no
preprocessing step, IMO it's less confusing for actually writing the
docs.

I didn't think there'd be anything weird about having thousands of
intermediate files generated from source files, with dependencies set
and working, just like we have .o files.

Sure, the mechanism is a proof-of-concept, rough around the edges, and
needs to stow away the intermediate files better, but I still think it's
a conceptually better approach than adding a layer of homebrew when we
have a chance to break away from that. And there's the bonus of getting
parallelization, which I think just backs the concept.

I did try to make asciidoc filters and plugins work for including
kernel-doc, which might have been a better match to what Jon wants, but
without docproc in between. I didn't quite manage to make that work, and
there's the problem they're both incompatible with asciidoctor.

>> - Insisting on EXPORT_SYMBOL being in the same file doesn't seem like
>> it's going to work for now; that could maybe change after Al's work
>> goes in, which could be fairly soon.
>
> Hm, assuming Al gets his stuff into 4.6 could we just assume this? It
> holds true for gpu docs already I think, and most other subsystems.
> The trouble iirc is all around asm and similar stuff, and we can't
> kerneldoc asm afaik.

I'd turn this around. IMO the problem isn't insisting EXPORT_SYMBOL is
in the same file as the definition of the symbol. The problem is
insisting that the kernel-doc comment is in the same file as the
EXPORT_SYMBOL and the definition. Particularly include/media has plenty
of kernel-doc in headers with the declarations.

If we can't insist on that, we could teach kernel-doc to scan a list of
other files for the EXPORT_SYMBOLs, instead of having that logic
externally in docproc. This should be trivial, especially if you know
perl. (Unfortunately this might get a little tricky with the include
syntax.)

This was mostly driven by the desire to get rid of the docproc
preprocessing step.

>> Please let me know your thoughts on the above. Do you think you can find
>> some time over the next month for this? I'll try to shake loose some time
>> too, but, well, $EXCUSES...

If we can come up with a plan where I can be reasonably sure the
polished effort isn't going down the drain... ;)

> One more thing we discussed: Did you ping kbuild folks already? Or
> want to get some agreement on the overall build process first?

I think CONFIG_BUILD_DOCSRC vs. having documentation targets directly in
Documentation/Makefile (instead of top level make issuing recursive make
in Documentation/DocBook/Makefile) should be reconciliated
somehow. Frankly, I find it odd that the hostprog targets under
Documentation seem to be better class citizens than documentation
targets. Not saying they can't both be there, but they should coexist.

BR,
Jani.



--
Jani Nikula, Intel Open Source Technology Center