Re: [PATCH bpf-next v2] bpf/docs: Document kfunc lifecycle / stability expectations

From: David Vernet
Date: Fri Feb 03 2023 - 09:47:33 EST


On Fri, Feb 03, 2023 at 02:04:10PM +0100, Toke Høiland-Jørgensen wrote:
> David Vernet <void@xxxxxxxxxxxxx> writes:
>
> > BPF kernel <-> kernel API stability has been discussed at length over
> > the last several weeks and months. Now that we've largely aligned over
> > kfuncs being the way forward, and BPF helpers being considered frozen,
> > it's time to document the expectations for kfunc lifecycles and
> > stability so that everyone (BPF users, kfunc developers, and
> > maintainers) are all aligned, and have a crystal-clear understanding of
> > the expectations surrounding kfuncs.
> >
> > To do that, this patch adds that documentation to the main kfuncs
> > documentation page via a new 'kfunc lifecycle expectations' section. The
> > patch describes how decisions are made in the kernel regarding whether
> > to include, keep, deprecate, or change / remove a kfunc. As described
> > very overtly in the patch itself, but likely worth highlighting here:
> >
> > "kfunc stability" does not mean, nor ever will mean, "BPF APIs may block
> > development elsewhere in the kernel".
> >
> > Rather, the intention and expectation is for kfuncs to be treated like
> > EXPORT_SYMBOL_GPL symbols in the kernel. The goal is for kfuncs to be a
> > safe and valuable option for maintainers and kfunc developers to extend
> > the kernel, without tying anyone's hands, or imposing any kind of
> > restrictions on maintainers in the same way that UAPI changes do.
> >
> > In addition to the 'kfunc lifecycle expectations' section, this patch
> > also adds documentation for a new KF_DEPRECATED kfunc flag which kfunc
> > authors or maintainers can choose to add to kfuncs if and when they
> > decide to deprecate them. Note that as described in the patch itself, a
> > kfunc need not be deprecated before being changed or removed -- this
> > flag is simply provided as an available deprecation mechanism for those
> > that want to provide a deprecation story / timeline to their users.
> > When necessary, kfuncs may be changed or removed to accommodate changes
> > elsewhere in the kernel without any deprecation at all.
> >
> > Signed-off-by: David Vernet <void@xxxxxxxxxxxxx>
>
> Some comments below, but otherwise please add my:
>
> Co-developed-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx>
>
> should we Cc the next version to linux-api@vger as well just to get a
> bit more visibility in case others have comments?

Yes, good idea, will do.

>
> > ---
> > Documentation/bpf/kfuncs.rst | 138 +++++++++++++++++++++++++++++++++--
> > 1 file changed, 133 insertions(+), 5 deletions(-)
> >
> > diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> > index 0bd07b39c2a4..4135f3111b67 100644
> > --- a/Documentation/bpf/kfuncs.rst
> > +++ b/Documentation/bpf/kfuncs.rst
> > @@ -13,7 +13,7 @@ BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
> > kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
> > kfuncs do not have a stable interface and can change from one kernel release to
> > another. Hence, BPF programs need to be updated in response to changes in the
> > -kernel.
> > +kernel. See :ref:`BPF_kfunc_lifecycle_expectations` for more information.
> >
> > 2. Defining a kfunc
> > ===================
> > @@ -238,6 +238,32 @@ single argument which must be a trusted argument or a MEM_RCU pointer.
> > The argument may have reference count of 0 and the kfunc must take this
> > into consideration.
> >
> > +.. _KF_deprecated_flag:
> > +
> > +2.4.9 KF_DEPRECATED flag
> > +------------------------
> > +
> > +The KF_DEPRECATED flag is used for kfuncs which are expected to be changed or
> > +removed in a subsequent kernel release. Deprecated kfuncs may be removed at any
> > +time, though if possible (and when applicable), developers are encouraged to
> > +provide users with a deprecation window to ease the burden of migrating off of
> > +the kfunc.
>
>
>
> I think the "may be removed at any time" is a bit odd here. If someone
> wants to just remove a kfunc, why bother with the deprecation flag at
> all? Besides, that whole "deprecation is optional" bit is explained

I definitely agree that the phrasing could be improved, but my intention
with that phrase was actually to answer the exact nuanced question you
posed. I think we need to clarify that KF_DEPRECATED is an optional tool
that developers can use to provide a softer deprecation story to their
users, rather than a flag that exists as part of a larger, strict
deprecation policy. Otherwise, I think it would be unclear to someone
reading this when and why they would need to use the flag. I liked your
suggestion below, and proposed a bit of extra text to try and capture
this point. Lmk what you think.

> below, in this section we're just explaining the flag. So I'd just drop
> this bit and combine the first two paragraphs as:
>
> "The KF_DEPRECATED flag is used for kfuncs which are scheduled to be
> changed or removed in a subsequent kernel release. A kfunc that is
> marked with KF_DEPRECATED should also have any relevant information
> captured in its kernel doc. Such information typically includes the
> kfunc's expected remaining lifespan, a recommendation for new
> functionality that can replace it if any is available, and possibly a
> rationale for why it is being removed."

I like this -- are you OK with adding this in a small subsequent
paragraph to address the point I made above?

Note that KF_DEPRECATED is simply a tool that developers can choose to
use to ease their users' burden of migrating off of a kfunc. While
developers are encouraged to use KF_DEPRECATED to provide a
transitionary deprecation period to users of their kfunc, doing so is
not strictly required, and the flag does not exist as part of any larger
kfunc deprecation policy.

>
> > +Note that while on some occasions, a KF_DEPRECATED kfunc may continue to be
> > +supported and have its KF_DEPRECATED flag removed, it is likely to be far more
> > +difficult to remove a KF_DEPRECATED flag after it's been added than it is to
> > +prevent it from being added in the first place. As described in
> > +:ref:`BPF_kfunc_lifecycle_expectations`, users that rely on specific kfuncs are
> > +highly encouraged to make their use-cases known as early as possible, and
>
> nit: "highly encouraged" reads a bit like overuse of "very" - just "encouraged"?

Agreed, I'll use "encouraged" here.

>
> > +participate in upstream discussions regarding whether to keep, change,
> > +deprecate, or remove those kfuncs if and when such discussions occur.
> > +
> > 2.5 Registering the kfuncs
> > --------------------------
> >
> > @@ -304,14 +330,116 @@ In order to accommodate such requirements, the verifier will enforce strict
> > PTR_TO_BTF_ID type matching if two types have the exact same name, with one
> > being suffixed with ``___init``.
> >
> > -3. Core kfuncs
> > +.. _BPF_kfunc_lifecycle_expectations:
> > +
> > +3. kfunc lifecycle expectations
> > +===============================
> > +
> > +kfuncs provide a kernel <-> kernel API, and thus are not bound by any of the
> > +strict stability restrictions associated with kernel <-> user UAPIs. Instead,
> > +they're modeled more similarly to EXPORT_SYMBOL_GPL, and can therefore be
>
> nit: "Instead, they're modeled more similarly to" -> "This means they
> can be thought of as similar to" ? ("more similarly" is terrible :P)

Heh, I agree that it doesn't sound right, so much so that I googled it
because of how odd it sounded in my head after repeating it a few times:
[1]. Even though it's apparently correct grammar, I'll still apply your
suggestion to v3 because I think it reads more clearly.

[1]: https://english.stackexchange.com/questions/253073/is-using-more-similar-incorrect

>
> > +modified or removed by a maintainer of the subsystem they're defined in when
> > +it's deemed necessary.
> > +
> > +Like any other change to the kernel, maintainers will not change or remove a
> > +kfunc without having a reasonable justification. Whether or not they'll choose
> > +to change a kfunc will ultimately depend on a variety of factors, such as how
> > +widely used the kfunc is, how long the kfunc has been in the kernel, whether an
> > +alternative kfunc exists, what the norm is in terms of stability for the
> > +subsystem in question, and of course what the technical cost is of continuing
> > +to support the kfunc.
> > +
> > +There are several implications of this:
> > +
> > +a) kfuncs that are widely used or have been in the kernel for a long time will
> > + be more difficult to justify being changed or removed by a maintainer. Said
> > + in a different way, kfuncs that are known to have a lot of users and provide
>
> nit: "said in a different way" -> "in other words" ?

Ack

>
> > + significant value provide stronger incentives for maintainers to invest the
> > + time and complexity in supporting them. It is therefore important for
> > + developers that are using kfuncs in their BPF programs to communicate and
> > + explain how and why those kfuncs are being used, and to participate in
> > + discussions regarding those kfuncs when they occur upstream.
> > +
> > +b) Because many BPF programs are not upstreamed as part of the kernel tree, it
> > + is often not possible to change them in-place when a kfunc changes, as it is
> > + for e.g. an upstreamed driver being updated in place when an
> > + EXPORT_SYMBOL_GPL symbol is changed. Distributions that bundle BPF programs
> > + that use kfuncs must therefore ensure that those BPF programs are linking
> > + against the kfuncs that are supported by the kernel version being used for
> > + any given release. Additionally, BPF developers are encouraged to upstream
> > + their BPF programs so they can enjoy the same benefits as upstreamed
> > + modules, and avoid code churn.
> > +
> > + On the other hand, while the hope is that it will become the norm to
> > + upstream BPF programs, the reality is that most BPF programs are still
> > + out-of-tree. This means that users with out-of-tree BPF programs that use
> > + kfuncs should be considered relevant to discussions and decisions around
> > + modifying and removing kfuncs, despite that not being the norm for
> > + out-of-tree kernel modules. The BPF community will take an active role in
> > + participating in upstream discussions when necessary to ensure that the
> > + perspectives of such users are taken into account.
>
> As I said in a previous email, I really don't think encouraging people
> to upstream BPF programs are either realistic of desirable. I think we
> should drop that and change this point to something like:

Fair enough. Donald said something similar in [2] as well. I'd like to
see us eventually change this paradigm over time, but we can (and
probably should) certainly decouple that from this patch.

[2]: https://lore.kernel.org/all/m2sffnvxbw.fsf@xxxxxxxxx/

>
> b) Unlike regular kernel symbols marked with EXPORT_SYMBOL_GPL, BPF
> programs that call kfuncs are generally not part of the kernel tree.
> This means that refactoring can not generally change callers in-place
> when a kfunc changes, as it is done for e.g. an upstreamed driver being
> updated in place when kernel symbol is changed.
>
> Unlike with regular kernel symbols, this is expected behaviour for
> BPF symbols, and out-of-tree BPF programs that use kfuncs should be
> considered relevant to discussions and decisions around modifying and
> removing kfuncs. The BPF community will take an active role in
> participating in upstream discussions when necessary to ensure that
> the perspectives of such users are taken into account.

Well put, will apply in v3.

>
> > +c) A kfunc will never have any hard stability guarantees. BPF APIs cannot and
> > + will not ever hard-block a change in the kernel purely for stability
> > + reasons. In other words, kfuncs have the same stability guarantees as any
> > + other kernel API, such as those provided by EXPORT_SYMBOL_GPL, though with
> > + perhaps less burden than EXPORT_SYMBOL_GPL changes thanks to BPF CO-RE.
>
> I'd drop the last sentence (from "In other words..."). It's not true
> that kfuncs have "the same stability guarantees", we just said above
> that out-of-tree BPF programs are relevant. Also, other than that I
> don't think having this sentence here adds anything that's not already
> explained below, so I'd just drop it and merge the below paragraph into
> the above.

Ack, will do.

>
> > + That being said, kfuncs are features that are meant to solve problems and
> > + provide value to users. The decision of whether to change or remove a kfunc
> > + is a multivariate technical decision that is made on a case-by-case basis,
> > + and which is informed by data points such as those mentioned above. It is
> > + expected that a kfunc being removed or changed with no warning will not be a
> > + common occurrence or take place without sound justification, but it is a
> > + possibility that must be accepted if one is to use kfuncs.
> > +
> > +3.1 kfunc deprecation
> > +---------------------
> > +
> > +As described above, while sometimes a maintainer may find that a kfunc must be
> > +changed or removed immediately to accommodate some changes in their subsystem,
> > +other kfuncs may be able to accommodate a longer and more measured deprecation
> > +process. For example, if a new kfunc comes along which provides superior
> > +functionality to an existing kfunc, the existing kfunc may be deprecated for
> > +some period of time to allow users to migrate their BPF programs to use the new
> > +one. Or, if a kfunc has no known users, a decision may be made to remove the
> > +kfunc (without providing an alternative API) after some deprecation period
> > +period so as to provide users with a window to notify the kfunc maintainer if
> > +it turns out that the kfunc is actually being used.
> > +
> > +kfuncs being deprecated (rather than changed or removed with no warning) is
> > +expected to be the common case, and as described in :ref:`KF_deprecated_flag`,
>
> reword as: "It's expected that the common case will be that kfuncs will
> go through a deprecation period rather than being changed or removed
> with not warning. As described in..."

Ack

>
> > +the kfunc framework provides the KF_DEPRECATED flag to kfunc developers to
> > +signal to users that a kfunc has been deprecated. Once a kfunc has been marked
> > +with KF_DEPRECATED, the following procedure is followed for removal:
> > +
> > +1. Any relevant information for deprecated kfuncs is documented in the kfunc's
> > + kernel docs. This documentation will typically include the kfunc's expected
> > + remaining lifespan, a recommendation for new functionality that can replace
> > + the usage of the deprecated function (or an explanation as to why no such
> > + replacement exists), etc.
> > +
> > +2. The deprecated kfunc is kept in the kernel for some period of time after it
> > + was first marked as deprecated. This time period will be chosen on a
> > + case-by-case basis, and will typically depend on how widespread the use of
> > + the kfunc is, how long it has been in the kernel, and how hard it is to move
> > + to alternatives. This deprecation time period is "best effort", and as
> > + described :ref:`above<BPF_kfunc_lifecycle_expectations>`, circumstances may
> > + sometimes dictate that the kfunc be removed before the full intended
> > + deprecation period has elapsed.
> > +
> > +3. After the deprecation period, or sometimes earlier if necessary, the kfunc
>
> drop "or sometimes earlier if necessary" - the deprecation period ends
> when the kfunc is removed, that's what a deprecation period means. If
> some factor means that the deprecation period is shortened, that's still
> the end of the deprecation period, by definition :)

Fair enough, will change in v3.

Thanks for all of the suggestions. I'll send out v3 later today, along
with your edits and Co-developed-by tag.

- David