Re: [PATCH 01/21] bcachefs: KEY_TYPE_accounting

From: Brian Foster
Date: Fri Mar 01 2024 - 10:01:32 EST


On Thu, Feb 29, 2024 at 04:24:37PM -0500, Kent Overstreet wrote:
> On Thu, Feb 29, 2024 at 01:43:15PM -0500, Brian Foster wrote:
> > Hmm.. I think the connection I missed on first look is basically
> > disk_accounting_key_to_bpos(). I think what is confusing is that calling
> > this a key makes me think of bkey, which I understand to contain a bpos,
> > so then overlaying it with a bpos didn't really make a lot of sense to
> > me conceptually.
> >
> > So when I look at disk_accounting_key_to_bpos(), I see we are actually
> > using the bpos _pad field, and this structure basically _is_ the bpos
> > for a disk accounting btree bkey. So that kind of makes me wonder why
> > this isn't called something like disk_accounting_pos instead of _key,
> > but maybe that is wrong for other reasons.
>
> hmm, I didn't consider calling it disk_accounting_pos. I'll let that
> roll around in my brain.
>
> 'key' is more standard terminology to me outside bcachefs, but 'pos'
> does make more sense within bcachefs.
>

Ok, so I'm not totally crazy at least. :)

Note again that wasn't an explicit suggestion, just that it seems more
logical to me based on my current understanding. I'm just trying to put
down my initial thoughts/confusions in hopes that at least some of this
triggers ideas for improvements...

> > Either way, what I'm trying to get at is that I think this documentation
> > would be better if it explained conceptually how disk_accounting_key
> > relates to bkey/bpos, and why it exists separately from bkey vs. other
> > key types, rather than (or at least before) getting into the lower level
> > side effects of a union with bpos.
>
> Well, that gets into some fun territory - ideally bpos would not be a
> fixed thing that every btree was forced to use, we'd be able to define
> different types per btree.
>

Ok, but this starts to sound orthogonal to the accounting bits. Since I
don't really grok why this is called a key, here's how I would add to
the existing documentation:

"Here, the key has considerably more structure than a typical key
(bpos); an accounting key is 'struct disk_accounting_key', which is a
union of bpos. We do this because disk_account_key actually is bpos for
the related bkey that ends up in the accounting btree.

This btree uses nontraditional bpos semantics because accounting btree
keys are indexed differently <reasons based on the counter
structures..?>. Yadda yadda..

Unlike with other key types, <continued existing comment> ...
"

Hm?

Brian

> And we're actually going to need to be able to do that in order to do
> configurationless autotiering - i.e. tracking how hot/cold data is on an
> inode:offset basis, because LRU btree backreferences need to go in the
> key (bpos), not the value, in order to avoid collisions, and bpos isn't
> big enough for that.
>
> disk_accounting_(key|pos) is an even trickier situation, because of
> endianness issues. The trick we do with bpos of defining the field order
> differently based on endianness so that byte order matches word order -
> that really wouldn't work here, so there is at present no practical way
> that I know of to avoid the byte swabbing when going back and forth
> between bpos and disk_accounting_pos on big endian.
>
> But gcc does have an attribute now that lets you specify that an integer
> struct member is big or little endian... I if we could get them to go
> one step further and give us an attribute to control whether members are
> laid out in ascending or descending order...
>