Re: [RFC 0/2] erofs: introduce bloom filter for xattr

From: Alexander Larsson
Date: Thu Jun 22 2023 - 03:57:53 EST


On Thu, Jun 22, 2023 at 8:37 AM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote:
>
>
>
> On 6/21/23 7:50 PM, Alexander Larsson wrote:
> > On Wed, Jun 21, 2023 at 10:32 AM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote:
> >>
> >> Background
> >> ==========
> >> Filesystems with ACL enabled generally need to read
> >> "system.posix_acl_access"/"system.posix_acl_default" xattr to get the
> >> access and default ACL. When filesystem is mounted with ACL enabled
> >> while files in the system have not set access/default ACL, the getattr()
> >> will run in vain while the round trip can decrease the performance in
> >> workload like "ls -lR".
> >>
> >> For example, there's a 12% performance boost if erofs is mounted with
> >> "noacl" when running "ls -lR" workload on dataset [1] (given in [2]).
> >>
> >> We'd better offer a fastpath to boost the above workload, as well as
> >> other negative xattr lookup.
> >>
> >>
> >> Proposal
> >> ========
> >> Introduce a per-inode bloom filter for xattrs to boost the negative
> >> xattr queries.
> >>
> >> As following shows, a 32-bit bloom filter is introduced for each inode,
> >> describing if a xattr with specific name exists on this inode.
> >>
> >> ```
> >> struct erofs_xattr_ibody_header {
> >> - __le32 h_reserved;
> >> + __le32 h_map; /* bloom filter */
> >> ...
> >> }
> >> ```
> >>
> >> Following are some implementation details for bloom filter.
> >>
> >> 1. Reverse bit value
> >> --------------------
> >> The bloom filter structure describes if a given data is inside the set.
> >> It will map the given data into several bits of the bloom filter map.
> >> The data must not exist inside the set if any mapped bit is 0, while the
> >> data may be not inside the set even if all mapped bits is 1.
> >>
> >> While in our use case, as erofs_xattr_ibody_header.h_map is previously a
> >> (all zero) reserved field, the bit value for the bloom filter has a
> >> reverse semantics in consideration for compatibility. That is, for a
> >> given data, the mapped bits will be cleared to 0. Thus for a previously
> >> built image without support for bloom filter, the bloom filter is all
> >> zero and when it's mounted by the new kernel with support for bloom
> >> filter, it can not determine if the queried xattr exists on the inode and
> >> thus will fallback to the original routine of iterating all on-disk
> >> xattrs to determine if the queried xattr exists.
> >>
> >>
> >> 2. The number of hash functions
> >> -------------------------------
> >> The optimal value for the number of the hash functions (k) is (ln2 *
> >> m/n), where m stands the number of bits of the bloom filter map, while n
> >> stands the number of all candidates may be inside the set.
> >>
> >> In our use case, the number of common used xattr (n) is approximately 8,
> >> including system.[posix_acl_access|posix_acl_default],
> >> security.[capability|selinux] and
> >> security.[SMACK64|SMACK64TRANSMUTE|SMACK64EXEC|SMACK64MMAP].
> >>
> >> Given the number of bits of the bloom filter (m) is 32, the optimal value
> >> for the number of the hash functions (k) is 2 (ln2 * m/n = 2.7).
> >
> > This is indeed the optimal value in a traditional use of bloom
> > filters. However, I think it is based on a much larger set of values.
> > For this usecase it may be better to choose a different value.
> >
> > I did some research a while ago on this, and I thought about the
> > counts too. Having more than one hash function is useful because it
> > allows you to avoid problems if two values happen to hash to the same
> > bucket, but this happens at the cost of there being less "unique
> > buckets". I spent some time looking for common xattr values
> > (including some from userspace) and ended up with a list of about 30.
>
> Yeah, if the number of common used xattr (n) is 30, then the optimal
> value for the number of the hash functions (k) is 1 (ln2 * m/n = 0.74).
> The optimal value in theory also matches our intuition.
>
>
> > If we can choose a single hash function that maps all (or most) of
> > these to a unique bucket (mod 32),
>
> Excellent research! Would you mind sharing the list of these
> approximately 30 commonly used xattrs, so that I could check if they are
> mapped to unique bucket with the single hash function we proposed?

This is the list I came up with:

trusted.overlay.opaque
trusted.overlay.redirect
trusted.overlay.origin
trusted.overlay.impure
trusted.overlay.nlink
trusted.overlay.upper
trusted.overlay.metacopy
trusted.overlay.protattr
user.overlay.opaque
user.overlay.redirect
user.overlay.origin
user.overlay.impure
user.overlay.nlink
user.overlay.upper
user.overlay.metacopy
user.overlay.protattr
security.evm
security.ima
security.selinux
security.SMACK64
security.SMACK64IPIN
security.SMACK64IPOUT
security.SMACK64EXEC
security.SMACK64TRANSMUTE
security.SMACK64MMAP
security.apparmor
security.capability
system.posix_acl_access
system.posix_acl_default
user.mime_type

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Alexander Larsson Red Hat, Inc
alexl@xxxxxxxxxx alexander.larsson@xxxxxxxxx