Re: [RFC 0/2] erofs: introduce bloom filter for xattr

From: Jingbo Xu
Date: Fri Jun 23 2023 - 21:33:12 EST




On 6/22/23 3:56 PM, Alexander Larsson wrote:
> On Thu, Jun 22, 2023 at 8:37 AM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote:
>>
>>
>>
>> On 6/21/23 7:50 PM, Alexander Larsson wrote:
>>> On Wed, Jun 21, 2023 at 10:32 AM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote:
>>>>
>>>> Background
>>>> ==========
>>>> Filesystems with ACL enabled generally need to read
>>>> "system.posix_acl_access"/"system.posix_acl_default" xattr to get the
>>>> access and default ACL. When filesystem is mounted with ACL enabled
>>>> while files in the system have not set access/default ACL, the getattr()
>>>> will run in vain while the round trip can decrease the performance in
>>>> workload like "ls -lR".
>>>>
>>>> For example, there's a 12% performance boost if erofs is mounted with
>>>> "noacl" when running "ls -lR" workload on dataset [1] (given in [2]).
>>>>
>>>> We'd better offer a fastpath to boost the above workload, as well as
>>>> other negative xattr lookup.
>>>>
>>>>
>>>> Proposal
>>>> ========
>>>> Introduce a per-inode bloom filter for xattrs to boost the negative
>>>> xattr queries.
>>>>
>>>> As following shows, a 32-bit bloom filter is introduced for each inode,
>>>> describing if a xattr with specific name exists on this inode.
>>>>
>>>> ```
>>>> struct erofs_xattr_ibody_header {
>>>> - __le32 h_reserved;
>>>> + __le32 h_map; /* bloom filter */
>>>> ...
>>>> }
>>>> ```
>>>>
>>>> Following are some implementation details for bloom filter.
>>>>
>>>> 1. Reverse bit value
>>>> --------------------
>>>> The bloom filter structure describes if a given data is inside the set.
>>>> It will map the given data into several bits of the bloom filter map.
>>>> The data must not exist inside the set if any mapped bit is 0, while the
>>>> data may be not inside the set even if all mapped bits is 1.
>>>>
>>>> While in our use case, as erofs_xattr_ibody_header.h_map is previously a
>>>> (all zero) reserved field, the bit value for the bloom filter has a
>>>> reverse semantics in consideration for compatibility. That is, for a
>>>> given data, the mapped bits will be cleared to 0. Thus for a previously
>>>> built image without support for bloom filter, the bloom filter is all
>>>> zero and when it's mounted by the new kernel with support for bloom
>>>> filter, it can not determine if the queried xattr exists on the inode and
>>>> thus will fallback to the original routine of iterating all on-disk
>>>> xattrs to determine if the queried xattr exists.
>>>>
>>>>
>>>> 2. The number of hash functions
>>>> -------------------------------
>>>> The optimal value for the number of the hash functions (k) is (ln2 *
>>>> m/n), where m stands the number of bits of the bloom filter map, while n
>>>> stands the number of all candidates may be inside the set.
>>>>
>>>> In our use case, the number of common used xattr (n) is approximately 8,
>>>> including system.[posix_acl_access|posix_acl_default],
>>>> security.[capability|selinux] and
>>>> security.[SMACK64|SMACK64TRANSMUTE|SMACK64EXEC|SMACK64MMAP].
>>>>
>>>> Given the number of bits of the bloom filter (m) is 32, the optimal value
>>>> for the number of the hash functions (k) is 2 (ln2 * m/n = 2.7).
>>>
>>> This is indeed the optimal value in a traditional use of bloom
>>> filters. However, I think it is based on a much larger set of values.
>>> For this usecase it may be better to choose a different value.
>>>
>>> I did some research a while ago on this, and I thought about the
>>> counts too. Having more than one hash function is useful because it
>>> allows you to avoid problems if two values happen to hash to the same
>>> bucket, but this happens at the cost of there being less "unique
>>> buckets". I spent some time looking for common xattr values
>>> (including some from userspace) and ended up with a list of about 30.
>>
>> Yeah, if the number of common used xattr (n) is 30, then the optimal
>> value for the number of the hash functions (k) is 1 (ln2 * m/n = 0.74).
>> The optimal value in theory also matches our intuition.
>>
>>
>>> If we can choose a single hash function that maps all (or most) of
>>> these to a unique bucket (mod 32),
>>
>> Excellent research! Would you mind sharing the list of these
>> approximately 30 commonly used xattrs, so that I could check if they are
>> mapped to unique bucket with the single hash function we proposed?
>
> This is the list I came up with:
>
> trusted.overlay.opaque
> trusted.overlay.redirect
> trusted.overlay.origin
> trusted.overlay.impure
> trusted.overlay.nlink
> trusted.overlay.upper
> trusted.overlay.metacopy
> trusted.overlay.protattr
> user.overlay.opaque
> user.overlay.redirect
> user.overlay.origin
> user.overlay.impure
> user.overlay.nlink
> user.overlay.upper
> user.overlay.metacopy
> user.overlay.protattr
> security.evm
> security.ima
> security.selinux
> security.SMACK64
> security.SMACK64IPIN
> security.SMACK64IPOUT
> security.SMACK64EXEC
> security.SMACK64TRANSMUTE
> security.SMACK64MMAP
> security.apparmor
> security.capability
> system.posix_acl_access
> system.posix_acl_default
> user.mime_type
>

Got it. Thanks a lot!

--
Thanks,
Jingbo