Re: [PATCH] erofs: fix wrong primary bvec selection on deduplicated extents

From: Yue Hu
Date: Wed Jul 19 2023 - 21:22:19 EST


On Wed, 19 Jul 2023 14:54:59 +0800
Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> wrote:

> When handling deduplicated compressed data, there can be multiple
> decompressed extents pointing to the same compressed data in one shot.
>
> In such cases, the bvecs which belong to the longest extent will be
> selected as the primary bvecs for real decompressors to decode and the
> other duplicated bvecs will be directly copied from the primary bvecs.
>
> Previously, only relative offsets of the longest extent was checked to
> decompress the primary bvecs. On rare occasions, it can be incorrect
> if there are several extents with the same start relative offset.
> As a result, some short bvecs could be selected for decompression and
> then cause data corruption.
>
> For example, as Shijie Sun reported off-list, considering the following
> extents of a file:
> 117: 903345.. 915250 | 11905 : 385024.. 389120 | 4096
> ...
> 119: 919729.. 930323 | 10594 : 385024.. 389120 | 4096
> ...
> 124: 968881.. 980786 | 11905 : 385024.. 389120 | 4096
>
> The start relative offset is the same: 2225, but extent 119 (919729..
> 930323) is shorter than the others.
>
> Let's restrict the bvec length in addition to the start offset if bvecs
> are not full.
>
> Reported-by: Shijie Sun <sunshijie@xxxxxxxxxx>
> Fixes: 5c2a64252c5d ("erofs: introduce partial-referenced pclusters")
> Signed-off-by: Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx>

Reviewed-by: Yue Hu <huyue2@xxxxxxxxxxx>