Re: [RFC PATCH] mm: swapfile: fix SSD detection with swapfile on btrfs

From: Johannes Weiner
Date: Tue Mar 26 2024 - 08:51:53 EST


Hi Ying,

Thanks for taking a look!

On Tue, Mar 26, 2024 at 01:47:45PM +0800, Huang, Ying wrote:
> Johannes Weiner <hannes@xxxxxxxxxxx> writes:
> > +static struct swap_cluster_info *setup_clusters(struct swap_info_struct *p,
> > + unsigned char *swap_map)
> > +{
> > + unsigned long nr_clusters = DIV_ROUND_UP(p->max, SWAPFILE_CLUSTER);
> > + unsigned long col = p->cluster_next / SWAPFILE_CLUSTER % SWAP_CLUSTER_COLS;
> > + struct swap_cluster_info *cluster_info;
> > + unsigned long i, j, k, idx;
> > + int cpu, err = -ENOMEM;
> > +
> > + cluster_info = kvcalloc(nr_clusters, sizeof(*cluster_info), GFP_KERNEL);
> > if (!cluster_info)
> > - return nr_extents;
> > + goto err;
> > +
> > + for (i = 0; i < nr_clusters; i++)
> > + spin_lock_init(&cluster_info[i].lock);
> >
> > + p->cluster_next_cpu = alloc_percpu(unsigned int);
> > + if (!p->cluster_next_cpu)
> > + goto err_free;
> > +
> > + /* Random start position to help with wear leveling */
> > + for_each_possible_cpu(cpu)
> > + per_cpu(*p->cluster_next_cpu, cpu) =
> > + get_random_u32_inclusive(1, p->highest_bit);
> > +
> > + p->percpu_cluster = alloc_percpu(struct percpu_cluster);
> > + if (!p->percpu_cluster)
> > + goto err_free;
> > +
> > + for_each_possible_cpu(cpu) {
> > + struct percpu_cluster *cluster;
> > +
> > + cluster = per_cpu_ptr(p->percpu_cluster, cpu);
> > + cluster_set_null(&cluster->index);
> > + }
> > +
> > + /*
> > + * Mark unusable pages as unavailable. The clusters aren't
> > + * marked free yet, so no list operations are involved yet.
> > + */
> > + for (i = 0; i < round_up(p->max, SWAPFILE_CLUSTER); i++)
> > + if (i >= p->max || swap_map[i] == SWAP_MAP_BAD)
> > + inc_cluster_info_page(p, cluster_info, i);
>
> If p->max is large, it seems better to use an loop like below?
>
> for (i = 0; i < swap_header->info.nr_badpages; i++) {
> /* check i and inc_cluster_info_page() */
> }
>
> in most cases, swap_header->info.nr_badpages should be much smaller than
> p->max.

Yes, it's a little crappy. I've tried to not duplicate the smarts from
setup_swap_map_and_extents() to avoid bugs if they go out of
sync. Consulting the map directly is a bit more robust. Right now it's
the badpages, but also the header at map[0], that needs to be marked.

But you're right this could be slow with big files. I can send an
update and add a comment to keep the functions in sync.