Re: [PATCH 6/9] Remove xfs_icluster

From: Christoph Hellwig
Date: Fri Nov 23 2007 - 12:59:21 EST


On Thu, Nov 22, 2007 at 11:39:52AM +1100, David Chinner wrote:
> Remove the xfs_icluster structure and replace with a radix tree lookup.
>
> We don't need to keep a list of inodes in each cluster around anymore
> as we can look them up quickly when we need to. The only time we need
> to do this now is during inode writeback.
>
> Factor the inode cluster writeback code out of xfs_iflush and convert
> it to use radix_tree_gang_lookup() instead of walking a list of
> inodes built when we first read in the inodes.
>
> This remove 3 pointers from each xfs_inode structure and the xfs_icluster
> structure per inode cluster. Hence we reduce the cache footprint of the
> xfs_inodes by between 5-10% depending on cluster sparseness.
>
> To be truly efficient we need a radix_tree_gang_lookup_range() call
> to stop searching once we are past the end of the cluster instead
> of trying to find a full cluster's worth of inodes.

Nice, I like this a lot. I was wondering about something like this
already when you put in the radix-tree based inode cache.

> +STATIC int
> +xfs_iflush_cluster(
> + xfs_inode_t *ip,
> + xfs_buf_t *bp)
> +{
> + xfs_mount_t *mp = ip->i_mount;
> + xfs_perag_t *pag = xfs_get_perag(mp, ip->i_ino);
> + unsigned long first_index, mask;
> + int ilist_size;
> + xfs_inode_t *ilist;
> + xfs_inode_t *iq;
> + xfs_inode_log_item_t *iip;
> + int nr_found;
> + int clcount = 0;
> + int bufwasdelwri;
> +
> + ASSERT(pag->pagi_inodeok);
> + ASSERT(pag->pag_ici_init);
> +
> + ilist_size = XFS_INODE_CLUSTER_SIZE(mp) * sizeof(xfs_inode_t *);
> + ilist = kmem_alloc(ilist_size, KM_MAYFAIL);
> + if (!ilist)
> + return 0;

Now if you just used the linux native allocator this could be a kcalloc :)

> + if ((iq->i_update_core == 0) &&
> + ((iip == NULL) ||
> + !(iip->ili_format.ilf_fields & XFS_ILOG_ALL)) &&
> + xfs_ipincount(iq) == 0) {
> + continue;
> + }

if (!iq->i_update_core &&
(!iip || !(iip->ili_format.ilf_fields & XFS_ILOG_ALL)) &&
!xfs_ipincount(iq))
continue;

> + /*
> + * arriving here means that this inode can be flushed. First
> + * re-check that it's dirty before flushing.
> + */
> + iip = iq->i_itemp;
> + if ((iq->i_update_core != 0) || ((iip != NULL) &&
> + (iip->ili_format.ilf_fields & XFS_ILOG_ALL))) {

if (!iq->i_update_core ||
(!iip && (iip->ili_format.ilf_fields & XFS_ILOG_ALL)) {

> + /*
> + * Clean up the buffer. If it was B_DELWRI, just release it --
> + * brelse can handle it with no problems. If not, shut down the
> + * filesystem before releasing the buffer.
> + */
> + bufwasdelwri = XFS_BUF_ISDELAYWRITE(bp);
> + if (bufwasdelwri)
> + xfs_buf_relse(bp);
> +
> + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> +
> + if (!bufwasdelwri) {
> + /*
> + * Just like incore_relse: if we have b_iodone functions,
> + * mark the buffer as an error and call them. Otherwise
> + * mark it as stale and brelse.
> + */
> + if (XFS_BUF_IODONE_FUNC(bp)) {
> + XFS_BUF_CLR_BDSTRAT_FUNC(bp);
> + XFS_BUF_UNDONE(bp);
> + XFS_BUF_STALE(bp);
> + XFS_BUF_SHUT(bp);
> + XFS_BUF_ERROR(bp,EIO);
> + xfs_biodone(bp);
> + } else {
> + XFS_BUF_STALE(bp);
> + xfs_buf_relse(bp);
> + }
> + }

What's the point of all this if the filesystem is shut down anyway?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/