[PATH] A few things for immediate cleanup.

Martin Dalecki (dalecki@cs.net.pl)
Mon, 29 Nov 1999 02:44:49 +0100


This is a multi-part message in MIME format.
--------------E148DE6D198D695DDB2C2CA1
Content-Type: text/plain; charset=iso-8859-2
Content-Transfer-Encoding: 7bit

The attached patch contains a few things which can be immediately
cleaned up.
It's in esp. removing two pseudo compiler problem workarounds by using
bogous macro constructs (Which are possible reminiscents for older gcc
usage).

Second I have noticed that the usuefullness of the
get_hardblocksize function is, lets say, at least
questionable. The simple logics is that:

1. Why do get all the other file systems around
without using it?

2. At least raid and frieds basically just don't care about it if I
understand
the code correctly.

Let's have a precise look at it again:

./fs/buffer.c:588:unsigned int get_hardblocksize(kdev_t dev)
./fs/isofs/inode.c:547: blocksize = get_hardblocksize(dev);
./fs/ext2/super.c:393: blocksize = get_hardblocksize(dev);
./fs/ext2/super.c:471: hblock = get_hardblocksize(dev);
./fs/udf/lowlevel.c:97: unsigned int hbsize = get_hardblocksize(dev);
./fs/udf/super.c:399: sb->s_blocksize = get_hardblocksize(sb->s_dev);
./kernel/ksyms.c:172:EXPORT_SYMBOL(get_hardblocksize);
./include/linux/fs.h:937:extern unsigned int get_hardblocksize(kdev_t);

There I see it and I'm seriously starting to scratch my head....

Removing this would make at least the quite offending two dimensional
hardsect_size array almost a "write only" variable, which could be used
for
a significant overall device handling cleanup.

I have anyway the impression that all device drivers are emulating
nearly
arbitrary physical block sizes, just due to the fact that:

1. Alomst anything is using powers of two as the base for block sizes.

2. 512 is the lowest common denominator for all filesystems those days.
(We don't support CP/M filesystems as far as I'm aware of ;-).

3. Anything above 512 can be easly handled on the device level due to 1.
by
coalescing accesses to bigger block sizes into multiple smaller ones.
(See the block strategy routine of every single block device driver for
refference...)

However there I have just added a comment without actually changing any
code,
becouse I would like to hear first if I'm missing some important
point...

Let's have a tighter look at it...

./fs/buffer.c:594: if (hardsect_size[MAJOR(dev)] != NULL) {
./fs/buffer.c:595: int blksize = hardsect_size[MAJOR(dev)][MINOR(dev)];
^^^ Seems to be the only legitimative use of it. However apparently I
think
one can get easly without it, becouse this is anyway our
get_hardblocksize...

./fs/fat/inode.c:455: if (hardsect_size[MAJOR(sb->s_dev)] != NULL){
./fs/fat/inode.c:456: blksize =
hardsect_size[MAJOR(sb->s_dev)][MINOR(sb->s_dev)];
^^^ This appears to be just a consistency check and for me the logics
behind it doesn't cut it.

./fs/partitions/amiga.c:47: blocksize = get_hardsect_size(dev);
./fs/partitions/check.c:164:int get_hardsect_size(kdev_t dev)
./fs/partitions/check.c:166: if (hardsect_size[MAJOR(dev)] != NULL)
./fs/partitions/check.c:167: return
hardsect_size[MAJOR(dev)][MINOR(dev)];
^^^ I don't care much about the amiga and I think they are missing some
point
anyway here.

./fs/partitions/msdos.c:80: int sector_size = get_hardsect_size(dev) /
512;
./fs/partitions/msdos.c:329: int sector_size = get_hardsect_size(dev) /
512;

./drivers/block/blkpg.c:261: intval = get_hardsect_size(dev);
^^^^^^ Hugh? Why not just simple use the logical block size (PAGE_SIZE)
here?

./drivers/char/raw.c:178: if (hardsect_size[MAJOR(bdev)])
./drivers/char/raw.c:179: sector_size =
hardsect_size[MAJOR(bdev)][MINOR(bdev)];
^^^ This is as dubious the same way as the usage inside the fat code and
apparently
once again just a statement about the implementation of this facility.

At all other places this array get's only used as a "write only"
variable.

So I have the impression that hardsect_size is just an artefact from the
past where
it wasn't entierly clear who should handle blocksize differences between
actual
physical blocks on the disk and blocksizes expected by a particular
filesystems.
Should it be the filesystem layer or the block device layer.
I'm pretty convinced this got already resolved the natural way by the
actual
drivers for the devices in favour of the device level ;-).

Thank's in advance for any comment's/corrections.

Anyway I think I have just identifyed a serious problem in the above.

--
	Marcin Dalecki
--------------E148DE6D198D695DDB2C2CA1
Content-Type: text/plain; charset=iso-8859-2;
 name="diff-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="diff-1"

diff -ur linux/drivers/block/ll_rw_blk.c linux-new/drivers/block/ll_rw_blk.c --- linux/drivers/block/ll_rw_blk.c Thu Nov 11 19:33:42 1999 +++ linux-new/drivers/block/ll_rw_blk.c Mon Nov 29 01:14:06 1999 @@ -35,8 +35,15 @@ #endif /* - * The request-struct contains all necessary data - * to load a nr of sectors into memory + * NR_REQUEST is the number of entries in the request-queue. + * NOTE that writes may use only the low 2/3 of these: reads + * take precedence. + */ +#define NR_REQUEST 128 + +/* + * The request-struct contains all necessary data to load a nr of sectors into + * memory. */ static struct request all_requests[NR_REQUEST]; @@ -326,6 +333,23 @@ } /* + * This is used in the elevator algorithm. We don't prioritise reads over + * writes any more --- although reads are more time-critical than writes, by + * treating them equally we increase filesystem throughput. This turns out to + * give better overall performance. -- sct + * + * Mon Nov 29 01:03:52 CET 1999 Marcin Dalecki <dalecki@cs.net.pl>: Just a + * slight change in the actual writedown of the code reveals (by objdump -S) + * that there isn't really any need to make this an atificial macro. Im fact + * we get btter code this way with modern compilers ;-). + */ +static __inline__ int in_order(struct request *s1, struct request *s2) +{ + return (s1->rq_dev < s2->rq_dev || ((s1->rq_dev == s2->rq_dev && + s1->sector < s2->sector))); +} + +/* * add-request adds a request to the linked list. * It disables interrupts (aquires the request spinlock) so that it can muck * with the request-lists in peace. Thus it should be called with no spinlocks @@ -358,14 +382,11 @@ goto out; } for ( ; tmp->next ; tmp = tmp->next) { - const int after_current = IN_ORDER(tmp,req); - const int before_next = IN_ORDER(req,tmp->next); - - if (!IN_ORDER(tmp,tmp->next)) { - if (after_current || before_next) + if (!in_order(tmp,tmp->next)) { + if (in_order(tmp,req) || in_order(req,tmp->next)) break; } else { - if (after_current && before_next) + if (in_order(tmp,req) && in_order(req,tmp->next)) break; } } diff -ur linux/fs/buffer.c linux-new/fs/buffer.c --- linux/fs/buffer.c Thu Nov 11 19:33:42 1999 +++ linux-new/fs/buffer.c Mon Nov 29 01:56:38 1999 @@ -446,13 +446,28 @@ spin_unlock(&lru_list_lock); } -/* After several hours of tedious analysis, the following hash - * function won. Do not mess with it... -DaveM +/* After several hours of tedious analysis, the following hash function won. + * Do not mess with it... -DaveM + * + * Mon Nov 29 01:33:28 CET 1999 Marcin Dalecki <dalecki@cs.net.pl>: + * + * This is indeed bad, since: + * + * 1. It's heavly dependant upon the number of bits in kdev_t. + * + * 2. I miss the logical justification why exactly this (obscure) hash function + * should be better then anything more tangible. + * + * 3. The previously present macro hackary was in effect... just code + * obfuscation. */ -#define _hashfn(dev,block) \ - ((((dev)<<(bh_hash_shift - 6)) ^ ((dev)<<(bh_hash_shift - 9))) ^ \ - (((block)<<(bh_hash_shift - 6)) ^ ((block) >> 13) ^ ((block) << (bh_hash_shift - 12)))) -#define hash(dev,block) hash_table[(_hashfn(dev,block) & bh_hash_mask)] +static __inline__ struct buffer_head **hash(kdev_t dev, unsigned long block) +{ + unsigned int hash_value = (((dev << (bh_hash_shift - 6)) ^ (dev << (bh_hash_shift - 9))) ^ + ((block << (bh_hash_shift - 6)) ^ (block >> 13) ^ (block << (bh_hash_shift - 12)))); + + return &hash_table[hash_value & bh_hash_mask]; +} static __inline__ void __hash_link(struct buffer_head *bh, struct buffer_head **head) { @@ -529,7 +544,7 @@ static void insert_into_queues(struct buffer_head *bh) { - struct buffer_head **head = &hash(bh->b_dev, bh->b_blocknr); + struct buffer_head **head = hash(bh->b_dev, bh->b_blocknr); spin_lock(&lru_list_lock); write_lock(&hash_table_lock); @@ -569,7 +584,7 @@ */ struct buffer_head * get_hash_table(kdev_t dev, int block, int size) { - struct buffer_head **head = &hash(dev, block); + struct buffer_head **head = hash(dev, block); struct buffer_head *bh; read_lock(&hash_table_lock); @@ -585,6 +600,24 @@ return bh; } +/* + * Mon Nov 29 01:44:56 CET 1999 Marcin Dalecki <cs.net.pl>: + * + * Here is a list of all the places where the following function get's actually + * used: + * + * ./fs/buffer.c:603:unsigned int get_hardblocksize(kdev_t dev) + * ./fs/isofs/inode.c:547: blocksize = get_hardblocksize(dev); + * ./fs/ext2/super.c:393: blocksize = get_hardblocksize(dev); + * ./fs/ext2/super.c:471: hblock = get_hardblocksize(dev); + * ./fs/udf/lowlevel.c:97: unsigned int hbsize = get_hardblocksize(dev); + * ./fs/udf/super.c:399: sb->s_blocksize = get_hardblocksize(sb->s_dev); + * + * After having looked at the code using it I'm pretty convinced that it's + * broken. In esp. at least in the case of loopback and raid devices I can see + * that the rest doesn't give a damn bit about it... However I'm not quite sure + * as of yet, so please correct me if I'm wrong. + */ unsigned int get_hardblocksize(kdev_t dev) { /* diff -ur linux/include/linux/blk.h linux-new/include/linux/blk.h --- linux/include/linux/blk.h Tue Nov 23 21:57:27 1999 +++ linux-new/include/linux/blk.h Mon Nov 29 01:08:25 1999 @@ -14,23 +14,6 @@ extern spinlock_t io_request_lock; /* - * NR_REQUEST is the number of entries in the request-queue. - * NOTE that writes may use only the low 2/3 of these: reads - * take precedence. - */ -#define NR_REQUEST 128 - -/* - * This is used in the elevator algorithm. We don't prioritise reads - * over writes any more --- although reads are more time-critical than - * writes, by treating them equally we increase filesystem throughput. - * This turns out to give better overall performance. -- sct - */ -#define IN_ORDER(s1,s2) \ -((s1)->rq_dev < (s2)->rq_dev || (((s1)->rq_dev == (s2)->rq_dev && \ -(s1)->sector < (s2)->sector))) - -/* * Initialization functions. */ extern int isp16_init(void);

--------------E148DE6D198D695DDB2C2CA1--

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/