[patch 2.5.8] bounce/swap stats

From: Randy.Dunlap (rddunlap@osdl.org)
Date: Thu Apr 11 2002 - 20:21:25 EST


Hi,

This patch adds stats for all bounce I/O and bounce swap I/O
to /proc/stats .

I've been testing bounce I/O and VM performance in 2.4.teens
with the highio patch and in 2.5.x.

Summary:
* 2.5.8-pre3 with highio runs to completion with an intense workload
* 2.5.8-pre3 with "nohighio" and same workload dies
* 2.5.8-pre3 with "nohighio" and less workload runs
[attachments contain /proc/stat for completed runs]

Here's the patch. Jens, please apply to 2.5.8-N.

--- ./fs/proc/proc_misc.c.org Thu Jan 3 09:16:31 2002
+++ ./fs/proc/proc_misc.c Tue Jan 8 16:12:56 2002
@@ -324,6 +324,12 @@
                 xtime.tv_sec - jif / HZ,
                 total_forks);

+ len += sprintf(page + len,
+ "bounce_io %u %u\n"
+ "bounce_swap_io %u %u\n",
+ kstat.bouncein, kstat.bounceout,
+ kstat.bounceswapin, kstat.bounceswapout);
+
         return proc_calc_metrics(page, start, off, count, eof, len);
 }

--- ./mm/highmem.c.org Thu Jan 3 09:16:31 2002
+++ ./mm/highmem.c Tue Jan 8 16:16:51 2002
@@ -21,6 +21,7 @@
 #include <linux/mempool.h>
 #include <linux/blkdev.h>
 #include <asm/pgalloc.h>
+#include <linux/kernel_stat.h>

 static mempool_t *page_pool, *isa_page_pool;

@@ -401,7 +401,10 @@
                         vfrom = kmap(from->bv_page) + from->bv_offset;
                         memcpy(vto, vfrom, to->bv_len);
                         kunmap(from->bv_page);
+ kstat.bounceout++;
                 }
+ else
+ kstat.bouncein++;
         }

         /*
--- ./include/linux/kernel_stat.h.org Thu Jan 3 09:28:04 2002
+++ ./include/linux/kernel_stat.h Tue Jan 8 16:10:20 2002
@@ -26,6 +26,8 @@
         unsigned int dk_drive_wblk[DK_MAX_MAJOR][DK_MAX_DISK];
         unsigned int pgpgin, pgpgout;
         unsigned int pswpin, pswpout;
+ unsigned int bouncein, bounceout;
+ unsigned int bounceswapin, bounceswapout;
 #if !defined(CONFIG_ARCH_S390)
         unsigned int irqs[NR_CPUS][NR_IRQS];
 #endif
--- ./mm/page_io.c.orig Tue Apr 9 14:54:02 2002
+++ ./mm/page_io.c Tue Apr 9 16:18:18 2002
@@ -10,11 +10,13 @@
  * Always use brw_page, life becomes simpler. 12 May 1998 Eric Biederman
  */

+#include <linux/config.h>
 #include <linux/mm.h>
 #include <linux/kernel_stat.h>
 #include <linux/swap.h>
 #include <linux/locks.h>
 #include <linux/swapctl.h>
+#include <linux/blkdev.h>

 #include <asm/pgtable.h>

@@ -41,6 +43,7 @@
         int block_size;
         struct inode *swapf = 0;
         struct block_device *bdev;
+ kdev_t kdev;

         if (rw == READ) {
                 ClearPageUptodate(page);
@@ -54,6 +57,7 @@
                 zones[0] = offset;
                 zones_used = 1;
                 block_size = PAGE_SIZE;
+ kdev = swapf->i_rdev;
         } else {
                 int i, j;
                 unsigned int block = offset
@@ -67,6 +71,19 @@
                         }
                 zones_used = i;
                 bdev = swapf->i_sb->s_bdev;
+ kdev = swapf->i_sb->s_dev;
+ }
+
+ {
+ request_queue_t *q = blk_get_queue(kdev); /* TBD: is kdev always correct here? */
+ zone_t *zone = page_zone(page);
+ if (q && (page - zone->zone_mem_map) + (zone->zone_start_paddr
+ >> PAGE_SHIFT) >= q->bounce_pfn) {
+ if (rw == WRITE)
+ kstat.bounceswapout++;
+ else
+ kstat.bounceswapin++;
+ }
         }

          /* block_size == PAGE_SIZE/zones_used */

I'll keep looking into the "kernel dies" problem(s) that I'm
seeing [using more tools], but I have some data and a patch for 2.5.8
concerning bounce I/O and bounce swap statistics that I would
like to have integrated so that both users and developers
can have more insight into how much bounce I/O is happening.

I'll generate the patch for 2.4.teens + highmem if anyone
is interested in it, or after highmem is merged into 2.4.
...it will be added to 2.4, right?

There is a second patch (attached) that prints the device
major:minor of devices that are being used for bounce I/O
[258-bounce-identify.patch]. This is a user helper, not
intended for kernel inclusion.

Some of the symptoms that I usually see with the most memory-intensive
workloads are:
. 'top' reports that all 8 processors are in system exec. state
  at 98-99% level and 'top' display is only being updated every
  few minutes (should be updated every 1 second)
. Magic SysRq does not work when all 8 CPUs are tied up in system
  mode
. there is a looping script running (with 'sleep 1') that
  prints the last 50 lines of 'dmesg', but it often doesn't print
  for 10-20 minutes and then finally comes back to life
. I usually don't see a kernel death, just lack of response or
  my sessions to the test system dies.

Comments?

Thanks,
~Randy







-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Apr 15 2002 - 22:00:20 EST