Re: [PATCH] pktcdvd: added sysfs interface + bio write queuehandling fix

From: Randy.Dunlap
Date: Sun Sep 03 2006 - 18:13:41 EST


On Sun, 03 Sep 2006 20:20:43 +0200 Thomas Maier wrote:

> Hello,
>
> this is a patch for the packet writing driver pktcdvd.
> It adds a sysfs interface to the driver and a bio write
> queue "congestion" handling.
>
> The patch modifies following files of the Linux 2.6.17.11
> source tree:
> Documentation/cdrom/packet-writing.txt
> include/linux/pktcdvd.h
> drivers/block/pktcdvd.c
> drivers/block/Kconfig
> block/genhd.c
>
> (genhd.c must be changed to export the block_subsys
> symbol)
>
> The bio write queue changes are in pktcdvd.c in functions:
> pkt_make_request()
> pkt_bio_finished()
>
> Any comments and improvements are welcomed ;)

a. send patches against the latest mainline kernel version
unless they only apply to -mm (then against -mm obviously)
or unless they are specifically being made to patch the
stable kernel releases. However, even patches to -stable
should also have already been made to mainline.

b. don't send *-zipped attachments. do send text,
preferably inline, but text attachments as a last resort.

c. anything that touches or modifies the kernel ABI should
also update Documentation/ABI/*. See Documentation/ABI/README
for more info/details.

d. kernel coding style does not put lots-of-spaces around if ():
not:
+ if ( pidx )
yes:
if (pidx)

e. do both the procfs and sysfs interfaces use the same code
for pkt_print_stat()? The main reason that I ask is that
sysfs files are restricted (by convention) to one value per file
but pkt_print_stat() is printing lots of values.


> Why this patch?
> ===============
> This driver uses an internal bio write queue to store
> write requests from the block layer, passed to the driver
> over its own make_request function.
> I am using Linux 2.6.17 on an Athlon 64 X2, 2G RAM and while
> writing huge files (>200M) to a DVDRAM using the pktcdvd driver,
> the bio write queue raised >200000 entries! This led to
> kernel out of memory Oops! e.g.:
>
> ----------------------------------------------------------
> Aug 14 17:42:26 master vmunix: pktcdvd: 4473408kB available on disc
> Aug 14 17:42:54 master vmunix: pktcdvd: write speed 4155kB/s
> Aug 14 17:54:24 master vmunix: oom-killer: gfp_mask=0xd0, order=1
> Aug 14 17:54:24 master vmunix: <c014346f> out_of_memory+0x12f/0x150
> <c01452d0> __alloc_pages+0x280/0x2e0
> Aug 14 17:54:24 master vmunix: <c015a52a> cache_alloc_refill+0x2ea/0x500
> <c015a7a1> __kmalloc+0x61/0x70
> Aug 14 17:54:24 master vmunix: <c039c0b3> __alloc_skb+0x53/0x110
> <c03985b6> sock_alloc_send_skb+0x176/0x1c0
> Aug 14 17:54:24 master vmunix: <c0399c5b> sock_def_readable+0x7b/0x80
> <c041262b> unix_stream_sendmsg+0x1cb/0x310
> Aug 14 17:54:24 master vmunix: <c039502b> do_sock_write+0xab/0xc0
> <c0395720> sock_aio_write+0x80/0x90
> Aug 14 17:54:24 master vmunix: <c011a609> __wake_up_common+0x39/0x60
> <c015d984> do_sync_write+0xc4/0x100
> Aug 14 17:54:47 master vmunix: printk: 10 messages suppressed.
> Aug 14 17:54:47 master vmunix: oom-killer: gfp_mask=0xd0, order=0
> Aug 14 17:54:47 master vmunix: <c014346f> out_of_memory+0x12f/0x150
> <c01452d0> __alloc_pages+0x280/0x2e0
> Aug 14 17:54:47 master vmunix: <c0258de2> __next_cpu+0x12/0x30
> <c015a52a> cache_alloc_refill+0x2ea/0x500
> Aug 14 17:54:47 master vmunix: <c015a23a> kmem_cache_alloc+0x4a/0x50
> <c03987ea> sk_alloc+0x2a/0x150
> Aug 14 17:54:47 master vmunix: <c03e3f8d> inet_create+0xed/0x320
> <c03950a2> sock_alloc_inode+0x12/0x70
> Aug 14 17:54:47 master vmunix: <c017790e> alloc_inode+0xce/0x180
> <c03966f3> __sock_create+0x123/0x2f0
> Aug 14 17:54:49 master vmunix: Total swap = 2152668kB
> Aug 14 17:54:49 master vmunix: Free swap: 2152436kB
> Aug 14 17:54:49 master vmunix: 524272 pages of RAM
> Aug 14 17:54:49 master vmunix: 294896 pages of HIGHMEM
> Aug 14 17:54:49 master vmunix: 5767 reserved pages
> Aug 14 17:54:49 master vmunix: 238277 pages shared
> Aug 14 17:54:49 master vmunix: 35 pages swap cached
> Aug 14 17:54:49 master vmunix: 47682 pages dirty
> Aug 14 17:54:49 master vmunix: 157861 pages writeback
> Aug 14 17:54:49 master vmunix: 17359 pages mapped
> Aug 14 17:54:49 master vmunix: 23835 pages slab
> Aug 14 17:54:49 master vmunix: 176 pages pagetables
> Aug 14 17:54:59 master vmunix: <c0145355> __get_free_pages+0x25/0x40
> Aug 14 17:55:19 master vmunix: 294896 pages of HIGHM<6>5767 reserved pages
> ------------------------------------------------------------
>
> It don't know exactly what is wrong in the kernel, but
> it seems it must be something with the kernels memory handling.
>
> To be able to use the pktcdvd driver now, i created this patch.
> It simply limits the size of the bio write queue of the driver
> to save kernel memory. Does not cure the "kernel bug", but the
> symptom ;)
> If the number of bio write requests would raise the bio
> queue size over a high limit (congestion on), the
> make_request function waits till the worker thread has
> lowered the queue size below the "congestion off" mark.
> The wait is similar to the wait in get_request_wait(),
> called by the "normal" request function __make_request().
>
> Peter Osterlund suggested to use the pair
> clear_queue_congested()
> blk_congestion_wait()
> here. But i am not sure if this is the right way to do
> it.
>
>
> Also there is now a sysfs interface for the driver and the
> procfs interface can be switched of by a kernel config
> parameter.
>
> Here are more informations about the new features of the driver,
> that are added to packet-writing.txt by this patch:
>
>
> Using the pktcdvd sysfs interface
> ---------------------------------
>
> The pktcdvd module has a sysfs interface and can be controlled
> by the tool "pktcdvd" that uses sysfs.
>
> "pktcdvd" works similar to "pktsetup", e.g.:
>
> # pktcdvd -a dev_name /dev/hdc
> # mkudffs /dev/pktcdvd/dev_name
> # mount -t udf -o rw,noatime /dev/pktcdvd/dev_name /dvdram
> # cp files /dvdram
> # umount /dvdram
> # pktcdvd -r dev_name
>
>
> The pktcdvd module exports these files in the sysfs:
> ( <pktdevname> is one of pktcdvd0..pktcdvd7 )
> ( <devid> is in format major:minor )
>
> /sys/block/pktcdvd/
> add (w) Write a block device id to create a
> new pktcdvd device and map it the
> block device.
>
> remove (w) Write the pktcdvd device id or the
> mapped block device id to it, to
> remove the pktcdvd device.
>
> device_map (r) Shows the device mapping in format:
> <pktdevname> <pktdevid> <blkdevid>
>
> packet_buffers (rw) Number of concurrent packets per
> pktcdvd device. Used for new created
> devices.
>
>
> /sys/block/pktcdvd/<pktdevname>/packet/
> stat (r) Show device status.
>
> reset_stat (w) Write any value to it to reset some
> pktcdvd device stat values, like
> bytes read/written.
>
> write_congestion_off (rw) If bio write queue size is below
> this mark, accept new bio requests
> from the block layer.
>
> write_congestion_on (rw) If bio write queue size is higher
> as this mark, do no longer accept
> bio write requests from the block
> layer and wait till the pktcdvd
> device has processed enough bio's
> so that bio write queue size is
> below congestion off mark.
>
> mapped_to Symbolic link to mapped block device
> in the sysfs tree.
>
>
>
>
> To use the pktcdvd sysfs interface directly, you can do:
>
> # create a new pktcdvd device mapped to /dev/hdc
> echo "22:0" >/sys/block/pktcdvd/add
> cat /sys/block/pktcdvd/device_map
> # assuming device pktcdvd0 was created, look at stat's
> cat /sys/block/pktcdvd/pktcdvd0/packet/stat
> # print the device id of the mapped block device
> cat /sys/block/pktcdvd/pktcdvd0/packet/mapped_to/dev
> # similar to
> fgrep pktcdvd0 /sys/block/pktcdvd/device_map
> # remove device, using pktcdvd0 device id 253:0
> echo "253:0" >/sys/block/pktcdvd/remove
> # same as using the mapped block device id 22:0
> echo "22:0" >/sys/block/pktcdvd/remove
>
>
> Bio write queue congestion marks
> --------------------------------
> The pktcdvd driver allows now to adjust the behaviour of the
> internal bio write queue.
> This can be done with the two write_congestion_[on|off] marks.
> The driver does only accept up to write_congestion_on bio
> write request from the i/o block layer, and waits till the
> requests are processed by the mapped block device and
> the queue size is below the write_congestion_off mark.
> In previous versions of pktcdvd, the driver accepted all
> incoming bio write request. This led sometimes to kernel
> out of memory oops (maybe some bugs in the linux kernel ;)
> CAUTION: use this options only if you know what you do!
> The default settings for the congestion marks should be ok
> for everyone.

---
~Randy

--
VGER BF report: H 0
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/