Re: [PATCH 3/3] ide: use per-device request queue locks

From: Bartlomiej Zolnierkiewicz
Date: Sat Dec 13 2008 - 18:18:53 EST



[ Once again, sorry for the long delay. ]

On Monday 24 November 2008, Elias Oltmanns wrote:
> Bartlomiej Zolnierkiewicz <bzolnier@xxxxxxxxx> wrote:
> > * Move hack for flush requests from choose_drive() to do_ide_request().
> >
> > * Add ide_plug_device() helper and convert core IDE code from using
> > per-hwgroup lock as a request lock to use the ->queue_lock instead.
> >
> > * Remove no longer needed:
> > - choose_drive() function
> > - WAKEUP() macro
> > - 'sleeping' flag from ide_hwif_t
> > - 'service_{start,time}' fields from ide_drive_t
> >
> > This patch results in much simpler and more maintainable code
> > (besides being a scalability improvement).
> >
> > Cc: Elias Oltmanns <eo@xxxxxxxxxxxxxx>
> > Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@xxxxxxxxx>
> > ---
> > newer version
>
> Eventually, I got round to having a look at this one (I reviewed the two
> small ones at the time). Apologies for the delay.
>
> >
> > drivers/ide/ide-io.c | 213 +++++++++++++++---------------------------------
> > drivers/ide/ide-park.c | 13 +-
> > drivers/ide/ide-probe.c | 3
> > include/linux/ide.h | 4
> > 4 files changed, 79 insertions(+), 154 deletions(-)
> >
> > Index: b/drivers/ide/ide-io.c
> > ===================================================================
> > --- a/drivers/ide/ide-io.c
> > +++ b/drivers/ide/ide-io.c
> [...]
> > @@ -780,61 +704,40 @@ repeat:
> > */
> > void do_ide_request(struct request_queue *q)
> > {
> > - ide_drive_t *orig_drive = q->queuedata;
> > - ide_hwgroup_t *hwgroup = orig_drive->hwif->hwgroup;
> > - ide_drive_t *drive;
> > - ide_hwif_t *hwif;
> > + ide_drive_t *drive = q->queuedata;
> > + ide_hwif_t *hwif = drive->hwif;
> > + ide_hwgroup_t *hwgroup = hwif->hwgroup;
> > struct request *rq;
> > ide_startstop_t startstop;
> >
> > - /* caller must own hwgroup->lock */
> > - BUG_ON(!irqs_disabled());
> > -
> > - while (!ide_lock_hwgroup(hwgroup)) {
> > - drive = choose_drive(hwgroup);
> > - if (drive == NULL) {
> > - int sleeping = 0;
> > - unsigned long sleep = 0; /* shut up, gcc */
> > - hwgroup->rq = NULL;
> > - drive = hwgroup->drive;
> > - do {
> > - if ((drive->dev_flags & IDE_DFLAG_SLEEPING) &&
> > - (sleeping == 0 ||
> > - time_before(drive->sleep, sleep))) {
> > - sleeping = 1;
> > - sleep = drive->sleep;
> > - }
> > - } while ((drive = drive->next) != hwgroup->drive);
> > - if (sleeping) {
> > + /*
> > + * drive is doing pre-flush, ordered write, post-flush sequence. even
> > + * though that is 3 requests, it must be seen as a single transaction.
> > + * we must not preempt this drive until that is complete
> > + */
> > + if (blk_queue_flushing(q))
> > /*
> > - * Take a short snooze, and then wake up this hwgroup again.
> > - * This gives other hwgroups on the same a chance to
> > - * play fairly with us, just in case there are big differences
> > - * in relative throughputs.. don't want to hog the cpu too much.
> > + * small race where queue could get replugged during
> > + * the 3-request flush cycle, just yank the plug since
> > + * we want it to finish asap
> > */
> > - if (time_before(sleep, jiffies + WAIT_MIN_SLEEP))
> > - sleep = jiffies + WAIT_MIN_SLEEP;
> > -#if 1
> > - if (timer_pending(&hwgroup->timer))
> > - printk(KERN_CRIT "ide_set_handler: timer already active\n");
> > -#endif
> > - /* so that ide_timer_expiry knows what to do */
> > - hwgroup->sleeping = 1;
> > - hwgroup->req_gen_timer = hwgroup->req_gen;
> > - mod_timer(&hwgroup->timer, sleep);
> > - /* we purposely leave hwgroup locked
> > - * while sleeping */
> > - } else
> > - ide_unlock_hwgroup(hwgroup);
> > + blk_remove_plug(q);
>
> I'm not at all convinced that this works as expected. First of all, I
> think we can safely assume that the plug is removed when block layer
> calls into the ->request_fn(). Secondly, since the ide layer doesn't
> call the ->request_fn() on it's own accord, I rather suspect that this
> check can be dropped altogether. On the other hand, I'm not sure I agree

I suspect that this is leftover from the old code and I also think that
it can be removed completely. However mixing too many real code changes
in a single patch is not a good practice and the removal can be handled
independently of the discussed patch.

If you would send me a patch with the above change I will be happy to
integrate it into pata tree (patch can be against current Linus' tree or
linux-next, either one is completely fine with me).

> with the rest of the patch anyway, see below.
>
> >
> > - /* no more work for this hwgroup (for now) */
> > - goto plug_device;
> > - }
> > + spin_unlock_irq(q->queue_lock);
> > + spin_lock_irq(&hwgroup->lock);
> >
> > - if (drive != orig_drive)
> > - goto plug_device;
> > + /* caller must own hwgroup->lock */
> > + BUG_ON(!irqs_disabled());
>
> Does this check really make any sense? The lock is being taken just two
> lines before, after all.

Indeed, removed.

> > - hwif = drive->hwif;
> > + if (!ide_lock_hwgroup(hwgroup)) {
> > + hwgroup->rq = NULL;
> > +
> > + if (drive->dev_flags & IDE_DFLAG_SLEEPING) {
> > + if (time_before(drive->sleep, jiffies)) {
> > + ide_unlock_hwgroup(hwgroup);
> > + goto plug_device;
> > + }
> > + }
> >
> > if (hwif != hwgroup->hwif) {
> > /*
> > @@ -847,16 +750,20 @@ void do_ide_request(struct request_queue
> > hwgroup->hwif = hwif;
> > hwgroup->drive = drive;
> > drive->dev_flags &= ~(IDE_DFLAG_SLEEPING | IDE_DFLAG_PARKED);
> > - drive->service_start = jiffies;
> >
> > + spin_unlock_irq(&hwgroup->lock);
> > + spin_lock_irq(q->queue_lock);
> > /*
> > * we know that the queue isn't empty, but this can happen
> > * if the q->prep_rq_fn() decides to kill a request
> > */
> > rq = elv_next_request(drive->queue);
> > + spin_unlock_irq(q->queue_lock);
> > + spin_lock_irq(&hwgroup->lock);
> > +
> > if (!rq) {
> > ide_unlock_hwgroup(hwgroup);
> > - break;
> > + goto out;
> > }
> >
> > /*
> > @@ -888,15 +795,22 @@ void do_ide_request(struct request_queue
> >
> > if (startstop == ide_stopped) {
> > ide_unlock_hwgroup(hwgroup);
> > - if (!elv_queue_empty(orig_drive->queue))
> > - blk_plug_device(orig_drive->queue);
> > + /* give other devices a chance */
> > + goto plug_device;
>
> This, I think, is very wrong. The rule of thumb for a ->requst_fn()
> should be to take as many requests off the queue as possible. Besides,
> this may easily preempt an ordered sequence as mentioned earlier. In my
> opinion, we really need a loop of sorts (while or goto).

Thanks for noticing this. Fixed.

[ The original idea behind "goto plug_device" is in the comment but as
I read the code again it doesn't make any sense now... ]

> > }
> > - }
> > + } else
> > + goto plug_device;
> > +out:
> > + spin_unlock_irq(&hwgroup->lock);
> > + spin_lock_irq(q->queue_lock);
> > return;
> >
> > plug_device:
> > - if (!elv_queue_empty(orig_drive->queue))
> > - blk_plug_device(orig_drive->queue);
> > + spin_unlock_irq(&hwgroup->lock);
> > + spin_lock_irq(q->queue_lock);
> > +
> > + if (!elv_queue_empty(q))
> > + blk_plug_device(q);
> > }
> >
> > /*
> [...]
> > @@ -974,10 +899,12 @@ out:
> > void ide_timer_expiry (unsigned long data)
> > {
> > ide_hwgroup_t *hwgroup = (ide_hwgroup_t *) data;
> > + ide_drive_t *uninitialized_var(drive);
> > ide_handler_t *handler;
> > ide_expiry_t *expiry;
> > unsigned long flags;
> > unsigned long wait = -1;
> > + int plug_device = 0;
> >
> > spin_lock_irqsave(&hwgroup->lock, flags);
> >
> > @@ -989,12 +916,8 @@ void ide_timer_expiry (unsigned long dat
> > * or we were "sleeping" to give other devices a chance.
> > * Either way, we don't really want to complain about anything.
> > */
>
> Not that important, I suppose, but you might want to update that comment
> while you're at it.

I think that despite code changes it stays valid so there is no urgent need
to touch it..

> > - if (hwgroup->sleeping) {
> > - hwgroup->sleeping = 0;
> > - ide_unlock_hwgroup(hwgroup);
> > - }
> > } else {
> > - ide_drive_t *drive = hwgroup->drive;
> > + drive = hwgroup->drive;
> > if (!drive) {
> > printk(KERN_ERR "ide_timer_expiry: hwgroup->drive was NULL\n");
> > hwgroup->handler = NULL;
>
> Finally, some more general blathering on the topic at hand: A while ago,
> I spent some thought on the possibilities of giving the block layer a
> notion of linked device queues as an equivalent hwgroups in ide,
> scsi_hosts or ata_ports and let it take care of time / bandwidth
> distribution among the queues belonging to one group. This is, as I
> understand, pretty much what your code is relying on since you have
> chucked out choose_drive(). However, this turned out not to be too easy

This is the right way to go and I has always advocated for it. However
after seeing how libata got away with ignoring the issue altogether
I'm no longer sure of this. I haven't seen any bug reports which would
indicate that simplified approach has any really negative consequences.

[ Still would be great to have the control over bandwitch of "queue-group"
at the block layer level since we could also use it for distributing the
available PCI[-E] bus bandwitch. ]

> and I'm not quite sure whether we really want to go down that road. One
> major problem is that there is no straight forward way for the block
> layer to know, whether a ->request_fn() has actually taken a request off
> the queue and if not (or less than queue_depth anyway), whether it's
> just the device that couldn't take any more or the controller instead.
> On the whole, it seems not exactly trivial to get it right and it would
> probably be a good idea to consult Jens and perhaps James before

I think that having more information returned by ->request_fn() could be
beneficial to the block layer (independently whether we end up adding
support for "queue-groups" to the block layer or not) but this definitely
needs to be verified with Jens & James.

> embarking on such a venture. Short of that, I think that ide layer has
> to keep an appropriate equivalent of choose_drive() and also the while
> loop in the do_ide_request() function.

Thank you for your review. v1->v2 interdiff below.

v2:
* Fixes/improvements based on review from Elias:
- take as many requests off the queue as possible
- remove now redundant BUG_ON()

diff -u b/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
--- b/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -726,10 +726,8 @@
spin_unlock_irq(q->queue_lock);
spin_lock_irq(&hwgroup->lock);

- /* caller must own hwgroup->lock */
- BUG_ON(!irqs_disabled());
-
if (!ide_lock_hwgroup(hwgroup)) {
+repeat:
hwgroup->rq = NULL;

if (drive->dev_flags & IDE_DFLAG_SLEEPING) {
@@ -793,11 +791,8 @@
startstop = start_request(drive, rq);
spin_lock_irq(&hwgroup->lock);

- if (startstop == ide_stopped) {
- ide_unlock_hwgroup(hwgroup);
- /* give other devices a chance */
- goto plug_device;
- }
+ if (startstop == ide_stopped)
+ goto repeat;
} else
goto plug_device;
out:

¢éì¹»®&Þ~º&¶¬?+-±éݶ¥?w®?Ë?±Êâméb?ìdz¹Þ?)í?æèw*jg¬±¨¶????Ý¢j/?êäz¹Þ??à2?Þ?¨è­Ú&¢)ß¡«a¶Úþø®G«?éh®æj:+v?¨?wè?Ù¥>W?±êÞiÛaxPjØm¶?ÿà -»+?ùd?_