Re: [PATCH 3/3] NVMe: Convert to blk-mq

From: Keith Busch
Date: Tue Oct 22 2013 - 15:52:34 EST

Next message: Leonidas Da Silva Barbosa: "Re: [tpmdd-devel] [PATCH] tpm: MAINTAINERS: Add myself as tpmmaintainer"
Previous message: Luigi Semenzato: "Re: [tpmdd-devel] [PATCH] tpm: MAINTAINERS: Add myself as tpm maintainer"
In reply to: Matias Bjorling: "Re: [PATCH 3/3] NVMe: Convert to blk-mq"
Next in thread: Matias Bjorling: "[PATCH 2/3] NVMe: Extract admin queue size"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, 22 Oct 2013, Matias Bjorling wrote:

Den 22-10-2013 18:55, Keith Busch skrev:
On Fri, 18 Oct 2013, Matias Bjørling wrote:
On 10/18/2013 05:13 PM, Keith Busch wrote:
On Fri, 18 Oct 2013, Matias Bjorling wrote:
The nvme driver implements itself as a bio-based driver. This primarily
because of high lock congestion for high-performance nvm devices. To
remove the congestion within the traditional block layer, a multi-queue
block layer is being implemented.

- result = nvme_map_bio(nvmeq, iod, bio, dma_dir, psegs);
- if (result <= 0)
+ if (nvme_map_rq(nvmeq, iod, rq, dma_dir))
goto free_cmdid;
- length = result;

- cmnd->rw.command_id = cmdid;
+ length = blk_rq_bytes(rq);
+
+ cmnd->rw.command_id = rq->tag;

The command ids have to be unique on a submission queue. Since each
namespace's blk-mq has its own 'tags' used as command ids here but share
submission queues, what's stopping the tags for commands sent to namespace
1 from clashing with tags for namespace 2?

I think this would work better if one blk-mq was created per device
rather than namespace. It would fix the tag problem above and save a
lot of memory potentially wasted on millions of requests allocated that
can't be used.

You're right. I didn't see the connection. In v3 I'll push struct request_queue to nvme_dev and map the queues appropriately. It will also fix the command id issues.

Just anticipating a possible issue with the suggestion. Will this separate
the logical block size from the request_queue? Each namespace can have
a different format, so the block size and request_queue can't be tied
together like it currently is for this to work.

If only a couple of different logical sizes are to be expected (1-4), we can keep a list of already initialized request queues, and use the one that match an already initialized?

The spec allows a namespace to have up to 16 different block formats and
they need not be the same 16 as another namespace on the same device.

From a practical standpoint, I don't think devices will support more

than a few formats, but even if you kept it to that many request queues,
you just get back to conflicting command id tags and some wasted memory.

Axboe, do you know of a better solution?

Next message: Leonidas Da Silva Barbosa: "Re: [tpmdd-devel] [PATCH] tpm: MAINTAINERS: Add myself as tpmmaintainer"
Previous message: Luigi Semenzato: "Re: [tpmdd-devel] [PATCH] tpm: MAINTAINERS: Add myself as tpm maintainer"
In reply to: Matias Bjorling: "Re: [PATCH 3/3] NVMe: Convert to blk-mq"
Next in thread: Matias Bjorling: "[PATCH 2/3] NVMe: Extract admin queue size"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]