Re:Re: [PATCH] block/mq: blk map queues by core id

From: luferry
Date: Mon Mar 25 2019 - 06:15:35 EST



After reading the code and compare pci device info.
I can reproduce this imbalance under KVM.
When enable virtio-blk with multi queues but with only 2 msix-vector.
vp_dev->per_vq_vectors will be false, so blk_mq_virtio_map_queues will fallback to virtio_mq_map_queues

Since virtual machine users cannot change the vector numbers, I think virtio_mq_map_queues should be more friendly with different cpu topology.

448 const struct cpumask *vp_get_vq_affinity(struct virtio_device *vdev, int index)
449 {
450 struct virtio_pci_device *vp_dev = to_vp_device(vdev);
451
452 if (!vp_dev->per_vq_vectors ||
453 vp_dev->vqs[index]->msix_vector == VIRTIO_MSI_NO_VECTOR)
454 return NULL;
455
456 return pci_irq_get_affinity(vp_dev->pci_dev,
457 vp_dev->vqs[index]->msix_vector);
458 }

32 int blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap,
33 struct virtio_device *vdev, int first_vec)
34 {
35 const struct cpumask *mask;
36 unsigned int queue, cpu;
37
38 if (!vdev->config->get_vq_affinity)
39 goto fallback;
40
41 for (queue = 0; queue < qmap->nr_queues; queue++) {
42 mask = vdev->config->get_vq_affinity(vdev, first_vec + queue); //vp_get_vq_affinity
43 if (!mask)
44 goto fallback;
45
46 for_each_cpu(cpu, mask)
47 qmap->mq_map[cpu] = qmap->queue_offset + queue;
48 }
49
50 return 0;
51 fallback:
52 return blk_mq_map_queues(qmap);
53 }








At 2019-03-23 19:14:34, "Dongli Zhang" <dongli.zhang@xxxxxxxxxx> wrote:
>
>
>On 03/23/2019 02:34 PM, luferry wrote:
>>
>> I just bought one vm , so i cannot access to hypervisor. I will try to build the environment on my desktop.
>> I'm sure about something.
>> The hypervisor is KVM and disk driver is virtio-blk.
>> [root@blk-mq ~]# dmesg |grep KVM
>> [ 0.000000] Hypervisor detected: KVM
>> [ 0.186330] Booting paravirtualized kernel on KVM
>> [ 0.279106] KVM setup async PF for cpu 0
>> [ 0.420819] KVM setup async PF for cpu 1
>> [ 0.421682] KVM setup async PF for cpu 2
>> [ 0.422113] KVM setup async PF for cpu 3
>> [ 0.422434] KVM setup async PF for cpu 4
>> [ 0.422434] KVM setup async PF for cpu 5
>> [ 0.423312] KVM setup async PF for cpu 6
>> [ 0.423394] KVM setup async PF for cpu 7
>> [ 0.424125] KVM setup async PF for cpu 8
>> [ 0.424414] KVM setup async PF for cpu 9
>> [ 0.424415] KVM setup async PF for cpu 10
>> [ 0.425329] KVM setup async PF for cpu 11
>> [ 0.425420] KVM setup async PF for cpu 12
>> [ 0.426156] KVM setup async PF for cpu 13
>> [ 0.426431] KVM setup async PF for cpu 14
>> [ 0.426431] KVM setup async PF for cpu 15
>> [root@blk-mq ~]# lspci |grep block
>> 00:05.0 SCSI storage controller: Red Hat, Inc. Virtio block device
>> 00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device
>>
>> [root@blk-mq ~]# lscpu
>> Architecture: x86_64
>> CPU op-mode(s): 32-bit, 64-bit
>> Byte Order: Little Endian
>> CPU(s): 16
>> On-line CPU(s) list: 0-15
>> Thread(s) per core: 2
>> Core(s) per socket: 8
>>
>> [root@blk-mq ~]# ls /sys/block/vdb/mq/
>> 0 1 2 3
>>
>> [root@blk-mq ~]# cat /proc/cpuinfo |egrep 'processor|core id'
>> processor : 0
>> core id : 0
>> processor : 1
>> core id : 0
>> processor : 2
>> core id : 1
>> processor : 3
>> core id : 1
>> processor : 4
>> core id : 2
>> processor : 5
>> core id : 2
>> processor : 6
>> core id : 3
>> processor : 7
>> core id : 3
>> processor : 8
>> core id : 4
>> processor : 9
>> core id : 4
>> processor : 10
>> core id : 5
>> processor : 11
>> core id : 5
>> processor : 12
>> core id : 6
>> processor : 13
>> core id : 6
>> processor : 14
>> core id : 7
>> processor : 15
>> core id : 7
>>
>> --before this patch--
>> [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
>> 0, 4, 5, 8, 9, 12, 13
>> 1
>> 2, 6, 7, 10, 11, 14, 15
>> 3
>>
>> --after this patch--
>> [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
>> 0, 4, 5, 12, 13
>> 1, 6, 7, 14, 15
>> 2, 8, 9
>> 3, 10, 11
>>
>>
>> I add dump_stack insert blk_mq_map_queues.
>>
>> [ 1.378680] Call Trace:
>> [ 1.378690] dump_stack+0x5a/0x73
>> [ 1.378695] blk_mq_map_queues+0x29/0xb0
>> [ 1.378700] blk_mq_alloc_tag_set+0x1bd/0x2d0
>> [ 1.378705] virtblk_probe+0x1ae/0x8e4 [virtio_blk]
>> [ 1.378709] virtio_dev_probe+0x18a/0x230 [virtio]
>> [ 1.378713] really_probe+0x215/0x3f0
>> [ 1.378716] driver_probe_device+0x115/0x130
>> [ 1.378718] device_driver_attach+0x50/0x60
>> [ 1.378720] __driver_attach+0xbd/0x140
>> [ 1.378722] ? device_driver_attach+0x60/0x60
>> [ 1.378724] bus_for_each_dev+0x67/0xc0
>> [ 1.378727] ? klist_add_tail+0x57/0x70
>
>I am not able to reproduce above call stack when virtio-blk is assigned 4 queues
>while my qemu cmdline is "-smp 16,sockets=1,cores=8,threads=2".
>
># cat /sys/block/vda/mq/0/cpu_list
>0, 1, 2, 3
># cat /sys/block/vda/mq/1/cpu_list
>4, 5, 6, 7
># cat /sys/block/vda/mq/2/cpu_list
>8, 9, 10, 11
># cat /sys/block/vda/mq/3/cpu_list
>12, 13, 14, 15
>
>
>I do agree in above case we would have issue if the mapping is established by
>blk_mq_map_queues().
>
>
>However, I am just curious how we finally reach at blk_mq_map_queues() from
>blk_mq_alloc_tag_set()?
>
>It should be something like:
>
>blk_mq_alloc_tag_set()
> -> blk_mq_update_queue_map()
> -> if (set->ops->map_queues && !is_kdump_kernel())
> return set->ops->map_queues(set);
> -> else
> return blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
>
>Wouldn't we always have set->ops->map_queues = virtblk_map_queues()?
>
>Or the execution reach at:
>
>virtblk_map_queues()
> -> blk_mq_virtio_map_queues()
> -> if (!vdev->config->get_vq_affinity)
> return blk_mq_map_queues(qmap);
> -> else
> establish the mapping via get_vq_affinity
>
>but vdev->config->get_vq_affinity == NULL?
>
>For virtio pci, get_vq_affinity is always set. Seems virtio mmio would not set
>get_vq_affinity.
>
>
>I used to play with firecracker (by amazon) and it is interesting firecracker
>uses mmio to setup virtio-blk.
>
>
>Sorry for disturbing the review of this patch. I just would like to clarify in
>which scenario we would hit this issue, e.g., when virtio-blk is based on mmio?
>
>Dongli Zhang
>
>>
>>
>> At 2019-03-22 19:58:08, "Dongli Zhang" <dongli.zhang@xxxxxxxxxx> wrote:
>>>
>>>
>>> On 03/22/2019 06:09 PM, luferry wrote:
>>>> under virtual machine environment, cpu topology may differ from normal
>>>> physical server.
>>>
>>> Would mind share the name of virtual machine monitor, the command line if
>>> available, and which device to reproduce.
>>>
>>> For instance, I am not able to reproduce with qemu nvme or virtio-blk as I
>>> assume they use pci or virtio specific mapper to establish the mapping.
>>>
>>> E.g., with qemu and nvme: -smp 8,sockets=1,cores=4,threads=2
>>>
>>> Indeed I use three queues instead of twp as one is reserved for admin.
>>>
>>> # ls /sys/block/nvme0n1/mq/*
>>> /sys/block/nvme0n1/mq/0:
>>> cpu0 cpu1 cpu2 cpu3 cpu_list nr_reserved_tags nr_tags
>>>
>>> /sys/block/nvme0n1/mq/1:
>>> cpu4 cpu5 cpu6 cpu7 cpu_list nr_reserved_tags nr_tags
>>>
>>>
>>> Thank you very much!
>>>
>>> Dongli Zhang
>>>
>>>> for example (machine with 4 cores, 2 threads per core):
>>>>
>>>> normal physical server:
>>>> core-id thread-0-id thread-1-id
>>>> 0 0 4
>>>> 1 1 5
>>>> 2 2 6
>>>> 3 3 7
>>>>
>>>> virtual machine:
>>>> core-id thread-0-id thread-1-id
>>>> 0 0 1
>>>> 1 2 3
>>>> 2 4 5
>>>> 3 6 7
>>>>
>>>> When attach disk with two queues, all the even numbered cpus will be
>>>> mapped to queue 0. Under virtual machine, all the cpus is followed by
>>>> its sibling cpu.Before this patch, all the odd numbered cpus will also
>>>> be mapped to queue 0, can cause serious imbalance.this will lead to
>>>> performance impact on system IO
>>>>
>>>> So suggest to allocate cpu map by core id, this can be more currency
>>>>
>>>> Signed-off-by: luferry <luferry@xxxxxxx>
>>>> ---
>>>> block/blk-mq-cpumap.c | 9 +++++----
>>>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>>>> index 03a534820271..4125e8e77679 100644
>>>> --- a/block/blk-mq-cpumap.c
>>>> +++ b/block/blk-mq-cpumap.c
>>>> @@ -35,7 +35,7 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>> {
>>>> unsigned int *map = qmap->mq_map;
>>>> unsigned int nr_queues = qmap->nr_queues;
>>>> - unsigned int cpu, first_sibling;
>>>> + unsigned int cpu, first_sibling, core = 0;
>>>>
>>>> for_each_possible_cpu(cpu) {
>>>> /*
>>>> @@ -48,9 +48,10 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>> map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>>> } else {
>>>> first_sibling = get_first_sibling(cpu);
>>>> - if (first_sibling == cpu)
>>>> - map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>>> - else
>>>> + if (first_sibling == cpu) {
>>>> + map[cpu] = cpu_to_queue_index(qmap, nr_queues, core);
>>>> + core++;
>>>> + } else
>>>> map[cpu] = map[first_sibling];
>>>> }
>>>> }
>>>>