Re: [PATCH net-next v3 10/18] nvme/host: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage

From: Aurelien Aptel
Date: Thu Jun 29 2023 - 10:45:27 EST


Hi David,

David Howells <dhowells@xxxxxxxxxx> writes:
> When transmitting data, call down into TCP using a single sendmsg with
> MSG_SPLICE_PAGES to indicate that content should be spliced rather than
> performing several sendmsg and sendpage calls to transmit header, data
> pages and trailer.

This series makes my kernel crash.

>From the current net-next main branch:

commit 9ae440b8fdd6772b6c007fa3d3766530a09c9045 (HEAD)
Merge: b545a13ca9b2 b848b26c6672
Author: Jakub Kicinski <kuba@xxxxxxxxxx>
Date: Sat Jun 24 15:50:21 2023 -0700

Merge branch 'splice-net-switch-over-users-of-sendpage-and-remove-it'


Steps to reproduce:

* connect a remote nvme null block device (nvmet) with 1 IO queue to keep
things simple
* open /dev/nvme0n1 with O_RDWR|O_DIRECT|O_SYNC
* write() a 8k buffer or 4k buffer

Trace:

[ 311.766163] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 311.768136] #PF: supervisor read access in kernel mode
[ 311.769327] #PF: error_code(0x0000) - not-present page
[ 311.770393] PGD 148988067 P4D 0
[ 311.771074] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 311.771978] CPU: 0 PID: 180 Comm: kworker/0:1H Not tainted 6.4.0-rc7+ #27
[ 311.773380] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[ 311.774808] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
[ 311.775547] RIP: 0010:skb_splice_from_iter+0xf1/0x370
[ 311.776176] Code: 8b 45 88 4d 89 fa 4d 89 e7 45 89 ec 44 89 e3 41 83
c4 01 83 fb 07 0f 87 56 02 00 00 48 8b 5c dd 90 41 bd 00 10 00 00 49 29
c5 <48> 8b 53 08 4d 39 f5 4d 0f 47 ee f6 c2 01 0f 85 c7 01 00 00 66 90
[ 311.778472] RSP: 0018:ff633e24c0747b08 EFLAGS: 00010206
[ 311.779115] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
[ 311.780007] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff633e24c0747d30
[ 311.780861] RBP: ff633e24c0747bb0 R08: ff633e24c0747d40 R09: 000000006db29140
[ 311.781748] R10: ff3001bd00a22800 R11: 0000000008000000 R12: 0000000000000001
[ 311.782631] R13: 0000000000001000 R14: 0000000000001000 R15: 0000000000000000
[ 311.783506] FS: 0000000000000000(0000) GS:ff3001be77800000(0000) knlGS:0000000000000000
[ 311.784494] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 311.785197] CR2: 0000000000000008 CR3: 0000000107f5c001 CR4: 0000000000771ef0
[ 311.786076] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 311.786948] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 311.787822] PKRU: 55555554
[ 311.788165] Call Trace:
[ 311.788480] <TASK>
[ 311.788756] ? show_regs+0x6e/0x80
[ 311.789189] ? __die+0x29/0x70
[ 311.789577] ? page_fault_oops+0x154/0x4a0
[ 311.790097] ? ip_output+0x7c/0x110
[ 311.790541] ? __sys_socketpair+0x1b4/0x280
[ 311.791065] ? __pfx_ip_finish_output+0x10/0x10
[ 311.791640] ? do_user_addr_fault+0x360/0x770
[ 311.792184] ? exc_page_fault+0x7d/0x190
[ 311.792677] ? asm_exc_page_fault+0x2b/0x30
[ 311.793198] ? skb_splice_from_iter+0xf1/0x370
[ 311.793748] ? skb_splice_from_iter+0xb7/0x370
[ 311.794312] ? __sk_mem_schedule+0x34/0x50
[ 311.794824] tcp_sendmsg_locked+0x3a6/0xdd0
[ 311.795344] ? tcp_push+0x10c/0x120
[ 311.795789] tcp_sendmsg+0x31/0x50
[ 311.796213] inet_sendmsg+0x47/0x80
[ 311.796655] sock_sendmsg+0x99/0xb0
[ 311.797095] ? inet_sendmsg+0x47/0x80
[ 311.797557] nvme_tcp_try_send_data+0x149/0x490 [nvme_tcp]
[ 311.798242] ? kvm_clock_get_cycles+0xd/0x20
[ 311.799181] nvme_tcp_try_send+0x1b7/0x300 [nvme_tcp]
[ 311.800133] nvme_tcp_io_work+0x40/0xc0 [nvme_tcp]
[ 311.801044] process_one_work+0x21c/0x430
[ 311.801847] worker_thread+0x54/0x3e0
[ 311.802611] ? __pfx_worker_thread+0x10/0x10
[ 311.803433] kthread+0xf8/0x130
[ 311.804116] ? __pfx_kthread+0x10/0x10
[ 311.804865] ret_from_fork+0x29/0x50
[ 311.805596] </TASK>
[ 311.806165] Modules linked in: mlx5_ib ib_uverbs ib_core nvme_tcp
mlx5_core mlxfw psample pci_hyperv_intf rpcsec_gss_krb5 nfsv3
auth_rpcgss nfs_acl nfsv4 nfs lockd grace fscache netfs nvme_fabrics
nvme_core nvme_common intel_rapl_msr intel_rapl_common
intel_uncore_frequency_common nfit kvm_intel kvm rapl input_leds
serio_raw sunrpc binfmt_misc qemu_fw_cfg sch_fq_codel dm_multipath
scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr ramoops reed_solomon
efi_pstore virtio_rng ip_tables x_tables autofs4 btrfs blake2b_generic
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid qxl drm_ttm_helper ttm crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel drm_kms_helper sha512_ssse3
syscopyarea sysfillrect sysimgblt aesni_intel crypto_simd i2c_i801 ahci
cryptd psmous e drm virtio_net i2c_smbus libahci lpc_ich net_failover
xhci_pci virtio_blk failover xhci_pci_renesas [last unloaded: ib_core]
[ 311.818698] CR2: 0000000000000008
[ 311.819437] ---[ end trace 0000000000000000 ]---

Cheers,