Re: AW: Slow I/O on USB media after commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6

From: Andrea Vai
Date: Tue Dec 24 2019 - 03:04:20 EST


Il giorno mar, 24/12/2019 alle 09.32 +0800, Ming Lei ha scritto:
> On Mon, Dec 23, 2019 at 03:02:35PM +0100, Andrea Vai wrote:
> > Il giorno lun, 23/12/2019 alle 21.08 +0800, Ming Lei ha scritto:
> > > On Mon, Dec 23, 2019 at 12:22:45PM +0100, Andrea Vai wrote:
> > > > Il giorno mer, 18/12/2019 alle 17.48 +0800, Ming Lei ha
> scritto:
> > > > > On Wed, Dec 18, 2019 at 09:25:02AM +0100, Andrea Vai wrote:
> > > > > > Il giorno gio, 12/12/2019 alle 05.33 +0800, Ming Lei ha
> > > scritto:
> > > > > > > On Wed, Dec 11, 2019 at 11:07:45AM -0500, Theodore Y.
> Ts'o
> > > > > wrote:
> > > > > > > > On Wed, Dec 11, 2019 at 12:00:58PM +0800, Ming Lei
> wrote:
> > > > > > > > > I didn't reproduce the issue in my test environment,
> and
> > > > > follows
> > > > > > > > > Andrea's test commands[1]:
> > > > > > > > >
> > > > > > > > > mount UUID=$uuid /mnt/pendrive 2>&1 |tee -a
> $logfile
> > > > > > > > > SECONDS=0
> > > > > > > > > cp $testfile /mnt/pendrive 2>&1 |tee -a $logfile
> > > > > > > > > umount /mnt/pendrive 2>&1 |tee -a $logfile
> > > > > > > > >
> > > > > > > > > The 'cp' command supposes to open/close the file
> just
> > > once,
> > > > > > > however
> > > > > > > > > ext4_release_file() & write pages is observed to run
> for
> > > > > 4358
> > > > > > > times
> > > > > > > > > when executing the above 'cp' test.
> > > > > > > >
> > > > > > > > Why are we sure the ext4_release_file() / _fput() is
> > > coming
> > > > > from
> > > > > > > the
> > > > > > > > cp command, as opposed to something else that might be
> > > running
> > > > > on
> > > > > > > the
> > > > > > > > system under test? _fput() is called by the kernel
> when
> > > the
> > > > > last
> > > > > > >
> > > > > > > Please see the log:
> > > > > > >
> > > > > > >
> > > > >
> > >
> https://lore.kernel.org/linux-scsi/3af3666920e7d46f8f0c6d88612f143ffabc743c.camel@xxxxxxxx/2-log_ming.zip
> > > > > > >
> > > > > > > Which is collected by:
> > > > > > >
> > > > > > > #!/bin/sh
> > > > > > > MAJ=$1
> > > > > > > MIN=$2
> > > > > > > MAJ=$(( $MAJ << 20 ))
> > > > > > > DEV=$(( $MAJ | $MIN ))
> > > > > > >
> > > > > > > /usr/share/bcc/tools/trace -t -C \
> > > > > > > 't:block:block_rq_issue (args->dev == '$DEV') "%s %d
> > > %d",
> > > > > args-
> > > > > > > >rwbs, args->sector, args->nr_sector' \
> > > > > > > 't:block:block_rq_insert (args->dev == '$DEV') "%s
> %d
> > > %d",
> > > > > args-
> > > > > > > >rwbs, args->sector, args->nr_sector'
> > > > > > >
> > > > > > > $MAJ:$MIN points to the USB storage disk.
> > > > > > >
> > > > > > > From the above IO trace, there are two write paths, one
> is
> > > from
> > > > > cp,
> > > > > > > another is from writeback wq.
> > > > > > >
> > > > > > > The stackcount trace[1] is consistent with the IO trace
> log
> > > > > since it
> > > > > > > only shows two IO paths, that is why I concluded that
> the
> > > write
> > > > > done
> > > > > > > via
> > > > > > > ext4_release_file() is from 'cp'.
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > >
> > >
> https://lore.kernel.org/linux-scsi/320b315b9c87543d4fb919ecbdf841596c8fbcea.camel@xxxxxxxx/2-log_ming_20191129_150609.zip
> > > > > > >
> > > > > > > > reference to a struct file is
> released. (Specifically, if
> > > you
> > > > > > > have a
> > > > > > > > fd which is dup'ed, it's only when the last fd
> > > corresponding
> > > > > to
> > > > > > > the
> > > > > > > > struct file is closed, and the struct file is about to
> be
> > > > > > > released,
> > > > > > > > does the file system's f_ops->release function get
> > > called.)
> > > > > > > >
> > > > > > > > So the first question I'd ask is whether there is
> anything
> > > > > else
> > > > > > > going
> > > > > > > > on the system, and whether the writes are happening to
> the
> > > USB
> > > > > > > thumb
> > > > > > > > drive, or to some other storage device. And if there
> is
> > > > > something
> > > > > > > > else which is writing to the pendrive, maybe that's
> why no
> > > one
> > > > > > > else
> > > > > > > > has been able to reproduce the OP's complaint....
> > > > > > >
> > > > > > > OK, we can ask Andrea to confirm that via the following
> > > trace,
> > > > > which
> > > > > > > will add pid/comm info in the stack trace:
> > > > > > >
> > > > > > > /usr/share/bcc/tools/stackcount
> > > blk_mq_sched_request_inserted
> > > > > > >
> > > > > > > Andrew, could you collect the above log again when
> running
> > > > > new/bad
> > > > > > > kernel for confirming if the write done by
> > > ext4_release_file()
> > > > > is
> > > > > > > from
> > > > > > > the 'cp' process?
> > > > > >
> > > > > > You can find the stackcount log attached. It has been
> produced
> > > by:
> > > > > >
> > > > > > - /usr/share/bcc/tools/stackcount
> > > blk_mq_sched_request_inserted >
> > > > > trace.log
> > > > > > - wait some seconds
> > > > > > - run the test (1 copy trial), wait for the test to
> finish,
> > > wait
> > > > > some seconds
> > > > > > - stop the trace (ctrl+C)
> > > > >
> > > > > Thanks for collecting the log, looks your 'stackcount'
> doesn't
> > > > > include
> > > > > comm/pid info, seems there is difference between your bcc
> and
> > > > > my bcc in fedora 30.
> > > > >
> > > > > Could you collect above log again via the following command?
> > > > >
> > > > > /usr/share/bcc/tools/stackcount -P -K
> t:block:block_rq_insert
> > > > >
> > > > > which will show the comm/pid info.
> > > >
> > > > ok, attached (trace_20191219.txt), the test (1 trial) took
> 3684
> > > > seconds.
> > >
> > > From the above trace:
> > >
> > > b'blk_mq_sched_request_inserted'
> > > b'blk_mq_sched_request_inserted'
> > > b'dd_insert_requests'
> > > b'blk_mq_sched_insert_requests'
> > > b'blk_mq_flush_plug_list'
> > > b'blk_flush_plug_list'
> > > b'io_schedule_prepare'
> > > b'io_schedule'
> > > b'rq_qos_wait'
> > > b'wbt_wait'
> > > b'__rq_qos_throttle'
> > > b'blk_mq_make_request'
> > > b'generic_make_request'
> > > b'submit_bio'
> > > b'ext4_io_submit'
> > > b'ext4_writepages'
> > > b'do_writepages'
> > > b'__filemap_fdatawrite_range'
> > > b'ext4_release_file'
> > > b'__fput'
> > > b'task_work_run'
> > > b'exit_to_usermode_loop'
> > > b'do_syscall_64'
> > > b'entry_SYSCALL_64_after_hwframe'
> > > b'cp' [19863]
> > > 4400
> > >
> > > So this write is clearly from 'cp' process, and it should be one
> > > ext4 fs issue.
> > >
> > > Ted, can you take a look at this issue?
> > >
> > > >
> > > > > > I also tried the usual test with btrfs and xfs. Btrfs
> behavior
> > > > > looks
> > > > > > "good". xfs seems sometimes better, sometimes worse, I
> would
> > > say.
> > > > > I
> > > > > > don't know if it matters, anyway you can also find the
> results
> > > of
> > > > > the
> > > > > > two tests (100 trials each). Basically, btrfs is always
> > > between 68
> > > > > and
> > > > > > 89 seconds, with a cyclicity (?) with "period=2 trials".
> xfs
> > > looks
> > > > > > almost always very good (63-65s), but sometimes "bad"
> (>300s).
> > > > >
> > > > > If you are interested in digging into this one, the
> following
> > > trace
> > > > > should be helpful:
> > > > >
> > > > >
> > >
> https://lore.kernel.org/linux-scsi/f38db337cf26390f7c7488a0bc2076633737775b.camel@xxxxxxxx/T/#m5aa008626e07913172ad40e1eb8e5f2ffd560fc6
> > > > >
> > > >
> > > > Attached:
> > > > - trace_xfs_20191223.txt (7 trials, then aborted while doing
> the
> > > 8th),
> > > > times to complete:
> > > > 64s
> > > > 63s
> > > > 64s
> > > > 833s
> > > > 1105s
> > > > 63s
> > > > 64s
> > >
> > > oops, looks we have to collect io insert trace with the
> following
> > > bcc script
> > > on xfs for confirming if there is similar issue with ext4, could
> you
> > > run
> > > it again on xfs? And only post the trace done in case of slow
> 'cp'.
> > >
> > >
> > > #!/bin/sh
> > >
> > > MAJ=$1
> > > MIN=$2
> > > MAJ=$(( $MAJ << 20 ))
> > > DEV=$(( $MAJ | $MIN ))
> > >
> > > /usr/share/bcc/tools/trace -t -C \
> > > 't:block:block_rq_issue (args->dev == '$DEV') "%s %d %d",
> args-
> > > >rwbs, args->sector, args->nr_sector' \
> > > 't:block:block_rq_insert (args->dev == '$DEV') "%s %d %d",
> args-
> > > >rwbs, args->sector, args->nr_sector'
> > >
> > >
> > here it is (1 trial, 313 seconds to finish)
>
> The above log shows similar issue with ext4 since there is another
> writeback IO path from 'cp' process. And the following trace can
> show if
> it is same with ext4's issue:
>
> /usr/share/bcc/tools/stackcount -P -K t:block:block_rq_insert

sorry, also here please tell me which conditions should I use to run
the test (ext4 or xfs? slow run or not important?)

Thanks,
Andrea