Re: AW: Slow I/O on USB media after commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6

From: Ming Lei
Date: Thu Nov 28 2019 - 21:36:29 EST


On Fri, Nov 29, 2019 at 08:57:34AM +0800, Ming Lei wrote:
> On Thu, Nov 28, 2019 at 06:34:32PM +0100, Andrea Vai wrote:
> > Il giorno gio, 28/11/2019 alle 17.17 +0800, Ming Lei ha scritto:
> > > On Thu, Nov 28, 2019 at 08:46:57AM +0100, Andrea Vai wrote:
> > > > Il giorno mer, 27/11/2019 alle 08.14 +0000, Schmid, Carsten ha
> > > > scritto:
> > > > > >
> > > > > > > Then I started another set of 100 trials and let them run
> > > > > tonight, and
> > > > > > > the first 10 trials were around 1000s, then gradually
> > > decreased
> > > > > to
> > > > > > > ~300s, and finally settled around 200s with some trials
> > > below
> > > > > 70-80s.
> > > > > > > This to say, times are extremely variable and for the first
> > > time
> > > > > I
> > > > > > > noticed a sort of "performance increase" with time.
> > > > > > >
> > > > > >
> > > > > > The sheer volume of testing (probably some terabytes by now)
> > > would
> > > > > > exercise the wear leveling algorithm in the FTL.
> > > > > >
> > > > > But with "old kernel" the copy operation still is "fast", as far
> > > as
> > > > > i understood.
> > > > > If FTL (e.g. wear leveling) would slow down, we would see that
> > > also
> > > > > in
> > > > > the old kernel, right?
> > > > >
> > > > > Andrea, can you confirm that the same device used with the old
> > > fast
> > > > > kernel is still fast today?
> > > >
> > > > Yes, it is still fast. Just ran a 100 trials test and got an
> > > average
> > > > of 70 seconds with standard deviation = 6 seconds, aligned with
> > > the
> > > > past values of the same kernel.
> > >
> > > Then can you collect trace on the old kernel via the previous
> > > script?
> > >
> > > #!/bin/sh
> > >
> > > MAJ=$1
> > > MIN=$2
> > > MAJ=$(( $MAJ << 20 ))
> > > DEV=$(( $MAJ | $MIN ))
> > >
> > > /usr/share/bcc/tools/trace -t -C \
> > > 't:block:block_rq_issue (args->dev == '$DEV') "%s %d %d", args-
> > > >rwbs, args->sector, args->nr_sector' \
> > > 't:block:block_rq_insert (args->dev == '$DEV') "%s %d %d", args-
> > > >rwbs, args->sector, args->nr_sector'
> > >
> > > Both the two trace points and bcc should be available on the old
> > > kernel.
> > >
> >
> > Trace attached. Produced by: start the trace script
> > (with the pendrive already plugged), wait some seconds, run the test
> > (1 trial, 1 GB), wait for the test to finish, stop the trace.
> >
> > The copy took 73 seconds, roughly as already seen before with the fast
> > old kernel.
>
> This trace shows a good write IO order because the writeback IOs are
> queued to block layer serially from the 'cp' task and writeback wq.
>
> However, writeback IO order is changed in current linus tree because
> the IOs are queued to block layer concurrently from the 'cp' task
> and writeback wq. It might be related with killing queue_congestion
> by blk-mq.
>
> The performance effect could be not only on this specific USB drive,
> but also on all HDD., I guess.
>
> However, I still can't reproduce it in my VM even though I built it
> with similar setting of Andrea's test machine. Maybe the emulated disk
> is too fast than Andrea's.
>
> Andrea, can you collect the following log when running the test
> on current new(bad) kernel?
>
> /usr/share/bcc/tools/stackcount -K blk_mq_make_request

Instead, please run the following trace, given insert may be
called from other paths, such as flush plug:

/usr/share/bcc/tools/stackcount -K t:block:block_rq_insert

If you are using python3, the following failure may be triggered:

"cannot use a bytes pattern on a string-like object"

Then apply the following fix on /usr/lib/python3.7/site-packages/bcc/__init__.py

diff --git a/src/python/bcc/__init__.py b/src/python/bcc/__init__.py
index 6f114de8..bff5f282 100644
--- a/src/python/bcc/__init__.py
+++ b/src/python/bcc/__init__.py
@@ -769,7 +769,7 @@ class BPF(object):
evt_dir = os.path.join(cat_dir, event)
if os.path.isdir(evt_dir):
tp = ("%s:%s" % (category, event))
- if re.match(tp_re, tp):
+ if re.match(tp_re.decode(), tp):
results.append(tp)
return results

Thanks,
Ming