Re: [GIT PULL] bcachefs updates for 6.8

From: Kent Overstreet
Date: Thu Jan 11 2024 - 20:11:07 EST


On Thu, Jan 11, 2024 at 09:47:26PM +0000, Mark Brown wrote:
> On Thu, Jan 11, 2024 at 12:38:57PM -0500, Kent Overstreet wrote:
> > On Thu, Jan 11, 2024 at 03:35:40PM +0000, Mark Brown wrote:
>
> > > IME the actually running the tests bit isn't usually *so* much the
> > > issue, someone making a new test runner and/or output format does mean a
> > > bit of work integrating it into infrastructure but that's more usually
> > > annoying than a blocker.
>
> > No, the proliferation of test runners, test output formats, CI systems,
> > etc. really is an issue; it means we can't have one common driver that
> > anyone can run from the command line, and instead there's a bunch of
> > disparate systems with patchwork integration and all the feedback is nag
> > emails - after you've finished whan you were working on instead of
> > moving on to the next thing - with no way to get immediate feedback.
>
> It's certainly an issue and it's much better if people do manage to fit
> their tests into some existing thing but I'm not convinced that's the
> big reason why you have a bunch of different systems running separately
> and doing different things. For example the enterprise vendors will
> naturally tend to have a bunch of server systems in their labs and focus
> on their testing needs while I know the Intel audio CI setup has a bunch
> of laptops, laptop like dev boards and things in there with loopback
> audio cables and I think test equipment plugged in and focuses rather
> more on audio. My own lab is built around on systems I can be in the
> same room as without getting too annoyed and does things I find useful,
> plus using spare bandwidth for KernelCI because they can take donated
> lab time.

No, you're overthinking.

The vast majority of kernel testing requires no special hardware, just a
virtual machine.

There is _no fucking reason_ we shouldn't be able to run tests on our
own local machines - _local_ machines, not waiting for the Intel CI
setup and asking for a git branch to be tested, not waiting for who
knows how long for the CI farm to get to it - just run the damn tests
immediately and get immediate feedback.

You guys are overthinking and overengineering and ignoring the basics,
the way enterprise people always do.

> > And it's because building something shiny and new is the fun part, no
> > one wants to do the grungy integration work.
>
> I think you may be overestimating people's enthusiasm for writing test
> stuff there! There is NIH stuff going on for sure but lot of the time
> when you look at something where people have gone off and done their own
> thing it's either much older than you initially thought and predates
> anything they might've integrated with or there's some reason why none
> of the existing systems fit well. Anecdotally it seems much more common
> to see people looking for things to reuse in order to save time than it
> is to see people going off and reinventing the world.

It's a basic lack of leadership. Yes, the younger engineers are always
going to be doing the new and shiny, and always going to want to build
something new instead of finishing off the tests or integrating with
something existing. Which is why we're supposed to have managers saying
"ok, what do I need to prioritize for my team be able to develop
effectively".

>
> > > > example tests, example output:
> > > > https://evilpiepirate.org/git/ktest.git/tree/tests/bcachefs/single_device.ktest
> > > > https://evilpiepirate.org/~testdashboard/ci?branch=bcachefs-testing
>
> > > For example looking at the sample test there it looks like it needs
> > > among other things mkfs.btrfs, bcachefs, stress-ng, xfs_io, fio, mdadm,
> > > rsync
>
> > Getting all that set up by the end user is one command:
> > ktest/root_image create
> > and running a test is one morecommand:
> > build-test-kernel run ~/ktest/tests/bcachefs/single_device.ktest
>
> That does assume that you're building and running everything directly on
> the system under test and are happy to have the test in a VM which isn't
> an assumption that holds universally, and also that whoever's doing the
> testing doesn't want to do something like use their own distro or
> something - like I say none of it looks too unreasonable for
> filesystems.

No, I'm doing it that way because technically that's the simplest way to
do it.

All you guys building crazy contraptions for running tests on Google
Cloud or Amazon or whatever - you're building technical workarounds for
broken procurement.

Just requisition the damn machines.

> Some will be, some will have more demanding requirements especially when
> you want to test on actual hardware rather than in a VM. For example
> with my own test setup which is more focused on hardware the operating
> costs aren't such a big deal but I've got boards that are for various
> reasons irreplaceable, often single instances of boards (which makes
> scheduling a thing) and for some of the tests I'd like to get around to
> setting up I need special physical setup. Some of the hardware I'd like
> to cover is only available in machines which are in various respects
> annoying to automate, I've got a couple of unused systems waiting for me
> to have sufficient bandwidth to work out how to automate them. Either
> way I don't think the costs are trival enough to be completely handwaved
> away.

That does complicate things.

I'd also really like to get automated performance testing going too,
which would have similar requirements in that jobs would need to be
scheduled on specific dedicated machines. I think what you're doing
could still build off of some common infrastructure.

> I'd also note that the 9 hour turnaround time for that test set you're
> pointing at isn't exactly what I'd associate with immediate feedback.

My CI shards at the subtest level, and like I mentioned I run 10 VMs per
physical machine, so with just 2 of the 80 core Ampere boxes I get full
test runs done in ~20 minutes.