Re: [GIT PULL] bcachefs updates for 6.8

From: Mark Brown
Date: Fri Jan 12 2024 - 13:23:14 EST


On Fri, Jan 12, 2024 at 06:11:04AM -0500, Neal Gompa wrote:
> On Thu, Jan 11, 2024 at 8:11 PM Kent Overstreet
> > On Thu, Jan 11, 2024 at 09:47:26PM +0000, Mark Brown wrote:

> > > It's certainly an issue and it's much better if people do manage to fit
> > > their tests into some existing thing but I'm not convinced that's the
> > > big reason why you have a bunch of different systems running separately
> > > and doing different things. For example the enterprise vendors will
> > > naturally tend to have a bunch of server systems in their labs and focus
> > > on their testing needs while I know the Intel audio CI setup has a bunch

> > No, you're overthinking.

> > The vast majority of kernel testing requires no special hardware, just a
> > virtual machine.

This depends a lot on the area of the kernel you're looking at - some
things are very amenable to testing in a VM but there's plenty of code
where you really do want to ensure that at some point you're running
with some actual hardware, ideally as wide a range of it with diverse
implementation decisions as you can manage. OTOH some things can only
be tested virtually because the hardware doesn't exist yet!

> > There is _no fucking reason_ we shouldn't be able to run tests on our
> > own local machines - _local_ machines, not waiting for the Intel CI
> > setup and asking for a git branch to be tested, not waiting for who
> > knows how long for the CI farm to get to it - just run the damn tests
> > immediately and get immediate feedback.

> > You guys are overthinking and overengineering and ignoring the basics,
> > the way enterprise people always do.

> As one of those former enterprise people that actually did do this
> stuff, I can say that even when I was "in the enterprise", I tried to
> avoid overthinking and overengineering stuff like this. :)

> Nobody can maintain anything that's so complicated nobody can run the
> tests on their machine. That is the root of all sadness.

Yeah, similar with a lot of the more hardware focused or embedded stuff
- running something on the machine that's in front of you is seldom the
bit that causes substantial issues. Most of the exceptions I've
personally dealt with involved testing hardware (from simple stuff like
wiring the audio inputs and outputs together to verify that they're
working to attaching fancy test equipment to simulate things or validate
that desired physical parameters are being achieved).

> > > of the existing systems fit well. Anecdotally it seems much more common
> > > to see people looking for things to reuse in order to save time than it
> > > is to see people going off and reinventing the world.

> > It's a basic lack of leadership. Yes, the younger engineers are always
> > going to be doing the new and shiny, and always going to want to build
> > something new instead of finishing off the tests or integrating with
> > something existing. Which is why we're supposed to have managers saying
> > "ok, what do I need to prioritize for my team be able to develop
> > effectively".

That sounds more like a "(reproducible) tests don't exist" complaint
which is a different thing again to people going off and NIHing fancy
frameworks.

> > > That does assume that you're building and running everything directly on
> > > the system under test and are happy to have the test in a VM which isn't
> > > an assumption that holds universally, and also that whoever's doing the
> > > testing doesn't want to do something like use their own distro or
> > > something - like I say none of it looks too unreasonable for
> > > filesystems.

> > No, I'm doing it that way because technically that's the simplest way to
> > do it.

> > All you guys building crazy contraptions for running tests on Google
> > Cloud or Amazon or whatever - you're building technical workarounds for
> > broken procurement.

I think you're addressing some specific stuff that I'm not super
familiar with here? My own stuff (and most of the stuff I end up
looking at) involves driving actual hardware.

> > Just requisition the damn machines.

There's some assumptions there which are true for a lot of people
working on the kernel but not all of them...

> Running in the cloud does not mean it has to be complicated. It can be
> a simple Buildbot or whatever that knows how to spawn spot instances
> for tests and destroy them when they're done *if the test passed*. If
> a test failed on an instance, it could hold onto them for a day or two
> for someone to debug if needed.

> (I mention Buildbot because in a previous life, I used that to run
> tests for the dattobd out-of-tree kernel module before. That was the
> strategy I used for it.)

Yeah, or if your thing runs in a Docker container rather than a VM then
throwing it at a Kubernetes cluster using a batch job isn't a big jump.

> > I'd also really like to get automated performance testing going too,
> > which would have similar requirements in that jobs would need to be
> > scheduled on specific dedicated machines. I think what you're doing
> > could still build off of some common infrastructure.

It does actually - like quite a few test labs mine is based around LAVA,
labgrid is the other popular option (people were actually thinking about
integrating the two recently since labgrid is a bit lower level than
LAVA and they could conceptually play nicely with each other). Since
the control API is internet accessible this means that it's really
simple for me to to donate spare time on the boards to KernelCI as it
understands how to drive LAVA, testing that I in turn use myself. Both
my stuff and KernelCI use a repository of glue which knows how to drive
various testsuites inside a LAVA job, that's also used by other systems
using LAVA like LKFT.

The custom stuff I have is all fairly thin (and quite janky), mostly
just either things specific to my physical lab or managing which tests I
want to run and what results I expect. What I've got is *much* more
limited than I'd like, and frankly if I wasn't able to pick up huge
amounts of preexisting work most of this stuff would not be happening.

> > > I'd also note that the 9 hour turnaround time for that test set you're
> > > pointing at isn't exactly what I'd associate with immediate feedback.

> > My CI shards at the subtest level, and like I mentioned I run 10 VMs per
> > physical machine, so with just 2 of the 80 core Ampere boxes I get full
> > test runs done in ~20 minutes.

> This design, ironically, is way more cloud-friendly than a lot of
> testing system designs I've seen in the past. :)

Sounds like a small private cloud to me! :P

Attachment: signature.asc
Description: PGP signature