Re: Scaling noise

From: Martin J. Bligh
Date: Wed Sep 03 2003 - 16:42:35 EST


> On Wed, Sep 03, 2003 at 01:48:59PM -0700, Martin J. Bligh wrote:
>> Not if you have an SSI cluster, that's the point.
>
> The scenario described above wasn't SSI but independent instances with
> a shared distributed fs.

OK, sorry - crosstalk confusion between subthreads.

> SSI clusters have most of the same problems,
> really. Managing the systems just becomes "managing the nodes" because
> they're not called systems, and you have to go through some (possibly
> automated, though not likely) hassle to figure out the right way to
> spread things across nodes, which virtualizes pieces to hand to which
> nodes running which loads, etc.

That's where I disagree - it's much easier for the USER because an SSI
cluster works out all the load balancing shit for itself, instead of
pushing the problem out to userspace. It's much harder for the KERNEL
programmer, sure ... but we're smart ;-) And I'd rather solve it once,
properly, in the right place where all the right data is about all
the apps running on the system, and the data about the machine hardware.

> On Wed, Sep 03, 2003 at 01:48:59PM -0700, Martin J. Bligh wrote:
>> No, each node in an SSI cluster has its own pagecache, that's mostly
>> independant.
>
> But not totally. truncate() etc. need handling, i.e. cross-instance
> pagecache invalidations. And write() too. =)

Same problem as any clustered fs, but yes, truncate will suck harder
than it does now. Not sure I care though ;-) Invalidations on write,
etc will be more expensive when we're sharing files across nodes, but
independant operations will be cheaper due to the locality. It's a
tradeoff - whether it pays off or not depends on the workload.

>> True. depends on how the processes / threads in that app communicate
>> as to how big the impact would be. There's nothing saying that two
>> processes of the same app in an SSI cluster can't run on different
>> nodes ... we present a single system image to userspace, across nodes.
>> Some of the glue layer (eg for ps, to give a simple example), like
>> for_each_task, is where the hard work in doing this is.
>
> Well, let's try the word "process" then. e.g. 4GB nodes and a process
> that suddenly wants to inflate to 8GB due to some ephemeral load
> imbalance.

Well, if you mean "task" in the linux sense (ie not a multi-threaded
process), that reduces us from worrying about tasks to memory. On an
SSI cluster that's on a NUMA machine, we could loan memory across nodes
or something, but yes, that's definitely a problem area. It ain't no
panacea ;-)

M.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/