Re: MMTests 0.01

From: Minchan Kim
Date: Wed Aug 10 2011 - 19:50:51 EST


Hi Mel,

At last, you release the great test scripts.
Awesome! I really welcome this!

We have needed standard test suite to discuss and test more easily.
And it can help to find regression through periodic test.

Of course, it would be good if LTP or autotest merge this tests.
But I think it's not bad that we maintains it with separate test for
heading goal of mm-specific standard test suite. :) For it, at least,
we need public git tree.

Anyway, Thanks for sharing your valuable knowhow, Mel.

On Thu, Aug 4, 2011 at 11:38 PM, Mel Gorman <mgorman@xxxxxxx> wrote:
> At LSF/MM at some point a request was made that a series of tests be
> identified that were of interest to MM developers and that could be
> used for testing the Linux memory management subsystem. At the time,
> I was occasionally posting tarballs of whatever scripts I happened to
> be using at the time but they were not generally usable and tended to
> be specific to a set of patches. I promised I would produce something
> usable by others but never got around to it. Over the last four months,
> I needed a better framework when testing against both distribution
> kernels and mainline so without further ado
>
> http://www.csn.ul.ie/~mel/projects/mmtests/
> http://www.csn.ul.ie/~mel/projects/mmtests/mmtests-0.01-mmtests-0.01.tar.gz
>
> I am not claiming that this is comprehensive in any way but it is
> almost always what I start with when testing patch sets. In preparation
> for identifying problems with backports, I also ran a series of tests
> against mainline kernels over the course of two months when machines
> were otherwise idle. I have not actually had a chance to go through
> all the results and identify each problem but I needed to have the
> raw data available for my own reference so might as well share.
>
> http://www.csn.ul.ie/~mel/projects/mmtests/results/SLES11sp1/
> http://www.csn.ul.ie/~mel/projects/mmtests/results/openSUSE11.4/
>
> The directories refer to the distribution used but not the
> kernel which is downloaded from kernel.org. Directory structure is
> distro/config/machine/comparison.html. For example a set of benchmarks
> used for evaluating the page and slab allocators on a test machine
> called "hydra" is located at
>
> http://www.csn.ul.ie/~mel/projects/mmtests/results/SLES11sp1/global-dhp__pagealloc-performance/hydra/comparison.html
>
> I know the report structure looks crude but I was not interested
> in making them pretty. Due to the fact that some of the scripts
> are extremely old, the quality and coding styles vary considerably.
> This may get cleaned up over time but in the meantime, try and keep
> the contents of your stomach down if you are reading the scripts.
>
> The documentation is not great and so some of the capabilities such
> as being able to reconfigure swap for a benchmark is not mentioned.
> For my own series, I'll relase the mmtests tarball I used if asked.
> If someone wants to use the tarball for their own testing but cannot
> configure it, complain on the linux-mm list and if I can, I'll offer
> suggestions.
>
> ==== MMTests README ====
>
> MMTests is a configurable test suite that runs a number of common
> workloads of interest to MM developers. Ideally this would have been
> to integrated with LTP, xfstests or Phoronix Test or implemented
> with autotest. ÂUnfortunately, large portions of these tests are
> cobbled together over a number of years with varying degrees of
> quality before decent test frameworks were common. ÂThe refactoring
> effort to integrate with another framework is significant.
>
> Organisation
> ============
>
> The top-level directory has a single driver script called
> run-mmtests.sh which reads a config file that describes how the
> benchmarks should be run, configures the system and runs the requested
> tests. config also has some per-test configuration items that can be
> set depending on the test. The driver script takes the name of the
> test as a parameter. Generally, this would be a symbolic name naming
> the kernel being tested.
>
> Each test is driven by a run-single-test.sh script which reads
> the relevant driver-TESTNAME.sh script. High level items such as
> profiling are configured from the top-level script while the driver
> scripts typically convert the config parameters into switches for a
> "shellpack". A shellpack is a pair of benchmark and install scripts
> that are all stored in shellpacks/ .
>
> Monitors can be optionally configured. A full list is in monitors/
> . Care should be taken with monitors as there is a possibility that
> they introduce overhead of their own. ÂHence, for some performance
> sensitive tests it is preferable to have no monitoring.
>
> Many of the tests download external benchmarks. An attempt will be
> made to download from a mirror . To get an idea where the mirror
> should be located, grep for MIRROR_LOCATION= in shellpacks/.
>
> A basic invocation of the suite is
>
> <pre>
> $ cp config-global-dhp__pagealloc-performance config
> $ ./run-mmtests.sh --no-monitor 3.0-nomonitor
> $ ./run-mmtests.sh --run-monitor 3.0-runmonitor
> </pre>
>
> Configuration
> =============
>
> The config file used is always called "config". A number of other
> sample configuration files are provided that have a given theme. Some
> important points of variability are;
>
> MMTESTS is a list of what tests will be run
>
> WEBROOT is the location where a number of tarballs are mirrored. For example,
> Â Â Â Âkernbench tries to download
> Â Â Â Â$WEBROOT/kernbench/linux-2.6.30.tar.gz . If this is not available,
> Â Â Â Âit is downloaded from the internet. This can add delays in testing
> Â Â Â Âand consumes bandwidth so is worth configuring.
>
> LINUX_GIT is the location of a git repo of the kernel. At the moment it's only
> Â Â Â Âused during report generation
>
> SKIP_*PROFILE
> Â Â Â ÂThese parameters determine what profiling runs are done. Even with
> Â Â Â Âprofiling enabled, a non-profile run can be used to ensure that
> Â Â Â Âthe profile and non-profile runs are comparable.
>
> SWAP_CONFIGURATION
> SWAP_PARTITIONS
> SWAP_SWAPFILE_SIZEMB
> Â Â Â ÂIt's possible to use a different swap configuration than what is
> Â Â Â Âprovided by default.
>
> TESTDISK_RAID_PARTITIONS
> TESTDISK_RAID_DEVICE
> TESTDISK_RAID_OFFSET
> TESTDISK_RAID_SIZE
> TESTDISK_RAID_TYPE
> Â Â Â ÂIf the target machine has partitions suitable for configuring RAID,
> Â Â Â Âthey can be specified here. This RAID partition is then used for
> Â Â Â Âall the tests
>
> TESTDISK_PARTITION
> Â Â Â ÂUse this partition for all tests
>
> TESTDISK_FILESYSTEM
> TESTDISK_MKFS_PARAM
> TESTDISK_MOUNT_ARGS
> Â Â Â ÂThe filesystem, mkfs parameters and mount arguments for the test
> Â Â Â Âpartitions
>
> Available tests
> ===============
>
> Note the ones that are marked untested. These have been ported from other
> test suites but no guarantee they actually work correctly here. If you want
> to run these tests and run into a problem, report a bug.
>
> kernbench
> Â Â Â ÂBuilds a kernel 5 times recording the time taken to completion.
> Â Â Â ÂAn average time is stored. This is sensitive to the overall
> Â Â Â Âperformance of the system as it hits a number of subsystems.
>
> multibuild
> Â Â Â ÂSimilar to kernbench except it runs a number of kernel compiles
> Â Â Â Âin parallel. Can be useful for stressing the system and seeing
> Â Â Â Âhow well it deals with simple fork-based parallelism.
>
> aim9
> Â Â Â ÂRuns a short version of aim9 by default. Each test runs for 60
> Â Â Â Âseconds. This is a micro-benchmark of a number of VM operations. It's
> Â Â Â Âsensitive to changes in the allocator paths for example.
>
> vmr-stream
> Â Â Â ÂRuns the STREAM benchmark a number of times for varying sizes. An
> Â Â Â Âaverage is recorded. This can be used to measure approximate memory
> Â Â Â Âthroughput or the average cost of a number of basic operations. It is
> Â Â Â Âsensitive to cache layout used for page faults.
>
> vmr-cacheeffects (untested)
> Â Â Â ÂPerforms linear and random walks on nodes of different sizes stored in
> Â Â Â Âa large amount of memory. Sensitive to cache footprint and layout.
>
> vmr-createdelete (untested)
> Â Â Â ÂA micro-benchmark that measures the time taken to create and delete
> Â Â Â Âfile or anonymous mappings of increasing sizes. Sensitive to changes
> Â Â Â Âin the page fault path performance.
>
> iozone
> Â Â Â ÂA basic filesystem benchmark.
>
> fsmark
> Â Â Â ÂThis tests write workloads varying the number of files and directory
> Â Â Â Âdepth.
>
> hackbench-*
> Â Â Â ÂHackbench is generally a scheduler benchmark but is also sensitive to
> Â Â Â Âoverhead in the allocators and to a lesser extent the fault paths.
> Â Â Â ÂCan be run for either sockets or pipes.
>
> largecopy
> Â Â Â ÂThis is a simple single-threaded benchmark that downloads a large
> Â Â Â Âtar file, expands it a number of times, creates a new tar and
> Â Â Â Âexpands it again. Each operation is timed and is aimed at shaking
> Â Â Â Âout stall-related bugs when copying large amounts of data
>
> largedd
> Â Â Â ÂSimilar to largecopy except it uses dd instead of cp.
>
> libreofficebuild
> Â Â Â ÂThis downloads and builds libreoffice. It is a more aggressive
> Â Â Â Âcompile-orientated test. This is a very download-intensive
> Â Â Â Âbenchmark and was only created as a reproduction case for
> Â Â Â Âa bug.
>
> nas-*
> Â Â Â ÂThe NAS Parallel Benchmarks for the serial and openmp versions of
> Â Â Â Âthe test.
>
> netperf-*
> Â Â Â ÂRuns the netperf benchmark for *_STREAM on the local machine.
> Â Â Â ÂSensitive to cache usage and allocator costs. To test for cache line
> Â Â Â Âbouncing, the test can be configured to bind to certain processors.
>
> postmark
> Â Â Â ÂRun the postmark benchmark. Optionally a program can be run in
> Â Â Â Âthe background that consumes anonymous memory. The background
> Â Â Â Âprogram is vary rarely needed except when trying to identify
> Â Â Â Âdesktop stalls during heavy IO.
>
> speccpu (untested)
> Â Â Â ÂSPECcpu, what else can be said. A restriction is that you must have
> Â Â Â Âa mirrored copy of the tarball as it is not publicly available.
>
> specjvm (untested)
> Â Â Â ÂSPECjvm. Same story as speccpu
>
> specomp (untested)
> Â Â Â ÂSPEComp. Same story as speccpu
>
> sysbench
> Â Â Â ÂRuns the complex workload for sysbench backed by postgres. Running
> Â Â Â Âthis test requires a significant build environment on the test
> Â Â Â Âmachine. It can run either read-only or read/write tests.
>
> simple-writeback
> Â Â Â ÂThis is a simple writeback test based on dd. It's meant to be
> Â Â Â Âeasy to understand and quick to run. Useful for measuring page
> Â Â Â Âwriteback changes.
>
> ltp (untested)
> Â Â Â ÂThe LTP benchmark. What it is testing depends on exactly which of the
> Â Â Â Âsuite is configured to run.
>
> ltp-pounder (untested)
> Â Â Â Âltp pounder is a non-default test that exists in LTP. It's used by
> Â Â Â ÂIBM for hardware certification to hammer a machine for a configured
> Â Â Â Ânumber of hours. Typically, they expect it to run for 72 hours
> Â Â Â Âwithout major errors. ÂUseful for testing general VM stability in
> Â Â Â Âhigh-pressure low-memory situations.
>
> stress-highalloc
> Â Â Â ÂThis test requires that the system not have too much memory and
> Â Â Â Âthat systemtap is available. Typically, it's tested with 3GB of
> Â Â Â ÂRAM. It builds a number of kernels in parallel such that total
> Â Â Â Âmemory usage is 1.5 times physical memory. When this is running
> Â Â Â Âfor 5 minutes, it tries to allocate a large percentage of memory
> Â Â Â Â(e.g. 95%) as huge pages recording the latency of each operation as it
> Â Â Â Âgoes. It does this twice. It then cancels the kernel compiles, cleans
> Â Â Â Âthe system and tries to allocate huge pages at rest again. It's a
> Â Â Â Âbasic test for fragmentation avoidance and the performance of huge
> Â Â Â Âpage allocation.
>
> xfstests (untested)
> Â Â Â ÂThis is still at prototype level and aimed at running testcase 180
> Â Â Â Âinitially to reproduce some figures provided by the filesystems people.
>
> Reporting
> =========
>
> For reporting, there is a basic compare-kernels.sh script. It must be updated
> with a list of kernels you want to compare and in what order. It generates a
> table for each test, operation and kernel showing the relative performance
> of each. The test reporting scripts are in subreports/. compare-kernel.sh
> should be run from the path storing the test logs. By default this is
> work/log. If you are automating tests from an external source, work/log is
> what you should be capturing after a set of tests complete.
>
> If monitors are configured such as ftrace, there are additional
> processing scripts. They can be activated by setting FTRACE_ANALYSERS in
> compare-kernels.sh. A basic post-process script is mmtests-duration which
> simply reports how long an individual test took and what its CPU usage was.
>
> There are a limited number of graphing scripts included in report/
>
> TODO
> ====
>
> o Add option to test on filesystem loopback device stored on tmpfs
> o Add volanomark
> o Create config-* set suitable for testing scheduler to isolate situations
> Âwhere the scheduler was the main cause of a regression
>
> --
> Mel Gorman
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxxx ÂFor more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
>



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/