Re: [PATCHv2 3/4] mm/zpool: implement common zpool api to zbud/zsmalloc

From: Seth Jennings
Date: Fri May 09 2014 - 00:14:02 EST


On Wed, May 07, 2014 at 05:51:35PM -0400, Dan Streetman wrote:
> Add zpool api.
>
> zpool provides an interface for memory storage, typically of compressed
> memory. Users can select what backend to use; currently the only
> implementations are zbud, a low density implementation with up to
> two compressed pages per storage page, and zsmalloc, a higher density
> implementation with multiple compressed pages per storage page.

This is the wrong design methinks. There are a bunch of #ifdefs for each
driver in the zpool.c code. Available drivers (zbud, zsmalloc) should
register _up_ to the zpool layer. That way the zpool layer doesn't have
to add a bunch of new code for each driver.

This zpool layer should be _really_ thin, stateless from the user point
of view, basically just wrapping the driver ops call.

New functions for driver registration:
zpool_register_driver()
zpool_unregister_driver()

Something like this (note the "void *" type of the pool):

struct zpool_driver_ops {
void (*destroy)(void *pool);
int (*malloc)(void *pool, size_t size, unsigned long *handle);
....
}

Each driver can cast the void *pool to the driver pool type on the
driver side.

struct zpool_driver {
char *driver_name;
struct zpool_driver *ops;
}

Then drivers create a struct zpool_driver suitable for them and register
with zpool_register_driver().

struct zpool {
void *driver_pool;
struct zpool_driver *driver;
}

zpool_create() is:

struct zpool *zpool_create(char *driver_name, gfp_t flags, void *ops)
{
[search for backend driver with name driver_name]
[alloc new zpool]
zpool->driver = driver;
zpool->driver_pool = driver->ops->create(flags, ops);
return zpool;
}

A user function like zpool_free() is just:

void zpool_free(struct zpool *pool, unsigned long handle)
{
pool->driver->free(pool->driver_pool, handle);
}

Hopefully this makes sense. Obviously, I didn't rewrite this whole
thing to see how it works end to end so there may be some pitfalls I'm
not considering.

Seth

>
> Signed-off-by: Dan Streetman <ddstreet@xxxxxxxx>
> Cc: Seth Jennings <sjennings@xxxxxxxxxxxxxx>
> Cc: Minchan Kim <minchan@xxxxxxxxxx>
> Cc: Nitin Gupta <ngupta@xxxxxxxxxx>
> Cc: Weijie Yang <weijie.yang@xxxxxxxxxxx>
> ---
>
> Changes since v1 https://lkml.org/lkml/2014/4/19/101
> -add some pr_info() during creation and pr_err() on errors
> -remove zpool code to call zs_shrink(), since zsmalloc shrinking
> was removed from this patchset
> -remove fallback; only specified pool type will be tried
> -pr_fmt() is defined in zpool to prefix zpool: in any pr_XXX() calls
>
> include/linux/zpool.h | 160 +++++++++++++++++++++++
> mm/Kconfig | 43 ++++---
> mm/Makefile | 1 +
> mm/zpool.c | 349 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 535 insertions(+), 18 deletions(-)
> create mode 100644 include/linux/zpool.h
> create mode 100644 mm/zpool.c
>
> diff --git a/include/linux/zpool.h b/include/linux/zpool.h
> new file mode 100644
> index 0000000..08f5468
> --- /dev/null
> +++ b/include/linux/zpool.h
> @@ -0,0 +1,160 @@
> +/*
> + * zpool memory storage api
> + *
> + * Copyright (C) 2014 Dan Streetman
> + *
> + * This is a common frontend for the zbud and zsmalloc memory
> + * storage pool implementations. Typically, this is used to
> + * store compressed memory.
> + */
> +
> +#ifndef _ZPOOL_H_
> +#define _ZPOOL_H_
> +
> +struct zpool;
> +
> +struct zpool_ops {
> + int (*evict)(struct zpool *pool, unsigned long handle);
> +};
> +
> +#define ZPOOL_TYPE_ZSMALLOC "zsmalloc"
> +#define ZPOOL_TYPE_ZBUD "zbud"
> +
> +/*
> + * Control how a handle is mapped. It will be ignored if the
> + * implementation does not support it. Its use is optional.
> + * Note that this does not refer to memory protection, it
> + * refers to how the memory will be copied in/out if copying
> + * is necessary during mapping; read-write is the safest as
> + * it copies the existing memory in on map, and copies the
> + * changed memory back out on unmap. Write-only does not copy
> + * in the memory and should only be used for initialization.
> + * If in doubt, use ZPOOL_MM_DEFAULT which is read-write.
> + */
> +enum zpool_mapmode {
> + ZPOOL_MM_RW, /* normal read-write mapping */
> + ZPOOL_MM_RO, /* read-only (no copy-out at unmap time) */
> + ZPOOL_MM_WO, /* write-only (no copy-in at map time) */
> +
> + ZPOOL_MM_DEFAULT = ZPOOL_MM_RW
> +};
> +
> +/**
> + * zpool_create_pool() - Create a new zpool
> + * @type The type of the zpool to create (e.g. zbud, zsmalloc)
> + * @flags What GFP flags should be used when the zpool allocates memory.
> + * @ops The optional ops callback.
> + *
> + * This creates a new zpool of the specified type. The zpool will use the
> + * given flags when allocating any memory. If the ops param is NULL, then
> + * the created zpool will not be shrinkable.
> + *
> + * Returns: New zpool on success, NULL on failure.
> + */
> +struct zpool *zpool_create_pool(char *type, gfp_t flags,
> + struct zpool_ops *ops);
> +
> +/**
> + * zpool_get_type() - Get the type of the zpool
> + * @pool The zpool to check
> + *
> + * This returns the type of the pool, which will match one of the
> + * ZPOOL_TYPE_* defined values.
> + *
> + * Returns: The type of zpool.
> + */
> +char *zpool_get_type(struct zpool *pool);
> +
> +/**
> + * zpool_destroy_pool() - Destroy a zpool
> + * @pool The zpool to destroy.
> + *
> + * This destroys an existing zpool. The zpool should not be in use.
> + */
> +void zpool_destroy_pool(struct zpool *pool);
> +
> +/**
> + * zpool_malloc() - Allocate memory
> + * @pool The zpool to allocate from.
> + * @size The amount of memory to allocate.
> + * @handle Pointer to the handle to set
> + *
> + * This allocates the requested amount of memory from the pool.
> + * The provided @handle will be set to the allocated object handle.
> + *
> + * Returns: 0 on success, negative value on error.
> + */
> +int zpool_malloc(struct zpool *pool, size_t size, unsigned long *handle);
> +
> +/**
> + * zpool_free() - Free previously allocated memory
> + * @pool The zpool that allocated the memory.
> + * @handle The handle to the memory to free.
> + *
> + * This frees previously allocated memory. This does not guarantee
> + * that the pool will actually free memory, only that the memory
> + * in the pool will become available for use by the pool.
> + */
> +void zpool_free(struct zpool *pool, unsigned long handle);
> +
> +/**
> + * zpool_shrink() - Shrink the pool size
> + * @pool The zpool to shrink.
> + * @size The minimum amount to shrink the pool.
> + *
> + * This attempts to shrink the actual memory size of the pool
> + * by evicting currently used handle(s). If the pool was
> + * created with no zpool_ops, or the evict call fails for any
> + * of the handles, this will fail.
> + *
> + * Returns: 0 on success, negative value on error/failure.
> + */
> +int zpool_shrink(struct zpool *pool, size_t size);
> +
> +/**
> + * zpool_map_handle() - Map a previously allocated handle into memory
> + * @pool The zpool that the handle was allocated from
> + * @handle The handle to map
> + * @mm How the memory should be mapped
> + *
> + * This maps a previously allocated handle into memory. The @mm
> + * param indicates to the implemenation how the memory will be
> + * used, i.e. read-only, write-only, read-write. If the
> + * implementation does not support it, the memory will be treated
> + * as read-write.
> + *
> + * This may hold locks, disable interrupts, and/or preemption,
> + * and the zpool_unmap_handle() must be called to undo those
> + * actions. The code that uses the mapped handle should complete
> + * its operatons on the mapped handle memory quickly and unmap
> + * as soon as possible. Multiple handles should not be mapped
> + * concurrently on a cpu.
> + *
> + * Returns: A pointer to the handle's mapped memory area.
> + */
> +void *zpool_map_handle(struct zpool *pool, unsigned long handle,
> + enum zpool_mapmode mm);
> +
> +/**
> + * zpool_unmap_handle() - Unmap a previously mapped handle
> + * @pool The zpool that the handle was allocated from
> + * @handle The handle to unmap
> + *
> + * This unmaps a previously mapped handle. Any locks or other
> + * actions that the implemenation took in zpool_map_handle()
> + * will be undone here. The memory area returned from
> + * zpool_map_handle() should no longer be used after this.
> + */
> +void zpool_unmap_handle(struct zpool *pool, unsigned long handle);
> +
> +/**
> + * zpool_get_total_size() - The total size of the pool
> + * @pool The zpool to check
> + *
> + * This returns the total size in bytes of the pool.
> + *
> + * Returns: Total size of the zpool in bytes.
> + */
> +u64 zpool_get_total_size(struct zpool *pool);
> +
> +#endif
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 30cb6cb..bdb4cb2 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -515,21 +515,23 @@ config CMA_DEBUG
> processing calls such as dma_alloc_from_contiguous().
> This option does not affect warning and error messages.
>
> -config ZBUD
> - tristate
> - default n
> +config MEM_SOFT_DIRTY
> + bool "Track memory changes"
> + depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY && PROC_FS
> + select PROC_PAGE_MONITOR
> help
> - A special purpose allocator for storing compressed pages.
> - It is designed to store up to two compressed pages per physical
> - page. While this design limits storage density, it has simple and
> - deterministic reclaim properties that make it preferable to a higher
> - density approach when reclaim will be used.
> + This option enables memory changes tracking by introducing a
> + soft-dirty bit on pte-s. This bit it set when someone writes
> + into a page just as regular dirty bit, but unlike the latter
> + it can be cleared by hands.
> +
> + See Documentation/vm/soft-dirty.txt for more details.
>
> config ZSWAP
> bool "Compressed cache for swap pages (EXPERIMENTAL)"
> depends on FRONTSWAP && CRYPTO=y
> select CRYPTO_LZO
> - select ZBUD
> + select ZPOOL
> default n
> help
> A lightweight compressed cache for swap pages. It takes
> @@ -545,17 +547,22 @@ config ZSWAP
> they have not be fully explored on the large set of potential
> configurations and workloads that exist.
>
> -config MEM_SOFT_DIRTY
> - bool "Track memory changes"
> - depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY && PROC_FS
> - select PROC_PAGE_MONITOR
> +config ZPOOL
> + tristate "Common API for compressed memory storage"
> + default n
> help
> - This option enables memory changes tracking by introducing a
> - soft-dirty bit on pte-s. This bit it set when someone writes
> - into a page just as regular dirty bit, but unlike the latter
> - it can be cleared by hands.
> + Compressed memory storage API. This allows using either zbud or
> + zsmalloc.
>
> - See Documentation/vm/soft-dirty.txt for more details.
> +config ZBUD
> + tristate "Low density storage for compressed pages"
> + default n
> + help
> + A special purpose allocator for storing compressed pages.
> + It is designed to store up to two compressed pages per physical
> + page. While this design limits storage density, it has simple and
> + deterministic reclaim properties that make it preferable to a higher
> + density approach when reclaim will be used.
>
> config ZSMALLOC
> bool "Memory allocator for compressed pages"
> diff --git a/mm/Makefile b/mm/Makefile
> index 9b75a4d..f64a5d4 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -61,6 +61,7 @@ obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
> obj-$(CONFIG_CLEANCACHE) += cleancache.o
> obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
> obj-$(CONFIG_PAGE_OWNER) += pageowner.o
> +obj-$(CONFIG_ZPOOL) += zpool.o
> obj-$(CONFIG_ZBUD) += zbud.o
> obj-$(CONFIG_ZSMALLOC) += zsmalloc.o
> obj-$(CONFIG_GENERIC_EARLY_IOREMAP) += early_ioremap.o
> diff --git a/mm/zpool.c b/mm/zpool.c
> new file mode 100644
> index 0000000..2bda300
> --- /dev/null
> +++ b/mm/zpool.c
> @@ -0,0 +1,349 @@
> +/*
> + * zpool memory storage api
> + *
> + * Copyright (C) 2014 Dan Streetman
> + *
> + * This is a common frontend for the zbud and zsmalloc memory
> + * storage pool implementations. Typically, this is used to
> + * store compressed memory.
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/list.h>
> +#include <linux/types.h>
> +#include <linux/mm.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/zpool.h>
> +#include <linux/zbud.h>
> +#include <linux/zsmalloc.h>
> +
> +struct zpool_imp {
> + void (*destroy)(struct zpool *pool);
> +
> + int (*malloc)(struct zpool *pool, size_t size, unsigned long *handle);
> + void (*free)(struct zpool *pool, unsigned long handle);
> +
> + int (*shrink)(struct zpool *pool, size_t size);
> +
> + void *(*map)(struct zpool *pool, unsigned long handle,
> + enum zpool_mapmode mm);
> + void (*unmap)(struct zpool *pool, unsigned long handle);
> +
> + u64 (*total_size)(struct zpool *pool);
> +};
> +
> +struct zpool {
> + char *type;
> +
> + union {
> +#ifdef CONFIG_ZSMALLOC
> + struct zs_pool *zsmalloc_pool;
> +#endif
> +#ifdef CONFIG_ZBUD
> + struct zbud_pool *zbud_pool;
> +#endif
> + };
> +
> + struct zpool_imp *imp;
> + struct zpool_ops *ops;
> +
> + struct list_head list;
> +};
> +
> +static LIST_HEAD(zpools);
> +static DEFINE_SPINLOCK(zpools_lock);
> +
> +static int zpool_noop_evict(struct zpool *pool, unsigned long handle)
> +{
> + return -EINVAL;
> +}
> +static struct zpool_ops zpool_noop_ops = {
> + .evict = zpool_noop_evict
> +};
> +
> +
> +/* zsmalloc */
> +
> +#ifdef CONFIG_ZSMALLOC
> +
> +static void zpool_zsmalloc_destroy(struct zpool *zpool)
> +{
> + spin_lock(&zpools_lock);
> + list_del(&zpool->list);
> + spin_unlock(&zpools_lock);
> +
> + zs_destroy_pool(zpool->zsmalloc_pool);
> + kfree(zpool);
> +}
> +
> +static int zpool_zsmalloc_malloc(struct zpool *pool, size_t size,
> + unsigned long *handle)
> +{
> + *handle = zs_malloc(pool->zsmalloc_pool, size);
> + return *handle ? 0 : -1;
> +}
> +
> +static void zpool_zsmalloc_free(struct zpool *pool, unsigned long handle)
> +{
> + zs_free(pool->zsmalloc_pool, handle);
> +}
> +
> +static int zpool_zsmalloc_shrink(struct zpool *pool, size_t size)
> +{
> + /* Not yet supported */
> + return -EINVAL;
> +}
> +
> +static void *zpool_zsmalloc_map(struct zpool *pool, unsigned long handle,
> + enum zpool_mapmode zpool_mapmode)
> +{
> + enum zs_mapmode zs_mapmode;
> +
> + switch (zpool_mapmode) {
> + case ZPOOL_MM_RO:
> + zs_mapmode = ZS_MM_RO; break;
> + case ZPOOL_MM_WO:
> + zs_mapmode = ZS_MM_WO; break;
> + case ZPOOL_MM_RW: /* fallthrough */
> + default:
> + zs_mapmode = ZS_MM_RW; break;
> + }
> + return zs_map_object(pool->zsmalloc_pool, handle, zs_mapmode);
> +}
> +
> +static void zpool_zsmalloc_unmap(struct zpool *pool, unsigned long handle)
> +{
> + zs_unmap_object(pool->zsmalloc_pool, handle);
> +}
> +
> +static u64 zpool_zsmalloc_total_size(struct zpool *pool)
> +{
> + return zs_get_total_size_bytes(pool->zsmalloc_pool);
> +}
> +
> +static struct zpool_imp zpool_zsmalloc_imp = {
> + .destroy = zpool_zsmalloc_destroy,
> + .malloc = zpool_zsmalloc_malloc,
> + .free = zpool_zsmalloc_free,
> + .shrink = zpool_zsmalloc_shrink,
> + .map = zpool_zsmalloc_map,
> + .unmap = zpool_zsmalloc_unmap,
> + .total_size = zpool_zsmalloc_total_size
> +};
> +
> +static struct zpool *zpool_zsmalloc_create(gfp_t flags, struct zpool_ops *ops)
> +{
> + struct zpool *zpool;
> +
> + zpool = kmalloc(sizeof(*zpool), GFP_KERNEL);
> + if (!zpool) {
> + pr_err("couldn't create zpool - out of memory\n");
> + return NULL;
> + }
> +
> + zpool->zsmalloc_pool = zs_create_pool(flags);
> + if (!zpool->zsmalloc_pool) {
> + kfree(zpool);
> + pr_err("zs_create_pool() failed\n");
> + return NULL;
> + }
> +
> + zpool->type = ZPOOL_TYPE_ZSMALLOC;
> + zpool->imp = &zpool_zsmalloc_imp;
> + zpool->ops = &zpool_noop_ops;
> + spin_lock(&zpools_lock);
> + list_add(&zpool->list, &zpools);
> + spin_unlock(&zpools_lock);
> +
> + return zpool;
> +}
> +
> +#else
> +
> +static struct zpool *zpool_zsmalloc_create(gfp_t flags, struct zpool_ops *ops)
> +{
> + pr_info("no zsmalloc in this kernel\n");
> + return NULL;
> +}
> +
> +#endif /* CONFIG_ZSMALLOC */
> +
> +
> +/* zbud */
> +
> +#ifdef CONFIG_ZBUD
> +
> +static void zpool_zbud_destroy(struct zpool *zpool)
> +{
> + spin_lock(&zpools_lock);
> + list_del(&zpool->list);
> + spin_unlock(&zpools_lock);
> +
> + zbud_destroy_pool(zpool->zbud_pool);
> + kfree(zpool);
> +}
> +
> +static int zpool_zbud_malloc(struct zpool *pool, size_t size,
> + unsigned long *handle)
> +{
> + return zbud_alloc(pool->zbud_pool, size, handle);
> +}
> +
> +static void zpool_zbud_free(struct zpool *pool, unsigned long handle)
> +{
> + zbud_free(pool->zbud_pool, handle);
> +}
> +
> +static int zpool_zbud_shrink(struct zpool *pool, size_t size)
> +{
> + return zbud_reclaim_page(pool->zbud_pool, 3);
> +}
> +
> +static void *zpool_zbud_map(struct zpool *pool, unsigned long handle,
> + enum zpool_mapmode zpool_mapmode)
> +{
> + return zbud_map(pool->zbud_pool, handle);
> +}
> +
> +static void zpool_zbud_unmap(struct zpool *pool, unsigned long handle)
> +{
> + zbud_unmap(pool->zbud_pool, handle);
> +}
> +
> +static u64 zpool_zbud_total_size(struct zpool *pool)
> +{
> + return zbud_get_pool_size(pool->zbud_pool) * PAGE_SIZE;
> +}
> +
> +static int zpool_zbud_evict(struct zbud_pool *zbud_pool, unsigned long handle)
> +{
> + struct zpool *zpool;
> +
> + spin_lock(&zpools_lock);
> + list_for_each_entry(zpool, &zpools, list) {
> + if (zpool->zbud_pool == zbud_pool) {
> + spin_unlock(&zpools_lock);
> + return zpool->ops->evict(zpool, handle);
> + }
> + }
> + spin_unlock(&zpools_lock);
> + return -EINVAL;
> +}
> +
> +static struct zpool_imp zpool_zbud_imp = {
> + .destroy = zpool_zbud_destroy,
> + .malloc = zpool_zbud_malloc,
> + .free = zpool_zbud_free,
> + .shrink = zpool_zbud_shrink,
> + .map = zpool_zbud_map,
> + .unmap = zpool_zbud_unmap,
> + .total_size = zpool_zbud_total_size
> +};
> +
> +static struct zbud_ops zpool_zbud_ops = {
> + .evict = zpool_zbud_evict
> +};
> +
> +static struct zpool *zpool_zbud_create(gfp_t flags, struct zpool_ops *ops)
> +{
> + struct zpool *zpool;
> + struct zbud_ops *zbud_ops = (ops ? &zpool_zbud_ops : NULL);
> +
> + zpool = kmalloc(sizeof(*zpool), GFP_KERNEL);
> + if (!zpool) {
> + pr_err("couldn't create zpool - out of memory\n");
> + return NULL;
> + }
> +
> + zpool->zbud_pool = zbud_create_pool(flags, zbud_ops);
> + if (!zpool->zbud_pool) {
> + kfree(zpool);
> + pr_err("zbud_create_pool() failed\n");
> + return NULL;
> + }
> +
> + zpool->type = ZPOOL_TYPE_ZBUD;
> + zpool->imp = &zpool_zbud_imp;
> + zpool->ops = (ops ? ops : &zpool_noop_ops);
> + spin_lock(&zpools_lock);
> + list_add(&zpool->list, &zpools);
> + spin_unlock(&zpools_lock);
> +
> + return zpool;
> +}
> +
> +#else
> +
> +static struct zpool *zpool_zbud_create(gfp_t flags, struct zpool_ops *ops)
> +{
> + pr_info("no zbud in this kernel\n");
> + return NULL;
> +}
> +
> +#endif /* CONFIG_ZBUD */
> +
> +
> +struct zpool *zpool_create_pool(char *type, gfp_t flags,
> + struct zpool_ops *ops)
> +{
> + struct zpool *pool = NULL;
> +
> + pr_info("creating pool type %s\n", type);
> +
> + if (!strcmp(type, ZPOOL_TYPE_ZSMALLOC))
> + pool = zpool_zsmalloc_create(flags, ops);
> + else if (!strcmp(type, ZPOOL_TYPE_ZBUD))
> + pool = zpool_zbud_create(flags, ops);
> + else
> + pr_err("unknown type %s\n", type);
> +
> + if (pool)
> + pr_info("created %s pool\n", type);
> + else
> + pr_err("couldn't create %s pool\n", type);
> +
> + return pool;
> +}
> +
> +char *zpool_get_type(struct zpool *pool)
> +{
> + return pool->type;
> +}
> +
> +void zpool_destroy_pool(struct zpool *pool)
> +{
> + pool->imp->destroy(pool);
> +}
> +
> +int zpool_malloc(struct zpool *pool, size_t size, unsigned long *handle)
> +{
> + return pool->imp->malloc(pool, size, handle);
> +}
> +
> +void zpool_free(struct zpool *pool, unsigned long handle)
> +{
> + pool->imp->free(pool, handle);
> +}
> +
> +int zpool_shrink(struct zpool *pool, size_t size)
> +{
> + return pool->imp->shrink(pool, size);
> +}
> +
> +void *zpool_map_handle(struct zpool *pool, unsigned long handle,
> + enum zpool_mapmode mapmode)
> +{
> + return pool->imp->map(pool, handle, mapmode);
> +}
> +
> +void zpool_unmap_handle(struct zpool *pool, unsigned long handle)
> +{
> + pool->imp->unmap(pool, handle);
> +}
> +
> +u64 zpool_get_total_size(struct zpool *pool)
> +{
> + return pool->imp->total_size(pool);
> +}
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/