Re: [PATCH] kcov: add AFL-style tracing

From: Dmitry Vyukov
Date: Sun May 29 2016 - 12:52:21 EST


On Sat, May 21, 2016 at 6:07 PM, Vegard Nossum <vegard.nossum@xxxxxxxxxx> wrote:
> From: Quentin Casasnovas <quentin.casasnovas@xxxxxxxxxx>
>
> AFL uses a fixed-size buffer (typically 64 KiB) where each byte is
> a counter representing how many times an A -> B branch was taken.
> Of course, since the buffer is fixed size, it's a little imprecise
> in that e.g. two different branches could map to the same counter,
> but in practice it works well.


Nice!
Glad it fit so nicely within the existing code.


> See afl:docs/technical_details.txt for more information.
>
> Here is a small test program that demonstrates the new capability:
>
> #include <sys/ioctl.h>
> #include <sys/mman.h>
> #include <sys/types.h>
>
> #include <errno.h>
> #include <error.h>
> #include <fcntl.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
>
> #include <linux/types.h>
>
> #define KCOV_INIT_TRACE _IOR('c', 1, unsigned long)
> #define KCOV_INIT_AFL _IOR('c', 2, unsigned long)
> #define KCOV_ENABLE _IO('c', 100)
> #define KCOV_DISABLE _IO('c', 101)
>
> int main(int argc, char *argv[])
> {
> int fd = open("/sys/kernel/debug/kcov", O_RDWR);
> if (fd == -1)
> error(1, errno, "open()");
>
> unsigned long size = 1 << 10;
> if (ioctl(fd, KCOV_INIT_AFL, size) != 0)
> error(1, errno, "ioctl(KCOV_INIT_AFL)");
>
> void *mem = mmap(NULL, size * sizeof(long), PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> if (mem == MAP_FAILED)
> error(1, errno, "mmap()");
>
> /* Start kernel instrumentation */
> if (ioctl(fd, KCOV_ENABLE, 0) != 0)
> error(1, errno, "ioctl(KCOV_ENABLE)");
>
> printf("Hello world!\n");
>
> /* End kernel instrumentation*/
> if (ioctl(fd, KCOV_DISABLE, 0) != 0)
> error(1, errno, "ioctl(KCOV_DISABLE)");
>
> /* Hex dump of memory area */
> unsigned char *mem2 = mem;
> for (unsigned int i = 0; i < size; ++i) {
> printf("%02x ", mem2[i]);
> if (i % 32 == 31)
> printf("\n");
> }
>
> close(fd);
> return 0;
> }
>
> This patch is a collaboration between Quentin Casasnovas and Vegard Nossum.
>
> Cc: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> Cc: Michal Zalewski <lcamtuf@xxxxxxxxx>
> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> Signed-off-by: Quentin Casasnovas <quentin.casasnovas@xxxxxxxxxx>
> Signed-off-by: Vegard Nossum <vegard.nossum@xxxxxxxxxx>
> ---
> include/linux/kcov.h | 6 ++++++
> include/uapi/linux/kcov.h | 1 +
> kernel/kcov.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 51 insertions(+)
>
> diff --git a/include/linux/kcov.h b/include/linux/kcov.h
> index 2883ac9..cb8eb3e 100644
> --- a/include/linux/kcov.h
> +++ b/include/linux/kcov.h
> @@ -18,6 +18,12 @@ enum kcov_mode {
> * Covered PCs are collected in a per-task buffer.
> */
> KCOV_MODE_TRACE = 1,
> + /*
> + * AFL-style collection.
> + * Covered branches are hashed and collected in a fixed size buffer
> + * (see AFL documentation for more information).
> + */
> + KCOV_MODE_AFL = 2,
> };
>
> #else
> diff --git a/include/uapi/linux/kcov.h b/include/uapi/linux/kcov.h
> index 574e22e..0dc1cf9 100644
> --- a/include/uapi/linux/kcov.h
> +++ b/include/uapi/linux/kcov.h
> @@ -4,6 +4,7 @@
> #include <linux/types.h>
>
> #define KCOV_INIT_TRACE _IOR('c', 1, unsigned long)
> +#define KCOV_INIT_AFL _IOR('c', 2, unsigned long)
> #define KCOV_ENABLE _IO('c', 100)
> #define KCOV_DISABLE _IO('c', 101)
>
> diff --git a/kernel/kcov.c b/kernel/kcov.c
> index a02f2dd..e6eb785 100644
> --- a/kernel/kcov.c
> +++ b/kernel/kcov.c
> @@ -5,6 +5,7 @@
> #include <linux/types.h>
> #include <linux/file.h>
> #include <linux/fs.h>
> +#include <linux/hash.h>
> #include <linux/mm.h>
> #include <linux/printk.h>
> #include <linux/slab.h>
> @@ -34,6 +35,19 @@ struct kcov {
> enum kcov_mode mode;
> /* Size of arena (in long's for KCOV_MODE_TRACE). */
> unsigned size;
> +
> + union {
> + /* For KCOV_MODE_AFL */
> + struct {
> + /* (number of bytes) - 1, where number of
> + * bytes must be a power of 2. */
> + unsigned int mask;
> +
> + /* Previous PC (for KCOV_MODE_AFL). */

"(for KCOV_MODE_AFL)" is excessive, remove. This whole struct is "For
KCOV_MODE_AFL".


> + unsigned int prev_location;
> + };
> + };
> +
> /* Coverage buffer shared with user space. */
> void *area;
> /* Task for which we collect coverage, or NULL. */
> @@ -76,6 +90,19 @@ void notrace __sanitizer_cov_trace_pc(void)
> area[pos] = _RET_IP_;
> WRITE_ONCE(area[0], pos);
> }
> + } else if (mode == KCOV_MODE_AFL) {
> + struct kcov *kcov;
> + unsigned char *area;
> + unsigned long location = _RET_IP_;
> +
> + /* See above */
> + barrier();

Wonder if it forces compiler to spill location onto stack.
Maybe it's better to initialize location after barrier.


> + kcov = t->kcov;
> + area = kcov->area;
> +
> + ++area[(kcov->prev_location ^ location) & kcov->mask];
> + kcov->prev_location = hash_long(location, BITS_PER_LONG);

The idea was that all data required for this function is cached in
task struct (since the function is very frequently executed). Avoids
additional indirection.
If you switch trace mode to using mask (see below), then both modes
can share that field in task struct.
And then you can add prev_location to task struct.



> }
> }
> EXPORT_SYMBOL(__sanitizer_cov_trace_pc);
> @@ -200,6 +227,23 @@ static int kcov_ioctl_locked(struct kcov *kcov, unsigned int cmd,
> kcov->size = size;
> kcov->mode = KCOV_MODE_TRACE;
> return 0;
> + case KCOV_INIT_AFL:
> + if (kcov->mode != KCOV_MODE_DISABLED)
> + return -EBUSY;
> +
> + /*
> + * AFL mode needs a fixed-size area that is a power of 2.
> + */
> + size = arg;
> + if (size < 2 || size > INT_MAX / sizeof(unsigned long))
> + return -EINVAL;
> + if (!is_power_of_2(size))
> + return -EINVAL;
> + kcov->size = size;
> + kcov->mask = (size * sizeof(unsigned long)) - 1;


I assumed that KCOV_INIT_AFL accepts size in bytes, because you use
"char *area". Then spent a while to figure out how it all works and
why you multiply mask by sizeof(unsigned long).

Do you mind to do a small refactoring?
Initially I implemented both trace mode and afl mode, but then
simplified the patch for initial submission as much as possible
(assuming only trace mode).
The initial idea was that kcov->size is expressed in bytes, so that
kcov_map does not make assumption that the trace mode operates on
longs. I think it's better to restore that now that we have 2 modes.
Namely, (1) make kcov->size expressed in bytes, (2) also use mask in
__sanitizer_cov_trace_pc for trace mode:

- if (likely(pos < t->kcov_size)) {
+ if (likely(pos < t->kcov_mask)) {

Then KCOV_INIT_AFL can accept bytes and avoid these confusing
multiplications by sizeof(unsigned long).



> + kcov->prev_location = hash_long(0, BITS_PER_LONG);
> + kcov->mode = KCOV_MODE_AFL;
> + return 0;
> case KCOV_ENABLE:
> /*
> * Enable coverage for the current task.


I am not very strong about it, but I would rename
KCOV_INIT_AFL/KCOV_MODE_AFL to KCOV_INIT_TABLE/KCOV_MODE_TABLE. I
understand that it originated in AFL, but in the end it's no more than
a hash table. KCOV users do not need to know what is AFL (though, it
would be bad for them :) ).