Re: [PATCH 2/2] getrusage: use sig->stats_lock

From: Dylan Hatch
Date: Mon Jan 22 2024 - 21:53:41 EST


On Sun, Jan 21, 2024 at 4:09 AM Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>
> Dylan, do you have a better description? Can you share your repro?

That description seems accurate to me.

> although I think that something simple like
>
> #define NT BIG_NUMBER
>
> pthread_barrier_t barr;
>
> void *thread(void *arg)
> {
> struct rusage ru;
>
> pthread_barrier_wait(&barr);
> for (;;)
> getrusage(RUSAGE_SELF, &ru);
> return NULL;
> }
>
> int main(void)
> {
> pthread_barrier_init(&barr, NULL, NT);
>
> for (int n = 0; n < NT-1; ++n) {
> pthread_t pt;
> pthread_create(&pt, NULL, thread, NULL);
> }
> thread(NULL);
>
> return 0;
> }
>
> should work if you have a machine with a lot of memory/cpus.
>
> Oleg.
>

Here's my repro, very similar to what you've sent:

#define _GNU_SOURCE
#include <sys/resource.h>
#include <sched.h>
#include <sys/wait.h>
#include <stdio.h>
#include <sys/mman.h>
#include <stdlib.h>
#include <unistd.h>

int thrd_func(void *data) {
struct rusage usage;
int *complete = (void *)data;

while (!*complete);
while (1) {
getrusage(RUSAGE_SELF, &usage);
}
}

#define STACK_SIZE (1024)

int main(int argc, char **argv) {
if (argc != 2) {
printf("Usage: %s <thread count>\n", argv[0]);
exit(EXIT_SUCCESS);
}
const int cnt = atoi(argv[1]);
int pids[cnt];
int complete = 0;
printf("Starting test with %d threads...\n", cnt);
for (int i = 0; i < cnt; i++) {
char *stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE |
MAP_ANONYMOUS | MAP_STACK, -1, 0);
if (stack == MAP_FAILED) {
perror("mmap() failed\n");
return -1;
}

pids[i] = clone(thrd_func, stack + STACK_SIZE, CLONE_THREAD
| CLONE_SIGHAND | CLONE_FS | CLONE_VM |
CLONE_FILES, (void *) &complete);

if (pids[i] == -1) {
perror("clone() failed\n");
return pids[i];
}
}
complete = 1;
printf("waiting on threads...\n");
sleep(100);
complete = 0;
printf("test finished.\n");
exit(EXIT_SUCCESS);
}

I can't remember exactly why I chose to call mmap and clone directly instead
of using pthreads... but I do know what mmap'ing in a smaller stack size
makes the repro more reliable since you can create more threads. It
seemed like around 250K threads was about enough to reliably produce
the lockup, but your mileage may vary.