Re: [PATCH 2/2] /proc/stat: Add sysctl parameter to control irq counts latency

From: Matthew Wilcox
Date: Mon Jan 07 2019 - 10:58:46 EST


On Mon, Jan 07, 2019 at 10:12:58AM -0500, Waiman Long wrote:
> Reading /proc/stat can be slow especially if there are many irqs and on
> systems with many CPUs as summation of all the percpu counts for each
> of the irqs is required. On some newer systems, there can be more than
> 1000 irqs per socket.
>
> Applications that need to read /proc/stat many times per seconds will
> easily hit a bottleneck. In reality, the irq counts are seldom looked
> at. Even those applications that read them don't really need up-to-date
> information. One way to reduce the performance impact of irq counts
> computation is to do it less frequently.
>
> A new "fs/proc-stat-irqs-latency-ms" sysctl parameter is now added to

No. No, no, no, no, no. No.

Stop adding new sysctls for this kind of thing. It's just a way to shift
blame from us (who allegedly know what we're doing) to the sysadmin
(who probably has better things to be doing than keeping up with the
intricacies of development of every single piece of software running on
their system).

Let's figure out what this _should_ be. As a strawman, I propose we
update these stats once a second. That's easy to document and understand.

> @@ -98,7 +105,48 @@ static u64 compute_stat_irqs_sum(void)
> static void show_stat_irqs(struct seq_file *p)
> {
> int i;
> +#ifdef CONFIG_PROC_SYSCTL
> + static char *irqs_buf; /* Buffer for irqs values */
> + static int buflen;
> + static unsigned long last_jiffies; /* Last buffer update jiffies */
> + static DEFINE_MUTEX(irqs_mutex);
> + unsigned int latency = proc_stat_irqs_latency_ms;
> +
> + if (latency) {
> + char *ptr;
> +
> + latency = _msecs_to_jiffies(latency);
> +
> + mutex_lock(&irqs_mutex);
> + if (irqs_buf && time_before(jiffies, last_jiffies + latency))
> + goto print_out;
> +
> + /*
> + * Each irq value may require up to 11 bytes.
> + */
> + if (!irqs_buf) {
> + irqs_buf = kmalloc(nr_irqs * 11 + 32,
> + GFP_KERNEL | __GFP_ZERO);

Why are you caching the _output_ of calling sprintf(), rather than caching the
values of each interrupt?