Re: [kernel-hardening] [PATCH v2] time: Remove CONFIG_TIMER_STATS

From: Yann Droneaud
Date: Thu Feb 09 2017 - 08:57:29 EST


Hi,

Don't forget to send to linux-api@xxxxxxxxxxxxxxx

Le mercredi 08 fÃvrier 2017 Ã 11:26 -0800, Kees Cook a ÃcritÂ:
> Currently CONFIG_TIMER_STATS exposes process information across
> namespaces:
>
> kernel/time/timer_list.c print_timer():
>
> ÂÂÂÂÂÂÂÂSEQ_printf(m, ", %s/%d", tmp, timer->start_pid);
>
> /proc/timer_list:
>
> Â#11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep,
> cron/2570
>
> Given that the tracer can give the same information, this patch
> entirely
> removes CONFIG_TIMER_STATS.
>
> Suggested-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Signed-off-by: Kees Cook <keescook@xxxxxxxxxxxx>
> Acked-by: John Stultz <john.stultz@xxxxxxxxxx>
> ---
> v2:
> - dropped doc comments for removed structure elements; thx 0-day
> builder.
> ---
> ÂDocumentation/timers/timer_stats.txt |ÂÂ73 ------
> Âinclude/linux/hrtimer.hÂÂÂÂÂÂÂÂÂÂÂÂÂÂ|ÂÂ11 -
> Âinclude/linux/timer.hÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ|ÂÂ45 ----
> Âkernel/kthread.cÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ|ÂÂÂ1 -
> Âkernel/time/MakefileÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ|ÂÂÂ1 -
> Âkernel/time/hrtimer.cÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ|ÂÂ38 ----
> Âkernel/time/timer.cÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ|ÂÂ48 +---
> Âkernel/time/timer_list.cÂÂÂÂÂÂÂÂÂÂÂÂÂ|ÂÂ10 -
> Âkernel/time/timer_stats.cÂÂÂÂÂÂÂÂÂÂÂÂ| 425 -----------------------
> ------------
> Âkernel/workqueue.cÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ|ÂÂÂ2 -
> Âlib/Kconfig.debugÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ|ÂÂ14 --
> Â11 files changed, 2 insertions(+), 666 deletions(-)
> Âdelete mode 100644 Documentation/timers/timer_stats.txt
> Âdelete mode 100644 kernel/time/timer_stats.c
>
> diff --git a/Documentation/timers/timer_stats.txt
> b/Documentation/timers/timer_stats.txt
> deleted file mode 100644
> index de835ee97455..000000000000
> --- a/Documentation/timers/timer_stats.txt
> +++ /dev/null
> @@ -1,73 +0,0 @@
> -timer_stats - timer usage statistics
> -------------------------------------
> -
> -timer_stats is a debugging facility to make the timer (ab)usage in a
> Linux
> -system visible to kernel and userspace developers. If enabled in the
> config
> -but not used it has almost zero runtime overhead, and a relatively
> small
> -data structure overhead. Even if collection is enabled runtime all
> the
> -locking is per-CPU and lookup is hashed.
> -
> -timer_stats should be used by kernel and userspace developers to
> verify that
> -their code does not make unduly use of timers. This helps to avoid
> unnecessary
> -wakeups, which should be avoided to optimize power consumption.
> -
> -It can be enabled by CONFIG_TIMER_STATS in the "Kernel hacking"
> configuration
> -section.
> -
> -timer_stats collects information about the timer events which are
> fired in a
> -Linux system over a sample period:
> -
> -- the pid of the task(process) which initialized the timer
> -- the name of the process which initialized the timer
> -- the function where the timer was initialized
> -- the callback function which is associated to the timer
> -- the number of events (callbacks)
> -
> -timer_stats adds an entry to /proc: /proc/timer_stats
> -
> -This entry is used to control the statistics functionality and to
> read out the
> -sampled information.
> -
> -The timer_stats functionality is inactive on bootup.
> -
> -To activate a sample period issue:
> -# echo 1 >/proc/timer_stats
> -
> -To stop a sample period issue:
> -# echo 0 >/proc/timer_stats
> -
> -The statistics can be retrieved by:
> -# cat /proc/timer_stats
> -
> -While sampling is enabled, each readout from /proc/timer_stats will
> see
> -newly updated statistics. Once sampling is disabled, the sampled
> information
> -is kept until a new sample period is started. This allows multiple
> readouts.
> -
> -Sample output of /proc/timer_stats:
> -
> -Timerstats sample period: 3.888770 s
> -ÂÂ12,ÂÂÂÂÂ0 swapperÂÂÂÂÂÂÂÂÂÂhrtimer_stop_sched_tick
> (hrtimer_sched_tick)
> -ÂÂ15,ÂÂÂÂÂ1 swapperÂÂÂÂÂÂÂÂÂÂhcd_submit_urb (rh_timer_func)
> -ÂÂÂ4,ÂÂÂ959 kedacÂÂÂÂÂÂÂÂÂÂÂÂschedule_timeout (process_timeout)
> -ÂÂÂ1,ÂÂÂÂÂ0 swapperÂÂÂÂÂÂÂÂÂÂpage_writeback_init (wb_timer_fn)
> -ÂÂ28,ÂÂÂÂÂ0 swapperÂÂÂÂÂÂÂÂÂÂhrtimer_stop_sched_tick
> (hrtimer_sched_tick)
> -ÂÂ22,ÂÂ2948 IRQ 4ÂÂÂÂÂÂÂÂÂÂÂÂtty_flip_buffer_push
> (delayed_work_timer_fn)
> -ÂÂÂ3,ÂÂ3100 bashÂÂÂÂÂÂÂÂÂÂÂÂÂschedule_timeout (process_timeout)
> -ÂÂÂ1,ÂÂÂÂÂ1 swapperÂÂÂÂÂÂÂÂÂÂqueue_delayed_work_on
> (delayed_work_timer_fn)
> -ÂÂÂ1,ÂÂÂÂÂ1 swapperÂÂÂÂÂÂÂÂÂÂqueue_delayed_work_on
> (delayed_work_timer_fn)
> -ÂÂÂ1,ÂÂÂÂÂ1 swapperÂÂÂÂÂÂÂÂÂÂneigh_table_init_no_netlink
> (neigh_periodic_timer)
> -ÂÂÂ1,ÂÂ2292 ipÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ__netdev_watchdog_up (dev_watchdog)
> -ÂÂÂ1,ÂÂÂÂ23 events/1ÂÂÂÂÂÂÂÂÂdo_cache_clean (delayed_work_timer_fn)
> -90 total events, 30.0 events/sec
> -
> -The first column is the number of events, the second column the pid,
> the third
> -column is the name of the process. The forth column shows the
> function which
> -initialized the timer and in parenthesis the callback function which
> was
> -executed on expiry.
> -
> -ÂÂÂÂThomas, Ingo
> -
> -Added flag to indicate 'deferrable timer' in /proc/timer_stats. A
> deferrable
> -timer will appear as follows
> -ÂÂ10D,ÂÂÂÂÂ1 swapperÂÂÂÂÂÂÂÂÂÂqueue_delayed_work_on
> (delayed_work_timer_fn)
> -
> diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
> index cdab81ba29f8..e52b427223ba 100644
> --- a/include/linux/hrtimer.h
> +++ b/include/linux/hrtimer.h
> @@ -88,12 +88,6 @@ enum hrtimer_restart {
> Â * @base: pointer to the timer base (per cpu and per clock)
> Â * @state: state information (See bit values above)
> Â * @is_rel: Set if the timer was armed relative
> - * @start_pid:ÂÂtimer statistics field to store the pid of the task
> which
> - * started the timer
> - * @start_site: timer statistics field to store the site
> where the timer
> - * was started
> - * @start_comm: timer statistics field to store the name of the
> process which
> - * started the timer
> Â *
> Â * The hrtimer structure must be initialized by hrtimer_init()
> Â */
> @@ -104,11 +98,6 @@ struct hrtimer {
> Â struct hrtimer_clock_base *base;
> Â u8 state;
> Â u8 is_rel;
> -#ifdef CONFIG_TIMER_STATS
> - int start_pid;
> - void *start_site;
> - char start_comm[16];
> -#endif
> Â};
> Â
> Â/**
> diff --git a/include/linux/timer.h b/include/linux/timer.h
> index 51d601f192d4..5a209b84fd9e 100644
> --- a/include/linux/timer.h
> +++ b/include/linux/timer.h
> @@ -20,11 +20,6 @@ struct timer_list {
> Â unsigned long data;
> Â u32 flags;
> Â
> -#ifdef CONFIG_TIMER_STATS
> - int start_pid;
> - void *start_site;
> - char start_comm[16];
> -#endif
> Â#ifdef CONFIG_LOCKDEP
> Â struct lockdep_map lockdep_map;
> Â#endif
> @@ -197,46 +192,6 @@ extern int mod_timer_pending(struct timer_list
> *timer, unsigned long expires);
> Â */
> Â#define NEXT_TIMER_MAX_DELTA ((1UL << 30) - 1)
> Â
> -/*
> - * Timer-statistics info:
> - */
> -#ifdef CONFIG_TIMER_STATS
> -
> -extern int timer_stats_active;
> -
> -extern void init_timer_stats(void);
> -
> -extern void timer_stats_update_stats(void *timer, pid_t pid, void
> *startf,
> - ÂÂÂÂÂvoid *timerf, char *comm, u32
> flags);
> -
> -extern void __timer_stats_timer_set_start_info(struct timer_list
> *timer,
> - ÂÂÂÂÂÂÂvoid *addr);
> -
> -static inline void timer_stats_timer_set_start_info(struct
> timer_list *timer)
> -{
> - if (likely(!timer_stats_active))
> - return;
> - __timer_stats_timer_set_start_info(timer,
> __builtin_return_address(0));
> -}
> -
> -static inline void timer_stats_timer_clear_start_info(struct
> timer_list *timer)
> -{
> - timer->start_site = NULL;
> -}
> -#else
> -static inline void init_timer_stats(void)
> -{
> -}
> -
> -static inline void timer_stats_timer_set_start_info(struct
> timer_list *timer)
> -{
> -}
> -
> -static inline void timer_stats_timer_clear_start_info(struct
> timer_list *timer)
> -{
> -}
> -#endif
> -
> Âextern void add_timer(struct timer_list *timer);
> Â
> Âextern int try_to_del_timer_sync(struct timer_list *timer);
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 2318fba86277..8461a4372e8a 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -850,7 +850,6 @@ void __kthread_queue_delayed_work(struct
> kthread_worker *worker,
> Â
> Â list_add(&work->node, &worker->delayed_work_list);
> Â work->worker = worker;
> - timer_stats_timer_set_start_info(&dwork->timer);
> Â timer->expires = jiffies + delay;
> Â add_timer(timer);
> Â}
> diff --git a/kernel/time/Makefile b/kernel/time/Makefile
> index 976840d29a71..938dbf33ef49 100644
> --- a/kernel/time/Makefile
> +++ b/kernel/time/Makefile
> @@ -15,6 +15,5 @@ ifeq ($(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST),y)
> Âendif
> Âobj-$(CONFIG_GENERIC_SCHED_CLOCK) += sched_clock.o
> Âobj-$(CONFIG_TICK_ONESHOT) += tick-oneshot.o
> tick-sched.o
> -obj-$(CONFIG_TIMER_STATS) += timer_stats.o
> Âobj-$(CONFIG_DEBUG_FS) +=
> timekeeping_debug.o
> Âobj-$(CONFIG_TEST_UDELAY) += test_udelay.o
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index c6ecedd3b839..edabde646e58 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -766,34 +766,6 @@ void hrtimers_resume(void)
> Â clock_was_set_delayed();
> Â}
> Â
> -static inline void timer_stats_hrtimer_set_start_info(struct hrtimer
> *timer)
> -{
> -#ifdef CONFIG_TIMER_STATS
> - if (timer->start_site)
> - return;
> - timer->start_site = __builtin_return_address(0);
> - memcpy(timer->start_comm, current->comm, TASK_COMM_LEN);
> - timer->start_pid = current->pid;
> -#endif
> -}
> -
> -static inline void timer_stats_hrtimer_clear_start_info(struct
> hrtimer *timer)
> -{
> -#ifdef CONFIG_TIMER_STATS
> - timer->start_site = NULL;
> -#endif
> -}
> -
> -static inline void timer_stats_account_hrtimer(struct hrtimer
> *timer)
> -{
> -#ifdef CONFIG_TIMER_STATS
> - if (likely(!timer_stats_active))
> - return;
> - timer_stats_update_stats(timer, timer->start_pid, timer-
> >start_site,
> - Âtimer->function, timer->start_comm,
> 0);
> -#endif
> -}
> -
> Â/*
> Â * Counterpart to lock_hrtimer_base above:
> Â */
> @@ -932,7 +904,6 @@ remove_hrtimer(struct hrtimer *timer, struct
> hrtimer_clock_base *base, bool rest
> Â Â* rare case and less expensive than a smp call.
> Â Â*/
> Â debug_deactivate(timer);
> - timer_stats_hrtimer_clear_start_info(timer);
> Â reprogram = base->cpu_base ==
> this_cpu_ptr(&hrtimer_bases);
> Â
> Â if (!restart)
> @@ -990,8 +961,6 @@ void hrtimer_start_range_ns(struct hrtimer
> *timer, ktime_t tim,
> Â /* Switch the timer base, if necessary: */
> Â new_base = switch_hrtimer_base(timer, base, mode &
> HRTIMER_MODE_PINNED);
> Â
> - timer_stats_hrtimer_set_start_info(timer);
> -
> Â leftmost = enqueue_hrtimer(timer, new_base);
> Â if (!leftmost)
> Â goto unlock;
> @@ -1128,12 +1097,6 @@ static void __hrtimer_init(struct hrtimer
> *timer, clockid_t clock_id,
> Â base = hrtimer_clockid_to_base(clock_id);
> Â timer->base = &cpu_base->clock_base[base];
> Â timerqueue_init(&timer->node);
> -
> -#ifdef CONFIG_TIMER_STATS
> - timer->start_site = NULL;
> - timer->start_pid = -1;
> - memset(timer->start_comm, 0, TASK_COMM_LEN);
> -#endif
> Â}
> Â
> Â/**
> @@ -1217,7 +1180,6 @@ static void __run_hrtimer(struct
> hrtimer_cpu_base *cpu_base,
> Â raw_write_seqcount_barrier(&cpu_base->seq);
> Â
> Â __remove_hrtimer(timer, base, HRTIMER_STATE_INACTIVE, 0);
> - timer_stats_account_hrtimer(timer);
> Â fn = timer->function;
> Â
> Â /*
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index ec33a6933eae..82a6bfa0c307 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -571,38 +571,6 @@ internal_add_timer(struct timer_base *base,
> struct timer_list *timer)
> Â trigger_dyntick_cpu(base, timer);
> Â}
> Â
> -#ifdef CONFIG_TIMER_STATS
> -void __timer_stats_timer_set_start_info(struct timer_list *timer,
> void *addr)
> -{
> - if (timer->start_site)
> - return;
> -
> - timer->start_site = addr;
> - memcpy(timer->start_comm, current->comm, TASK_COMM_LEN);
> - timer->start_pid = current->pid;
> -}
> -
> -static void timer_stats_account_timer(struct timer_list *timer)
> -{
> - void *site;
> -
> - /*
> - Â* start_site can be concurrently reset by
> - Â* timer_stats_timer_clear_start_info()
> - Â*/
> - site = READ_ONCE(timer->start_site);
> - if (likely(!site))
> - return;
> -
> - timer_stats_update_stats(timer, timer->start_pid, site,
> - Âtimer->function, timer->start_comm,
> - Âtimer->flags);
> -}
> -
> -#else
> -static void timer_stats_account_timer(struct timer_list *timer) {}
> -#endif
> -
> Â#ifdef CONFIG_DEBUG_OBJECTS_TIMERS
> Â
> Âstatic struct debug_obj_descr timer_debug_descr;
> @@ -789,11 +757,6 @@ static void do_init_timer(struct timer_list
> *timer, unsigned int flags,
> Â{
> Â timer->entry.pprev = NULL;
> Â timer->flags = flags | raw_smp_processor_id();
> -#ifdef CONFIG_TIMER_STATS
> - timer->start_site = NULL;
> - timer->start_pid = -1;
> - memset(timer->start_comm, 0, TASK_COMM_LEN);
> -#endif
> Â lockdep_init_map(&timer->lockdep_map, name, key, 0);
> Â}
> Â
> @@ -1001,8 +964,6 @@ __mod_timer(struct timer_list *timer, unsigned
> long expires, bool pending_only)
> Â base = lock_timer_base(timer, &flags);
> Â }
> Â
> - timer_stats_timer_set_start_info(timer);
> -
> Â ret = detach_if_pending(timer, base, false);
> Â if (!ret && pending_only)
> Â goto out_unlock;
> @@ -1130,7 +1091,6 @@ void add_timer_on(struct timer_list *timer, int
> cpu)
> Â struct timer_base *new_base, *base;
> Â unsigned long flags;
> Â
> - timer_stats_timer_set_start_info(timer);
> Â BUG_ON(timer_pending(timer) || !timer->function);
> Â
> Â new_base = get_timer_cpu_base(timer->flags, cpu);
> @@ -1176,7 +1136,6 @@ int del_timer(struct timer_list *timer)
> Â
> Â debug_assert_init(timer);
> Â
> - timer_stats_timer_clear_start_info(timer);
> Â if (timer_pending(timer)) {
> Â base = lock_timer_base(timer, &flags);
> Â ret = detach_if_pending(timer, base, true);
> @@ -1204,10 +1163,9 @@ int try_to_del_timer_sync(struct timer_list
> *timer)
> Â
> Â base = lock_timer_base(timer, &flags);
> Â
> - if (base->running_timer != timer) {
> - timer_stats_timer_clear_start_info(timer);
> + if (base->running_timer != timer)
> Â ret = detach_if_pending(timer, base, true);
> - }
> +
> Â spin_unlock_irqrestore(&base->lock, flags);
> Â
> Â return ret;
> @@ -1331,7 +1289,6 @@ static void expire_timers(struct timer_base
> *base, struct hlist_head *head)
> Â unsigned long data;
> Â
> Â timer = hlist_entry(head->first, struct timer_list,
> entry);
> - timer_stats_account_timer(timer);
> Â
> Â base->running_timer = timer;
> Â detach_timer(timer, true);
> @@ -1868,7 +1825,6 @@ static void __init init_timer_cpus(void)
> Âvoid __init init_timers(void)
> Â{
> Â init_timer_cpus();
> - init_timer_stats();
> Â open_softirq(TIMER_SOFTIRQ, run_timer_softirq);
> Â}
> Â
> diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
> index afe6cd1944fc..387a3a5aa388 100644
> --- a/kernel/time/timer_list.c
> +++ b/kernel/time/timer_list.c
> @@ -62,21 +62,11 @@ static void
> Âprint_timer(struct seq_file *m, struct hrtimer *taddr, struct
> hrtimer *timer,
> Â ÂÂÂÂint idx, u64 now)
> Â{
> -#ifdef CONFIG_TIMER_STATS
> - char tmp[TASK_COMM_LEN + 1];
> -#endif
> Â SEQ_printf(m, " #%d: ", idx);
> Â print_name_offset(m, taddr);
> Â SEQ_printf(m, ", ");
> Â print_name_offset(m, timer->function);
> Â SEQ_printf(m, ", S:%02x", timer->state);
> -#ifdef CONFIG_TIMER_STATS
> - SEQ_printf(m, ", ");
> - print_name_offset(m, timer->start_site);
> - memcpy(tmp, timer->start_comm, TASK_COMM_LEN);
> - tmp[TASK_COMM_LEN] = 0;
> - SEQ_printf(m, ", %s/%d", tmp, timer->start_pid);
> -#endif
> Â SEQ_printf(m, "\n");
> Â SEQ_printf(m, " # expires at %Lu-%Lu nsecs [in %Ld to %Ld
> nsecs]\n",
> Â (unsigned long
> long)ktime_to_ns(hrtimer_get_softexpires(timer)),
> diff --git a/kernel/time/timer_stats.c b/kernel/time/timer_stats.c
> deleted file mode 100644
> index afddded947df..000000000000
> --- a/kernel/time/timer_stats.c
> +++ /dev/null
> @@ -1,425 +0,0 @@
> -/*
> - * kernel/time/timer_stats.c
> - *
> - * Collect timer usage statistics.
> - *
> - * Copyright(C) 2006, Red Hat, Inc., Ingo Molnar
> - * Copyright(C) 2006 Timesys Corp., Thomas Gleixner <tglx@xxxxxxxxxx
> m>
> - *
> - * timer_stats is based on timer_top, a similar functionality which
> was part of
> - * Con Kolivas dyntick patch set. It was developed by Daniel Petrini
> at the
> - * Instituto Nokia de Tecnologia - INdT - Manaus. timer_top's design
> was based
> - * on dynamic allocation of the statistics entries and linear search
> based
> - * lookup combined with a global lock, rather than the static array,
> hash
> - * and per-CPU locking which is used by timer_stats. It was written
> for the
> - * pre hrtimer kernel code and therefore did not take hrtimers into
> account.
> - * Nevertheless it provided the base for the timer_stats
> implementation and
> - * was a helpful source of inspiration. Kudos to Daniel and the
> Nokia folks
> - * for this effort.
> - *
> - * timer_top.c is
> - * Copyright (C) 2005 Instituto Nokia de Tecnologia - INdT -
> Manaus
> - * Written by Daniel Petrini <d.pensator@xxxxxxxxx>
> - * timer_top.c was released under the GNU General Public
> License version 2
> - *
> - * We export the addresses and counting of timer functions being
> called,
> - * the pid and cmdline from the owner process if applicable.
> - *
> - * Start/stop data collection:
> - * # echo [1|0] >/proc/timer_stats
> - *
> - * Display the information collected so far:
> - * # cat /proc/timer_stats
> - *
> - * This program is free software; you can redistribute it and/or
> modify
> - * it under the terms of the GNU General Public License version 2 as
> - * published by the Free Software Foundation.
> - */
> -
> -#include <linux/proc_fs.h>
> -#include <linux/module.h>
> -#include <linux/spinlock.h>
> -#include <linux/sched.h>
> -#include <linux/seq_file.h>
> -#include <linux/kallsyms.h>
> -
> -#include <linux/uaccess.h>
> -
> -/*
> - * This is our basic unit of interest: a timer expiry event
> identified
> - * by the timer, its start/expire functions and the PID of the task
> that
> - * started the timer. We count the number of times an event happens:
> - */
> -struct entry {
> - /*
> - Â* Hash list:
> - Â*/
> - struct entry *next;
> -
> - /*
> - Â* Hash keys:
> - Â*/
> - void *timer;
> - void *start_func;
> - void *expire_func;
> - pid_t pid;
> -
> - /*
> - Â* Number of timeout events:
> - Â*/
> - unsigned long count;
> - u32 flags;
> -
> - /*
> - Â* We save the command-line string to preserve
> - Â* this information past task exit:
> - Â*/
> - char comm[TASK_COMM_LEN + 1];
> -
> -} ____cacheline_aligned_in_smp;
> -
> -/*
> - * Spinlock protecting the tables - not taken during lookup:
> - */
> -static DEFINE_RAW_SPINLOCK(table_lock);
> -
> -/*
> - * Per-CPU lookup locks for fast hash lookup:
> - */
> -static DEFINE_PER_CPU(raw_spinlock_t, tstats_lookup_lock);
> -
> -/*
> - * Mutex to serialize state changes with show-stats activities:
> - */
> -static DEFINE_MUTEX(show_mutex);
> -
> -/*
> - * Collection status, active/inactive:
> - */
> -int __read_mostly timer_stats_active;
> -
> -/*
> - * Beginning/end timestamps of measurement:
> - */
> -static ktime_t time_start, time_stop;
> -
> -/*
> - * tstat entry structs only get allocated while collection is
> - * active and never freed during that time - this simplifies
> - * things quite a bit.
> - *
> - * They get freed when a new collection period is started.
> - */
> -#define MAX_ENTRIES_BITS 10
> -#define MAX_ENTRIES (1UL << MAX_ENTRIES_BITS)
> -
> -static unsigned long nr_entries;
> -static struct entry entries[MAX_ENTRIES];
> -
> -static atomic_t overflow_count;
> -
> -/*
> - * The entries are in a hash-table, for fast lookup:
> - */
> -#define TSTAT_HASH_BITS (MAX_ENTRIES_BITS - 1)
> -#define TSTAT_HASH_SIZE (1UL << TSTAT_HASH_BITS)
> -#define TSTAT_HASH_MASK (TSTAT_HASH_SIZE - 1)
> -
> -#define __tstat_hashfn(entry)
> \
> - (((unsigned long)(entry)->timerÂÂÂÂÂÂÂ^
> \
> - ÂÂ(unsigned long)(entry)->start_funcÂÂ^
> \
> - ÂÂ(unsigned long)(entry)->expire_func ^
> \
> - ÂÂ(unsigned long)(entry)->pid ) &
> TSTAT_HASH_MASK)
> -
> -#define tstat_hashentry(entry) (tstat_hash_table +
> __tstat_hashfn(entry))
> -
> -static struct entry *tstat_hash_table[TSTAT_HASH_SIZE]
> __read_mostly;
> -
> -static void reset_entries(void)
> -{
> - nr_entries = 0;
> - memset(entries, 0, sizeof(entries));
> - memset(tstat_hash_table, 0, sizeof(tstat_hash_table));
> - atomic_set(&overflow_count, 0);
> -}
> -
> -static struct entry *alloc_entry(void)
> -{
> - if (nr_entries >= MAX_ENTRIES)
> - return NULL;
> -
> - return entries + nr_entries++;
> -}
> -
> -static int match_entries(struct entry *entry1, struct entry *entry2)
> -{
> - return entry1->timerÂÂÂÂÂÂÂ== entry2->timer ÂÂ&&
> - ÂÂÂÂÂÂÂentry1->start_funcÂÂ== entry2->start_funcÂÂ&&
> - ÂÂÂÂÂÂÂentry1->expire_func == entry2->expire_func &&
> - ÂÂÂÂÂÂÂentry1->pid ÂÂÂ== entry2->pid;
> -}
> -
> -/*
> - * Look up whether an entry matching this item is present
> - * in the hash already. Must be called with irqs off and the
> - * lookup lock held:
> - */
> -static struct entry *tstat_lookup(struct entry *entry, char *comm)
> -{
> - struct entry **head, *curr, *prev;
> -
> - head = tstat_hashentry(entry);
> - curr = *head;
> -
> - /*
> - Â* The fastpath is when the entry is already hashed,
> - Â* we do this with the lookup lock held, but with the
> - Â* table lock not held:
> - Â*/
> - while (curr) {
> - if (match_entries(curr, entry))
> - return curr;
> -
> - curr = curr->next;
> - }
> - /*
> - Â* Slowpath: allocate, set up and link a new hash entry:
> - Â*/
> - prev = NULL;
> - curr = *head;
> -
> - raw_spin_lock(&table_lock);
> - /*
> - Â* Make sure we have not raced with another CPU:
> - Â*/
> - while (curr) {
> - if (match_entries(curr, entry))
> - goto out_unlock;
> -
> - prev = curr;
> - curr = curr->next;
> - }
> -
> - curr = alloc_entry();
> - if (curr) {
> - *curr = *entry;
> - curr->count = 0;
> - curr->next = NULL;
> - memcpy(curr->comm, comm, TASK_COMM_LEN);
> -
> - smp_mb(); /* Ensure that curr is initialized before
> insert */
> -
> - if (prev)
> - prev->next = curr;
> - else
> - *head = curr;
> - }
> - out_unlock:
> - raw_spin_unlock(&table_lock);
> -
> - return curr;
> -}
> -
> -/**
> - * timer_stats_update_stats - Update the statistics for a timer.
> - * @timer: pointer to either a timer_list or a hrtimer
> - * @pid: the pid of the task which set up the timer
> - * @startf: pointer to the function which did the timer setup
> - * @timerf: pointer to the timer callback function of the
> timer
> - * @comm: name of the process which set up the timer
> - * @tflags: The flags field of the timer
> - *
> - * When the timer is already registered, then the event counter is
> - * incremented. Otherwise the timer is registered in a free slot.
> - */
> -void timer_stats_update_stats(void *timer, pid_t pid, void *startf,
> - ÂÂÂÂÂÂvoid *timerf, char *comm, u32 tflags)
> -{
> - /*
> - Â* It doesn't matter which lock we take:
> - Â*/
> - raw_spinlock_t *lock;
> - struct entry *entry, input;
> - unsigned long flags;
> -
> - if (likely(!timer_stats_active))
> - return;
> -
> - lock = &per_cpu(tstats_lookup_lock, raw_smp_processor_id());
> -
> - input.timer = timer;
> - input.start_func = startf;
> - input.expire_func = timerf;
> - input.pid = pid;
> - input.flags = tflags;
> -
> - raw_spin_lock_irqsave(lock, flags);
> - if (!timer_stats_active)
> - goto out_unlock;
> -
> - entry = tstat_lookup(&input, comm);
> - if (likely(entry))
> - entry->count++;
> - else
> - atomic_inc(&overflow_count);
> -
> - out_unlock:
> - raw_spin_unlock_irqrestore(lock, flags);
> -}
> -
> -static void print_name_offset(struct seq_file *m, unsigned long
> addr)
> -{
> - char symname[KSYM_NAME_LEN];
> -
> - if (lookup_symbol_name(addr, symname) < 0)
> - seq_printf(m, "<%p>", (void *)addr);
> - else
> - seq_printf(m, "%s", symname);
> -}
> -
> -static int tstats_show(struct seq_file *m, void *v)
> -{
> - struct timespec64 period;
> - struct entry *entry;
> - unsigned long ms;
> - long events = 0;
> - ktime_t time;
> - int i;
> -
> - mutex_lock(&show_mutex);
> - /*
> - Â* If still active then calculate up to now:
> - Â*/
> - if (timer_stats_active)
> - time_stop = ktime_get();
> -
> - time = ktime_sub(time_stop, time_start);
> -
> - period = ktime_to_timespec64(time);
> - ms = period.tv_nsec / 1000000;
> -
> - seq_puts(m, "Timer Stats Version: v0.3\n");
> - seq_printf(m, "Sample period: %ld.%03ld s\n",
> (long)period.tv_sec, ms);
> - if (atomic_read(&overflow_count))
> - seq_printf(m, "Overflow: %d entries\n",
> atomic_read(&overflow_count));
> - seq_printf(m, "Collection: %s\n", timer_stats_active ?
> "active" : "inactive");
> -
> - for (i = 0; i < nr_entries; i++) {
> - entry = entries + i;
> - if (entry->flags & TIMER_DEFERRABLE) {
> - seq_printf(m, "%4luD, %5d %-16s ",
> - entry->count, entry->pid, entry-
> >comm);
> - } else {
> - seq_printf(m, " %4lu, %5d %-16s ",
> - entry->count, entry->pid, entry-
> >comm);
> - }
> -
> - print_name_offset(m, (unsigned long)entry-
> >start_func);
> - seq_puts(m, " (");
> - print_name_offset(m, (unsigned long)entry-
> >expire_func);
> - seq_puts(m, ")\n");
> -
> - events += entry->count;
> - }
> -
> - ms += period.tv_sec * 1000;
> - if (!ms)
> - ms = 1;
> -
> - if (events && period.tv_sec)
> - seq_printf(m, "%ld total events, %ld.%03ld
> events/sec\n",
> - ÂÂÂevents, events * 1000 / ms,
> - ÂÂÂ(events * 1000000 / ms) % 1000);
> - else
> - seq_printf(m, "%ld total events\n", events);
> -
> - mutex_unlock(&show_mutex);
> -
> - return 0;
> -}
> -
> -/*
> - * After a state change, make sure all concurrent lookup/update
> - * activities have stopped:
> - */
> -static void sync_access(void)
> -{
> - unsigned long flags;
> - int cpu;
> -
> - for_each_online_cpu(cpu) {
> - raw_spinlock_t *lock = &per_cpu(tstats_lookup_lock,
> cpu);
> -
> - raw_spin_lock_irqsave(lock, flags);
> - /* nothing */
> - raw_spin_unlock_irqrestore(lock, flags);
> - }
> -}
> -
> -static ssize_t tstats_write(struct file *file, const char __user
> *buf,
> - ÂÂÂÂsize_t count, loff_t *offs)
> -{
> - char ctl[2];
> -
> - if (count != 2 || *offs)
> - return -EINVAL;
> -
> - if (copy_from_user(ctl, buf, count))
> - return -EFAULT;
> -
> - mutex_lock(&show_mutex);
> - switch (ctl[0]) {
> - case '0':
> - if (timer_stats_active) {
> - timer_stats_active = 0;
> - time_stop = ktime_get();
> - sync_access();
> - }
> - break;
> - case '1':
> - if (!timer_stats_active) {
> - reset_entries();
> - time_start = ktime_get();
> - smp_mb();
> - timer_stats_active = 1;
> - }
> - break;
> - default:
> - count = -EINVAL;
> - }
> - mutex_unlock(&show_mutex);
> -
> - return count;
> -}
> -
> -static int tstats_open(struct inode *inode, struct file *filp)
> -{
> - return single_open(filp, tstats_show, NULL);
> -}
> -
> -static const struct file_operations tstats_fops = {
> - .open = tstats_open,
> - .read = seq_read,
> - .write = tstats_write,
> - .llseek = seq_lseek,
> - .release = single_release,
> -};
> -
> -void __init init_timer_stats(void)
> -{
> - int cpu;
> -
> - for_each_possible_cpu(cpu)
> - raw_spin_lock_init(&per_cpu(tstats_lookup_lock,
> cpu));
> -}
> -
> -static int __init init_tstats_procfs(void)
> -{
> - struct proc_dir_entry *pe;
> -
> - pe = proc_create("timer_stats", 0644, NULL, &tstats_fops);
> - if (!pe)
> - return -ENOMEM;
> - return 0;
> -}
> -__initcall(init_tstats_procfs);
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 1d9fb6543a66..072cbc9b175d 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1523,8 +1523,6 @@ static void __queue_delayed_work(int cpu,
> struct workqueue_struct *wq,
> Â return;
> Â }
> Â
> - timer_stats_timer_set_start_info(&dwork->timer);
> -
> Â dwork->wq = wq;
> Â dwork->cpu = cpu;
> Â timer->expires = jiffies + delay;
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index eb9e9a7870fa..132af338d6dd 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -980,20 +980,6 @@ config DEBUG_TIMEKEEPING
> Â
> Â ÂÂIf unsure, say N.
> Â
> -config TIMER_STATS
> - bool "Collect kernel timers statistics"
> - depends on DEBUG_KERNEL && PROC_FS
> - help
> - ÂÂIf you say Y here, additional code will be inserted into
> the
> - ÂÂtimer routines to collect statistics about kernel timers
> being
> - ÂÂreprogrammed. The statistics can be read from
> /proc/timer_stats.
> - ÂÂThe statistics collection is started by writing 1 to
> /proc/timer_stats,
> - ÂÂwriting 0 stops it. This feature is useful to collect
> information
> - ÂÂabout timer usage patterns in kernel and userspace. This
> feature
> - ÂÂis lightweight if enabled in the kernel config but not
> activated
> - ÂÂ(it defaults to deactivated on bootup and will only be
> activated
> - ÂÂif some application like powertop activates it
> explicitly).
> -
> Âconfig DEBUG_PREEMPT
> Â bool "Debug preemptible kernel"
> Â depends on DEBUG_KERNEL && PREEMPT && TRACE_IRQFLAGS_SUPPORT
> --Â
> 2.7.4
>
>