Re: [tip:sched/core] sched/cpuacct: Split usage accounting into user_usage and sys_usage

From: Srikar Dronamraju
Date: Mon Apr 04 2016 - 10:04:31 EST


* tip-bot for Dongsheng Yang <tipbot@xxxxxxxxx> [2016-03-31 02:27:39]:

> Commit-ID: d740037fac7052e49450f6fa1454f1144a103b55
> Gitweb: http://git.kernel.org/tip/d740037fac7052e49450f6fa1454f1144a103b55
> Author: Dongsheng Yang <yangds.fnst@xxxxxxxxxxxxxx>
> AuthorDate: Tue, 22 Mar 2016 16:37:08 +0800
> Committer: Ingo Molnar <mingo@xxxxxxxxxx>
> CommitDate: Thu, 31 Mar 2016 10:48:54 +0200
>
> sched/cpuacct: Split usage accounting into user_usage and sys_usage
>
> Sometimes, cpuacct.usage is not detailed enough to see how much CPU
> usage a group had. We want to know how much time it used in user mode
> and how much in kernel mode.
>

Unfortunately this commit causes boot to fail on my power 7 box.

> @@ -238,10 +316,18 @@ static struct cftype files[] = {
> void cpuacct_charge(struct task_struct *tsk, u64 cputime)
> {
> struct cpuacct *ca;
> + int index;
> +
> + if (user_mode(task_pt_regs(tsk)))
> + index = CPUACCT_USAGE_USER;
> + else
> + index = CPUACCT_USAGE_SYSTEM;
>
> rcu_read_lock();
> +
> for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
> - *this_cpu_ptr(ca->cpuusage) += cputime;
> + this_cpu_ptr(ca->cpuusage)->usages[index] += cputime;

The above line seems to be the cause of oops. Attached the complete console log below.

Unable to handle kernel paging request for data at address 0x00000108
Faulting instruction address: 0xc0000000000f8fbc
cpu 0x0: Vector: 300 (Data Access) at [c000000f1f607720]
pc: c0000000000f8fbc: .cpuacct_charge+0x2c/0x110
lr: c0000000000dd184: .update_curr+0xa4/0x1d0
sp: c000000f1f6079a0
msr: 9000000002009032
dar: 108
dsisr: 40000000
current = 0xc000000f1f582480
paca = 0xc00000000fe00000 softe: 0 irq_happened: 0x01
pid = 2, comm = kthreadd
Linux version 4.6.0-rc1-master+ (srikar@llmjuno) (gcc version 4.8.2 20131212 (Red Hat 4.8.2-7) (GCC) ) #38 SMP Tue Apr 5 00:20:26 IST 2016
enter ? for help
[c000000f1f607a30] c0000000000dd184 .update_curr+0xa4/0x1d0
[c000000f1f607ad0] c0000000000dfc2c .dequeue_task_fair+0xbc/0x1110
[c000000f1f607be0] c0000000000d3c70 .do_set_cpus_allowed+0x1b0/0x1c0
[c000000f1f607c70] c0000000000d4424 .__set_cpus_allowed_ptr+0x194/0x2e0
[c000000f1f607d50] c0000000000c70b8 .kthreadd+0x68/0x2c0
[c000000f1f607e30] c0000000000095e4 .ret_from_kernel_thread+0x58/0x74

git bisect shows this as the commit that causes the problem.
I verifed by booting with the tip + revert of this commit.

> +
> rcu_read_unlock();
> }
>

--
Thanks and Regards
Srikar Dronamraju


phys_mem_size = 0x1000000000
cpu_features = 0x0b7e7ae518500049
possible = 0x3fffffff18500649
always = 0x0000000018100040
cpu_user_features = 0xdc0065c7 0x20000000
mmu_features = 0x7c000003
firmware_features = 0x0000000010000000
htab_address = 0xc000000ffc000000
htab_hash_mask = 0x7ffff
-----------------------------------------------------
<- setup_system()
Linux version 4.6.0-rc1-master+ (srikar@llmjuno) (gcc version 4.8.2 20131212 (Red Hat 4.8.2-7) (GCC) ) #38 SMP Tue Apr 5 00:20:26 IST 2016
Node 0 Memory:
Node 1 Memory: 0x0-0x1000000000
numa: Initmem setup node 0
numa: NODE_DATA [mem 0xf2ed9e100-0xf2eda7fff]
numa: NODE_DATA(0) on node 1
numa: Initmem setup node 1 [mem 0x00000000-0xfffffffff]
numa: NODE_DATA [mem 0xf2ed94200-0xf2ed9e0ff]
Section 3884 and 3886 (node 1) have a circular dependency on usemap and pgdat allocations
Probing IODA IO-Hub /io-hub@3efe00000000
Initializing IODA0 OPAL PHB /io-hub@3efe00000000/pciex@3efe00080000
PCI host bridge /io-hub@3efe00000000/pciex@3efe00080000 (primary) ranges:
IO 0x00003efe01000000..0x00003efe017fffff -> 0x0000000000000000
MEM 0x00003da080000000..0x00003da0fffeffff -> 0x0000000080000000
Not support M64 window
128 (127) PE's M32: 0x80000000 [segment=0x1000000]
IO: 0x800000 [segment=0x10000]
Allocated bitmap for 256 MSIs (base IRQ 0x600)
Initializing IODA0 OPAL PHB /io-hub@3efe00000000/pciex@3efe00090000
PCI host bridge /io-hub@3efe00000000/pciex@3efe00090000 ranges:
IO 0x00003efe01800000..0x00003efe01ffffff -> 0x0000000000000000
MEM 0x00003da180000000..0x00003da1fffeffff -> 0x0000000080000000
Not support M64 window
128 (127) PE's M32: 0x80000000 [segment=0x1000000]
IO: 0x800000 [segment=0x10000]
Allocated bitmap for 256 MSIs (base IRQ 0xa00)
Initializing IODA0 OPAL PHB /io-hub@3efe00000000/pciex@3efe000a0000
PCI host bridge /io-hub@3efe00000000/pciex@3efe000a0000 ranges:
IO 0x00003efe02000000..0x00003efe027fffff -> 0x0000000000000000
MEM 0x00003da280000000..0x00003da2fffeffff -> 0x0000000080000000
Not support M64 window
128 (127) PE's M32: 0x80000000 [segment=0x1000000]
IO: 0x800000 [segment=0x10000]
Allocated bitmap for 256 MSIs (base IRQ 0xe00)
Initializing IODA0 OPAL PHB /io-hub@3efe00000000/pciex@3efe000b0000
PCI host bridge /io-hub@3efe00000000/pciex@3efe000b0000 ranges:
IO 0x00003efe02800000..0x00003efe02ffffff -> 0x0000000000000000
MEM 0x00003da380000000..0x00003da3fffeffff -> 0x0000000080000000
Not support M64 window
128 (127) PE's M32: 0x80000000 [segment=0x1000000]
IO: 0x800000 [segment=0x10000]
Allocated bitmap for 256 MSIs (base IRQ 0x1200)
Initializing IODA0 OPAL PHB /io-hub@3efe00000000/pciex@3efe000c0000
PCI host bridge /io-hub@3efe00000000/pciex@3efe000c0000 ranges:
IO 0x00003efe03000000..0x00003efe037fffff -> 0x0000000000000000
MEM 0x00003da480000000..0x00003da4fffeffff -> 0x0000000080000000
Not support M64 window
128 (127) PE's M32: 0x80000000 [segment=0x1000000]
IO: 0x800000 [segment=0x10000]
Allocated bitmap for 256 MSIs (base IRQ 0x1600)
Initializing IODA0 OPAL PHB /io-hub@3efe00000000/pciex@3efe000d0000
PCI host bridge /io-hub@3efe00000000/pciex@3efe000d0000 ranges:
IO 0x00003efe03800000..0x00003efe03ffffff -> 0x0000000000000000
MEM 0x00003da580000000..0x00003da5fffeffff -> 0x0000000080000000
Not support M64 window
128 (127) PE's M32: 0x80000000 [segment=0x1000000]
IO: 0x800000 [segment=0x10000]
Allocated bitmap for 256 MSIs (base IRQ 0x1a00)
OPAL nvram setup, 1048576 bytes
Top of RAM: 0x1000000000, Total RAM: 0x1000000000
Memory hole size: 0MB
Zone ranges:
DMA [mem 0x0000000000000000-0x0000000fffffffff]
DMA32 empty
Normal empty
Movable zone start for each node
Early memory node ranges
node 1: [mem 0x0000000000000000-0x0000000fffffffff]
Could not find start_pfn for node 0
Initmem setup node 0 [mem 0x0000000000000000-0x0000000000000000]
On node 0 totalpages: 0
Initmem setup node 1 [mem 0x0000000000000000-0x0000000fffffffff]
On node 1 totalpages: 1048576
DMA zone: 1024 pages used for memmap
DMA zone: 0 pages reserved
DMA zone: 1048576 pages, LIFO batch:1
percpu: Embedded 3 pages/cpu @c000000f2dc00000 s133656 r0 d62952 u262144
pcpu-alloc: s133656 r0 d62952 u262144 alloc=1*1048576
pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47
pcpu-alloc: [0] 48 49 50 51 [0] 52 53 54 55
pcpu-alloc: [0] 56 57 58 59 [0] 60 61 62 63
Built 2 zonelists in Node order, mobility grouping on. Total pages: 1047552
Policy zone: DMA
Kernel command line: BOOT_IMAGE=/boot/vmlinux-4.6.0-rc1-master+ root=/dev/mapper/fedora_llmjuno03b-root ro rd.md=0 rd.dm=0 vconsole.keymap=us rd.luks=0 vconsole.font=latarcyrheb-sun16 rd.lvm.lv=fedora_llmjuno03b/swap rd.lvm.lv=fedora_llmjuno03b/root nmi_watchdog=0 nohpet ignore_loglevel log_buf_len=10M print_fatal_signals=1 loglevel=8 panic=30
log_buf_len: 16777216 bytes
early log buf free: 122804(93%)
PID hash table entries: 4096 (order: -1, 32768 bytes)
Sorting __ex_table...
Memory: 63497024K/67108864K available (9152K kernel code, 1728K rwdata, 3160K rodata, 832K init, 1040K bss, 253120K reserved, 3358720K cma-reserved)
SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=64, Nodes=2
Hierarchical RCU implementation.
Build-time adjustment of leaf fanout to 64.
RCU restricting CPUs from NR_CPUS=2048 to nr_cpu_ids=64.
RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=64
NR_IRQS:512 nr_irqs:512 16
ICS OPAL backend registered
time_init: decrementer frequency = 512.000000 MHz
time_init: processor frequency = 3300.000000 MHz
clocksource: timebase: mask: 0xffffffffffffffff max_cycles: 0x761537d007, max_idle_ns: 440795202126 ns
clocksource: timebase mult[1f40000] shift[24] registered
clockevent: decrementer mult[83126e98] shift[32] cpu[0]
Console: colour dummy device 80x25
console [hvc0] enabled
console [hvc0] enabled
bootconsole [udbg0] disabled
bootconsole [udbg0] disabled
mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
pid_max: default: 65536 minimum: 512
Dentry cache hash table entries: 8388608 (order: 10, 67108864 bytes)
Inode-cache hash table entries: 4194304 (order: 9, 33554432 bytes)
Mount-cache hash table entries: 131072 (order: 4, 1048576 bytes)
Mountpoint-cache hash table entries: 131072 (order: 4, 1048576 bytes)
Unable to handle kernel paging request for data at address 0x00000108
Faulting instruction address: 0xc0000000000f8fbc
cpu 0x0: Vector: 300 (Data Access) at [c000000f1f607720]
pc: c0000000000f8fbc: .cpuacct_charge+0x2c/0x110
lr: c0000000000dd184: .update_curr+0xa4/0x1d0
sp: c000000f1f6079a0
msr: 9000000002009032
dar: 108
dsisr: 40000000
current = 0xc000000f1f582480
paca = 0xc00000000fe00000 softe: 0 irq_happened: 0x01
pid = 2, comm = kthreadd
Linux version 4.6.0-rc1-master+ (srikar@llmjuno) (gcc version 4.8.2 20131212 (Red Hat 4.8.2-7) (GCC) ) #38 SMP Tue Apr 5 00:20:26 IST 2016
enter ? for help
[c000000f1f607a30] c0000000000dd184 .update_curr+0xa4/0x1d0
[c000000f1f607ad0] c0000000000dfc2c .dequeue_task_fair+0xbc/0x1110
[c000000f1f607be0] c0000000000d3c70 .do_set_cpus_allowed+0x1b0/0x1c0
[c000000f1f607c70] c0000000000d4424 .__set_cpus_allowed_ptr+0x194/0x2e0
[c000000f1f607d50] c0000000000c70b8 .kthreadd+0x68/0x2c0
[c000000f1f607e30] c0000000000095e4 .ret_from_kernel_thread+0x58/0x74