Montreal Linux Power Management Mini-Summit, July 13,2009 - Meeting Notes

From: Len Brown
Date: Thu Jul 30 2009 - 18:05:25 EST


A Linux Power Management "mini-summit" was held on July 13th, 2009 -
on the first day of the Montreal Linux Symposium.

The Linux Symposium generously provided the facilities.

We repeated the process used in 2008: http://lwn.net/Articles/292447/

This year the meeting room was more accessible to the general attendees
of the Linux Symposium, so we had a fair number of "drop-ins".
25 signed in (listed below) plus a few more that came and went.
While this exceeded our cap of 20, the extra people did not hinder
our goal of focusing on a single discussion.

Attendees
---------
Len Brown - Intel - ACPI, SFI, Suspend co-Maintainer
Howard Alyne - Wind River
Pierre Phaneuf
Rafael J. Wysocki - SUSE Labs/Novell, U. Warsaw; Hibernate and Suspend Maintainer
Per-Inge Tallberg - Ericsson
Rickard Andersson - Ericsson
Paul Mundt - Renesas - SH Maintainer
Magnus Damm - Renesas
Richard Wooodruff - Texas Instruments, OMAP
Stephen Hui - Zarlink
John Linville - Red Hat - Wireless LAN maintainer
Mark Brown - Marvell
Samuel Thibault - labri.fr
Lucas Nussbaun - inria.fr
Srinivas Sripathi - Motorola
Jason Baron - Red Hat
Aristu Rozanaski - Red Hat - RHEL6 kernel maintainer
Christopher Curtis - RipTide Software
Klaus Pedersen - Nokia
H. Peter Anvin - Intel - x86 maintainer
Ernest Szedeman - Nortel
Rick Leir - Leirtech
David Ahern - Cisco
Wending Wen - Rheinmetall
Jason Chagas - Marvell

Some of the attendees are in photos here:
http://picasaweb.google.com/lenb417/2009LinuxSymposium#

Agenda
------
1. Review changes over the last year
2. Survey tools, techniques, workloads
3. Discuss upcoming work

Summary of Power Management kernel changes since last year
----------------------------------------------------------
ACPI Platform BIOS compatibility fixes
ACPI ACPI_SCI_EN work-around
resume memory corruption workarounds

hibernation:
NVS memory handling
handle overlapping memory zones

suspend/resume framework re-work (Rafael Wysocki)
shipped suspend/resume RTC test feature
ordering update/workaround
simplified driver interface now available
r8169 etc. drivers now using it
PCI PM framework re-worked to simplify drivers
graphics drivers better support suspend/resume
i915 video restore, though has bugs
ATI making progress, especially older cards
NVIDIA - continues to trail
no open source support for devices after 7200
power aware scheduling
sched_mc_power_savings
per-CPU timers fixed
clock_events_broadcast()
bugs fixed
(no longer needed on Westmere, which has always running LAPIC timer)
range timers shipped upstream
eg. range timers used android to group around wireless

Intel shipped Nehalem (Core i7), which has always-running-TSC

Run Time power management is receiving some attention now.

OMAP (Richard Woodruff)
2008 had TI releasing aggressive full-off reference code on public portals
Customers snapshotted this code at different points
Heavy support burden ramping variants into production
Linux-OMAP community have been creating a cleaner version of aggressive PM
code suitable for mainline kernel in Linux-OMAP PM branch.
Hope of reduced burden for future kernels with mainlined code

ACPI sub-system (Len Brown)
quality has been the focus for the last year.
We continue to process about 300 bugs/year
with 50-60 unresolved at any given time.

Wireless: (John Linville)
mac-80211 is now suspend/resume aware
IEEE-80211 has run-time power saving features
eg. negotiate w/ access point
starting to deploy in drivers
beacon filtering (reduces CPU wake-ups)
TX power upcoming in cfg-80211 API
Nokia tablets pushing power savings

SH: (Paul Mundt)
cpuidle integration
using clocksources & clockevents from upstream
can switch between timers depending on sleep states
Hibernate & STR enabled, can test w/ RTC & kexec-jump-and-return

s390:
added suspend/resume support

5-second boot on Atom netbook for Moblin
async API is upstream
Fedora Core-11 boots in 20 seconds on a notebook
Down from 60 seconds in Fedora Core-10

PM-QOS shipped
Documentation/power/pm_qos_interface.txt

Survey of Tools, Techniques, workloads for optimizing power management
----------------------------------------------------------------------
powertop
bootchart
bootgraph
CONFIG_POWER_TRACER=y
LTT-lite
performance counters for energy coming
OMAP uses on-board instrumentation
suspend/resume debug I/F
Power meters:
O(100) Watts Up Pro; O(600) Extech; O(1000) Yokogawa
O(600) HP/Agilent 34401A
OMAP: measure per-power-plane w/ lab instruments
500mA vs uA range difficult to measure w/ precision
multi-channel DAC - each channel calibrated to range

Workloads for measuring power:

handheld: no standard workloads
however device vendors have internal benchmarks
#1 idle
#2 specific workloads
#3 combination use-case

SpecPower benchmark for servers (only)

Energy Star for client computers
idle only
requires STR to be enabled by default
Energy Star Server spec coming
Future Energy Star wants to use energy benchmark
BAPCO
MobileMark 2007 for Windows
Apple joined, so expect something new to work also on Apple
No Linux Distro representation
EEMBC
released something or other...
BLTK (Battery Life Toolkit) for Linux
http://www.lesswatts.org/projects/bltk/
could use refresh
could use handheld new workloads

Future plans for the PM development, kernel side
-------------------------------------------------
cpuidle C-states generalized to be platform idle states...
platform driver can hide platform hooks into CPU power states

Runtime PM for Platform Devices.
2.6.32 framework plan simmering
SH running on top of prototype now
context save/restore for power off power domain

platform devices
SH specific - Magnus
IO devices
eg PCI, USB - Alan Stern

clock framework (started in ARM, now common on embedded)
includes ref-counts/clock
architecture specific implementation
x86/ACPI system doesn't expose clock dependencies
so unclear benefit to that arch

Run-time PM of I/O devices, from the PCI POV mostly
ability to put device into D1/D2 (~200us) /D3 (10ms)

wakeup: PCIe #PME plug-event via root port
(PCI #PME is less well specified)

ACPI 4.0 adds D3hot
Q: has an effect on _SD3?

Hibernate/suspend:

Axiom: we need more people fixing suspend/resume bugs

Suspend2 aka "Tux on Ice"
Spring 2009 patch set to replace hibernate w/ TOI was
deemed impractical by upstream community, which prefers
an incremental approach.

Since, Nigel has sent specific patches to Rafael along
the lines of gradual cherry-picking that upstream needs.

First example is patch to compress hibernation image
which Rafael thinks can be integrated.

TOI is able to save larger hibernate images due to
how it manages memory. This is a nice benefit and
we'd like to see if we can do it upstream.

patch review bandwidth limited

1. image compression
2. image saving performance
currently very slow
3. ability to use multiple devices to save images
including multiple swaps, and regular files
4. break the half-of-memory image limitation
5. Image encryption (solution for keys is an issue)

It would be great to have Nigel supporting upstream hibernate.

TOI supports snapshot boot via "kiosk mode"

Hibernate & kexec
kexec-jump is upstream (i386, SH, no x86_64)
simplifies memory management of the "jumped to" code
unclear if any other advantages.

kexec-crash-dump is useful
can make an oops "look less scary" and be automatic

STR performance
eliminate console switch
async device resume

android submitted "auto-suspend" patches
compromise between low-level and high-level suspend invocation policy.

cpuidle vs auto-suspend
suspend is more "draconian", it stops timers etc for you.
platform drivers in cpuidle can get to same place.

Android
OHA -Open Handset Alliance
controls android license(s)
Android = access to app-store
Moblin
shall support Android applications

OMAP & SH specifics

UIO - user space codec etc. have no concept of PM
could use clock framework extension
(clock framework is accessible via debugfs if necessary)
interrupt coalescing
deferred I/O to LCD
delay until regular (infrequent) update interval
use x-damage API to track change to visible screen

SH running cpufreq on top of clock framework
cpufreq has notifiers, clock framework does not

lightweight CPU hotplug
IBM proposed "idle throttling" approach using scheduler
Intel is proposing simple "forced idle" RT thread
PeterZ likes neither implementation, but
favors the IBM approach in the long term.

SH SMP wants to run Itron on some cores...
low latency transition is important

Memory Power Management
Nokia project w/ U. in Brazil
more pain than gain in memory offline prototype
"partial RAM self refresh"

page tables for kernel memory would allow
moving kernel physical memory

memory off-line incompatible with high-performance interleaving

using NUMA node to segment memory allows tracking
unused memory
anti-fragmentation went upstream last year

consensus: online/offline
node granularity only

ACPI 4.0 was published
Error Reporting extensions
processor aggregator device (forced idle to save power)
D3hot
generalized fan support
thermal extensions
IPMI op-region

Len will do a Linux ACPI 4.0 presentation this Fall

virtualization power management
PM is still an after-though in the VMM space
they have bigger problems

KVM gets everything in Linux for free
but could benefit from more info from the guests

Xen gets to re-invent/port/re-implement everything in Linux

VMMS have an easier time moving physical pages
and thus doing memory power management

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/