memory

Albert Cahalan (albert@ccs.neu.edu)
Thu, 14 Mar 1996 00:01:26 -0500


Please check this:

This document explains how to use memory from kernel space.
The information was primarily collected from the Linux
kernel mailing list in December 1995. I wish to thank the
people who helped answer qustions:

Alan Cox <alan@cymru.net>
Jochen Karrer <karrer@wpfd25.physik.uni-wuerzburg.de>
Leonard N. Zubkoff <lnz@dandelion.com>
Mauro Condarelli <mc5686@mclink.it>
Michael Weller <eowmob@exp-math.uni-essen.de>
Rogier Wolff <wolff@socrates.et.tudelft.nl>

Albert Cahalan (all email addresses are unreliable)
<albert@ccs.neu.edu> <acahalan@cs.uml.edu> <acahalan@lynx.neu.edu>

****** Getting memory

get_free_pages()
max 128 kB
physically one piece
OK for DMA
can not be freed
usable only at boot
can only allocate a multiple of the page size (4kB or 8kB)
vmalloc()
almost no size limit
one piece in virtual memory, but not in physical
DMA would be really hard - maybe one page at a time is OK
free with vfree()
useable anytime [even in an IRQ handler?]
can only allocate a multiple of the page size (4kB or 8kB)
kmalloc()
max 128kB - headersize (waste: adds headersize, rounds up to some 2^n)
physically one piece
OK for DMA
free with kfree()
usable anytime (pass parameter to indicate if IRQ handler)
can allocate tiny fragments (small ones are wasteful)

For DMA or memory of size less than one page use kmalloc. kmalloc always
returns physically continous memory. Allocating this is quite difficult
and may often fail. Thus the 128K limit is quite sensible. Check for
proper allocation priority (how hard to try/which emergency memory pools
to access) and check the return value for failure!!!!

The size of the physical memory chunks is limited because of memory
fragmentation problems, [someone want to write a memory defrag?]
but the virtual adress space of the kernel is 1 GB, so vmalloc() can
allocate very big pieces. For large pieces you should use get_free_pages
or kmalloc only if you need memory for DMA. (see man 9 kmalloc)

For buffers bigger than 128kB the only solution is to allocate
it at system initialization, during the boot sequence.
In ../linux/init/main.c: start_kernel(), modify memory_start
(increasing it by the amount of memory required) somewhere after
the call to setup_arch() and before the call to mem_init(). The
drawback of this solution is that such a big buffer (obviously
bigger than 128k!) is reserved once and for all, no way to reclaim
it when not in use. This buffer is also at a known physical address
and is physically contiguous (i.e.: can be used for DMA transfers).

The code and data of kernel modules is allocated with vmalloc().
DMA can not be used in this memory.

****** user space as seen from kernel

There are two distinct address spaces: kernel and user.
Special care must be used when transferring data between these two
address spaces. The *_user_*() routines are inlines to access
memory over the fs segment pointer pointing to user segment.
The user functions do an address translation between the kernel
addressing and addressing in the process "current".
[can current just be changed?]
The user functions work even if the process is paged out, in which case
they (indirectly) generate normal page faults to get the needed page.

****** specific memory, memory mapped hardware

In the i386 kernel the device area (640K-1Mbyte) is identity
mapped so you can do things like hardware_ptr=0xD0000.

In 1.3.xx+ you should access this space using memcpy_fromio() and
memcpy_toio(). This is because the DEC Alpha uses similar devices
and memory mapping but does not map the 640-1Mb ISA hole into
640K-1MB as seen by the CPU. (address lines are shifted because
the Alpha can not access less than 32 bits) Using memcpy_fromio
should work on a DEC alpha.

Above that, such as memory mapped hardware at 0x80000000:
void * vremap(unsigned long offset, unsigned long size);
mypointer = vremap(PhysicalAddress, HowManyBytes);
The address returned is _different_ from the hardware address.

It is not currently possible to remove the processes using memory
that is needed. Thus, if there is video memory between 14MB and 16MB
it must be reserved for video only. If gcc is running in video memory,
there is no way to move it somewhere else or swap it out.

If your hardware supports the "hole" between 14 and 16 Mb, you
should be able to modify the kernel not to use the memory for
anything else by modifying something like "init/main.c".

If an ISA card is mapped between 14->16MB and you have over 14MB
of memory, these things may happen:
a: the card does not work, and you get plain RAM instead
b: the card works, and you waste 2MB of RAM
c: you use motherboard-specific code to remap memory
d: Bad Things

Assuming tha card works, you can modify mem_init()
in arch/i386/mm/init.c to leave the memory reserved.

You mark a page as reserved with
mem_map[MAP_NR(address_of_myreservedpage)].reserved = 1;

By default all pages are marked as reserved when mem_init()
is entered, (see also mm/swap.c free_area_init()) so you only
have to take care that mem_init does not mark the area from
14 to 16 MB as not reserved:

By default the memory space from 0 - 640k and the space from
1M to memory end are marked as unreserved in mem_init():

while (start_low_mem < 0x9f000) {
mem_map[MAP_NR(start_low_mem)].reserved = 0;
start_low_mem += PAGE_SIZE;
}
while (start_mem < high_memory) {
mem_map[MAP_NR(start_mem)].reserved = 0;
start_mem += PAGE_SIZE;
}

mem_map is exported for use in modules, but you should do
that in mem_init(), before the memory is possibly used.
(there should be an memexclude option for the bootprompt)

****** swappable memory

The kernel code it not swappable. It would be very bad to swap
out an IRQ handler, disk driver, swapper, filesystem...
It seems impractical to change this (by marking what code _could_
be swapped), but perhaps init routines could be collected
in one place and thrown out after the system is running.

Swappable data is only available if you have a user mode process
working with you (see for example the multicast router cache).
If you need to allocate swappable memory, consider doing the job
in user space. If this is not possible you must write a daemon
whose sole purpose is storing data for the kernel - and it had
better not crash!

**********************************************************