Re: Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

From: Matthias Dahl
Date: Wed Jul 13 2016 - 11:33:09 EST


Hello...

On 2016-07-13 15:47, Michal Hocko wrote:

This is getting out of my area of expertise so I am not sure I can help
you much more, I am afraid.

That's okay. Thank you so much for investing the time.

For what it is worth, I did some further tests and here is what I came
up with:

If I create the plain dm-crypt device with --perf-submit_from_crypt_cpus,
I can run the tests for as long as I want but the memory problem never
occurs, meaning buffer/cache increase accordingly and thus free memory
decreases but used mem stays pretty constant low. Yet the problem here
is, the system becomes sluggish and throughput is severely impacted.
ksoftirqd is hovering at 100% the whole time.

Somehow my guess is that normally dm-crypt simply takes every request,
encrypts it and queues it internally by itself. And that queue is then
slowly emptied to the underlying device kernel queue. That is why I am
seeing the exploding increase in used memory (rather than in buffer/cache)
which in the end causes a OOM situation. But that is just my guess. And
IMHO that is not the right thing to do (tm), as can be seen in this case.

No matter what, I have no clue how to further diagnose this issue. And
given that I already had unsolvable issues with dm-crypt a couple of
months ago with my old machine where the system simply hang itself or
went OOM when the swap was encrypted and just a few kilobytes needed to
be swapped out, I am not so sure anymore I can trust dm-crypt with a
full disk encryption to the point where I feel "safe"... as-in, nothing
bad will happen or the system won't suddenly hang itself due to it. Or
if a bug is introduced, that it will actually be possible to diagnose it
and help fix it or that it will even be eventually fixed. Which is really
a pity, since I would really have liked to help solve this. With the
swap issue, I did git bisects, tests, narrowed it down to kernel versions
when said bug was introduced... but in the end, the bug is still present
as far as I know. :(

I will probably look again into ext4 fs encryption. My whole point is
just that in case any of disks go faulty and needs to be replaced or
sent in for warranty, I don't have to worry about mails, personal or
business data still being left on the device (e.g. if it is no longer
accessible or has reallocated sectors or whatever) in a readable form.

Oh well. Pity, really.

Thanks again,
Matthias

--
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu
services: custom software [desktop, mobile, web], server administration