Re: Change in functionality of futex() system call.

From: Kyle Moffett
Date: Thu Jun 09 2011 - 00:45:16 EST

Next message: Daisuke Nishimura: "Re: [BUGFIX][PATCH] memcg: fix wrong decision of noswap withsoftlimit."
Previous message: George Spelvin: "Re: Change in functionality of futex() system call."
In reply to: Eric Dumazet: "Re: Change in functionality of futex() system call."
Next in thread: Peter Zijlstra: "Re: Change in functionality of futex() system call."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Jun 8, 2011 at 23:54, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
> Le mercredi 08 juin 2011 Ã 23:38 -0400, Andrew Lutomirski a Ãcrit :
>> Huh?
>>
>> I still don't understand why userspace ought to need to deny read
>> access to a file to prevent DoS. ÂI think it's entirely reasonable for
>> userspace to make the assumption that users with read access cannot
>> make changes visible to writers unless explicitly documented (i.e.
>> file locking, which is so thoroughly broken that it shouldn't be taken
>> as an example of how to design anything).
>>
>> Given that current kernels make this use safe and the proposal is to
>> make it unsafe, I think it's worth designing the interface to avoid
>> introducing new security problems.
>
> I am very tired of this discussion, you repeat the same arguments over
> and over.
>
> You can not prevent DOS on a machine if you allow a process to RO map
> your critical files (where you put futexes), because you allow this
> process to interfere with critical cache lines bouncing between cpus.
>
> Really, please forget about this crazy idea of allowing foreigners to
> _read_ or memory _map_ your files. Dont do it.

The issue is NOT that things get "slow". There are lots of ways to do that
in an untrusted process on a normal Linux system. Chewing CPU time
and reading random small files from all over the disk are the easiest ones,
and most Linux distributions usually ship lots of such files in directories
such as /usr/share/doc, /usr/share/zoneinfo, various locales directories, etc.

The issue is that this allows you to eat wakeups and make processes hang.

One relatively trivial example would be a database library like libdb or
similar. The library could very reasonably use futexes to communicate
between multiple simultaneous threads writing to the same database
file. Since the library wants to be well-behaved and avoid thundering
herd problems, it only issues a single wakeup for each lock release.

Now you have another program which uses the same database library
to do lockless queries of the of the DB file. This is all well and good
except that it can now permanently hang an unlimited number of writer
threads in FUTEX_WAIT with trivial effort and virtually zero CPU.

All the attacker process needs to do is mmap() one page containing
one lock that the victim threads take occasionally and do this in a loop:
int *victimfutex = [...];
while(1)
futex(victimfutex, *victimfutex, FUTEX_WAIT, NULL, NULL, 0);

Suddenly read-only access to *ANY* database file that happens to
use an in-file futex means that you can hang the database... period.
If you write it in ASM, you could even probably start a whole bunch
of threads in parallel by sharing the same stack.

Even better from a DoS standpoint, this does not trigger any resource
limits the way other attacks would, because you are sleeping 99.999%
of the time and are using no memory. On top of that, once all of the
program's threads are stuck you can exit and it will just stay stuck.

This kind of thing is incredibly common in web-applications and other
similar environments, where "www-data" should be allowed to query
various file databases which are maintained by another daemon.

If the C library happens to use an in-file futex for arbitrating processes
writing to /var/log/utmp or /var/log/lastlog, it suddenly becomes trivial
to lock up every login process.

That is why FUTEX_WAIT needs separate handling for read-only files.

Cheers,
Kyle Moffett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Daisuke Nishimura: "Re: [BUGFIX][PATCH] memcg: fix wrong decision of noswap withsoftlimit."
Previous message: George Spelvin: "Re: Change in functionality of futex() system call."
In reply to: Eric Dumazet: "Re: Change in functionality of futex() system call."
Next in thread: Peter Zijlstra: "Re: Change in functionality of futex() system call."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]