RE: [PATCH] eCryptfs: Quota check incorrectly ignored

From: Li Wang
Date: Fri Feb 10 2012 - 09:44:14 EST


> -----Original Message-----
> From: Jan Kara [mailto:jack@xxxxxxx]
> Sent: Friday, February 10, 2012 6:32 PM
> To: Li Wang
> Cc: Tyler Hicks; ecryptfs@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
Jan
> Kara; Yong Peng
> Subject: Re: [PATCH] eCryptfs: Quota check incorrectly ignored
>
> On Thu 09-02-12 19:39:32, Li Wang wrote:
> > eCryptfs recently modified the write path to perform encryption
> > and write down in ecryptfs_writepage(). This function invokes
vfs_write()
> > to write down the encrypted data to lower page cache. vfs_write() will
> > first make sure this write will not exceed the quota limit for the owner
> > of the file being written into, if quota is supported by
> > the underlying file system, and it is turned on. Normally, it
accomplishs
> > this job by calling check_idq()/check_bdq() (fs/quota/dquot.c). When
system
> > dirty ratio is not high, ecryptfs_writepage() is normally invoked by the
> > write back kernel thread, who has the capability CAP_SYS_RESOURCE,
> > this priority will let check_idq()/check_bdq() directly bypass the quota
> > check. This sometimes makes data safely written into the disk in spite
of
> > exceeding the quota limit.
> >
> > This patch temporarily removes the CAP_SYS_RESOURCE capability from the
> kernel
> > thread before invoking vfs_write_lower(), to let it undergo quota check
by
> > the lower file system, if necessary. After that, reassign the
capability.
> Hmm, but then the error will just be thrown away by the flusher thread
> and the application never learns about it? That doesn't sound like a good
> solution.
>
> Honza

Yes. we are aware of that, but we have not found better solution, since it
seems
VFS does not supply a file system independent interface to check quota early
and only
(fix us if we are wrong), The file systems just perform the quota check in
their own way,
the routine is wrapped deeply inside the file system specific code path.
Maybe we could
copy some codes from _dquot_alloc_space() & dquot_alloc_inode
(fs/quota/dquot.c) to
perform quota check on lower inode ourselves, but that is ugly, and we are
not sure if it works
for any file system or not..., On the other side, due to the existence of
write buffer, and io schedule,
the successful write call does not gurantee the data will be written into
disk, in terms of that,
we do not change much of the semantic of write call.

>
>
> >
> > Signed-off-by: Li Wang <liwang@xxxxxxxxxxx>
> > Singed-off-by: Yong Peng <pengyong@xxxxxxxxxxxxxx>
> > ---
> >
> > To repeat this bug,
> > mount -o usrquota /dev/sda3 /tmp
> > cd /tmp
> > edquota -u foo // set the disk quota limit for user foo be m1 bytes
> > quotaon -a
> > mount -t ecryptfs cipher plain
> >
> > login the system as user foo
> > cd /tmp/plain
> > execute the following simple program
> >
> > int main()
> > {
> > char buf[m2]; // m2>m1
> > FILE *f = fopen("dummy", "w");
> > fwrite(buf, 1, m2, f);
> > sleep(60); // let the kernel thread do the write back job
> > fclose(f);
> > return 0;
> > }
> >
> > This program can write as much of data as it wants, provided sleep
> enough long time before closing the file.
> >
> > fs/ecryptfs/crypto.c | 27 +++++++++++++++++++++++++++
> > 1 files changed, 27 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
> > index 63ab245..2c1da29 100644
> > --- a/fs/ecryptfs/crypto.c
> > +++ b/fs/ecryptfs/crypto.c
> > @@ -457,6 +457,7 @@ int ecryptfs_encrypt_page(struct page *page)
> > struct page *enc_extent_page = NULL;
> > loff_t extent_offset;
> > int rc = 0;
> > + struct cred *cred;
> >
> > ecryptfs_inode = page->mapping->host;
> > crypt_stat =
> > @@ -487,8 +488,34 @@ int ecryptfs_encrypt_page(struct page *page)
> > * (PAGE_CACHE_SIZE
> > / crypt_stat->extent_size))
> > + extent_offset), crypt_stat);
> > + if (current->flags & PF_KTHREAD) {
> > + /*
> > + * Temporarily remove the
> CAP_SYS_RESOURCE capability
> > + * from the write back kernel thread to let it
> undergo
> > + * quota check by the lower file system
> > + */
> > + cred = prepare_creds();
> > + if (unlikely(!cred)) {
> > + rc = -ENOMEM;
> > + goto out;
> > + }
> > + cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
> > + commit_creds(cred);
> > + }
> > rc = ecryptfs_write_lower(ecryptfs_inode, enc_extent_virt,
> > offset, crypt_stat->extent_size);
> > + if (current->flags & PF_KTHREAD) {
> > + /*
> > + * Reassign the CAP_SYS_RESOURCE
> capability
> > + */
> > + cred = prepare_creds();
> > + if (unlikely(!cred)) {
> > + rc = -ENOMEM;
> > + goto out;
> > + }
> > + cap_raise(cred->cap_effective, CAP_SYS_RESOURCE);
> > + commit_creds(cred);
> > + }
> > if (rc < 0) {
> > ecryptfs_printk(KERN_ERR, "Error attempting "
> > "to write lower page; rc = [%d]"
> > --
> > 1.7.6.5
> --
> Jan Kara <jack@xxxxxxx>
> SUSE Labs, CR

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/