Re: RFD: x32 ABI system call numbers

From: Arnd Bergmann
Date: Mon Sep 05 2011 - 16:29:01 EST


On Monday 05 September 2011 12:59:50 H. Peter Anvin wrote:
> On 09/05/2011 12:54 PM, H.J. Lu wrote:
> >
> > Since readv/writev/preadv/pwritev have const struct iovec *iov, I
> > have to copy the whole array. compat_sys seems more efficient.
> >
>
> compat_sys for these do exactly what we want, right?

Quoting from compat_rw_copy_check_uvector():

if (nr_segs > fast_segs) {
ret = -ENOMEM;
iov = kmalloc(nr_segs*sizeof(struct iovec), GFP_KERNEL);
if (iov == NULL)
goto out;
}
*ret_pointer = iov;

/*
* Single unix specification:
* We should -EINVAL if an element length is not >= 0 and fitting an
* ssize_t.
*
* In Linux, the total length is limited to MAX_RW_COUNT, there is
* no overflow possibility.
*/
tot_len = 0;
ret = -EINVAL;
for (seg = 0; seg < nr_segs; seg++) {
compat_uptr_t buf;
compat_ssize_t len;

if (__get_user(len, &uvector->iov_len) ||
__get_user(buf, &uvector->iov_base)) {
ret = -EFAULT;
goto out;
}
if (len < 0) /* size_t not fitting in compat_ssize_t .. */
goto out;
if (!access_ok(vrfy_dir(type), compat_ptr(buf), len)) {
ret = -EFAULT;
goto out;
}
if (len > MAX_RW_COUNT - tot_len)
len = MAX_RW_COUNT - tot_len;
tot_len += len;
iov->iov_base = compat_ptr(buf);
iov->iov_len = (compat_size_t) len;
uvector++;
iov++;
}


compared to native rw_copy_check_uvector():

if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
ret = -EFAULT;
goto out;
}

/*
* According to the Single Unix Specification we should return EINVAL
* if an element length is < 0 when cast to ssize_t or if the
* total length would overflow the ssize_t return value of the
* system call.
*
* Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
* overflow case.
*/
ret = 0;
for (seg = 0; seg < nr_segs; seg++) {
void __user *buf = iov[seg].iov_base;
ssize_t len = (ssize_t)iov[seg].iov_len;

/* see if we we're about to use an invalid len or if
* it's about to overflow ssize_t */
if (len < 0) {
ret = -EINVAL;
goto out;
}
if (unlikely(!access_ok(vrfy_dir(type), buf, len))) {
ret = -EFAULT;
goto out;
}
if (len > MAX_RW_COUNT - ret) {
len = MAX_RW_COUNT - ret;
iov[seg].iov_len = len;
}
ret += len;
}

This is better than I thought for the compat version. The only overhead
is in reading the array in word chunks as opposed to a single memcpu for
the native case. This should barely be noticeably within the other stuff
done in the same function. So you are both right, the compat case is good.

I was assuming that this would do something worse, like an extra copy
of the data back to userspace.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/