Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling (Was Re: Question: how to handle too big f_pos

From: AmÃrico Wang
Date: Wed Sep 16 2009 - 04:20:44 EST


On Wed, Sep 16, 2009 at 1:29 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
>
> The problem:
>> I'm writing a patch against /dev/kmem...I found a problem.
>>
>> /dev/kmem (and /proc/<pid>/mem) puts virtual addres to f->f_pos.
>>
>> but f->f_pos is always negative and rw_verify_ara() returns -EINVAL always.
>
> Changed CC: List.
>
> This is a trial to consider how to fix negative f_pos problem shown in above.
>
> Hmm, even after this patch, x86's vsyscall area is not readable.
> ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 Â[vsyscall]
> But maybe no problems. (now, it cannot be read, anyway.)
>
> I tested /dev/kmem on x86-64 and this works fine. I added a fix for
> /proc/<pid>/mem because I know ia64's hugetlbe area is not readable
> via /proc/<pid>/mem. (But I'm not sure other 64bit arch has this
> kind of problems in /proc/<pid>/mem)
>
> ==
> From: KAMEZAWA Hiruyoki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
>
> Modifying rw_verify_area()'s negative f_pos check.
>
> Now, rw_verify_area() has this check
> Â if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> Â Â Â Â Â Â Â Âreturn -EINVAL
>
> And access to special files as /dev/mem,kmem, /proc/<pid>/mem
> returns unexpected -EINVAL.
> (For example, ia64 maps hugetlb at 0x8000000000000000- region)
>
> This patch tries to make range check more precise by using
> llseek ops defined per special files.
>
> Signed-off-by: KAMEZAWA Hiruyoki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> ---
> Âfs/proc/base.c Â| Â 22 +++++++++++++++++-----
> Âfs/read_write.c | Â 39 +++++++++++++++++++++++++++++++++++++--
> Â2 files changed, 54 insertions(+), 7 deletions(-)
>
> Index: mmotm-2.6.31-Sep14/fs/read_write.c
> ===================================================================
> --- mmotm-2.6.31-Sep14.orig/fs/read_write.c
> +++ mmotm-2.6.31-Sep14/fs/read_write.c
> @@ -205,6 +205,37 @@ bad:
> Â}
> Â#endif
>
> +static int
> +__verify_negative_pos_range(struct file *file, loff_t pos, size_t count)
> +{
> + Â Â Â unsigned long long upos, end;
> + Â Â Â loff_t ret;
> +
> + Â Â Â /* disallow overflow */
> + Â Â Â upos = (unsigned long long)pos;
> + Â Â Â end = upos + count;
> + Â Â Â if (end < pos)
> + Â Â Â Â Â Â Â return -EOVERFLOW;
> + Â Â Â /*
> + Â Â Â Â* Sanity check...subsystem has to provide llseek for handle big pos.
> + Â Â Â Â* Subsystem's llseek should verify f_pos's value comaparing with its
> + Â Â Â Â* max file size.
> + Â Â Â Â* Note1: generic file ops' llseek cannot handle negative pos.
> + Â Â Â Â* Note2: should we take care of pos == -EINVAL ?
> + Â Â Â Â* Note3: we check flags and ops here for avoiding taking locks in.
> + Â Â Â Â* default_lseek.
> + Â Â Â Â*/
> + Â Â Â ret = -EINVAL;
> + Â Â Â if ((file->f_mode & FMODE_LSEEK) &&
> + Â Â Â Â Â (file->f_op && file->f_op->llseek)) {
> + Â Â Â Â Â Â Â ret = vfs_llseek(file, 0, SEEK_CUR);
> + Â Â Â Â Â Â Â if (ret == pos)
> + Â Â Â Â Â Â Â Â Â Â Â return 0;
> + Â Â Â }
> +
> + Â Â Â return (int)ret;
> +}
> +
> Â/*
> Â* rw_verify_area doesn't like huge counts. We limit
> Â* them to something that fits in "int" so that others
> @@ -222,8 +253,12 @@ int rw_verify_area(int read_write, struc
> Â Â Â Âif (unlikely((ssize_t) count < 0))
> Â Â Â Â Â Â Â Âreturn retval;
> Â Â Â Âpos = *ppos;
> - Â Â Â if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> - Â Â Â Â Â Â Â return retval;
> + Â Â Â if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
> + Â Â Â Â Â Â Â /* some files requires special care */
> + Â Â Â Â Â Â Â retval = __verify_negative_pos_range(file, pos, count);
> + Â Â Â Â Â Â Â if (retval)
> + Â Â Â Â Â Â Â Â Â Â Â return retval;
> + Â Â Â }
>
> Â Â Â Âif (unlikely(inode->i_flock && mandatory_lock(inode))) {
> Â Â Â Â Â Â Â Âretval = locks_mandatory_area(
> Index: mmotm-2.6.31-Sep14/fs/proc/base.c
> ===================================================================
> --- mmotm-2.6.31-Sep14.orig/fs/proc/base.c
> +++ mmotm-2.6.31-Sep14/fs/proc/base.c
> @@ -903,18 +903,30 @@ out_no_task:
>
> Âloff_t mem_lseek(struct file *file, loff_t offset, int orig)
> Â{
> + Â Â Â struct task_struct *task = get_proc_task(file->f_path.dentry->d_inode);
> + Â Â Â unsigned long long new_offset = -EINVAL;


Why not make 'new_offset' as loff_t? This can make your code easier.

> +
> + Â Â Â if (!task) /* lseek's spec doesn't allow -ESRCH but... */


No worry, we have many ESRCH for proc files.

> + Â Â Â Â Â Â Â return -ESRCH;
> +
> Â Â Â Âswitch (orig) {
> Â Â Â Âcase 0:
> - Â Â Â Â Â Â Â file->f_pos = offset;
> + Â Â Â Â Â Â Â new_offset = offset;
> Â Â Â Â Â Â Â Âbreak;
> Â Â Â Âcase 1:
> - Â Â Â Â Â Â Â file->f_pos += offset;
> + Â Â Â Â Â Â Â new_offset = (unsigned long long)f->f_pos + offset;
> Â Â Â Â Â Â Â Âbreak;
> Â Â Â Âdefault:
> - Â Â Â Â Â Â Â return -EINVAL;
> + Â Â Â Â Â Â Â new_offset = -EINVAL;
> + Â Â Â Â Â Â Â break;
> Â Â Â Â}
> - Â Â Â force_successful_syscall_return();
> - Â Â Â return file->f_pos;
> + Â Â Â if (new_offset < (unsigned long long)TASK_SIZE_OF(task)) {


Hmm, why this check?

> + Â Â Â Â Â Â Â file->f_pos = (loff_t)new_offset;
> + Â Â Â Â Â Â Â force_successful_syscall_return();
> + Â Â Â } else
> + Â Â Â Â Â Â Â new_offset = -EINVAL;
> + Â Â Â put_task_struct(task);
> + Â Â Â return (loff_t)new_offset;
> Â}
>
> Âstatic const struct file_operations proc_mem_operations = {

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/