Re: [RFC][PATCH][bugfix] more checks for negative f_pos handling(Was Re: Question: how to handle too big f_pos

From: KAMEZAWA Hiroyuki
Date: Wed Sep 16 2009 - 04:46:54 EST


Ah, sorry. I should CC: you.

On Wed, 16 Sep 2009 16:20:32 +0800
AmÃrico Wang <xiyou.wangcong@xxxxxxxxx> wrote:

> On Wed, Sep 16, 2009 at 1:29 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> >
> > The problem:
> >> I'm writing a patch against /dev/kmem...I found a problem.
> >>
> >> /dev/kmem (and /proc/<pid>/mem) puts virtual addres to f->f_pos.
> >>
> >> but f->f_pos is always negative and rw_verify_ara() returns -EINVAL always.
> >
> > Changed CC: List.
> >
> > This is a trial to consider how to fix negative f_pos problem shown in above.
> >
> > Hmm, even after this patch, x86's vsyscall area is not readable.
> > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 Â[vsyscall]
> > But maybe no problems. (now, it cannot be read, anyway.)
> >
> > I tested /dev/kmem on x86-64 and this works fine. I added a fix for
> > /proc/<pid>/mem because I know ia64's hugetlbe area is not readable
> > via /proc/<pid>/mem. (But I'm not sure other 64bit arch has this
> > kind of problems in /proc/<pid>/mem)
> >
> > ==
> > From: KAMEZAWA Hiruyoki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> >
> > Modifying rw_verify_area()'s negative f_pos check.
> >
> > Now, rw_verify_area() has this check
> > Â if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> > Â Â Â Â Â Â Â Âreturn -EINVAL
> >
> > And access to special files as /dev/mem,kmem, /proc/<pid>/mem
> > returns unexpected -EINVAL.
> > (For example, ia64 maps hugetlb at 0x8000000000000000- region)
> >
> > This patch tries to make range check more precise by using
> > llseek ops defined per special files.
> >
> > Signed-off-by: KAMEZAWA Hiruyoki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> > ---
> > Âfs/proc/base.c Â| Â 22 +++++++++++++++++-----
> > Âfs/read_write.c | Â 39 +++++++++++++++++++++++++++++++++++++--
> > Â2 files changed, 54 insertions(+), 7 deletions(-)
> >
> > Index: mmotm-2.6.31-Sep14/fs/read_write.c
> > ===================================================================
> > --- mmotm-2.6.31-Sep14.orig/fs/read_write.c
> > +++ mmotm-2.6.31-Sep14/fs/read_write.c
> > @@ -205,6 +205,37 @@ bad:
> > Â}
> > Â#endif
> >
> > +static int
> > +__verify_negative_pos_range(struct file *file, loff_t pos, size_t count)
> > +{
> > + Â Â Â unsigned long long upos, end;
> > + Â Â Â loff_t ret;
> > +
> > + Â Â Â /* disallow overflow */
> > + Â Â Â upos = (unsigned long long)pos;
> > + Â Â Â end = upos + count;
> > + Â Â Â if (end < pos)
> > + Â Â Â Â Â Â Â return -EOVERFLOW;
> > + Â Â Â /*
> > + Â Â Â Â* Sanity check...subsystem has to provide llseek for handle big pos.
> > + Â Â Â Â* Subsystem's llseek should verify f_pos's value comaparing with its
> > + Â Â Â Â* max file size.
> > + Â Â Â Â* Note1: generic file ops' llseek cannot handle negative pos.
> > + Â Â Â Â* Note2: should we take care of pos == -EINVAL ?
> > + Â Â Â Â* Note3: we check flags and ops here for avoiding taking locks in.
> > + Â Â Â Â* default_lseek.
> > + Â Â Â Â*/
> > + Â Â Â ret = -EINVAL;
> > + Â Â Â if ((file->f_mode & FMODE_LSEEK) &&
> > + Â Â Â Â Â (file->f_op && file->f_op->llseek)) {
> > + Â Â Â Â Â Â Â ret = vfs_llseek(file, 0, SEEK_CUR);
> > + Â Â Â Â Â Â Â if (ret == pos)
> > + Â Â Â Â Â Â Â Â Â Â Â return 0;
> > + Â Â Â }
> > +
> > + Â Â Â return (int)ret;
> > +}
> > +
> > Â/*
> > Â* rw_verify_area doesn't like huge counts. We limit
> > Â* them to something that fits in "int" so that others
> > @@ -222,8 +253,12 @@ int rw_verify_area(int read_write, struc
> > Â Â Â Âif (unlikely((ssize_t) count < 0))
> > Â Â Â Â Â Â Â Âreturn retval;
> > Â Â Â Âpos = *ppos;
> > - Â Â Â if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> > - Â Â Â Â Â Â Â return retval;
> > + Â Â Â if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
> > + Â Â Â Â Â Â Â /* some files requires special care */
> > + Â Â Â Â Â Â Â retval = __verify_negative_pos_range(file, pos, count);
> > + Â Â Â Â Â Â Â if (retval)
> > + Â Â Â Â Â Â Â Â Â Â Â return retval;
> > + Â Â Â }
> >
> > Â Â Â Âif (unlikely(inode->i_flock && mandatory_lock(inode))) {
> > Â Â Â Â Â Â Â Âretval = locks_mandatory_area(
> > Index: mmotm-2.6.31-Sep14/fs/proc/base.c
> > ===================================================================
> > --- mmotm-2.6.31-Sep14.orig/fs/proc/base.c
> > +++ mmotm-2.6.31-Sep14/fs/proc/base.c
> > @@ -903,18 +903,30 @@ out_no_task:
> >
> > Âloff_t mem_lseek(struct file *file, loff_t offset, int orig)
> > Â{
> > + Â Â Â struct task_struct *task = get_proc_task(file->f_path.dentry->d_inode);
> > + Â Â Â unsigned long long new_offset = -EINVAL;
>
>
> Why not make 'new_offset' as loff_t? This can make your code easier.
>
loff_t is "long long", I wanted "unsigned long long" for showing
f_pos here is treated as "unsigned".



> > +
> > + Â Â Â if (!task) /* lseek's spec doesn't allow -ESRCH but... */
>
>
> No worry, we have many ESRCH for proc files.
>
I know ;)

> > + Â Â Â Â Â Â Â return -ESRCH;
> > +
> > Â Â Â Âswitch (orig) {
> > Â Â Â Âcase 0:
> > - Â Â Â Â Â Â Â file->f_pos = offset;
> > + Â Â Â Â Â Â Â new_offset = offset;
> > Â Â Â Â Â Â Â Âbreak;
> > Â Â Â Âcase 1:
> > - Â Â Â Â Â Â Â file->f_pos += offset;
> > + Â Â Â Â Â Â Â new_offset = (unsigned long long)f->f_pos + offset;
> > Â Â Â Â Â Â Â Âbreak;
> > Â Â Â Âdefault:
> > - Â Â Â Â Â Â Â return -EINVAL;
> > + Â Â Â Â Â Â Â new_offset = -EINVAL;
> > + Â Â Â Â Â Â Â break;
> > Â Â Â Â}
> > - Â Â Â force_successful_syscall_return();
> > - Â Â Â return file->f_pos;
> > + Â Â Â if (new_offset < (unsigned long long)TASK_SIZE_OF(task)) {
>
>
> Hmm, why this check?
>
2 reasons.

1. If this lseek has to check something, this is it.
2. On architecture where 32bit program can ran on 64bit,
moving f_pos above 4G is out-of-range, for example.

But mem_read() will catch any bad f_pos, anyway. So, just making
allow all f_pos here is maybe a choice. Considering lseek,
providing this range check here is not so bad.

Thanks.
-Kame

> > + Â Â Â Â Â Â Â file->f_pos = (loff_t)new_offset;
> > + Â Â Â Â Â Â Â force_successful_syscall_return();
> > + Â Â Â } else
> > + Â Â Â Â Â Â Â new_offset = -EINVAL;
> > + Â Â Â put_task_struct(task);
> > + Â Â Â return (loff_t)new_offset;
> > Â}
> >
> > Âstatic const struct file_operations proc_mem_operations = {
>
> Thanks.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/