On Fri, Oct 17, 2014 at 09:53:46AM +0200, Miklos Szeredi wrote:

> Ugh, no, we don't need that d_drop(), it's just an optimization.  I was simply
> not thinking about bind mounts.
> 
> The reason I thought it useful to drop the dentry is that after a non-dir is
> copied up we don't need the lower dentry reference any more, that dentry can be
> freed.  But setting ->lowerdentry to NULL is going to cause all sorts of
> headaches: we assume it remains non-NULL after it's been set to non-NULL.
> 
> So removing that d_drop should be fine, it just holds that lowerdentry ref until
> the overlayfs dentry is gotten rid of (which it won't if there's a bind mount on
> it, but I don't think that's a big deal).
> 
> The d_drop() calls in ovl_rename2() are also in this category and need to be
> removed.
> 
> With the attached patch the above test passes fine.

OK, folded...  Next question: ovl_clear_empty() doesn't verify that
udir == upper->d_parent.d_inode.  Result: if some joker renames that
thing in upper layer, we get vfs_rename() (and, possibly, vfs_rmdir()
or vfs_unlink()) called with dentry _not_ being a child of directory
we have locked.  Subsequent breakage depends on the fs type, but in
principle sky's the limit - ->rmdir() et.al. have every right to
assume that victim->d_name is stable, etc.

In other places like that you check such asserts explicitly and fail
wtih ESTALE; is there any reason for not doing the same here?  Related
one: you unlock wdir (as part of unlock_rename()), only to relock it
again after ovl_cleanup_whiteouts().  Which has lovely potential for
upper getting moved someplace else, so we need to recheck the
same assert again (that's where rmdir/unlink part of the problem
comes from).  Why?  To relieve the contention on s_vfs_rename_mutex
and wdir->i_mutex?

Next one: ovl_do_remove() looks at the victim's type (lower vs. upper vs.
mixed) before taking any locks in upper layer.  What happens if it
races with copyup of victim?  And no, ->i_mutex on victim in overlayfs
itself is not enough to exclude all causes of copyup - I can see at least
one (open() for write on a regular file).  That particular case is
harmless (overwriting rename into place will actually DTRT here), but
at the very least it needs comments re "why there's no case where such
race would be possible and not harmless".

I'm skipping a lot of really obvious ones; I'll push a branch with fixes
for those folded as vfs.git#experimental-ovl later today (probably after
I get some sleep, though),

Other non-obvious stuff:
	* setattr() does truncated-copy first, moves it in position, then
does notify_change() stripping SUID/SGID.  In theory that's exploitable.
I hadn't looked into reorganizing it yet, but it seems that having
truncating copyup handle attribute changes *before* move would be a good
idea.
	* rmdir() of something you should be allowed to remove (empty subdir
of your directory) fails with EINVAL if you don't have permissions to open
it (including the really lovely case when it's *yours* and had been just
chmoded to 111).  Reason: you try to open the fscker to verify it's empty,
doing that with current creds.  That, actually, is one of the reasons why
whiteouts ought to be fs primitives - you only need that kind of games
because rmdir() on underlying fs treats your poor-man-whiteout as "something
making our directory non-empty".  And for any BSD-style whiteouts that's
really trivial to check - in fact, for FFS derivatives the normal check for
directory being empty will work just fine (they are represented as negative
entries with special dt_type value).  Anyway, rmdir() failure is a bug - if
you insist on using that crap as whiteout substitutes, at least make sure
to get appropriate creds.
	* we really, _really_ need some sanity checks on the nature of
layers.  As it is, you can ask for an overlay with procfs as one of the
layers.  Which will blow up big way.  Interactions with NFS will also
be interesting.  Or with FUSE, for that matter...

	BTW, am I misreading your readdir.c, or do you really reread the
sucker every time you open a directory?  With each opened struct file
getting big fat cache built in memory, nevermind that they will be identical?
Looks like an OOM bait, that...  I'm still digging through that code, but
it really looks that way...