This message was created automatically by mail delivery software. A message that you sent could not be delivered to one or more of its recipients. This is a permanent error. The following address(es) failed: torvalds@linux-foundation.org SMTP error from remote mail server after end of data: host ASPMX.L.GOOGLE.COM [2a00:1450:400c:c09::1a]: 550-5.7.1 [2002:c35c:fd02::1 12] Our system has detected that this message 550-5.7.1 is likely unsolicited mail. To reduce the amount of spam sent to 550-5.7.1 Gmail, this message has been blocked. Please visit 550-5.7.1 http://support.google.com/mail/bin/answer.py?hl=en&answer=188131 for 550 5.7.1 more information. v12si1930888wie.42 - gsmtp ------ This is a copy of the message, including all the headers. ------ Return-path: Received: from viro by ZenIV.linux.org.uk with local (Exim 4.76 #1 (Red Hat Linux)) id 1XfPDc-0002F8-QA; Sat, 18 Oct 2014 08:18:32 +0000 Date: Sat, 18 Oct 2014 09:18:32 +0100 From: Al Viro To: Miklos Szeredi Cc: Linus Torvalds , mszeredi@suse.cz, David Howells , Sage Weil Subject: Re: overlay-fs? Please? Finally? Message-ID: <20141018081832.GA28033@ZenIV.linux.org.uk> References: <20141017004622.GB7996@ZenIV.linux.org.uk> <20141017011426.GA31027@ZenIV.linux.org.uk> <20141017022032.GA6809@ZenIV.linux.org.uk> <20141017102748.GA15157@ZenIV.linux.org.uk> <20141017153052.GF5011@tucsk.piliscsaba.szeredi.hu> <20141017204335.GD7996@ZenIV.linux.org.uk> <20141018050153.GA24652@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141018050153.GA24652@ZenIV.linux.org.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: Al Viro [Cc to Sage due to interesting ceph bit that has shown up from grepping - see the very end] On Sat, Oct 18, 2014 at 06:01:53AM +0100, Al Viro wrote: First of all, I've just fixed a dumb braino in ovl_clear_empty(); assignment to upper needed to be moved up to the added test. Force-pushed to the same branch - vfs.git#ovl-experimental. > As for the "what filesystems are we OK with", I wonder if looking into the > sucker's ->s_d_op (or ->d_op of root of lower tree, for that matter) would > be a good approximation. I really think that ->d_{weak_,}revalidate() in > there is complete no-go, ditto for ->d_manage() and ->d_automount() and > I would consider ->d_compare() or ->d_hash() as a cause to be _very_ cautious. > > Alternatively, we could just go ahead and add FS_OK_FOR_OVERLAY_LOWER into > fs type flags and mark the obvious good ones. It's not _that_ much work. > > I'd still like to hear details on the plans re d_path(); I don't consider > that a deal-breaker, but we'd better have some clear idea of what we are > getting into. BTW, why on the Earth are you pinning that ->__upperdentry twice? The comment about d_delete() makes no sense whatsoever - anything other than overlayfs itself would have to grab a reference to call that d_delete(), which would give you refcount greater than 1 automatically. So it would have to be overlayfs passing that thing to d_delete() or something that would call it, right? Now, d_delete() itself isn't called there at all. Which leaves passing the sucker to something outside that would call d_delete(). Now, what would it be? Here's the full list of d_delete() callers: arch/s390/hypfs/inode.c:82: d_delete(dentry); drivers/infiniband/hw/ipath/ipath_fs.c:318: d_delete(dir); drivers/infiniband/hw/qib/qib_fs.c:512: d_delete(dir); drivers/usb/gadget/function/f_fs.c:1560: d_delete(epfile->dentry); drivers/usb/gadget/legacy/inode.c:1611: d_delete (dentry); fs/btrfs/ioctl.c:2507: d_delete(dentry); fs/ceph/dir.c:893: d_delete(dentry); fs/ceph/inode.c:1114: d_delete(dn); fs/ceph/inode.c:1223: d_delete(dn); fs/ceph/inode.c:1395: d_delete(dn); fs/configfs/dir.c:643: d_delete(child); fs/configfs/dir.c:834: d_delete(dentry); fs/configfs/dir.c:880: d_delete(dentry); fs/configfs/dir.c:1475: d_delete(new_dentry); fs/configfs/dir.c:1721: d_delete(dentry); fs/debugfs/inode.c:483: d_delete(dentry); fs/devpts/inode.c:666: d_delete(dentry); fs/efivarfs/file.c:50: d_delete(file->f_dentry); fs/fuse/dir.c:1061: d_delete(entry); fs/namei.c:3586: d_delete(dentry); fs/namei.c:3702: d_delete(dentry); fs/nfs/dir.c:1760: d_delete(dentry); fs/nfs/nfs4proc.c:2231: d_delete(dentry); fs/ocfs2/dlmglue.c:3752: d_delete(dentry); fs/reiserfs/xattr.c:95: d_delete(dentry); fs/reiserfs/xattr.c:111: d_delete(dentry); net/sunrpc/rpc_pipe.c:607: d_delete(dentry); net/sunrpc/rpc_pipe.c:634: d_delete(dentry); security/selinux/selinuxfs.c:1212: d_delete(d); We are talking about the *upper* layer; that excludes most of those guys. At the very least, you want that fs to support rename and xattrs. So hypfs, infinibarf ones, gadgetfs, configfs, debugfs, devpts, efivarfs, sunrpc and selinuxfs are right out. Moreover, all of those are not in the codepaths reachable from overlayfs - all of that is removal of object driven by external event. And we end up using a reference other than what overlayfs would be holding. The same goes for reiserfs xattr code (it calls d_delete() for references it has acquired itself) and for ocfs2. NFS is also not an option for upper layer, according to overlayfs docs. FUSE is in the same boat as ocfs2 and reiserfs - we acquire the reference by d_lookup() in the same function. The same goes for btrfs caller (s/d_lookup/lookup_one_len/), not to mention that this code won't be called by overlayfs. What's left? fs/ceph/dir.c:893: d_delete(dentry); ceph_unlink(). fs/ceph/inode.c:1114: d_delete(dn); ceph_fill_trace(), dn comes from d_lookup(). Not an issue. fs/ceph/inode.c:1395: d_delete(dn); ceph_readdir_prepopulate(), dn comes from d_lookup(). Not an issue. fs/ceph/inode.c:1223: d_delete(dn); ceph_fill_trace(), again. Hell knows - it's really hard to read ;-/ fs/namei.c:3586: d_delete(dentry); vfs_rmdir() fs/namei.c:3702: d_delete(dentry); vfs_unlink() Now, ceph_unlink() can come only from vfs_unlink(). So we are down to the following: victim of vfs_unlink(), victim of vfs_rmdir(), _maybe_ something strange coming from that ceph_fill_trace() callsite. We definitely do have vfs_unlink() and vfs_rmdir() calls in overlayfs. Not many, though - there's ovl_remove_upper(), calling them directly, and there's ovl_cleanup(), calling them via ovl_do_{unlink,rmdir}() wrappers. For one thing, we could do dget()/dput() in both of those guys. However, looking at the callers of ovl_cleanup() shows that with two exceptions we have already grabbed/dropped dentry around the callsite. Exceptions are ovl_clear_empty() and ovl_remove_and_whiteout(). Could as well put that dget()/dput() in those two, rather than in ovl_clear_empty()... IOW, modulo that ceph thing we could trivially avoid that double reference to ->__upperdentry, just by doing a temporary dget()/dput() in a few places in fs/overlayfs/dir.c. Objections? A bunch of code becomes simpler that way, IMO... Question to Sage: what's that d_delete() in ceph_fill_trace() about? It's this bit: /* null dentry? */ if (!rinfo->head->is_target) { dout("fill_trace null dentry\n"); if (dn->d_inode) { dout("d_delete %p\n", dn); d_delete(dn); } else { dout("d_instantiate %p NULL\n", dn); d_instantiate(dn, NULL); if (have_lease && d_unhashed(dn)) d_rehash(dn); update_dentry_lease(dn, rinfo->dlease, session, req->r_request_started); } goto done; } What codepaths could lead us there and where could that dentry have come from? Overlayfs aside, the things can get rather interesting if it could, e.g. turn out to be an existing mountpoint...