|
|
About
TODO
Blog
RSS
Old blog
Projects
Gallery
Notes
Thu, 20 Dec 2007
open-by-inode() vs. name lookup in network filesystems.
Network filesystem is a tricky bustard - depending on where it is implemented
(kernel or userspace) it is very different. By 'very' I mean really complex differences.
In kernel inode, or basic object's identity, always exists for all objects
checked before (until special steps completed, when inode is dropped, but usually
it stays alive - for example when you traverse some dir, inodes for every object
you checked continue to exist, even if you already do not use that directory.
When file is opened, inode will be attached to file, when file will be closed, inode
will live. This is a fundamental feature of the split of directory entries and inodes -
directory entries are linked into the tree, which we can see, but inodes
are shadowed objects behind that entries.
In userspace things are completely different: there are no indes, but only files,
identified by file descriptors. That's all. So, when kernel performs a lookup,
it checks some name in the inode with given number - i.e. it perfoms in-kernel
reference-by-inode operation, but in userspace there is no API (except rare special cases,
which I think Zach uses in
CRFS,
and that is likely good speedup for Btrfs)
to get file handler by inode number. Basically userspace should have either
opened file descriptor for parent directory, or perform a reverse lookup,
create a path and open directory to check if some object exists there, since
userspace can only work with file descriptors.
open-by-inode was marked by Linus Torvalds as fundamentally broken
because of number of reasons (namely because of races with directory layout changes
like move and rename), and likely it is correct, but absence of such API
greatly reduces performance of userspace metadata operations.
Having network fileserver in kernel is of course much (MUCH) simpler and faster,
but so far its implementation will be postponed a bit.
Initial server will be quite dumb - it will always perform a lookup from the
root and always close directory, later it will be possible to add cache of opened directories...
/devel/fs :: Link / Comments (8)
|