Zbr's days.
December
Sun Mon Tue Wed Thu Fri Sat
           
20
         
2007
Months
Dec

About TODO Blog RSS Old blog Projects Gallery Notes

Thu, 20 Dec 2007

open-by-inode() vs. name lookup in network filesystems.

Network filesystem is a tricky bustard - depending on where it is implemented (kernel or userspace) it is very different. By 'very' I mean really complex differences.

In kernel inode, or basic object's identity, always exists for all objects checked before (until special steps completed, when inode is dropped, but usually it stays alive - for example when you traverse some dir, inodes for every object you checked continue to exist, even if you already do not use that directory. When file is opened, inode will be attached to file, when file will be closed, inode will live. This is a fundamental feature of the split of directory entries and inodes - directory entries are linked into the tree, which we can see, but inodes are shadowed objects behind that entries.

In userspace things are completely different: there are no indes, but only files, identified by file descriptors. That's all. So, when kernel performs a lookup, it checks some name in the inode with given number - i.e. it perfoms in-kernel reference-by-inode operation, but in userspace there is no API (except rare special cases, which I think Zach uses in CRFS, and that is likely good speedup for Btrfs) to get file handler by inode number. Basically userspace should have either opened file descriptor for parent directory, or perform a reverse lookup, create a path and open directory to check if some object exists there, since userspace can only work with file descriptors.
open-by-inode was marked by Linus Torvalds as fundamentally broken because of number of reasons (namely because of races with directory layout changes like move and rename), and likely it is correct, but absence of such API greatly reduces performance of userspace metadata operations.

Having network fileserver in kernel is of course much (MUCH) simpler and faster, but so far its implementation will be postponed a bit.
Initial server will be quite dumb - it will always perform a lookup from the root and always close directory, later it will be possible to add cache of opened directories...

/devel/fs :: Link / Comments (8)

Chuck wrote at 2007-12-21 02:21:

What are you thoughts about FUSE? It might be to much of an extra layer for you, but I find the "cross platform" (*bsd, linux, OSX, etc) capability interesting.

BTW, your captcha makes do retry a number of times before it appears to work.

Chuck wrote at 2007-12-21 02:25:

I think glusterfs has the right idea, but their implementation is seriously lacking. The idea of each file server being independent and handling its own locking/concurrency is interesting. Eliminates a need for a distributed locking mechanism.

Chuck wrote at 2007-12-21 02:26:

Addon to glusterfs...and its idea that the client determines what file server it wants to create files on.

BTW, your captcha is killin me! :)

Zbr wrote at 2007-12-21 13:03:

Fuse is not just an additional layer - it is a HUGE unneeded layer in this setup. Glusterfs as long as any other filesystem can be connected as a server. I already wrote about glusterfs and its problems (people who do real work said me its state is far from being usable). Not thinking about redundancy from the beginning is a major flaw.

Btw, this captcha is for real men - who wants to post will do that :)

Chuck wrote at 2007-12-21 22:30:

Glusterfs has the right idea tho. I said their implemtation was lacking (read: sucks).

Chuck wrote at 2007-12-21 22:32:

As for captcha...real men dont like having to post 3 times before their comment is taken because the captcha doesnt work properly. Maybe why your low on comments :)

Again try 3 to get this useless comment posted.

Chuck wrote at 2007-12-21 22:34:

One more thing and Ill not return.

"Btw, this captcha is for real men" - way to treat people interested in your work. Dick.

Zbr wrote at 2007-12-21 22:52:

If my dictionary is correct, I have just been unfriendly called :) I think it is a sign of disrespect.

I'm not interested in people who do not not understand jokes and do not respect others.

P.S. Captcha has to be reloaded (i.e. page has to be fetched again) each 2 minutes, I usually reload it only once when post.

Please solve this captcha to be allowed to post (need to reload in a minute): 30 + 77

Comments are closed for this story.