Zbr's days.
November
Sun Mon Tue Wed Thu Fri Sat
       
24
 
2007
Months
Nov

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Sat, 24 Nov 2007

Coherent Remote File System.

Zach Brown has an extremely interesting idea of network filesystem implementation.
One can thing about it like NFS client or more proceise as a client-server protocol, which allows clients to have a cache of data instaed of relying on server. This of course requires a cache coherency protocol to be involved in client-server communications, which makes things more complex.
Simply this works as a trivial filesystem, mounted on clients, where each read/write/meta operation is perfomed on top of locally cached data, if data is not preset in the local cache, it is fetched from the server. Client flushes its updated cache to the server in number of various conditions either because of usual writeback process or because of cache coherency process (i.e. when another node reads from the file, updated by given client).

Zach will present it at LCA this February.
So far it is closed Oracle's project (as far as I know open sourcing process in on the way, just like it was with Chris Mason's btrfs), and I strongly want to implement exactly the same idea myself :)
This process will have number of benefits:

  • simple open source filesystem, which can be used as a base for real filesystem development (do not confuse it with virtual filesystems like sysfs or debugfs)
  • ability to extend it for own protocols
  • cache coherency mechanism will be used in distributed filesystem
  • possibility to test byte range locking in a real life
  • implement filesystem bits first in userspace (I do not want to introduce additional mispredicted behavuiour because of FUSE)
Zach, what about small competition? :)
Frankly saying I'm not an expert in cache coherency protocols and filesystem development either (you will not believe me, but last several days I'm trying to implement inteteresting B-tree, but with each day spent on that problem I comment more and more bits in the code and it still does not work the way I want :). With recent trends I believe I will have pretty high-end hardware soon to perform various tests and find common and tricky bottlenecks.

This implementation can be used by various users aimed for distributed systems, but which do not want to have (or bother with) real filesystem developemnt and which are ready to have a server in userspace on top of existing filesystems (in receiving zero-copy project I showed huge problem with in-kernel usage of some of Linux filesystems, especially those which use in-kernel JBD journaling, when it is impossible to preallocate (->prepare_write()) number of pages for given file and then write into them and commit (->commit_write()) at once for maximum performance).

/devel/fs :: Link / Comments (2)

Zach Brown wrote at 2007-11-30 22:20:

> Zach, what about small competition? :)

I'd prefer collaboration, but sure, you can experiment with your stuff before seeing the light :).

I don't want to get too lost talking about the crfs design here. I will just mention that my experience with Lustre and OCFS2 has given me a strong feel for the sweet spot where the majority of users get the benefits from cache coherence without bringing in the complexity and risk of distributed and parallel systems. If you're intending to do truly distributed locking and cache coherence, be prepared for vastly more work than you think it will take. From a high level the moving parts seem simple. When you really get down into the details of the posix file system API and the behavour that applications expect to be performant, it becomes very painful very quickly.

And yes, I'm working through the official channels at Oracle which will approve the publication of the crfs implementation. I won't mention a time line, but the intent is that it will be available on oss.oracle.com before too long.

Zbr wrote at 2007-12-02 14:21:

Hi Zach.

I'm pretty sure distributed locking and filesystem development will not be a one week/month project, so I'm building some simple blocks first. It is complex for sure, that is why it is so much interesting. I do not think _full_ POSIX compliant is a must (iirc even some local filesystems do not follow it for multiple page writes), but rather a logically expected behavior should be preserved.

I will first implement a simple kernel client (without CC algorithms at all) to be able to connect to the remote server and perform all data/metadata operations with it. After it is ready I will experiment both with server and client to allow parallel access and distributed facilities.

I really do not push on you and reinvent the wheel (well, yes I do reinvent this, but not because of NIH syndrome or something like that, at least not now :), but rather want a local project which I know for every single bit to be a very useful tool for future development.

And we can test both and laugh on each one's during some beer sessions (if ever) :)

Please solve this captcha to be allowed to post (need to reload in a minute): 76 + 94

Comments are closed for this story.