Zbr's days.
January
Sun Mon Tue Wed Thu Fri Sat
   
8
   
2008
Months
Jan
Oct Nov Dec

About TODO Blog RSS Old blog Projects Gallery Notes

Tue, 08 Jan 2008

Write support in POHMELFS.

My network filesystem got file writing support, which is rather trivial right now - ->prepare_write()/->commit_write() callbacks do nothing, but ->writepage() method sends data to the server. It uses very simple request/reply protocol to report errors on the server side, and does not include any cache coherency mechanisms yet. Since only ->writepage() is used, data always stays in the client's cache and only is being sent to the server when local system wants (for example when system requires to flush some data to the storage or when it wants more memory).
Next step is to implement metadata operations - directory entry creation/modifications (like file/directory create/remove/move, link/unlink and so on) and file metadata operations (like attributes management and truncation).

After this tasks are completed (I expect it to be finished quite soon, it is not that complex to operate on local cached entries), cache coherency protocol will enter the game. So far it will be quite simple: each client will have a number of states associated for each inode, so when one or another is changed, server will be notified and when another client is about to access modified data it will be synced to server.

Another task is to test clients scalability: when there are multiple users working on the same client of the pohmel filesystem, how well network filesystem performs? Is locking too coarse? It is right now - there is a single lock, which guards each network operation, and should not be changed except by introducing multiple sockets, which is quite bad decision imho, since network is supposed to be a bottleneck (or remote storage speed, but that can be changed by switching to faster storage) in this scenario, so having too fain grained locks for different network operations does not change anything at all. Local cache, which contains inodes, can be operated using three different tuples (I described them previously), but there are two locks: one lock for offset based searches (offset inside address space of the inode, for example reading directory content, where each directory entry in the stream is located by its offset in given stream), and another lock for more generic operations like searching for inode by its number or by hash of its name in the parent direntry (including length and parent inode number). Although both former operations are supposed to be very fast (it is about O(log2(N)), where N is total number of inodes in the filesystem), practice can break that dream, since that speed can be too low for very dense filesystems.

The last one is userspace server, which is quite simple so far and likely have own bottlenecks. One of the crazy ideas is to move it into the kernel, so that lookup of the inode (file or directory in the userspace) could be very fast. It will also reduce number of unneded copies (there is number of them - I use simple send()/recv() instead of mapping and generally there is at least one unneded, but unavoidable in userspace, copy from kernelspace to userspace).

Some work should be performed with server redundancy - right now there is no failover recovery neither on clients (I do not know about any filesystem which supports that though, do not confuse that with NFS. All operations with local cache will succeed of course, but reading from the remote side will stall), nor on servers (if server fails, clients can not proceed with work, since there are no other servers which could catch the data and metadata flows. It has to be fixed).

Anyway, there is number of interesting tasks to complete, and I expect to have something to show quite soon...

/devel/fs :: Link / Comments (0)