|
|
About
TODO
Blog
RSS
Old blog
Projects
Gallery
Notes
Tue, 08 Jan 2008
Write support in POHMELFS.
My network filesystem got file writing support, which is rather trivial
right now - ->prepare_write()/->commit_write() callbacks
do nothing, but ->writepage() method sends data to the server.
It uses very simple request/reply protocol to report errors on the server
side, and does not include any cache coherency mechanisms yet. Since
only ->writepage() is used, data always stays in the client's cache
and only is being sent to the server when local system wants (for example
when system requires to flush some data to the storage or when it wants
more memory).
Next step is to implement metadata operations - directory entry creation/modifications
(like file/directory create/remove/move, link/unlink and so on) and file metadata operations
(like attributes management and truncation).
After this tasks are completed (I expect it to be finished quite soon, it is not that
complex to operate on local cached entries), cache coherency protocol will enter the game.
So far it will be quite simple: each client will have a number of states associated for each
inode, so when one or another is changed, server will be notified and when another client
is about to access modified data it will be synced to server.
Another task is to test clients scalability: when there are multiple users working on the same
client of the pohmel filesystem, how well network filesystem performs? Is locking too coarse?
It is right now - there is a single lock, which guards each network operation, and should not
be changed except by introducing multiple sockets, which is quite bad decision imho, since
network is supposed to be a bottleneck (or remote storage speed, but that can be changed
by switching to faster storage) in this scenario, so having too fain grained locks for different
network operations does not change anything at all. Local cache, which contains inodes,
can be operated using three different tuples (I described them
previously), but there are
two locks: one lock for offset based searches (offset inside address space of the inode, for example
reading directory content, where each directory entry in the stream is located by its offset in given
stream), and another lock for more generic operations like searching for inode by its number or by hash
of its name in the parent direntry (including length and parent inode number). Although both former
operations are supposed to be very fast (it is about O(log2(N)),
where N is total number of inodes in the filesystem), practice can break that dream, since that speed
can be too low for very dense filesystems.
The last one is userspace server, which is quite simple so far and likely have own bottlenecks. One of the crazy
ideas is to move it into the kernel, so that lookup of the inode (file or directory in the userspace)
could be very fast. It will also reduce number of unneded copies (there is number of them - I use
simple send()/recv() instead of mapping and generally there is at least one unneded,
but unavoidable in userspace, copy from kernelspace to userspace).
Some work should be performed with server redundancy - right now there is no failover recovery neither on clients
(I do not know about any filesystem which supports that though, do not confuse that with NFS.
All operations with local cache will succeed of course, but reading from the remote side
will stall), nor on servers (if server fails, clients can not
proceed with work, since there are no other servers which could catch the data and metadata flows.
It has to be fixed).
Anyway, there is number of interesting tasks to complete, and I expect to have something to show quite soon...
/devel/fs :: Link / Comments (0)
|