Zbr's days.
April
Sun Mon Tue Wed Thu Fri Sat
   
10
     
2008
Months
Apr
Oct Nov Dec

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Thu, 10 Apr 2008

Busy inodes after unmount.

VFS: Busy inodes after unmount of pohmel. Self-destruct in 5 seconds.  Have a nice day...
After removing private cache of inodes I found, that objects, which were sent by the server and which were never attached to directory entry (dentry), will never be freed.
So, essentially this does not work with Linux VFS:
iget()/iget_locked()
...
umount
Inodes, created by iget()/iget_locked() will be placed into at least three different lists:
  • inode_in_use - global list of ever created inodes, which have i_count and i_nlink more than 0
  • s_inodes - per superblock list, which contains every inode, created for this superblock
  • inode_hashtable - hash table indexed by inode number. If you want to work with writeback, your inodes have to be there. Did not yet investigate why.
So, essentially all inodes, which you created, are accessible by VFS and will be checked during umount via generic_shutdown_super()->invalidate_inodes(), where system will notice that if inode in s_inodes list has non-zero reference counter (or course, otherwise it would be already freed by filesystem), then this inode can not be freed. Thus we have a leak.

Above lists can only be accessed under global inode lock, so it is not a good idea to destroy inodes traversing them in for example ->put_super() callback or in any other filessytem callback, so I had to add a list of all inodes into POHMELFS superblock. Ugly.

/devel/fs :: Link / Comments (0)


get_user_pages() sclability.

Just found an article at LWN about get_user_pages(). Main problems happend to be a locking between multiple threads...

Out of curiosity, was this scalability problem fixed (for the busy reader: this is my more than 2-years old testing of the get_user_pages() performance with single thread, ran to find bottlenecks in kevent AIO).

Here is a graph (perfomance vs. number of pages):
get_user_page() scalability

/devel/other :: Link / Comments (0)


POHMELFS development status.

It has developed very rapidly last couple of days, so essentially I rewrote it. I think it is ready for the next release, which I will announce in a day or so.
Right now all first-milestone features except cache-coherency (check below), which I planned, are completed (although maybe not in the most optimal way sometimes).
Because of name cache usage it is now possible to create huge pathes with multiple directories via single command. The same applies to directory removal, although it is because of different design issue.
It would be possible to rewrite generic read/write helpers and provide set of pages into POHMELFS network stack (which is page based for data now), but I decided that for the first step it is not needed.
POHMELSF has now fully async processing of all operations except link creation (I just decided that it is a bit simpler to make them write-through, it was done because of laziness and not some fundamental arch problems). It was achieved by serious (read: from scratch) changes in the arch, which had own problematic places, namely error report. Because of this move it becomes really simple to implement any kind of protocol, if it obeys async rules, namely sending of the message never requires sync reply, and where it is needed, reply comes as an independent incoming message, which is processed asynchronously from waiting and via common state machine.
Such arch allows to have simple cache coherency algorithm, when server just sends a missed entries or commands to remove some objects and client's core handles that just fine since its reciving code does not depend on sending one. This is not 100% correct way to handle collisions (collisions thus became new objects in the filesystem tree, like old name plus some suffix), but it is what lots of the users need, but not real cache-coherency.
Writeback cache does not play very well with cache-coherency, since every metadata changes (like object creation or removal) has to be checked against server state, since different clients can do the same with the same object. Level of paranoidality has to be thought of in advance.

First cache-coherency step is implementation of the trivial scheme, when every object is synced during its writeback time and changes being broadcasted by server to other clients. If another client has the same object being processed it can either be renamed to collision or just overwritten. Having locks and thus real states is a next step.

Also, POHMELFS does not have authentification and strong checksums right now, and although this is a simple task to implement, its priority is questionable. There is also possibility to implement cryptographically strong encryption of the communication channels.

So, lots of ideas, but main part is ready - async data processing design was definitely a right choice to implement, so all other features become very simple to complete.
New release will be announced very soon, stay tuned!

/devel/fs :: Link / Comments (0)