|
|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Sun, 06 Apr 2008
The is only one way: asynchronous.
This is a new motto for POHMELFS.
It is a completely new filesystem now.
POHMELFS got new page processing code (sending side: commands and data), new lookup,
which is based on the Linux VFS inode cache without reinventing the wheel (comment
says it is very smp-friendly, although I do not quite understand how
it is possible with global inode_lock), it also got
completely new object creation and referencing path. It is possible
to create a huge path (up to 4k, but can be easily extended if there will be such demand)
with multiple objects in it with only single network command.
But the main feature of new POHMELFS is its name cache. I did not find
how to hook into VFS dentry cache, so invented own. It is fast
to travers from child to the highest level parent, which is actively
used in POHMELFS writeback path. Although it is not 100% the best
storage, but a simple RB-tree (and thus requires smp-unfriendly mutex), the whole
idea shows its gains already. Eventually it will be replaced with
faster and more scalable approach protected by RCU (even properly sized hash
table will show better scalability, although dynamic resizing of hash tables
prevents RCU usage), but I started from the simplest ground.
POHMELFS already outperforms async NFS during untarring and completely saturates
my testing Xen domains (both network and disk speed), while NFS is almost two
times slower. Testing machines have 256 Mb of RAM, maximum 3 MB/s interconnect speed
(something is broken in Xen setup likely, since it is supposed to be 100 mbit/s
and there is no high load), which is very unfriendly (read: in such scenario POHMELFS
will show its worse results) for POHMELFS, but nevertheless it is fast.
It became not only much faster, but also simpler. Its userspace server has
two times less lines of code (816 vs. 1613), kernel side is smaller and simpler too:
mainly there are no zillions of different trees indexed by any possible keys,
so far only per-inode tree of child names for readdir and per-superblock path
entry cache.
There are drawbacks of course: there is no receiving code (at all). It will be a dedicated
thread, which will asynchronously process all incoming packets (mostly
readdir async return, read page content and cache-coherency messages). First
two are really simple. The last one will be implemented as a full MOSI/MSI
library for inode content. Likely it will be possible to use in my
other projects.
P.S. I frequently think that I'm very good vapourware seller :)
Stay tuned!
/devel/fs :: Link / Comments (0)
|