|
|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Thu, 15 Nov 2007
Ground points of the filesystem development.
1. Data read/write rebalance in the filesystem.
When it is possible to add/remove storages from the system,
there is a clear question about theirs utilisation. First, when
you have your data spread over different nodes/storages, reading
will always be faster, since it can be performed in parallel.
From another point of view, this can lead to heavy data fragmentation,
if done incorrectly (like in case of tightly packet data in the first place,
which after spreading will require heavy write/update overhead).
So, this is a good solution for read-mostly setups, but is a bad choice
for write-mostly cases.
The cleanest solution for this issue I see is to use copy-on-write sematic,
which implies that each new write will be placed to the new location. Thus in case
of new storage added to the filesystem, it will be readily utilized for new
writes, which in turn can work with delayed allocation and extents heavily reducing
fragmentation.
Reading is a bit more trickier, ideally data should be spread over the new storage,
but having large contiguous regions for the same file is a huge win because
of read-ahead logic and the way disks work, so only fragmented files have to be
moved around. Here we enter defragmentation land, which is very small and easy
in copy-on-write design - file should be read and written to get a new
contiguous region, or special operation should be introduced to do essentually the
same, without writing to the data (like do that on sync or flush).
So, to summarise my ideas, the only needed thing for having high-performance read and write
in case of multiple (or extendible) storages is to have copy-on-write semantic
behind IO logic with correctly implemented balancing algorithms (like proper delayed
allocation and extent usage).
This is a first base point of my filesystem design.
2. Locking.
Obviously, the less locks you have, the less time you will spent in busy
loops (zero in the perfect case).
Thus main design principle is to allow multiple IO (simultaneous reads and writes)
and metadata (file creation/deletion and so on) operations.
While multiple readers are handled just fine in Linux kernel
via generic_file_aio_read() all writers are stuck
in generic_file_aio_write()'s inode->i_mutex,
which effectively blocks multithreaded writing to the same file.
But inode->i_mutex
should only guard metadata updates actually, not writing itself,
so this issue has to be resolved in any filesystem, aimed for high performance
applications (no filesystem in Linux kernel tries to avoid grabbing
inode->i_mutex for writes currently).
Getting into account number of hacks I implemented
for network without touching a lot of core code, I'm pretty sure I will
be able to do so for own filesystem only.
3. Motivation.
I do strongly believe that it is impossible to make a really good things
when you are forced to do them. So, my idealism says me, that when
you are paid to do the work, it will not be completed in the best way.
Do not confuse, when you get money for things you do for yourself
or on your own intention, they are completely different approaches.
4. Fun.
It has to be fun. If project starts sucking the power without good
feedback, it has to be completed to the next milestone and frozen. If something
is not interesting, it should be avoided.
That were my rules for success filesystem project,
the last two items obviously apply to any other project.
Stay tuned :)
/devel/fs :: Link / Comments ()
|