|
|
About
TODO
Blog
RSS
Old blog
Projects
Gallery
Notes
Mon, 05 May 2008
POHMELFS transaction support. Failover (re)connection to different servers.
POHMELFS
just got full transaction support. So far it is only used in ->wrteipages()
callback, which is invoked by writeback mechanism. POHMELFS uses lazy transaction support,
namely it waits after each transaction, which includes header and data to be written for at most
14 pages, 14 is a magic number of pages, which corresponds to struct pagevec size,
used by generic writeback, transaction size is limited by mount option and is 32 pages by default.
Performance was dropped from 125 MB/s down to 64 MB/s, which is not acceptible.
Main problem is of course waiting for transaction to be completed (i.e. completion message from server).
There should not be per transaction waiting, instead writeback has to allocate as much transactions as
needed and proceed one after another, and only start waiting for them, when there are no more
pages to be written. This is the next task.
Transaction mechanism allows quite simple reconnection to different master servers in case of failure,
and rollback of the failed transaction. For example one can provide different number of main
servers (which have to be in sync with each other and be able to be synchronized themselfs,
or they just can use shared storage), so POHMELFS client will switch between them if current
one has failed. System will detect it and reconnect, if reconnect fails, next server will be used
and the whole transaction will be resent there.
It is also possible to write transaction to different server on demand (it may or may not to be connected
already, but it has to have address structure, so far it is only obtained during pre-mount configuration),
which is a prerequistic for parallel data processing. One can create a simple patch to write transactions
one after another to severs in round-robing fasion.
Right now only write transactions are used (and can be combined with object creation if needed), read ones are pending
as long as multiple parallel transactions (which is not complex, but main task is how to wait them all to be
completed, very similar code is used in pohmelfs_aio_read()).
There is also pending task of cache coherency support (server side originated messages
to clients, which used the same pages, which another client is writing into,
also including metadata coherency messages like uid/gid/inode size and other changes),
it is not that complex task, and mostly requires server modifications.
Stay tuned!
/devel/fs :: Link / Comments (0)
|