|
|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Mon, 12 May 2008
Fast POHMELFS transactions.
With new transactions and new waiting mechanism (see below)
system now untars the whole kernel tree in less than 3 seconds
over the GigE link (including subsequent sync, which
takes less than second always), while async NFS (remote side is tmpfs in both cases)
performs that in a bit more than 30 seconds.
In addition POHMELFS write speed is 125 MB/s (wire limit) vs. less
than 90 MB/s in NFS (dd from /dev/zero
with 1 MB block size and 1000 blocks).
That's what I call a good result.
Transaction mechanism invoked in writeback path is now completely
async too, i.e. it does not wait until remote side confirms that
transaction was received and processed, but writeback does not drop
transactions after sending function returned, instead it stores it
in the in-flight storage and proceeds with the next one.
Transaction can accumulate up to 90 pages in a single frame.
When reply is received, async thread searches for given transaction and
complete it (unlocks page, although it can be done in writeback,
since page is being copied, cleanup writeback bits, drops it from
appropriate radix tree and drops reference counter). If transaction
was not sent due to some error it will be tried to be sent to different
servers, if some error was returned from the server, it will be resent
to different ones. Since original writeback path does not know about
transactions in-flight anymore, any timeout has to be checked by
dedicated thread (or workqueue), which will detect too old transactions
(by simply checking them from the beginning, since each new transaction has
incrased id) and resend them to remote servers.
There is a small problem though - if object size is more than single
transaction can accumulate (90 pages), it will be split into several
transactions, where first one will contain object creation command
and some data to be written, while others will contain only data.
If server runs multiple threads per client (default is one though),
it is possible that not first transaction will be processed first,
so server will write some data into non-existent file, so transaction
will fail. There are two ways to fix this isuue: either wait in writeback
on client while creation transaction is completed, and then send all others
like described above, or add creation command into every subsequent transactions
until object is created on the server (special bit is set on local inode
in that case). Likely the latter is better case.
/devel/fs :: Link / Comments ()
|