Zbr's days.

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Tue, 15 Jul 2008

Distributed storage development roadmap.

Yes, DST project is alive and will beat out the crap very soon, since I decided to change its underlying architecture, and switch to transaction model just like POHMELFS. This basically means that as long as system has enough RAM writing operations will be extremely fast, reading can be balanced between multiple nodes (in mirror), transactions can be resent, failover mechanism becomes much simpler, and system overall will be much more robust to failures.

Transaction model also means that system requires explicit acknowlege from remote side, and there are two possibilities here: two handle implicit ack which comes with TCP ack packets like I experimented before, and send explicit ack from server for each client's request.
\ The former approach although has smaller performance overhead, still suffers from the fact, that pages sent via DST are always stateless, i.e. at this layer there is no knowledge about who sends this page. We can determine inode page belongs to, can even get a socket when page is about to be released when ack has been received, but we can not know from exactly which PIPE it was submitted into given socket, so when multiple threads send the same page via miltiple sendfile() calls we do not know when and how page will be released. We can put pipes this page belong to into single-linked list (since page has only two unused at this point pointers: LRU list head, and one of them is used to determine that this page belongs to sendfile()/splice codepath), and likely traversing this list will not hurt usual users, but malicios one can create a local DoS with this approach. After some experiments with the splice code today I decided to drop this idea implementation for now.
There is a strong argument in favour of explicit acks from the server: this allows to make asynchronous transaction processing (with implicit acks we can not hook into processing path, since we do not know where exactly skb with our pages is chained), and this does not hurt perfromance (which was proven by POHMELFS benchmarks).

So, overall plan to develop DST is to switch to transaction model and perform async processing of all events (there are only two actually: reading and writing of the given pages to given locations).
This task is not that complex, so I expect some new results later this week. Stay tuned!

/devel/dst :: Link / Comments ()