Zbr's days.
July
Sun Mon Tue Wed Thu Fri Sat
   
15
   
2008
Months
Jul
Nov Dec

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Tue, 15 Jul 2008

Distributed storage development roadmap.

Yes, DST project is alive and will beat out the crap very soon, since I decided to change its underlying architecture, and switch to transaction model just like POHMELFS. This basically means that as long as system has enough RAM writing operations will be extremely fast, reading can be balanced between multiple nodes (in mirror), transactions can be resent, failover mechanism becomes much simpler, and system overall will be much more robust to failures.

Transaction model also means that system requires explicit acknowlege from remote side, and there are two possibilities here: two handle implicit ack which comes with TCP ack packets like I experimented before, and send explicit ack from server for each client's request.
\ The former approach although has smaller performance overhead, still suffers from the fact, that pages sent via DST are always stateless, i.e. at this layer there is no knowledge about who sends this page. We can determine inode page belongs to, can even get a socket when page is about to be released when ack has been received, but we can not know from exactly which PIPE it was submitted into given socket, so when multiple threads send the same page via miltiple sendfile() calls we do not know when and how page will be released. We can put pipes this page belong to into single-linked list (since page has only two unused at this point pointers: LRU list head, and one of them is used to determine that this page belongs to sendfile()/splice codepath), and likely traversing this list will not hurt usual users, but malicios one can create a local DoS with this approach. After some experiments with the splice code today I decided to drop this idea implementation for now.
There is a strong argument in favour of explicit acks from the server: this allows to make asynchronous transaction processing (with implicit acks we can not hook into processing path, since we do not know where exactly skb with our pages is chained), and this does not hurt perfromance (which was proven by POHMELFS benchmarks).

So, overall plan to develop DST is to switch to transaction model and perform async processing of all events (there are only two actually: reading and writing of the given pages to given locations).
This task is not that complex, so I expect some new results later this week. Stay tuned!

/devel/dst :: Link / Comments (5)

roelof wrote at 2008-07-16 16:01:

would DST benefit from tipc (the kernel api), as opposed to tcp?

Zbr wrote at 2008-07-16 17:23:

I.e. Transparent Inter Process Communication from Ericsson? It has somewhat orthogonal design goals than DST. Although DST can use any underlying network protocol, if it provides socket interface, so one can use TIPC for DST too, although I'm not sure it will have any benefit.

KernelPanic wrote at 2008-07-16 20:21:

Yeah!!!! Im waiting your work :D

roelof wrote at 2008-07-18 13:57:

sorry, maybe i should've explained the idea better :-)

i am proposing tipc instead of tcp, i.e. network protocol.

tipc provides a topology service, so you can get cluster membership events for free. no need to modify DST node membership at all via e.g. sysfs

then DST may benefit from the additional addressing modes, e.g. round-robin, or send to all (or subset) instances of specific service, etc.

this may allow one to implement data distribution based on Adi Shamir's "How to share a secret" [1]

haven't really looked at the kernel interface, but the user space api provides the normal bsd socket interface.

[1] http://www.caip.rutgers.edu/~virajb/readinglist/shamirturing.pdf

Zbr wrote at 2008-07-18 22:00:

TIPC provides socket interface and can be used as DST transport, if userspace will initialize it appropriately (it is quite trivial task to do), but TIPC hides all its features and provides non-trival access to them. I'm not sure they will be ever used in DST, since it is quite simple protocol.

Please solve this captcha to be allowed to post (need to reload in a minute): 51 - 16

Name:
URL (optional):
Captcha:
Comments: