|
|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Fri, 18 Jul 2008
Completed distributed storage redesign.
I also managed to play second octave F# and sometimes the whole chromatic scale
down to small (minor?) octave F on my trumpet, and I belive I started to understand
overall trumpet kung-fu, but expect it is not what you wanted to read under
DST tag.
So, DST becomes smaller, cleaner and simpler. Notably, I decided to drop userspace
target completely for now.
Kernel part now operates on transaction entity, which holds a reference to the node,
where data should be sent/received. There can be at most two such nodes if block IO
request spans the boundary. In case of mirroring (which will be dropped for the first release)
list of nodes to mirror this data to will be maintained by the first node, so transaction
will not need to know about them.
In theory block request can be as much as BIO_MAX_PAGES pages,
which is 256 for now, but I decided to limit minimum node size to be not smaller than
above bio limit, so there will be always at most two nodes per request.
Each node has either block device behind it (so it will just call generic_make_request()
with different block device for given bio), or network state machine.
Network state will have two threads: RX and TX. Receive one is used to get replies for the
read/write messages, search appropriate transaction and complete it.
In case of DST server it will also handle read/write requests and generate replies, but the whole
processing will be exactly the same, client node will have a switch to process read/write requests from
the network, but they should be only received by server.
Sending thread is tricky.
It is used as fallback for non-blocking sockets, which are used first at generic_make_request()
time, i.e. when higher level user performed read or write, if block was not fully sent,
then it is queued to this thread and it will try to send the rest of the data when
polling allows. ->make_request_fn() function returns in this case and higher
layer can proceed with own operations.
Transaction is not freed until reply is received from the remote side or resending retry
count fires.
Transaction is always allocated (from the appropriate memory pool) and that is actually
all allocations in DST itself. In case it works with block devices, it is possible to clone a bio,
when it crosses the boundaries (or even always, I have to check it, but it is essentially
what device mapper with lots of own additional allocations), but it should be very rare condition.
Network stack will allocate data itself too.
That was a theory. Practice tells me, that essentially 90% of the code should be rewritten
from scratch, so I recloned the tree and so far implemented generic bits of registering
block device, creating various sysfs files and directories and other similar trivial bits.
I still plan to finish it this weekend (without mirroring), but things may turn to me a different side though...
/devel/dst :: Link / Comments ()
|