|
|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Tue, 26 Jun 2007
First results in distributed storage development.
Pretty trivial: I've created a device mapper target which sends
data pages over the net. It can be configured
to work with any kind of network media (including various sockets).
Well, it is similar to network block device, but does not require
special process. BIOs remapped in distributed
storage target operates with pages, so with appropriate jumbo frame
setup there should be no copies at all (code uses ->sendpage()).
Right now code only sends data and does not receive it at all,
so appropriate protocol should be designed.
There is also open question about how BIOs/pages should be remapped.
Device mapper supports three types (which are not documented actually):
DM_MAPIO_SUBMITTED - this means that target code will process BIO
by itself - either put it and thus call end_io()
callbacks, or submit to another layers.
DM_MAPIO_REMAPPED - this must be returned when target
remapped all fields in place and will not work with given block IO request anymore,
so generic code can call generic_make_request(), process
that request further and eventually put BIO.
DM_MAPIO_REQUEUE - supposed to be used when BIO should be requeued
at end_io() time. It is used only in multiple path target.
There will be a map of the pages for local and remote nodes, according to appropriate
redundancy self-healing algorithm,
so local BIOs will be moved directly to the next layer via generic_make_request(),
and BIOs/pages for remote nodes will be processed accordingly. In both cases
DM_MAPIO_SUBMITTED path will be used (like now).
There might be a problem, when the same BIO will contain pages for local and remote node,
in that case BIO vector will be changed to only contain pages for local node.
Another problem I see is to how to dispatch reading and writing requests withouth locking the channel,
so that during single write requests there would be possible to read another ones.
Thinking...
/devel/dst :: Link / Comments (0)
UK visa.
I have been called from UK embassy about 30 minutes ago, although I was not asked,
but instead accounts department had a conversation, which did not know about my plans
and did not answered embassy questions besides that I do work there.
I want to think, that it will not be a show-stopper.
/devel/other :: Link / Comments (0)
fsblocks and buffer heads.
Nick Piggin from SuSE announced several days ago
his rework of the buffer_head interface, which is used as a layer between block layer
and filesystems. Its main goal is to obtain a memory region which directly
represents content of the storage if read, or memory region,
which will be written to the given position
on storage. Buffer heads have number of disadvantages,
mainly high overhead and possibility to deadlock writeout
(i.e. to write a page to disk to free it it might require
to have additional allocation). Interface is not that good too.
Fsblocks are supposed to fix that. Although it does have set of advantages over
buffer_heads already, not everyone is happy with approach - namelt XFS guys, who want to be able
to map arbitrary blocks to storage, namely extents, so better name
was suggested - 'buffer_heads on steroids' only due to existing limits
of both buffer_heads and fsblocks. So far, only minix filesystem was
converted to fsblocks, so there are quite a lot of work yet to be done.
This discussion forced me to recall my fs
design notes, one of the main issues I wanted was/is to avoid buffer_heads
usage at all (well, it is only needed to map a page without the ugly needs
to have a buffer_head for each block which can reside in the given page),
i.e. each object on the storage is always aligned to a page size (PAGE_CACHE_SIZE
actually, which is 4k), i.e. each inode will have about 100-200 bytes of control
information and then will have the rest of the page filled with file's a
or directory entries. That would greatly speedup small file lookup and processing
as long as directory reading (including fairly trivial directory readahead
absence of which is a serious limitation of ext234 filesystems, which leadsto major directory reading performance degradation when directory contains decent amount of files/subdirs).
Such approach can be described by something which gets good parts from both
extents and delayed allocation.
So far it is a silent sleeping idea, maybe I will discuss it on KS.
/devel/fs :: Link / Comments (0)
Distributed storage blog tag.
One can track this
tag for information about my distributed storage development.
So, to recall how to write in C (I did not do that for about a week or two),
I'm writing initial implemnetation for multiple device stack (the same that
supports software raid and dm-crypt). It will not be the latest revision obviously,
will not support interesting configuration techniques I think about,
will not have special failure-aware encoding (like Reed-Solomon or WEAVER),
but will only be used to create a storage on top of two devices - local
block device and remote one.
Initial goal is not even to achive sequential reading speed equal to
sum of speeds of both devices - only to make it really distributed and recall
what block layer is. Last time I worked with it about 4-5 years ago,
when created asynchronous block device
to test acrypto
async crypto layer.
Getting into account that I ported dm-crypt to acrypto fairly quickly, that should be that
complex task. Expect some news tomorrow.
Switched off...
/devel/dst :: Link / Comments (0)
|