|
|
About
TODO
Blog
RSS
Old blog
Projects
Gallery
Notes
Mon, 17 Dec 2007
New release of the distributed storage: Dancing with the smoked neutrino.
Short changelog:
- new improved mirroring algorithm.
This algorithm uses sliding window approach for full resync
and write log for partial resync.
- fixed number of typos and debug cleanups
- update inode size when linear algorithm changes the size of the
storage in run time
- extended number of sysfs files and documentation for them
- fixed leak in local export node setup
- name is 'Dancing with the smoked neutrino' now
Overall list of features of the DST can be found on project's
homepage.
DST is also exported as a git tree available for clone and pull from
here.
Interested reader can test DST with 2.6.23 tree too
(it should compile fine, but was not tested).
/devel/dst :: Link / Comments (4)
New distributed storage mirroring algorithm.
Resync logic - sliding window algorithm.
At startup system checks age (unique cookie) of the node and if it
does not match first node it resyncs all data from the first node in
the mirror to others (non-sync nodes), each non-synced node has a
window, which slides from the start of the node to the end.
During resync all requests, which enter the window are queued, thus
window has to be sufficiently small. When window is synced from the
other nodes, queued requests are written and window moves forward,
thus subsequent resync is started when previous window is fully completed.
When window reaches end of the node, it is marked as synchronized.
If age of the node matches the first one, but log contains different
number of write log entries compared to the first node (first node always
stands as a clean), then partial resync is scheduled.
Partial resync will also be scheduled when log entry pointed by resync
index of the node contains error.
Mechanism of this resync type is following: system selects a sync node
(checking each node's flags) and fetches a log entry pointed by resync
index of the given node and resync data from other nodes to given one.
Then it checks the rest of the write log and checks if there are
another failed writes, so that next resync block would be fetched for
them.
Mirroring log is used to store write request information.
It is allocated on disk and in memory (sync happens each time
resync work queue fires), and eats about 1% of free RAM or disk
(what is less). Each write updates log, so when node goes offline,
its log will be updated with error values, so that this entries
could be resynced when node will be back online. When number of
failed writes becomes equal to number of entries in the write log,
recovery becomes impossible (since old log entries were overwritten)
and full resync is scheduled.
This does not work well with the situation, when there are multiple
writes to the same locations - they are considered as different
writes and thus will be resynced multiple times.
The right solution is to check log for each write, better if log
would be not array, but tree.
/devel/dst :: Link / Comments (0)
|