|
|
About
TODO
Blog
RSS
Old blog
Projects
Gallery
Notes
Mon, 17 Dec 2007
New distributed storage mirroring algorithm.
Resync logic - sliding window algorithm.
At startup system checks age (unique cookie) of the node and if it
does not match first node it resyncs all data from the first node in
the mirror to others (non-sync nodes), each non-synced node has a
window, which slides from the start of the node to the end.
During resync all requests, which enter the window are queued, thus
window has to be sufficiently small. When window is synced from the
other nodes, queued requests are written and window moves forward,
thus subsequent resync is started when previous window is fully completed.
When window reaches end of the node, it is marked as synchronized.
If age of the node matches the first one, but log contains different
number of write log entries compared to the first node (first node always
stands as a clean), then partial resync is scheduled.
Partial resync will also be scheduled when log entry pointed by resync
index of the node contains error.
Mechanism of this resync type is following: system selects a sync node
(checking each node's flags) and fetches a log entry pointed by resync
index of the given node and resync data from other nodes to given one.
Then it checks the rest of the write log and checks if there are
another failed writes, so that next resync block would be fetched for
them.
Mirroring log is used to store write request information.
It is allocated on disk and in memory (sync happens each time
resync work queue fires), and eats about 1% of free RAM or disk
(what is less). Each write updates log, so when node goes offline,
its log will be updated with error values, so that this entries
could be resynced when node will be back online. When number of
failed writes becomes equal to number of entries in the write log,
recovery becomes impossible (since old log entries were overwritten)
and full resync is scheduled.
This does not work well with the situation, when there are multiple
writes to the same locations - they are considered as different
writes and thus will be resynced multiple times.
The right solution is to check log for each write, better if log
would be not array, but tree.
/devel/dst :: Link / Comments (0)
Please solve this captcha to be allowed to post (need to reload in a minute): 48 - 13
|