Zbr's days.
December
Sun Mon Tue Wed Thu Fri Sat
           
17
         
2007
Months
Dec

About TODO Blog RSS Old blog Projects Gallery Notes

Mon, 17 Dec 2007

New distributed storage mirroring algorithm.

Resync logic - sliding window algorithm.

At startup system checks age (unique cookie) of the node and if it does not match first node it resyncs all data from the first node in the mirror to others (non-sync nodes), each non-synced node has a window, which slides from the start of the node to the end. During resync all requests, which enter the window are queued, thus window has to be sufficiently small. When window is synced from the other nodes, queued requests are written and window moves forward, thus subsequent resync is started when previous window is fully completed. When window reaches end of the node, it is marked as synchronized.

If age of the node matches the first one, but log contains different number of write log entries compared to the first node (first node always stands as a clean), then partial resync is scheduled. Partial resync will also be scheduled when log entry pointed by resync index of the node contains error.

Mechanism of this resync type is following: system selects a sync node (checking each node's flags) and fetches a log entry pointed by resync index of the given node and resync data from other nodes to given one. Then it checks the rest of the write log and checks if there are another failed writes, so that next resync block would be fetched for them.

Mirroring log is used to store write request information. It is allocated on disk and in memory (sync happens each time resync work queue fires), and eats about 1% of free RAM or disk (what is less). Each write updates log, so when node goes offline, its log will be updated with error values, so that this entries could be resynced when node will be back online. When number of failed writes becomes equal to number of entries in the write log, recovery becomes impossible (since old log entries were overwritten) and full resync is scheduled.

This does not work well with the situation, when there are multiple writes to the same locations - they are considered as different writes and thus will be resynced multiple times. The right solution is to check log for each write, better if log would be not array, but tree.

/devel/dst :: Link / Comments (0)

Please solve this captcha to be allowed to post (need to reload in a minute): 48 - 13

Comments are closed for this story.