Zbr's days.
January
Sun Mon Tue Wed Thu Fri Sat
   
31    
2008
Months
Jan
Aug Sep
Oct Nov Dec

About TODO Blog RSS Old blog Projects Gallery Notes

Thu, 31 Jan 2008

BTRFS subvolumes.

Chris Mason created a short specification for the subvolumes in BTRFS. Subvolumes allow filesystem to allocate blocks on several devices and use tricky algoritms to distributed the load between storages.
Overall this is excellent idea, but specification rises some questions and I belive it is too heavily tied to ZFS design.

I will drop my thoughts here, which may be completely wrong though.

Here are some features btrfs will support with subvolume implementation: Mirrored metadata, configurable up to N mirrors (where N > 2); Mirrored data extents; Checksum failure resolution by using a mirrored copy; Striped data extents and others.

They are clear targets for block layer, but there are following notes on why it is not:

If Btrfs were to rely on device mapper or MD for mirroring, it would not be able to resolve checksum failures by checking the mirrored copy. The lower layers don't know the checksum or granularity of the filesystem blocks, and so they are not able to verify the data they return.
Well, that's not entirely correct, since checksum has to be checked not against other mirror, but against data itself (i.e. it has to be recalculated after read), since during transfer data can be damaged and it is not that rare condition. Thus checksums from different mirror can be both be wrong, but equal, which without recalculating can sign that everything is ok, while it does not.
Recalculating block checksum can be faster for smaller blocks than reading it from other disk.

If Btrfs were to rely on device mapper for aggregating all of the physical devices into a single big address space, it would not have sufficient information to allocate mirrored copies on different devices. Keeping this information in sync between Btrfs and the device mapper would be difficult and error prone.
Actually it is very simple. DST supports such iteraction for example.

Instead I propose and will use following scheme for subvolumes (I like the name) in local filesystem: there is pool of devices, and there are allocation policies for each one in the following form (just an example): files with '*.jpg' pattern are allocated from device 1, '*.log' from device 2, metadata is stored on device 3, small files are allocated on device 4, and so on. Then each device has own policy on mirroring its data to needed number of storages.

And, a side note, it looks like Chris Mason uses Mac OSX for development or at least for writing documentation, since a screenshot of high-level design clearly has Mac's shadows and fonts :)

/devel/dst :: Link / Comments (0)

Please solve this captcha to be allowed to post (need to reload in a minute): 37 * 71

Comments are closed for this story.