Zbr's days.
January
Sun Mon Tue Wed Thu Fri Sat
   
15
   
2008
Months
Jan
Sep
Oct Nov Dec

About TODO Blog RSS Old blog Projects Gallery Notes

Tue, 15 Jan 2008

POHMELFS development progrees.

If you are curious about strange delay in POHMELFS development do not think it is closed or stuck, there is number of things I'm working on in this network filesystem and delay is only because of administrivia steps about my testing environment and things like that...
Now it seems things settled down and I have some news.
First, it supports object creation in the filesystem, so far only regular files, but directories, links and directories is just a matter of additional flags, so it is simple. Second, it supports object removal (tested on files only though). It does not support file writing yet, and all metadata operations described above (removing and creation) perform network sending and receiving (removing can be done in local cache only).
I will write more detailed explaination of the operations involved just after directory/link creation is ready, likely tomorrow.

/devel/fs :: Link / Comments (0)


BTRFS 0.10 has been released.

Chris Mason announced new release of the BTRFS filesystem. According to changelog, this version contains pretty serious changes:

  • on-disk format changes, now it supports back references from every data and metadata blocks. This allows future extensions like implementation of the on-line fsck (a question rises, why is it ever needed for COW FS?) and to allow data migration between different devices.
  • online resizing (including shrinking)
  • in-place conversation from ext3 to btrfs :) Although it is offline only, it is a very good step for easier migration for users. The conversion program uses the copy on write nature of Btrfs to preserve the original Ext3 FS, sharing the data blocks between Btrfs and Ext3 metadata. Btrfs metadata is created inside the free space of the Ext3 filesystem, and it is possible to either make the conversion permanent (reclaiming the space used by Ext3) or roll back the conversion to the original Ext3 filesystem.
  • data=ordered support. (Probably it is option of the transactin log journal)
  • mount options to disable checksumming and COW (the latter explains a lot about fsck and journalling)
  • barrier supports
From the changelog observation only, it looks really impressive, my congratulations for the project, although list of not fixed bugs worries a bit, but I'm pretty sure, things will be fixed.

/devel/fs :: Link / Comments (0)


Direct IO with filesystem from the kernel and fast mapping for loop device.

Although every bit of the system is easily accessible from the kernel, it is quite hard to do filesystem related tasks, which are generally only performed from the userspace. For example to read and write files. Actually one can call the whole sys_open()/sys_read()/sys_write() path from the kernel, but it is quite slow and ineffective.
Likely the most common example is loop block device driver, which allows to make a usual file to look like a block device, so one can mount if, create files there and so on.
With time loop driver became more and more complex, I recall I my first block layer driver (async block device, which was similar to loop device, but allowed to perform a lot of operations asynchronously, it was used to test acrypto crypto system) was based on it.
Loop device is quite slow, so Jens Axboe (block layer maintainer) came into the game and extended it to support much faster mapping of the blocks to read/write from the kernel, than existing.
His first version was extended by Chris Mason (btrfs author among other), which basically moved mapping code into the filesystem, so address space operations were extended to include new callbacks called ->map_extent() and ->extent_io_complete().
The former is used to map offset inside the file into extent. Basically extent is a bigger than a block area on the disk, so far it is not supported by mainline tree (at least 2.6.24 tree), so one can consider this callback is a mapping from file offset into block number. Usually it is implemented by filesystem specific ->get_block() callback. Extent part of the patchset adds a special tree of extents, which can be addressed by offset in the address space, if there is no extent in the tree, it can be inserted. Extent creation is implemented via ->get_block().
Second callback, ->extent_io_complete(), is only used to invoke calling layer, when IO is completed, so far it is only used to show when hole filling is completed. Actually I do not know, how this callback can be used by classical filesystem, but copy-on-write ones should benefit greatly, since they automatically get a completion, which is async, so higher-layer tree can be updated. Classical filesystems already handle this situation though. Since it is only implemented for hole filling, it looks like a little hack :)

Here is Jens' first presentation, and here is Chris' presentation of the extent mapping code used to implement fast mapping in loop device.

/devel/fs :: Link / Comments (0)