Zbr's days.

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Tue, 15 Jan 2008

Direct IO with filesystem from the kernel and fast mapping for loop device.

Although every bit of the system is easily accessible from the kernel, it is quite hard to do filesystem related tasks, which are generally only performed from the userspace. For example to read and write files. Actually one can call the whole sys_open()/sys_read()/sys_write() path from the kernel, but it is quite slow and ineffective.
Likely the most common example is loop block device driver, which allows to make a usual file to look like a block device, so one can mount if, create files there and so on.
With time loop driver became more and more complex, I recall I my first block layer driver (async block device, which was similar to loop device, but allowed to perform a lot of operations asynchronously, it was used to test acrypto crypto system) was based on it.
Loop device is quite slow, so Jens Axboe (block layer maintainer) came into the game and extended it to support much faster mapping of the blocks to read/write from the kernel, than existing.
His first version was extended by Chris Mason (btrfs author among other), which basically moved mapping code into the filesystem, so address space operations were extended to include new callbacks called ->map_extent() and ->extent_io_complete().
The former is used to map offset inside the file into extent. Basically extent is a bigger than a block area on the disk, so far it is not supported by mainline tree (at least 2.6.24 tree), so one can consider this callback is a mapping from file offset into block number. Usually it is implemented by filesystem specific ->get_block() callback. Extent part of the patchset adds a special tree of extents, which can be addressed by offset in the address space, if there is no extent in the tree, it can be inserted. Extent creation is implemented via ->get_block().
Second callback, ->extent_io_complete(), is only used to invoke calling layer, when IO is completed, so far it is only used to show when hole filling is completed. Actually I do not know, how this callback can be used by classical filesystem, but copy-on-write ones should benefit greatly, since they automatically get a completion, which is async, so higher-layer tree can be updated. Classical filesystems already handle this situation though. Since it is only implemented for hole filling, it looks like a little hack :)

Here is Jens' first presentation, and here is Chris' presentation of the extent mapping code used to implement fast mapping in loop device.

/devel/fs :: Link / Comments ()