|
|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Tue, 15 Jan 2008
Direct IO with filesystem from the kernel and fast mapping for loop device.
Although every bit of the system is easily accessible from the kernel,
it is quite hard to do filesystem related tasks, which are generally only
performed from the userspace. For example to read and write files. Actually
one can call the whole sys_open()/sys_read()/sys_write() path
from the kernel, but it is quite slow and ineffective.
Likely the most common example is loop block device driver, which allows
to make a usual file to look like a block device, so one can
mount if, create files there and so on.
With time loop driver became more and more complex, I recall I my first
block layer driver (async block device,
which was similar to loop device, but allowed to perform a lot of operations
asynchronously, it was used to test acrypto
crypto system) was based on it.
Loop device is quite slow, so Jens Axboe (block layer maintainer) came into the game and
extended it to support much faster mapping of the blocks to read/write from the kernel,
than existing.
His first version was extended by Chris Mason (btrfs
author among other), which basically moved mapping code into the filesystem,
so address space operations were extended to include new callbacks called
->map_extent() and ->extent_io_complete().
The former is used to map offset inside the file into extent. Basically extent is a bigger than
a block area on the disk, so far it is not supported by mainline tree (at least 2.6.24 tree),
so one can consider this callback is a mapping from file offset into block number. Usually it is
implemented by filesystem specific ->get_block() callback. Extent part of the patchset
adds a special tree of extents, which can be addressed by offset in the address space, if there
is no extent in the tree, it can be inserted. Extent creation is implemented via ->get_block().
Second callback, ->extent_io_complete(), is only used to invoke calling layer, when
IO is completed, so far it is only used to show when hole filling is completed. Actually I do not know,
how this callback can be used by classical filesystem, but copy-on-write ones should benefit greatly,
since they automatically get a completion, which is async, so higher-layer tree can be updated. Classical
filesystems already handle this situation though. Since it is only implemented for hole filling, it looks
like a little hack :)
Here is Jens' first presentation,
and here is Chris' presentation
of the extent mapping code used to implement fast mapping in loop device.
/devel/fs :: Link / Comments ()
|