Zbr's days.

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Mon, 03 Apr 2006

AIO sendfile.


Let's briefly describe how generic reading is being done in Linux kernel.
When chunk of file data is not in VFS cache, it is requested using block layer, which bio->bi_end_io callback, generally mpage_end_io_read(), is called in hard IRQ context and only marks pages as uptodate or error and unlock the pages.
Generally cold pages from VFS cache, i.e. pages which were not used, but inserted recently, are used for this technique.
Network processing code must wait on those pages before it can start use them until pages are marked as uptodate.
AIO sendfile's state machine is invoked in bio->bi_end_io callback, in which it basicaly does the same as mpage_end_io_read() plus schedules reading of the next chunk of file data.
Let's compare what happens when we call do_generic_file_read() and how kevent based AIO sendfile works.

  • do_generic_file_read() runs through all pages which would contain data from requested region of the file, and if file data is not in VFS cache, it invokes block layer through mapping->a_ops->readpage(), which is likely mpage_readpage() or block_read_full_page(), and then wait on VFS pages to become ready. Selected VFS page is then provided to specified from higher layer actor function, which actually copies data or send it over the net.
  • Current AIO approach does almost the same. It has kevent which has array of preallocated clean pages, which are used either for actor function, which is used for uptodate pages found in VFS cache; or for block layer mechanism, which is very similar to mpage_readpage(), but it's bio->bi_end_io not only marks pages as ready, but also invokes kevent state machine. Then new work is scheduled in callback invoked from kevent state machine, which will process kevent's pages and then will start reading of the next chunk of data.

So, what is the difference between the two approaches?
Basically, synchronous buffered reading handles data in page sized chunks, and if there are no data in VFS cache, it allocates new page, inserts it into VFS cache, and waits until data is read there. Then VFS page is procesed by higher layer actor function. Synchronous buffered reading is never interrupted (I mean do_generic_file_read() does not exit except on errors) between different page processing.
AIO approach works similar, but it does not populate page read into VFS cache, uses bigger set of pages and is interrupted each time given number of pages has been processed, i.e. only predefined number of pages in kevent are processed in a time, while buffered reading process all requested data.

Since AIO processing happens in work queue, we should add here an overhead for process switching after predefined number of pages in kevent has been processed.

:: Link / Comments ()