Zbr's days.
February
Sun Mon Tue Wed Thu Fri Sat
       
11
     
2007
Months
Feb

About TODO Blog RSS Old blog Projects Gallery Notes

Sun, 11 Feb 2007

Linus on AIO. Limitations of the proposal. Practical example of weakness.


Linus Torvalds wrote on reply to tcp_sendmsg() example:

> Will you create a thread every time tcp_sendmsg() hits the send queue > limits?

No. You use epoll() for those.
I.e. we design Asynchronous IO, which is already limited to not be used with network?
I.e. AIO can not be used in anything connected to the network, since even if disc read/write will be asynchronous, sending will block and thus we just lose all possible advantages.

Continue:
There's a reason why a lot of UNIX system calls are blocking: they just don't make sense as event models, because there is no sensible half-way point that you can keep track of (filename lookup is the most common example).
Linus - blocking IS waiting for an event, which will remove that block.
Linux even uses wait_event_*() calls - don't you think that name has some sence?
Filename lookup is just an inode reading from disk - when it is done, filename is ready, that is an event.
And actually no one uses async filename lookup - people use open() syscall, which is perfectly eventable - block-removal event is readines of the opened file descriptor - it is even used in kevent AIO (surprise?) as a part of the async sendfile transfer state machine (but I must admit, that opening always happens in async mode as part of the state machine, so it will lose some ticks if things are perfectly in the cache, but practice shows that async sendfile is faster).

In another mail Linus continues to burn things out:
You use the AIO stuff for things that you *expect* to be almost instantaneous. Even if you actually start ten thousand IO's in one go, and they all do IO, you would hopefully expect that the first ones start completingn before you've even submitted them all. If that's not true, then you'd just be better off using epoll.
I.e. we should not use AIO for the case, when request really blocks, only when it is synchronous and maybe sometimes block.
Linus, direct IO (used by databases) blocks all the time, sync IO blocks all the time, network blocks, pipes block, readahead blocks - only the simplest case of reading from VFS cache does not block.

And eventually Linus proposes waiting for AIO events:
for (;;) {
	async(epoll);	/* wait for networking events */
	async_wait();	/* wait for epoll _or_ any of the outstanding file IO events */
	handle_completed_events();
}
Linus - you have just introduced a waiting for AIO events - i.e. new type of events, which are supposed to wrap async completions. And since every async syscall is that new event, we can wait for them in userspace loop.
You do not know, but kevent is supposed to wait on every possibly type of events - you do not need to wrap sync-events-waiting calls (like epoll()) into async helper and then wait for that - just register it with kevents where you are currently forks in patchset.

And to draw the line: AIO by micro-threads is not even supposed to work in environments where it will block all the time (like network or direct IO), instead in blocking environments events should be used, since they are much more scalable.

Micro-thread AIO sucks even in reading from file - practice example: if file is happend on bad block, reading will block for too long (seconds!), and system can be just killed with rescheduling when there are a lot of threads waiting for read completion on that blocks.

And to finally kill such design, here is another test I created.
Consider a directory with high number of inner dirs and files (hundreds), theirs total size is 3 times smaller than amount of RAM (1gb vs. 300 Mb) and several applications which run and randomly copy data from one file to another.
I've put several printks in __lock_page() (i.e. when requesting application blocks and thus new thread would be created) and watch a nice picture when upto hundred of blocks happend per second (and that is just for the case, when size of the test dir is 3 times smaller than RAM, what will happen when size of the dir will be more than amount of RAM I even do not want to imagine):
printk: 84 messages suppressed.
__lock_page: aio_new_thread: 6650.
printk: 118 messages suppressed.
__lock_page: aio_new_thread: 6769.

Conclusion: 'f toppku', i.e. into the furnace.

/devel/kevent/aio :: Link / Comments (0)

Please solve this captcha to be allowed to post (need to reload in a minute): 14 - 46

Comments are closed for this story.