Zbr's days.
January
Sun Mon Tue Wed Thu Fri Sat
 
11
     
2007
Months
Jan

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Thu, 11 Jan 2007

Userspace threading and theirs benefits and drawbacks.


Benefits.

1. Fast scheduling.
There is no need to cross userspace/kernelspace boundary to schedule new thread execution (just watch what happens with userspace network stack compared to kernel's one when there are a lot of syscalls performed for small packets receiving/sending).

2. Fast thread creation and destruction.
It just becomes an allocation of the structure in the userspace, no need for full creation process which is performed in clone() syscall.

3. Smaller number of cache misses.
Since there is only one process instead of several threads, cache locality is increased greatly with reduced number of misses.

Drawbacks.

1. Scheduling fairness.
Since kernel does not know about multiple threads behind given process, it can not add it appropriate number of timeslices for execution.
Can be solved either by more tight collaboarion of the userspace nad kernelspace schedulers or simply by increasing process' nice value.

2. All communications are performed through one kevent pipe.
Which can be problematic (although interface was specially designed to be scalable).

3. Complex code for good SMP scalability and userspace scheduler.
I wanted to put it into 'Benefits' section, since that is exactly why I started this project.

/devel/threading :: Link / Comments (0)


Threading issues and ways to resolve them.


1. Signals.
POSIX requires that signal must be delivered on per-thread basis, but signal handler, and thus the fact that signal is ignored or not, is per-process property. With kevent's possibility to deliver signals through its queue problem can be solved in the very elegant way - main process receives a signal event notification through its kevent queue and then check all its threads, which have that signal unblocked, all appropriate threads receives signal through the alternative signal stack.

2. Kevent/poll usage in the threads.
Poll() and select() must be translated into kevent request in syscall wrapper, for example how I implemented epoll on top of kevent, and then that event will be put into main kevent queue.

3. Sleep and the list system calls.
Kevent has timer notification which will be used to emulate such calls. Call for POSIX timers can be emulated through kevent POSIX timers support, but probably I will not consider this for initial implementation.

4. Blocking inter-process communications like semaphores.
It must be converted to userspace kevent notifications.

All above can look like it is old LinuxThreads days before NPTL, when there was a special management thread which performed a lot of that functionality (namely signal handling, resource cleaning, which is not a problem for this new implementaion, since all resources will be automatically cleaned when process exits, and no process-visible resources like file descriptors are closed on thread cancellation, and signals can be handled perfectly with kevent's capabilities), but now it has moved into layer between kernel (or glibc for initial implementation) and application (i.e. scheduler, I think it is correct name, since main task of that layer is exactly scheduling). But actually it completely does not differ from what we have right now with NPTL and 1-on-1 thread model - exactly the same tasks are performed by kernel, but with additional layer crossing overhead.

/devel/threading :: Link / Comments (0)


Initial thoughs about userspace threads (or M-on-N threading model).


Let's see, what we already have.
Glibc provides us makecontext() and friends functions, which are essentially a part of the userspace execution mechanism - one can create context, run it, swap it and so one. That is something I want to implement, except its problems - context switch can be performed from the outside thread (that is how IBM NGPT was implemented), it is not the main issue, although I really do not like such an approach, the main problem is the fact, that if such a context is going to block, that fact can not be detected from another contexts, and thus it is impossible to swap context with another one. Even if some check will be done in each syscall, or even if each syscall will be a rescheduling point, that means that either each syscall must be non-blocking, or the whole process will go to sleep in syscall, since kernel does not know that there are several context in the same process.

So, the solution is to have some kind of a thin layer between kernel and userspace (in a real world it is called glibc), which will convert all syscalls into non-blocking operations (including nanosleep() and the like), and keep a track of what each context performed. In practice glibc rewrite is not what I would like to do, but instead some layer on top of it will be implemented, which will convert syscalls into kevent operations, and become a rescheduling point. I will even consider to implement not exactly known syscalls, but instead (at least for the initial implementation) introduce new calls, which will be a wrapper to known ones - like new_write() will be a kevent and new threading model based wrapper, which will setup all appropriate requests (like POLLIN) and if possible, call write() itself. When all execution context are put into the sleep, the whole process will park itself in the waiting syscall like kevent_get_events().

Main issues with such approach are following:

  • scheduling algorithm
  • SMP scalability
  • syscall wrapper in the glibc or completely new calls (like described above)
At least first two issues are interesting technical challenges, the last one will be first implemented with new calls.

/devel/threading :: Link / Comments (0)


Filesystem corruption bug recently found in Linux kernel.


LWN.net article about it clearly shows how complex VFS is, but its conclusion and Linus words about buffer heads are interesting. Conclusion is basically 'do not use buffer heads'. Indeed, all the time I worked with VFS ( kevent and AIO, receiving zero-copy, test block device for acrypto) I never ever tried to use buffer heads - why is it needed, when this days we operate with pages already - and eventually filesystems operate with pages too - they have special set of callbacks to write page, read them and so on.
So, filesystem must be simple in that regard - do not split page into buffer heads, always work with pages and provide appropriate callbacks where they are needed (inode operations at least), and that is how my FS will work.

/devel/fs :: Link / Comments (0)