|
|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Thu, 11 Jan 2007
Userspace threading and theirs benefits and drawbacks.
Benefits.
1. Fast scheduling.
There is no need to cross userspace/kernelspace boundary to schedule
new thread execution (just watch what happens with userspace network stack
compared to kernel's one when there are a lot of syscalls performed
for small packets receiving/sending).
2. Fast thread creation and destruction.
It just becomes an allocation of the structure in the userspace,
no need for full creation process which is performed
in clone() syscall.
3. Smaller number of cache misses.
Since there is only one process instead of several threads,
cache locality is increased greatly with reduced number
of misses.
Drawbacks.
1. Scheduling fairness.
Since kernel does not know about multiple threads behind given process,
it can not add it appropriate number of timeslices for execution.
Can be solved either by more tight collaboarion of the userspace nad kernelspace
schedulers or simply by increasing process' nice value.
2. All communications are performed through one kevent pipe.
Which can be problematic (although interface was specially designed to be scalable).
3. Complex code for good SMP scalability and userspace scheduler.
I wanted to put it into 'Benefits' section, since that is exactly why I started
this project.
/devel/threading :: Link / Comments (0)
Threading issues and ways to resolve them.
1. Signals.
POSIX requires that signal must be delivered on per-thread basis,
but signal handler, and thus the fact that signal is ignored or not,
is per-process property. With kevent's possibility
to deliver signals through its queue problem can be solved
in the very elegant way - main process receives a signal
event notification through its kevent queue and then check
all its threads, which have that signal unblocked, all appropriate
threads receives signal through the alternative signal stack.
2. Kevent/poll usage in the threads.
Poll() and select() must be translated into kevent request in syscall wrapper,
for example how I implemented epoll on top of kevent,
and then that event will be put into main kevent queue.
3. Sleep and the list system calls.
Kevent has timer notification which will be used to emulate such calls.
Call for POSIX timers can be emulated through kevent POSIX timers support,
but probably I will not consider this for initial implementation.
4. Blocking inter-process communications like semaphores.
It must be converted to userspace kevent notifications.
All above can look like it is old LinuxThreads days before NPTL, when
there was a special management thread which performed a lot of
that functionality (namely signal handling, resource cleaning, which is not
a problem for this new implementaion, since all resources will be automatically
cleaned when process exits, and no process-visible resources like file
descriptors are closed on thread cancellation, and signals can be
handled perfectly with kevent's capabilities), but now it has moved
into layer between kernel (or glibc for initial implementation)
and application (i.e. scheduler, I think it is correct name, since main
task of that layer is exactly scheduling). But actually it completely does not differ from what we
have right now with NPTL and 1-on-1 thread model - exactly the same
tasks are performed by kernel, but with additional layer crossing
overhead.
/devel/threading :: Link / Comments (0)
Initial thoughs about userspace threads (or M-on-N threading model).
Let's see, what we already have.
Glibc provides us makecontext()
and friends functions, which are essentially a
part of the userspace execution mechanism -
one can create context, run it, swap it and so one.
That is something I want to implement, except its
problems - context switch can be performed from the
outside thread (that is how IBM NGPT was implemented),
it is not the main issue, although I really do not like
such an approach, the main problem is the fact, that
if such a context is going to block, that fact can not be
detected from another contexts, and thus it is impossible
to swap context with another one. Even if some check will
be done in each syscall, or even if each syscall will be
a rescheduling point, that means that either each syscall
must be non-blocking, or the whole process will go to sleep
in syscall, since kernel does not know that there are
several context in the same process.
So, the solution is to have some kind of a thin layer
between kernel and userspace (in a real world
it is called glibc), which will convert all syscalls
into non-blocking operations (including nanosleep()
and the like), and keep a track of what each context performed.
In practice glibc rewrite is not what I would like to do,
but instead some layer on top of it will be implemented,
which will convert syscalls into kevent operations,
and become a rescheduling point. I will even consider to
implement not exactly known syscalls, but instead (at least
for the initial implementation) introduce new calls,
which will be a wrapper to known ones - like new_write()
will be a kevent and new threading model based wrapper,
which will setup all appropriate requests (like POLLIN)
and if possible, call write() itself. When all
execution context are put into the sleep, the whole
process will park itself in the waiting syscall like
kevent_get_events().
Main issues with such approach are following:
- scheduling algorithm
- SMP scalability
- syscall wrapper in the glibc or completely new calls (like described above)
At least first two issues are interesting technical challenges,
the last one will be first implemented with new calls.
/devel/threading :: Link / Comments (0)
Filesystem corruption bug recently found in Linux kernel.
LWN.net article about it
clearly shows how complex VFS is, but its conclusion and Linus
words about buffer heads are interesting. Conclusion is basically
'do not use buffer heads'. Indeed, all the time I worked with VFS (
kevent and AIO,
receiving zero-copy,
test block device for acrypto)
I never ever tried to use buffer heads - why is it needed, when
this days we operate with pages already - and eventually
filesystems operate with pages too - they have special set of
callbacks to write page, read them and so on.
So, filesystem must be simple
in that regard - do not split page into buffer heads, always work with pages and provide
appropriate callbacks where they are needed (inode operations at least),
and that is how my FS will work.
/devel/fs :: Link / Comments (0)
|