|
|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Tue, 22 Apr 2008
Debunked copy_to_user() from kernel thread problem.
It happend to be really trivial. Even no VM hacking :(
First, some background on how copy_to_user() works on x86.
Its asm looks pretty simple (and it is very small, check
arch/x86/lib/usercopy_32.c:__copy_user()),
so I always wondered how it can handle missing-page-exception,
when userspace page was swapped out.
Things live in small part of the function: .section __ex_table,
this table contains two values: place where exception happend, and fixup address
(it is just instruction positions). Linker puts this table into special section,
accessible by page fault handler do_page_fault(). In some
cases page fault path is never executed, code just searches for page and locks it,
even if it is already in the table (that is why get_user_pages()
is at best as fast as copy_to_user()). This happens when
WP bit is not set and does not work
(a speculation only though, derived from __copy_to_user_ll()
and Intel F00F bug errata).
When WP bit works, we have usual copy_to_user(), which will
fault if there is no destination page, and do_page_fault() eventually
will be called. After number of checks system determines that it is exception
in kernel mode and if there is above exception table (which is true for
copy_to_user()), it tries to fix things up.
Here we come to essentially the same code, what is called in get_user_pages():
we locate VMA for failed address and insert new page into page table, this involves allocation
of all those strange 3-letters abbreviations: pgd, pud, pmd and pte ('and' is not VMM abbreviation yet),
I know what two or three of them mean, but completely forgot pud, on 4 level page table
it is hard to recall which two are the same, since iirc x86 has only 3 levels.
If page was swapped out, it will be brought back and eventually fault handler will
try to fix things up via fixup_exception(), which will
replace EIP with appropriate value from the section table described above, so that
CPU will return back to __copy_user() code and continue (or not, depending
on fact that page exists or not) its execution.
So, how to hook into above mechanism and allow completely different process to write data
into userspace? Quite trivially: above fixup (VMA searching and 3-letters abbreviation allocations)
happens for particular mm_struct, which contains VMA list, page table lock
and other (likely very) essential information to handle memory management. This structure is obtained
from the curent thread executed on the CPU, so by replacing mm_struct in our kernel thread with
userspace thread's one, we can safely copy data to and from userspace. There is a race of course,
when userspace thread will want to access its own mm_struct (copied to kernel thread) for example
calling mmap() or copy_*_user() from kernel, so we have to be careful and
properly guard against that.
Example code which does copy to userspace from kernel thread can be found in
archive. Just
replace kernel path in Makefile to your own, call make and insert module.
Each reading from /dev/tcopy file will end up with copy of data from kernel
to userspace in dedicated kernel thread.
/devel/other :: Link / Comments (2)
Please solve this captcha to be allowed to post (need to reload in a minute): 6 - 31
|
Zbr wrote at 2008-04-22 21:14: