|
|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Tue, 05 Dec 2006
Locking tricks inside trie algorithm.
This tree is never rebalanced by design, its pointers
are always the same except the case when node is added or removed,
thus it is possible to use RCU protection when traversing
or modifying trie structure - just using rcu_dereference()
and rcu_assign_pointer() it is possible to create completely
lockless network stack.
Let's consider trie modification which leads to node removing (when appropriate
reference counter becomes zero).
Trie removal algo deletes pointer
from root node to its child using rcu_assign_pointer() and
all subsequent subnodes and schedules node freeing through call_rcu() mechanism.
Trie traversal, which uses rcu_dereference(), will eventually fail
to dereference next pointer or will complete netchannel processing
and thus will exit RCU protected section, which will lead to freeing appropriate nodes.
Node addition is essentially the same - set of node insertions (if new trie
set of nodes requires allocations and insertions and does not reuse existing
inserted nodes) will be 'published' one by one through rcu_assign_pointer(),
which can be interrupted between different isnertions and lead to obtaining
NULL pointer, which is perfectly fine, since netchannel is not yet added.
RCU callbacks are scheduled for invocation after so called grace period is completed,
in practice it means that callback will be invoked some time after context switch happens
(surely since context has switched, section is no longer executed, thus it is ended),
so we need to protect our execution path from being scheduled away while traversing
the trie with rcu_read_lock()/rcu_read_unlock(), which is normal,
since searching path is executed in softirq context. So when searching
is completed new data is added into netchannels queue and appropriate user
is awakened (either userspace process or kernel thread/work queue for things that
do not have userspace process, for example netfilter checks,
for the latter case netchannel will be put into appropriate per-cpu list, accessible
from kernel thread, bound to given CPU).
Since netchannel processing happens in different context than searching, it means
that grace period has elapsed, thus netchannel and related nodes can be freed, while
netchannel will be accessed from processing context. To solve this problem
and increase performance I design special multeplexing schema which will be used in
softirq/hardirq fast path and allow better scalability by minimizing
number of context switches and data moving between CPU caches.
Addition is simple - while holding mutex, which will prevent simultaneous
deletion, it will allocate nodes one by one and publish them using
rcu_assign_pointer(), since fast path running in different context uses
rcu_dereference() it will either get new value with all above logic,
or get NULL pointer, thus returning without processing.
Addition will not be protected by rcu_read_lock()/rcu_read_unlock(),
since simultaneous deletion is not allowed due to mutex
being held, so sleepeing allocation can be used.
/devel/networking :: Link / Comments ()
|