Zbr's days.

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Tue, 05 Dec 2006

Locking tricks inside trie algorithm.


This tree is never rebalanced by design, its pointers are always the same except the case when node is added or removed, thus it is possible to use RCU protection when traversing or modifying trie structure - just using rcu_dereference() and rcu_assign_pointer() it is possible to create completely lockless network stack.

Let's consider trie modification which leads to node removing (when appropriate reference counter becomes zero).
Trie removal algo deletes pointer from root node to its child using rcu_assign_pointer() and all subsequent subnodes and schedules node freeing through call_rcu() mechanism. Trie traversal, which uses rcu_dereference(), will eventually fail to dereference next pointer or will complete netchannel processing and thus will exit RCU protected section, which will lead to freeing appropriate nodes.

Node addition is essentially the same - set of node insertions (if new trie set of nodes requires allocations and insertions and does not reuse existing inserted nodes) will be 'published' one by one through rcu_assign_pointer(), which can be interrupted between different isnertions and lead to obtaining NULL pointer, which is perfectly fine, since netchannel is not yet added.

RCU callbacks are scheduled for invocation after so called grace period is completed, in practice it means that callback will be invoked some time after context switch happens (surely since context has switched, section is no longer executed, thus it is ended), so we need to protect our execution path from being scheduled away while traversing the trie with rcu_read_lock()/rcu_read_unlock(), which is normal, since searching path is executed in softirq context. So when searching is completed new data is added into netchannels queue and appropriate user is awakened (either userspace process or kernel thread/work queue for things that do not have userspace process, for example netfilter checks, for the latter case netchannel will be put into appropriate per-cpu list, accessible from kernel thread, bound to given CPU).

Since netchannel processing happens in different context than searching, it means that grace period has elapsed, thus netchannel and related nodes can be freed, while netchannel will be accessed from processing context. To solve this problem and increase performance I design special multeplexing schema which will be used in softirq/hardirq fast path and allow better scalability by minimizing number of context switches and data moving between CPU caches.

Addition is simple - while holding mutex, which will prevent simultaneous deletion, it will allocate nodes one by one and publish them using rcu_assign_pointer(), since fast path running in different context uses rcu_dereference() it will either get new value with all above logic, or get NULL pointer, thus returning without processing.
Addition will not be protected by rcu_read_lock()/rcu_read_unlock(), since simultaneous deletion is not allowed due to mutex being held, so sleepeing allocation can be used.

/devel/networking :: Link / Comments ()