Zbr's days.
December
Sun Mon Tue Wed Thu Fri Sat
         
5
           
2006
Months
Dec

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Tue, 05 Dec 2006

Due to personal issues I will be offline for several days.

/life :: Link / Comments (0)


Badness in postponing work.


RCU is not a perfect solution for existing schemas actually. Lets consider synchronous situation when object freeing does not only return memory to the cache, but also perform some additional state machine changes - like putting reference counters and so on - obviously postponing of such work likely will not be very good solution, since when we work with fast path, we want the whole sequence be completed quickly, but not fast first half and postpone other for a long time.
But even pure memory freeing - i.e. returning mempory to the cache, if being postponed to RCU callback, can lead to very noticeble performance degradation.
Thinking about better and faster skb processing for netchannels I created simple patch which just postpones skb freeing (kfree_skbmem(), i.e. pure releasing memory back to skb cache) to RCU callback invoked from __kfree_skb(). This leads to the following performance degradation (receiving of small packets):
kfree_skb() RCU speed degradation

Speed is about 2.5 times slower, although CPU usage is smaller too - likely due to the increased work of RCU tasklet and increased number of context switches.

As a conclusion: using RCU protected lists of skbs for sockets will lead to major performance degradation.
As a second conclusion: RCU is not a panacea, so its usage will be limited in netchannels.

/devel/networking :: Link / Comments (0)


Locking tricks inside trie algorithm.


This tree is never rebalanced by design, its pointers are always the same except the case when node is added or removed, thus it is possible to use RCU protection when traversing or modifying trie structure - just using rcu_dereference() and rcu_assign_pointer() it is possible to create completely lockless network stack.

Let's consider trie modification which leads to node removing (when appropriate reference counter becomes zero).
Trie removal algo deletes pointer from root node to its child using rcu_assign_pointer() and all subsequent subnodes and schedules node freeing through call_rcu() mechanism. Trie traversal, which uses rcu_dereference(), will eventually fail to dereference next pointer or will complete netchannel processing and thus will exit RCU protected section, which will lead to freeing appropriate nodes.

Node addition is essentially the same - set of node insertions (if new trie set of nodes requires allocations and insertions and does not reuse existing inserted nodes) will be 'published' one by one through rcu_assign_pointer(), which can be interrupted between different isnertions and lead to obtaining NULL pointer, which is perfectly fine, since netchannel is not yet added.

RCU callbacks are scheduled for invocation after so called grace period is completed, in practice it means that callback will be invoked some time after context switch happens (surely since context has switched, section is no longer executed, thus it is ended), so we need to protect our execution path from being scheduled away while traversing the trie with rcu_read_lock()/rcu_read_unlock(), which is normal, since searching path is executed in softirq context. So when searching is completed new data is added into netchannels queue and appropriate user is awakened (either userspace process or kernel thread/work queue for things that do not have userspace process, for example netfilter checks, for the latter case netchannel will be put into appropriate per-cpu list, accessible from kernel thread, bound to given CPU).

Since netchannel processing happens in different context than searching, it means that grace period has elapsed, thus netchannel and related nodes can be freed, while netchannel will be accessed from processing context. To solve this problem and increase performance I design special multeplexing schema which will be used in softirq/hardirq fast path and allow better scalability by minimizing number of context switches and data moving between CPU caches.

Addition is simple - while holding mutex, which will prevent simultaneous deletion, it will allocate nodes one by one and publish them using rcu_assign_pointer(), since fast path running in different context uses rcu_dereference() it will either get new value with all above logic, or get NULL pointer, thus returning without processing.
Addition will not be protected by rcu_read_lock()/rcu_read_unlock(), since simultaneous deletion is not allowed due to mutex being held, so sleepeing allocation can be used.

/devel/networking :: Link / Comments (0)