Zbr's days.
June
Sun Mon Tue Wed Thu Fri Sat
       
 
2006
Months
Jun

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Fri, 30 Jun 2006

Vacations!


I'm going to vacations for one week: no phone, no internet access, no civilization, only river, canoe, vodka and friends.

/life :: Link / Comments (0)


Netchannels.


I've completed sending support port from userspace TCP/IP stack implementation into kernelspace netchannels, so I started cleanups and performance tuning.
Ok, first problem have been found: sending side does not honor zero-sized receiving window, so congestion control does not work correctly and retransmit engine seems to be broken.

/devel/networking :: Link / Comments (0)


Alternative TCP/IP stack.


The more I think about userspace TCP/IP stack, the more I like that idea.
Consider situation when a lot of very small packets are being sent from the host. Each sending requires at least one system call, so performance will suffer (in my 1Gbit lan I'm able to get only about 22 MB/sec with 92 bytes packet with more than 90% CPU usage), but if TCP/IP processing partially happens in userspace, then it is possible to combine a lot of packets before doing system call thus dramatically reduce CPU usage and improve performance.

/devel/networking :: Link / Comments (0)


Quotation of the week:

"With the 2.6.18 merge window open it's a great opportunity to break things as best I can.". (c) David Miller.

/devel/other :: Link / Comments (0)


Thu, 29 Jun 2006

Moscow is the most expensive city in the world.


Worldwide Cost of Living Survey 2006 rankings.

Believe me, Moscow does not deserve to be the most expensive city, although I do love it (heh, I wish there would not be such temperature drops of 80 degrees between summer and winter weather, and our officials would not be so greedy, and couple of other things...).

I think that survey was done from luxury point of view - that segment is really expensive in Moscow.

/life :: Link / Comments (0)


I've gotten Evegeniy Grishkovets' "The Planet" copy.


Excellent mood!

/life :: Link / Comments (0)


Wed, 28 Jun 2006

Netchannel.


After introducing fancy ACK generation algorithm I'm able to outperform socket receiving code with my alternative TCP/IP stack and netchannels (upto 3-4 MB (2^20 bytes) per sec).
Sending side is netcat which reads data from big file, receiving side is epoll() based socket client, tuned for maximum performance, and netchannel client.
CPU usage is higher for netchannel client, so it is next item for investigation.

/devel/networking :: Link / Comments (0)


Netchannel.


[17179816.844000] netchannel_recv: skb: f764a180, size: 83.
[17179816.852000] 
[17179816.852000] atcp_process_in: skb: f764a180, data_size: 31.
[17179816.860000] R 192.168.4.78:1234 <-> 192.168.0.48:1234 : 
	seq: 362885993, ack: 2138147038, win: 1448 [5792], doff: 8, 
	s: 0, a: 1, p: 1, r: 0, f: 0, len: 31, state: 1, skb: f764a180.
[17179816.876000] atcp_established: seq: 362885993, end_seq: 362886024, ack: 2138147038, 
	snd_una: 2138147038, snd_nxt: 2138147038, snd_wnd: 5792, rcv_nxt: 362885993, rcv_wnd: 5792, cwnd: 17.
[17179816.892000] atcp_check_retransmit_queue: removed: 0, in_flight: 0, cwnd: 18.
[17179816.900000] ofo queue: seq: 362885993, end_seq: 362886024.
[17179816.908000] ofo dump: 362885993 - 362886024, 
[17179816.912000] S 192.168.4.78:1234 <-> 192.168.0.48:1234 : 
	seq: 2138147038, ack: 362886024, win: 1448 [5792], doff: 8, 
	s: 0, a: 1, p: 0, r: 0, f: 0, len: 52, state: 1, skb: f7740c80, csum: 636a.
[17179816.928000] atcp_established: return: 31.
[17179816.932000] atcp_read_data: size: 40924, seq_read: 362885993.
[17179816.940000] Copy: seq_read: 362885993, seq: 362885993, end_seq: 362886024, 
	size: 40924, off: 0, data_size: 31, sz: 31, read: 0.
[17179816.952000] Unlinking: skb: seq: 362885993, end_seq: 362886024, seq_read: 362886024.
Kernel netchannels can connect to the remote host, establish connection and receive data.
I've not tested sending yet.
After large file transfer it's md5 checksum is correct.

/devel/networking :: Link / Comments (0)


Tue, 27 Jun 2006

Netchannel.


I've started to combine my extremely lightweight TCP/IP stack with in-kernel netchannel subsystem created earlier.
There is one problem with it: TCP/IP part of the linux network stack uses socket abstraction (and struct sock in particular) extensively to deliver packet to the NIC, so I need to get TCP/IP part from my userspace code completely and only replace route lookup and L1 data sending/receiving.

/devel/networking :: Link / Comments (0)


Acrypto.


I've reorganized acrypto archive.
It now contains patchsets subdir where all IPsec, dm-crypt, acrypto combined patchsets live.
Drivers subdir now only contains real acrypto drivers (HIFN, FCRYPT, VIA, likely IXP4xx soon) and old_patches subdir, where old development patches live.

/devel/acrypto :: Link / Comments (0)


Acrypto.


I've completed 2.6.17 port.

Well, previously described changes were imported in 2.6.18 tree, and I've rebased my acrypto tree against 2.6.18-git, not pure 2.6.17, so all porting work was done for the future release.
2.6.17 does not contain major changes in IPsec, so porting is simple.

/devel/acrypto :: Link / Comments (0)


Mon, 26 Jun 2006

Acrypto.


I've started acrypto port to 2.6.17. IPsec was noticebly changed from previous kernel release: there were splitted encapsulation mode processing into separate objects (one can find transport and tunnel modes when configuring the kernel now, they were embedded before), which means that each packet is processed in two different callbacks now, for example in ESP4 code to decrypt/encrypt data and in tunnel or transport callback which creates proper headers.

/devel/acrypto :: Link / Comments (0)


Sun, 25 Jun 2006

Climbing.


I got my friends to climbing zone today - that was fun time there.
I think I have some pedagogic talent (all of them have almost zero experience in climbing), since they finished quite a few simple traces each. Although not everyone likes my educational approach.
Ive also finished several traces - vertical and on negtive slope.
It was very good time there.

/life :: Link / Comments (0)


Sat, 24 Jun 2006

Mephody returned from Ireland.


He will be in Moscow for several days, so we decided to celebrate his visit. As usual I've reserved a place in "5 oborotov", where I met Wijo and Alexandra, Fedor and Ira, Alexander and Yuliana and Meph with Ira.
While sitting there we watched couple of footbal matches (I supported Mexico and Sweden - both failed), talked about life and spent very good time there.

/life :: Link / Comments (0)


Passive OS fingerprinting module OSF.


I've fixed 64bit issues and also added some heueristics about TTL and MSS checks.
OSF can now detect modern NMAP scan (although it can be wrong and confuse it with Win2003 Eternprise Server in about 10-20%).
It is available in archive.

/devel/networking :: Link / Comments (0)


Acrypto.


I've released new combined patchsets which contain major refactoring of all acrypto related prefixes from "crypto" to "acrypto".
As ususal, they are available in archive.

/devel/acrypto :: Link / Comments (0)


Fri, 23 Jun 2006

Passive OS fingerprinting.


I've started remote patch-o-matic repository creation and found that it does not work well on 64-bit systems, so I hacked it a little, but it is not done yet.

/devel/networking :: Link / Comments (0)


Software update.


I've removed Gnome and installed XFCE.
I've removed XFCE and installed Fluxbox.
Everything works quite good except that Firefox can not be installed using yum due to some broken dependencies in gnome libs, so I use older Mozilla, and aterm does not support UTF8, so I use xterm to read my mails.

/other :: Link / Comments (0)


Who is the looter?


After fresh reboot of my AMD64 3500+ with 1Gb of RAM desktop machine, I've noticed that more than 200MB of RAM are used just after login screen appeared. After couple of hours of work in default Gnome session in FC5 about 600MB are used (not including buffer cache). I use Firefox, Evolution (very slow on my machine, although I think it is not that bad hardware), gvim and a lot of Gnome terminals.
At home I use 1.4 Ghz Centrino laptop with 256 MB of RAM as desktop with Debian Sarge installation with Fluxbox and a lot of aterms - it works perfectly.
It looks like I'm going to remove Gnome (and setup XFCE which comes with FC5 install) and maybe FC5 later...

/other :: Link / Comments (0)


Kevent.


I've released a patch for IPv6 support for kevent.
It is not tested thought, but since socket code and TCP state machine are the same for IPv6 and IPv4 it should be enough.

/devel/kevent :: Link / Comments (0)


Thu, 22 Jun 2006

Kevent.


Kevent subsystem incorporates several AIO/kqueue design notes and ideas. Kevent can be used both for edge and level notifications. It supports socket notifications, network AIO (aio_send(), aio_recv() and aio_sendfile()), inode notifications (create/remove), generic poll()/select() notifications and timer notifications.

It was tested against FreeBSD kqueue and epoll and showed noticeble performance win.

Network asynchronous IO was tested against synchronous sockets and showed noticeble win.

I've created patchset against 2.6.17 git tree and sent it to netdev@ for review.
Patch is available in archive.

/devel/kevent :: Link / Comments (0)


Alternative TCP/IP stack.


After I created some stack optimisation, I'm able to run TCP/IP session with 128-bytes packet sizes with more than 52 MB/sec over packet socket (but sending speed slows down with a time).
I've checked that received data is correct, and it is.

Optimisation was fairly simple - do not send data into the net until MSS bytes have been collected when slow start is over.
Generic socket code with 128 bytes packet size shows about 30 MB/sec and about 75 MB/sec with 1280 bytes (that is the size of the data which my stack sends when collecting data). When I turn tcpdump on for socket test to emulate packet socket behaviour (which reads each packet from the wire) speed drops to 22 and 55 MB/sec for 128 and 1280 bytes packet size respectively.

/devel/networking :: Link / Comments (0)


Wed, 21 Jun 2006

Climbing.


It was not very good training today, since there were a lot of people and I tried to complete some traverses, which was almost impossible since all walls were occupied mostly near the floor.
I've tired but only found one interesting traverse and couple of boulderings, allmost all the time I tried to complete old stuff.

/life :: Link / Comments (0)


Alternative TCP/IP stack.


I've fixed RFC 793 issues and completed slow start and congestion control according to RFC 2001 and my tests. Now TCP performance is about 10.4-10.9 MB/sec through packet socket, while UDP is about 11-12 MB/sec.
And I see the way for major optimisations in my current code, which will match the way TCP works when slow start is over.

/devel/networking :: Link / Comments (0)


Ugh, bastards, they killed Kenny.


OSF (OS passive fingerprinting target for iptables) was removed from netfilter's patch-o-matic because of some celanup process.
So I plan to create remote repository for it.

/devel/networking :: Link / Comments (0)


Tue, 20 Jun 2006

Alternative TCP/IP stack.


I'm a looser: I've wrongly read RFCs.
Now, when I've found the root of the sending problem while cooking up my dinner, I expect to complete stack implementation tomorrow.

/devel/networking :: Link / Comments (0)


Alternative TCP/IP stack.


I've implemented sending support, which includes following features:

  • slow start.
  • congestion control.
  • retransmit queue (not tested, all it's processing happend during broken setup tests).

Slow start and congestion control were implemented in different manner than RFC 2001 says, since I only count packets in flight and compare them witn congestion window measured in segments.
There are a lot of issues yet to resolve with packet sizes.

Here is first sending benchmark for TCP/IP sending (128 byte packets) through packet socket.
Alternative TCP/IP stack. Sending (128 byte packets) through packet socket.

The main problem with this test, is that state machine is handled in userspace when some data is received from packet socket, that means, that after each packet sent to network I need to check if there is new data in the packet socket, so it is stricly synchronous and if remote side wants to delay it's reply sending side will just sleep.
After removing packet socket reading timeout I was able to increase performance to 6.2MB/sec, but it is still two times less than with UDP (13 MB/sec).
In both cases in_csum() which is implemented in C is the slowest function.

My implementation returns error when packet was not sent due to some congestion.
After congestion check tuning I was able to achieve 10.5 MB/sec.

/devel/networking :: Link / Comments (0)


Alternative TCP/IP stack.


Here is one of the typical fair comparison of my TCP/IP stack for UDP and TCP protocols. Test runs over packet socket.
Maximum observed performance for UDP socket was 13 MB/sec when test did not check if packet socket contains some data at all, which is not correct.
TCP continuously slows down which means that there is some problem in congestion control, which is the main problem to investigate now.

Alternative TCP and UDP performance graph. Vertical axis is speed of transfer over packet socket in MB/sec.
TCP and UDP performance.

/devel/networking :: Link / Comments (0)


Mon, 19 Jun 2006

Climbing.


Today I've run very intensive training - several traverses at start, then traces on negative slope and vertical wall with little rest and finish with some exercises on campus-board. My partner - local gurus Genady - forced to make rest pauses less after each climb, so at the end I was completely tired.
I's quite long ago when I last time climbed upto complete hand hummering, and it looks like new progresis has begun when I started to climb a lot on negative slopes.

/life :: Link / Comments (0)


Acrypto.


I've released new combined patchsets which change crypto context callback return value and thus might fail when crypto context is initialized.

/devel/acrypto :: Link / Comments (0)


Some fun from kernel hacker Anton Blanchard.

/other :: Link / Comments (0)


Sun, 18 Jun 2006

I believe in two things: sex and death,

and at least the last one you can not avoid.

Woody Allen films watching day.
And I tends to agree with him...

/life :: Link / Comments (0)


Alternative TCP/IP stack.


I've completed following things today:

  • timestamp option.
  • PAWS.
  • window scaling option.
  • header prediction.
  • some work on retransmit queue.

/devel/networking :: Link / Comments (0)


Sat, 17 Jun 2006

Alternative TCP/IP stack.


Supports out-of-order segments processing now.
OFO handling even does not require SACK generating, since it ACKs only when holes are filled. That in theory can lead to data starvation, but if you have such a broken link, so sender part stops transmiting and retransmits are lost too, so holes are not filled, SACK can not help there.
SACK would help in long fat pipes, where lost detection requires a lot of time, but I think this task does not have the highest priority right now.

Stack grew upto 2240 lines and 51Kb where TCP takes about 19Kb.

Also added nice input TCP options processor.

Strange timestamp issues sometimes happen with www.sun.com.
Timestamp value per packet

You can see here, how remote clock jumps for the last packet, although there were no noticeble delays in receiving side.
It is possible that it is some trivial jitter though.

/devel/networking :: Link / Comments (0)


Fri, 16 Jun 2006

Alternative TCP/IP stack.


Ok, there is another significant miss in my stack. It is out of order handling.
the most complex part is to find how to generate ACK and update various TCP specific parameters when received segment is out of order.
Interesting note, that when trying to get main page from www.microsoft.com, www.openbsd.org and some other interesting places reordering happens only with www.sun.com.
My stack does not support SACK yet, so I'm a bit lost in how to deal with it.
Currently my stack sets sequence number for ACK for out of order segment if it acknowledges perfectly valid segment, i.e. it sets sequnce number to the sequence number from previous (valid) packet plus size of that packet.

Here is typical picture of sequence numbers with packet reordering.
out of order packets graph

Green line is sequence numbers received from remote host, red line is correct in-order sequence numbers.

/devel/networking :: Link / Comments (0)


Thu, 15 Jun 2006

Alternative TCP/IP stack.


I've implemented TCP MSS and timestamp options (without PAWS receiving check), although the latter does not work with acknowledges yet.

That's how passive OS fingerprinting, which I ported to netfilter as OSF, recognized my stack:

Your address is: xxx.xxx.xxx.xxx
Your system is recognized as:

xxx.xxx.xxx.xxx:1111 - UNKNOWN [4096:51:1:56:T,N,N,M1460:Z:?:?] (up: 3195 hrs) -> 
213.134.128.25:80 (link: ethernet/modem)

P0f did not recognize your system. 
We would really appreciate if you could tell us more about the system using the form below. 
Thanks!

I can even tune it to look like windows or something like palm.

What is really missing in my implementation is retransmit queue, which is my main goal now.

/devel/networking :: Link / Comments (0)


Acrypto development.


Some brain-damaged hardware (like IXP4xx crypto processors) can not handle key exchange in run-time, so they must be somehow called before sessions with new key are queued for processing.

Yakov Lerner (iler.ml_gmail.com) gave me idea of so called crypto contexts, which holds information about crypto operations performed for given context, for example key and mode for IPsec or dm-crypt. That context can be created when new crypto user wants to start crypto processing, and allow to notify about various events all drivers, which are registered for those notifications. With above design IXP4xx hardware can register itself for key change notification, which generally happens in process context at least in dm-crypt and IPsec, and update it's hardware structures to be capable to process flow of crypto requests.

I've released new combined patchsets for 2.6.15 and 2.6.16 trees with above concept imeplemented. Patches can be found in archive.

/devel/acrypto :: Link / Comments (0)


Wed, 14 Jun 2006

Climbing.


Local gurus created new trace on negative slope in Skala-city so I tried to complete it several times, but definitely failed . Although the whole trace is not that complex, and if it was created on the vertical wall, I would finish it without serious problems, but I always had troubles climbing on negative slope, so it was not that easy.
But while climbing on negative slope I definitely see major progress both in technique and power endurance, so I will continue my training there.

/life :: Link / Comments (0)


Alternative TCP/IP stack.


I've implemented TCP sending. One can create echo server with my stack, although there is no accept() call, network channel must be created without wildcard remote system.

/devel/networking :: Link / Comments (0)


Woody Allen.


I've gotten 16 his films. Excellent mood!

/life :: Link / Comments (0)


Tue, 13 Jun 2006

Initial results of alternative TCP/IP stack implementation.


After 650Mb transfer of FC5 ISO using my TCP/IP userspace (through packet socket) implementation file's md5 checksum was correct.
The whole projects fits 1813 lines of code including comments which is 42k of code, where TCP part is about 12k.
Groovy!

Stack is fairly trivial, no options, no TCP extensions. Sending was not tested yet.
Next option in agenda is to check various loss and reordering issues.
Then I plan to implement PAWS and timestamp TCP option. MSS option would be usefull too.

I want to run various tests with congestions to collect some statistic about generic TCP/IP behaviour.

Here is some test snippet from running session. Remote sending part is Linux 2.6.16, local receiving is my TCP/IP stack. Netfilter on local machine is configured to silently drop packets for selected ports, so normal stack would not interfere with userspace TCP/IP state machine.

[s0mbre@uganda stack]$ sudo ./stack
create 192.168.0.48:1111 -> 192.168.4.78:1025, proto: 6, hit: 0, err: 0.
S 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 1619429707, ack: 0
 win: 1024, doff: 5, s: 1, a: 0, p: 0, r: 0, f: 0.
state change: 0 -> 2.
Connected.
+ 192.168.0.48:1111 <-> 192.168.4.78:1025, size: 24.
R 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 3672804101, ack: 1619429708
 win: 5840, doff: 6, s: 1, a: 1, p: 0, r: 0, f: 0, len: 0.
state change: 2 -> 1.
S 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 1619429708, ack: 3672804102
 win: 1024, doff: 5, s: 0, a: 1, p: 0, r: 0, f: 0.
+ 192.168.0.48:1111 <-> 192.168.4.78:1025, size: 37.
R 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 3672804102, ack: 1619429708
 win: 5840, doff: 5, s: 0, a: 1, p: 1, r: 0, f: 0, len: 17.
S 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 1619429708, ack: 3672804119
 win: 1024, doff: 5, s: 0, a: 1, p: 0, r: 0, f: 0.
recv [17]: qqqqqqqqqqqqqqqq

+ 192.168.0.48:1111 <-> 192.168.4.78:1025, size: 29.
R 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 3672804119, ack: 1619429708
 win: 5840, doff: 5, s: 0, a: 1, p: 1, r: 0, f: 0, len: 9.
S 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 1619429708, ack: 3672804128
 win: 1024, doff: 5, s: 0, a: 1, p: 0, r: 0, f: 0.
recv [9]: aaaaaaaa

+ 192.168.0.48:1111 <-> 192.168.4.78:1025, size: 47.
R 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 3672804128, ack: 1619429708
 win: 5840, doff: 5, s: 0, a: 1, p: 1, r: 0, f: 0, len: 27.
S 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 1619429708, ack: 3672804155
 win: 1024, doff: 5, s: 0, a: 1, p: 0, r: 0, f: 0.
recv [27]: qqqqqqqqqqqqqqqqqqqqqqqqqq

+ 192.168.0.48:1111 <-> 192.168.4.78:1025, size: 62.
R 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 3672804155, ack: 1619429708
 win: 5840, doff: 5, s: 0, a: 1, p: 1, r: 0, f: 0, len: 42.
S 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 1619429708, ack: 3672804197
 win: 1024, doff: 5, s: 0, a: 1, p: 0, r: 0, f: 0.
recv [42]: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

+ 192.168.0.48:1111 <-> 192.168.4.78:1025, size: 25.
R 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 3672804197, ack: 1619429708
 win: 5840, doff: 5, s: 0, a: 1, p: 1, r: 0, f: 0, len: 5.
S 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 1619429708, ack: 3672804202
 win: 1024, doff: 5, s: 0, a: 1, p: 0, r: 0, f: 0.
recv [5]: aaaa

+ 192.168.0.48:1111 <-> 192.168.4.78:1025, size: 28.
R 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 3672804202, ack: 1619429708
 win: 5840, doff: 5, s: 0, a: 1, p: 1, r: 0, f: 0, len: 8.
S 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 1619429708, ack: 3672804210
 win: 1024, doff: 5, s: 0, a: 1, p: 0, r: 0, f: 0.
recv [8]: qweqqwe

+ 192.168.0.48:1111 <-> 192.168.4.78:1025, size: 42.
R 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 3672804210, ack: 1619429708
 win: 5840, doff: 5, s: 0, a: 1, p: 1, r: 0, f: 0, len: 22.
S 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 1619429708, ack: 3672804232
 win: 1024, doff: 5, s: 0, a: 1, p: 0, r: 0, f: 0.
recv [22]: asdasdasdasdasdasdasd

+ 192.168.0.48:1111 <-> 192.168.4.78:1025, size: 50.
R 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 3672804232, ack: 1619429708
 win: 5840, doff: 5, s: 0, a: 1, p: 1, r: 0, f: 0, len: 30.
S 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 1619429708, ack: 3672804262
 win: 1024, doff: 5, s: 0, a: 1, p: 0, r: 0, f: 0.
recv [30]: zzzzzzzzzzzzzzzzzzzzzzzzzzzzz

+ 192.168.0.48:1111 <-> 192.168.4.78:1025, size: 25.
R 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 3672804262, ack: 1619429708
 win: 5840, doff: 5, s: 0, a: 1, p: 1, r: 0, f: 0, len: 5.
S 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 1619429708, ack: 3672804267
 win: 1024, doff: 5, s: 0, a: 1, p: 0, r: 0, f: 0.
recv [5]: zzzz

+ 192.168.0.48:1111 <-> 192.168.4.78:1025, size: 20.
R 192.168.0.48:1111 <-> 192.168.4.78:1025 : seq: 3672804267, ack: 1619429708
 win: 5840, doff: 5, s: 0, a: 1, p: 0, r: 0, f: 1, len: 0.

/devel/networking :: Link / Comments (0)


Alternative TCP/IP stack.


Ok, I've just asked in netdev@ about idea of using alternative TCP/IP stack for netchannels.

Let's see how quickly I will be suggested to make myself a lobotomy.

/devel/networking :: Link / Comments (0)


Alternative TCP/IP stack.


Design is fairly simple and is similar to linux network stack, following issues are already implemented:

  • routing table. It includes static destination routing and ARP cahce. No dynamic ARP probes.
  • socket-like interface for data reading.
  • ethernet sending/receiving support.
  • IP sending/receiving support.
  • initial TCP state machine implementation (three-way handshake).
Currently it works in userspace emulator over packet socket.
Netchannel will only have high-level protocol processing part, and will use existing in-kernel interfaces for low-level data sending/receiving.

Current code fits in 2k lines including packet socket processing and comments.

/devel/networking :: Link / Comments (0)


Site was down for weekend.

/other :: Link / Comments (0)


Fri, 09 Jun 2006

Climbing.


Today I shinned higher on my new favourite trace over striped holds. But negative slope does not allow me to complete big parts of the trace without hanging, so I complete couple of holds and hang, then other couple an hang again. But it looks like each day I increase distance between hangs, hopefully I will finish that trace completely someday.
Also completed old complex trace with different finish, run several long traverses and couple of old traces.
When I was at home I was unable even to turn laptop on... Excellent evening.

/life :: Link / Comments (0)


Alternative TCP/IP stack.


I've started own implementation of TCP/IP stack, which will be used for netchannels.

Stack internal structures and state machines are being designed with size and performance in mind, so it could be implemented in hardware.

/devel/networking :: Link / Comments (0)


World football championship 2006.


There is no football in Russia, since our team failed for several years already. But as usual there are a lot of empty ambitious.

/other :: Link / Comments (0)


Thu, 08 Jun 2006

Netchannels.


After some more tuning I was able to decrease netchannel CPU usage down and frequently less than socket one.
Netchannel performance is always better than socket one, and exceeds 84 MB/sec sometimes.
Although 1-2 MB/sec is not that big difference, but I expect it is the maximum, that can be achieved using existing socket code with given network parameters.

1gbit netchannel vs. socket benchamrk

Interesting note, that netchannel copy_to_user() setup outperforms memcpy() setup.

/devel/networking :: Link / Comments (0)


Netchannels.


After some netchannel code tuning I was able to achieve new results.
Performance graph included.

1gbit netchannel vs. socket benchamrk

Maximum netchannel result exceeds 83.9 MB/sec, while CPU usage is 2-3 % more, which definitely has roots described in the previous analysis.

There are two types of netchannel runs: with copy_to_user() setup and memcpy() setup (the former is /tmp/netchannel_tcp_netchannel.out graph and the later is /tmp/netchannel_tcp_netchannel_1.out graph). As you can see, copy_to_user() behaves sligtly worse than memcpy() one, which does not rise a question for 1500 bytes MTU.

New patch and userspace utilities are available in archive.

/devel/networking :: Link / Comments (0)


Wed, 07 Jun 2006

Climbing.


I've found new favoourite and the most complex trace I ever tried.
It runs on negative slope with several relief and passive holds. Naturally I failed to complete it on-sight, but trace itself is really exciting.
Training was finished with usual exercises on campus board, shower and excellent mood.

/life :: Link / Comments (0)


Netchannels.


I've updated performance test on netchannels homepage.
As you can see, netchannel performs slightly better than socket with the same socket options, but netchannel CPU usage is higher too. Various oprofile tests show that there are no netchannel specific calls on the top of the profiles, and CPU usage difference is about only 5%, so I think that it is the difference between calling both initial flags processing TCP code (sk->sk_backlog_rcv() aka tcp_v4_do_rcv()) and receiving processing code (tcp_recvmsg()) in process context (in netchannel case) when both calls are counted in CPU usage, and when only receiving processing code (tcp_recvmsg()) is run (in socket case).

After some socket options tuning I was able to run both netchannel and socket code at about 82-83 MB/sec, but netchannels more frequently showed higher performance. Netchannel CPU usage is still higher than socket one due to above reason.

/devel/networking :: Link / Comments (0)


Mon, 05 Jun 2006

Climbing.


It was hard training with Skala-city haunters.
It started with hour of various traverses, then several easy climbs and I've finished the day with quite complex trace on negative slope and couple of similar vertical ones.
Then usual excercises on campus board and I felt myself completely tired.
Good.

/life :: Link / Comments (0)


Sat, 03 Jun 2006

OpenBSD hackathon.


Very interesting articles on Kerneltrap about this year OpenBSD hackathon. There are a lot of exciting things hackers are going to implement.

There is one interesting thing: almost everyone says that he tired of Linux and Linux was bad and Linux...

/other :: Link / Comments (0)


Does Linux suck?


It's hard to believe, but I've gotten a DVD disk which is always failed to be read in Linux, but can be easily read in OpenBSD. Here is the dmesg:

hdc: media error (bad sector): status=0x51 { DriveReady SeekComplete Error }
hdc: media error (bad sector): error=0x34 { AbortedCommand LastFailedSense=0x03 }
ide: failed opcode was: unknown
end_request: I/O error, dev hdc, sector 410420
Buffer I/O error on device hdc, logical block 102605
hdc: media error (bad sector): status=0x51 { DriveReady SeekComplete Error }
hdc: media error (bad sector): error=0x34 { AbortedCommand LastFailedSense=0x03 }
ide: failed opcode was: unknown
end_request: I/O error, dev hdc, sector 410424
But Linux can be booted on 1024-way SMP system, on NUMA and supports tons of hardware I never saw and other unknown cruft, while usual DVD-ROM fails to read some disks, properly readable on other system.
Crap.

The more I work with Linux, the more I think it is overbloated.
So I will go and hack yet another overbloating unneded feature.

/other :: Link / Comments (0)


Fri, 02 Jun 2006

Climbing.


Today's hard training contained several simple long traverses, relief (3 runs in each direction without rest) traverse, and I've created new one - using hold only on the first shield for legs and arms. In Skala-city that shield is about 1 meter high, so for completeness try to not getting out of that length. I've found that it was the hardest trace I did for quite a long period - especially for legs.

/life :: Link / Comments (0)


Netchannel 1Gbit benchmark.


Benchmark is performed for one TCP netchannel which uses memcpy() into preallocated area, which could be mapped from userspace with 1Gbit link.

1gbit netchannel vs. socket benchamrk

As you can see, netchannels outperforms sockets, but it's CPU usage is higher too.
It is possible, that it is the price, i.e. socket code would increase it's CPU usage if it could increase it's processing speed.

Implementationis fairly ugly yet. It is required a some changes in generic TCP state machine processing logic to clean things up, so it would be performed not on top of sockets, but using skbs from queues with appropriate parameters (timeout, flags) provided either as new structure (so it would be embedded into struct sock and netchannel) or as function parameters.

Netchannels currently use two queue dereferencing to work with socket's queue processing:

  • from netchannel's queue which is filled in interrupt
  • from socket's queue which is filled in process context
which is a source of some speed problems too.

It still requires some thinking...

/devel/networking :: Link / Comments (0)


New toy.


My new BOSCH PST 650.

PST650

Soon, very soon I will start new development process...

/life :: Link / Comments (0)


Thu, 01 Jun 2006

Theatre.


I've visited "Story of seven hunged" drama in Tabakov theatre with Tanya Z.
Very interesting and hard story about last hours of life of some very different people.
I want to thank Tanya and Abr for that great time.

/life :: Link / Comments (0)


Hash comparison, take 2.


I was involved into linux-kernel@ discussion (did I say why I do not like linux-kernel@ mail list? Because of all this non-technical flames) about fairness of Jenkins hash in my previous tests. There is an opinion that following folding in my tests after jhash_2words():

	h = jhash_2words(faddr, laddr, ports);
	h ^= h >> 16;
	h ^= h >> 8;
leads to completely unfair hash distribution.
Well, I've created new test with and without above shifts and XORs and compared it with XOR hash used in TCP socket code.
Here is the result.

hash_comparison_2

So you can see exactly the same distribution for folded and not folded Jenkins hash distribution, and it's artifact compared to XOR hash. Running tests for 2^30 number of values confirms my results.

/devel/other :: Link / Comments (0)