Zbr's days.
January
Sun Mon Tue Wed Thu Fri Sat
       
2006
Months
Jan

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Tue, 31 Jan 2006

Van Jacobson's LCA2006 slides.

:: Link / Comments (0)


Mon, 30 Jan 2006

Van Jacobson's Net Channels.


Here is some compilation from David Miller blog about "A modest proposal to help speed up & scale up the linux networking stack" by Van Jacobson and Bob Felderman.

Underlying all of this talk was the basic Internet design precept of pushing work to the end nodes. This is the only way to design systems that truly scale. The middle is implemented as simply as possible, they just push packets around, and the real compute and "work" on the packet is done at the end hosts. The larger the network gets, the more compute power you have at the end nodes, and thus the system scales up.

With SMP systems this "end host" concept really should be extended to the computing entities within the system, that being cpus and threads within the box.

System must stop doing so much work in interrupt (both hard and soft) context. Jamal Hadi Salim and others understood this quite well, and NAPI is a direct consequence of that understanding. But what Van is trying to show in his presentation is that you can take this further, in fact a _lot_ further.

A Van Jacobson channel is a path for network packets. It is implemented as an array'd queue of packets. There is state for the producer and the consumer, and it all sits in different cache lines so that it is never the case that both the consumer and producer write to shared cache lines. Network cards want to know purely about packets, yet for years we've been enforcing an OS determined model and abstraction for network packets upon the drivers for such cards. This has come in the form of "mbufs" in BSD and "SKBs" under Linux, but the channels are designed so that this is totally unnecessary. Drivers no longer need to know about what the OS packet buffers look like, channels just contain pointers to packet data.

At the first step we just have one channel, that goes to a generic routine in the generic network device code that attaches the packet to an OS network packet data structure and passes it into the normal input path. So the driver interrupt handler just puts packets into the channel, and the software interrupt sucks them out and passes them into the stack. At this first step, drivers no longer need to know about OS packet buffer data structures. Van stated that a channel'ized e1000 driver gets 200 lines of code removed from the fast paths, a non-trivial feat.

The next step is to build channels to sockets. We need some intelligence in order to map packets to channels, and this comes in the form of a tiny packet classifier the drivers use on input. It reads the protocol, ports, and addresses to determine the flow ID and uses this to find a channel. If no matching flow is found, we fall back to the basic channel we created in the first step. As sockets are created, channel mappings are installed and thus the driver classifier can find them later. The socket wakes up, and does protocol input processing and copying into userspace directly out of the channel.

And in the next step you can have the socket ask for a channel ID (with a getsockopt or something like that), have it mmap() a receive ring buffer into user space, and the mapped channel just tosses the packet data into that mmap()'d area and wakes up the process. The process has a mini TCP receive engine in user space.

:: Link / Comments (0)


David's Miller keynote at linux.conf.au.


Linux TCP Developments & Kernel Developer Social Interactions:

:: Link / Comments (0)


Thu, 26 Jan 2006

Nework asynchronous IO.


Design notes.
Network AIO is based on kevent and works as usual kevent storage on top of inode. When new socket is created it is associated with that inode and when some activity is detected appropriate notifications are generated and kevent_naio_callback() is called.
When new kevent is being registered, network AIO ->enqueue() callback simply marks itself like usual socket event watcher. It also locks physical userspace pages in memory and stores appropriate pointers in private kevent structure.
Network AIO callback gets pointers to userspace pages and tries to copy data from receiving skb queue into them using protocol specific callback. This callback is very similar to ->recvmsg(), actually it is the same code but without MSG_* flags processing, so they could share a lot in the future.

:: Link / Comments (0)


Wed, 25 Jan 2006

Climbed a lot today.


It was power training, so most of the time I tried different traverses and boulderings. As expected ast the end I was completely unable to finish somehow complex trace, which definitely means, that training was good.

:: Link / Comments (0)


Tue, 24 Jan 2006

"Election Day" performance.


I've seen excellent fun performance "Election Day" of "Kvartet I" theater today. It is an amusing comedy about modern Russia. Subject is based around governor's election, where creative group from radiostation "Kak bi radio" must push completely stupid contender. This is a continuation of "Radio Day" performance, which describes a day of modern radiostation.

:: Link / Comments (0)


linux.conf.au presentations.


I wish I could be there. There are at least 4 presentations I would like to listen or read:

Looking forward for those publications.

:: Link / Comments (0)


Grange has a birthday!


Yeaah, my congratulations!

:: Link / Comments (0)


Mon, 23 Jan 2006

Updated W1 project.


Fixed compilation warnings in w1_test.c test module.

:: Link / Comments (0)


Sat, 21 Jan 2006

FreeBSD kqueue benchmark.


FreeBSD is a good system, but either it's network stack behaves worse than Linux or, more likely, I have not configured it right, but under load FreeBSD kqueue based server behaves noticebly worse than Linux kevent one.

This set of tests was designed to compare FreeBSD kqueue latency with Linux' epoll() and kevent subsytems.

Hardware and software.
Server: Xeon 2.4 GHZ, HT disabled, 1Gb RAM, 1Gb Intel e1000 adapter. FreeBSD 6.0-RELEASE. No firewall.
Client: AMD64 3500+ 2.2 Ghz, 1Gb RealTek 8169 adapter. Linux FC4. httperf-0.8.
D-Link DGS-1216T gigabit switch.

Test description.
evserver_kqueue.c is a kqueue based simple static web server, which uses read()/send() for data transfer in FreeBSD, since it's sendfile() interface is horrible. It runs with root privileges to eliminate various user's limits in *nix systems.

First test.
Run 10k connection one-by-one:

httperf --client=0/1 --server=freebsd --port=80 --uri=/ --send-buffer=4096 --recv-buffer=16384 --num-conns=10000 --num-calls=1
Maximum connect burst length: 1

Total: connections 10000 requests 10000 replies 10000 test-duration 9.667 s

Connection rate: 1034.4 conn/s (1.0 ms/conn, <=1 concurrent connections)
Connection time [ms]: min 0.3 avg 1.0 max 220.6 median 0.5 stddev 3.8
Connection time [ms]: connect 0.1
Connection length [replies/conn]: 1.000

Request rate: 1034.4 req/s (1.0 ms/req)
Request size [B]: 64.0

Reply rate [replies/s]: min 1013.9 avg 1013.9 max 1013.9 stddev 0.0 (1 samples)
Reply time [ms]: response 0.4 transfer 0.4
Reply size [B]: header 198.0 content 3931.0 footer 0.0 (total 4129.0)
Reply status: 1xx=0 2xx=0 3xx=0 4xx=10000 5xx=0

CPU time [s]: user 2.30 system 7.06 (user 23.8% system 73.1% total 96.9%)
Net I/O: 4235.6 KB/s (34.7*10^6 bps)

Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

Request rate is 1034 requests per second, when Linux in the same hardware (it had only 512 Mb of RAM but it does not matter) and server software (kevent based server used read()/send() syscalls) showed 1700 requests per second. Strange thing is that even Apache on FC4 run faster.


Second test.
Run several simultaneous connections with different burst size.

1. 10k connections with maximum 1k connections in a burst with 1 second timeout.
httperf --timeout=1 --client=0/1 --server=freebsd --port=80 --uri=/ --rate=1000 --send-buffer=4096 --recv-buffer=16384 --num-conns=10000 --num-calls=1
Maximum connect burst length: 62

Total: connections 10000 requests 10000 replies 9733 test-duration 10.000 s

Connection rate: 1000.0 conn/s (1.0 ms/conn, <=64 concurrent connections)
Connection time [ms]: min 0.1 avg 1.0 max 15.7 median 0.5 stddev 0.9
Connection time [ms]: connect 0.2
Connection length [replies/conn]: 1.000

Request rate: 1000.0 req/s (1.0 ms/req)
Request size [B]: 64.0

Reply rate [replies/s]: min 970.3 avg 970.3 max 970.3 stddev 0.0 (1 samples)
Reply time [ms]: response 0.4 transfer 0.4
Reply size [B]: header 198.0 content 3931.0 footer 0.0 (total 4129.0)
Reply status: 1xx=0 2xx=0 3xx=0 4xx=9733 5xx=0

CPU time [s]: user 2.46 system 7.16 (user 24.6% system 71.6% total 96.2%)
Net I/O: 3987.1 KB/s (32.7*10^6 bps)

Errors: total 267 client-timo 0 socket-timo 0 connrefused 0 connreset 267
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

As we see here, request rate is higher for FreeBSD (1000 vs 910), but number of errors is higher too.

2. 20k connections with maximum 2k connections in a burst with 1 second timeout.
httperf --timeout=1 --client=0/1 --server=freebsd --port=80 --uri=/ --rate=2000 --send-buffer=4096 --recv-buffer=16384 --num-conns=20000 --num-calls=1
Maximum connect burst length: 4

Total: connections 20000 requests 20000 replies 14371 test-duration 10.000 s

Connection rate: 2000.0 conn/s (0.5 ms/conn, <=6 concurrent connections)
Connection time [ms]: min 0.2 avg 1.0 max 2.5 median 0.5 stddev 0.1
Connection time [ms]: connect 0.2
Connection length [replies/conn]: 1.000

Request rate: 2000.0 req/s (0.5 ms/req)
Request size [B]: 64.0

Reply rate [replies/s]: min 1534.9 avg 1534.9 max 1534.9 stddev 0.0 (1 samples)
Reply time [ms]: response 0.4 transfer 0.4
Reply size [B]: header 198.0 content 3931.0 footer 0.0 (total 4129.0)
Reply status: 1xx=0 2xx=0 3xx=0 4xx=14371 5xx=0

CPU time [s]: user 2.32 system 7.68 (user 23.2% system 76.8% total 100.0%)
Net I/O: 5919.8 KB/s (48.5*10^6 bps)

Errors: total 5629 client-timo 0 socket-timo 0 connrefused 0 connreset 5629
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

Very good request rate for FreeBSD, but very big number of "connection reset" errors.

3. 30k connections with maximum 3k connections in a burst with 1 second timeout.
httperf --timeout=1 --client=0/1 --server=freebsd --port=80 --uri=/ --rate=3000 --send-buffer=4096 --recv-buffer=16384 --num-conns=30000 --num-calls=1
Maximum connect burst length: 285

Total: connections 30000 requests 29964 replies 18608 test-duration 10.000 s

Connection rate: 2999.9 conn/s (0.3 ms/conn, <=329 concurrent connections)
Connection time [ms]: min 0.4 avg 1.5 max 105.2 median 1.5 stddev 0.9
Connection time [ms]: connect 0.4
Connection length [replies/conn]: 1.000

Request rate: 2996.3 req/s (0.3 ms/req)
Request size [B]: 64.0

Reply rate [replies/s]: min 1709.9 avg 1860.6 max 2011.3 stddev 213.2 (2 samples)
Reply time [ms]: response 0.6 transfer 0.6
Reply size [B]: header 198.0 content 3931.0 footer 0.0 (total 4129.0)
Reply status: 1xx=0 2xx=0 3xx=0 4xx=18608 5xx=0

CPU time [s]: user 1.53 system 8.38 (user 15.3% system 83.8% total 99.1%)
Net I/O: 7690.1 KB/s (63.0*10^6 bps)

Errors: total 11392 client-timo 40 socket-timo 0 connrefused 0 connreset 11352
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

Excellent request rate for FreeBSD (3000 vs. 2500 in Linux with sendfile()), but enormous number of errors, and only 62% of connection requests were successfully finished.


Conclusion.
After various sysctls have been changed (sysctl -a output is available here) things become slightly better (btw, default FreeBSD installation does not allow such tests at all due to default network parameters), but number of "connection reset" errors is still very high.
FreeBSD drops too many connections due to either misconfiguration or lack of resources.

According to FreeBSD and Linux comparison, in Linux number of connection errors is much smaller than in FreeBSD with comparable or bigger requests rate.

:: Link / Comments (0)


Fri, 20 Jan 2006

Created web pages for receiving zero-copy and kevent subsystems.


Receiving zero-copy project page.
Kevent subsystem project page.

All pages contain description of the project, main goals, results and benchmarks.

:: Link / Comments (0)


Weather in Moscow.


Currently in Moscow is about 11:00 AM and temperature is 2 degrees higher than in Antarctica, where currently summer though.

:: Link / Comments (0)


Thu, 19 Jan 2006

Kevent benchmarking. Step 5. New record!


Running httperf with 30k connections with maximum burst size of 3k connections with 1 sec timeout between bursts on single-threaded handmade http server (using sendfile() for file) on Xeon 2.4 Ghz, 1 Gb RAM, HT enabled, 1Gb network with modified version of kevent_socket notification mechanism and userspace daemon.

[s0mbre@uganda httperf-0.8]$ ./httperf --server pcix --num-conns 30000 --rate 3000 --timeout 1
httperf --timeout=1 --client=0/1 --server=pcix --port=80 --uri=/ --rate=3000 --send-buffer=4096 --recv-buffer=16384 --num-conns=30000 --num-calls=1
Maximum connect burst length: 101

Total: connections 29197 requests 27739 replies 27101 test-duration 11.000 s

Connection rate: 2654.4 conn/s (0.4 ms/conn, <=1022 concurrent connections)
Connection time [ms]: min 5.1 avg 62.9 max 989.7 median 44.5 stddev 91.4
Connection time [ms]: connect 0.4
Connection length [replies/conn]: 1.000

Request rate: 2521.8 req/s (0.4 ms/req)
Request size [B]: 55.0

Reply rate [replies/s]: min 2539.9 avg 2701.8 max 2863.7 stddev 228.9 (2 samples)
Reply time [ms]: response 62.5 transfer 0.0
Reply size [B]: header 198.0 content 3931.0 footer 0.0 (total 4129.0)
Reply status: 1xx=0 2xx=0 3xx=0 4xx=27101 5xx=0

CPU time [s]: user 0.22 system 10.32 (user 2.0% system 93.8% total 95.8%)
Net I/O: 10070.1 KB/s (82.5*10^6 bps)

Errors: total 2899 client-timo 2096 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 803 addrunavail 0 ftab-full 0 other 0

Both epoll and kevent_poll shows about 1600-1700 requests per second in this setup.

As usual, patch is available in archive.

:: Link / Comments (0)


Did you hear that russians can only drink vodka, graze bears and play balalaika.


It is definitely true and it is time for that now - more than -30 degrees Centigrade in Moscow and even more in Siberia and Ural...
My brain is frozen.

:: Link / Comments (0)


Wed, 18 Jan 2006

Kevent benchmarking. Step 4.


Kevent vs. epoll.
To find difference between kevent_poll and epoll I've modified httperf to run 30k connections with maximum burst size of 3k connections with 1 sec timeout between bursts on single-threaded handmade http server (using sendfile() for file) on Xeon 2.4 Ghz, 512 Mb RAM, HT enabled, 1Gb network:
Here are results:

$ ./httperf --server pcix --num-conns 30000 --rate 3000 --timeout 1
kevent_poll               : Request rate: 1718.8 req/s (0.6 ms/req)
kevent_poll (Jenkins hash): Request rate: 1749.9 req/s (0.6 ms/req)
      epoll               : Request rate: 1746.8 req/s (0.6 ms/req)
Subsequent run of any server obviously always shows temporary performance degradation, which order depends on time spent after previous run.

If we get into account that kevent_poll transfers to userspace 3.3 times more information (extended flags, user's id, additional 64 bits of data, which can be used as hints and so on), then kevent_poll is not so bad.

:: Link / Comments (0)


Tue, 17 Jan 2006

Kevent benchmarking. Step 3.


I've implemented KEVENT_SOCKET_ACCEPT and KEVENT_SOCKET_RECV notifications, which is first step to fully asynchronous networking.
Here are results.

Using 10k requests with maximum 1k requests in burst and 1 second timeout between bursts.

Static index.html on kevent based single-threaded handmade http server (using sendfile() for file) on Xeon 2.4 Ghz, 512 Mb RAM, HT enabled, 1Gb network:

[s0mbre@uganda httperf-0.8]$ ./httperf --server pcix --num-conns 10000 --rate 1000 --timeout 1
httperf --timeout=1 --client=0/1 --server=pcix --port=80 --uri=/ --rate=1000 --send-buffer=4096 --recv-buffer=16384 --num-conns=10000 --num-calls=1
Maximum connect burst length: 30

Total: connections 10000 requests 9947 replies 9816 test-duration 11.000 s

Connection rate: 909.1 conn/s (1.1 ms/conn, <=164 concurrent connections)
Connection time [ms]: min 0.3 avg 104.1 max 139.6 median 129.5 stddev 38.6
Connection time [ms]: connect 0.1
Connection length [replies/conn]: 1.000

Request rate: 904.3 req/s (1.1 ms/req)
Request size [B]: 55.0

Reply rate [replies/s]: min 966.8 avg 981.6 max 996.5 stddev 21.0 (2 samples)
Reply time [ms]: response 103.9 transfer 0.0
Reply size [B]: header 198.0 content 3931.0 footer 0.0 (total 4129.0)
Reply status: 1xx=0 2xx=0 3xx=0 4xx=9816 5xx=0

CPU time [s]: user 0.35 system 10.30 (user 3.2% system 93.6% total 96.8%)
Net I/O: 3646.9 KB/s (29.9*10^6 bps)

Errors: total 184 client-timo 184 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

Which means almost full absence of latency in accept() and corresponding client processing compared with step 2 results and Apache2.
To make things worse I need to admit, that it is only first test that runs at such high request rate, subsequent rates are lower, down to 500 req/s, but main socket recreation (and new kevent_user strucutre which holds kevent's hash table) returns things to more than 900 req/s again. I suspect it is a problem with hash unfairness and socket's lifetime, since socket can live even after has been closed by user.
Patch and new evserver are available in archive.

:: Link / Comments (0)


Mon, 16 Jan 2006

Updated kevent patch.


Patch and dumb http server, which uses kevent poll()/select() notifications, are available in archive.

:: Link / Comments (0)


Kevent benchmarking. Step 1.


Groovy!

Static index.html on kevent based single-threaded handmade http server (using read()/send() for file) on Xeon 2.4 Ghz, 512 Mb RAM, HT enabled, 1Gb network:

httperf --client=0/1 --server=pcix --port=80 --uri=/ --send-buffer=4096 --recv-buffer=16384 --num-conns=10000 --num-calls=1
Maximum connect burst length: 1

Total: connections 10000 requests 10000 replies 10000 test-duration 5.684 s

Connection rate: 1759.4 conn/s (0.6 ms/conn, <=1 concurrent connections)
Connection time [ms]: min 0.2 avg 0.6 max 204.5 median 0.5 stddev 3.5
Connection time [ms]: connect 0.1
Connection length [replies/conn]: 1.000

Request rate: 1759.4 req/s (0.6 ms/req)
Request size [B]: 55.0

Reply rate [replies/s]: min 1810.8 avg 1810.8 max 1810.8 stddev 0.0 (1 samples)
Reply time [ms]: response 0.2 transfer 0.2
Reply size [B]: header 198.0 content 3931.0 footer 0.0 (total 4129.0)
Reply status: 1xx=0 2xx=0 3xx=0 4xx=10000 5xx=0

CPU time [s]: user 1.46 system 4.23 (user 25.6% system 74.4% total 100.0%)
Net I/O: 7188.6 KB/s (58.9*10^6 bps)

Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

Using sendfile():
Connection rate: 2586.5 conn/s (0.4 ms/conn, <=1 concurrent connections)

The same index.html on Apache/2.0.54 from FC4 (default config) on P4 3.00GHz, 512 Mb RAM, HT enabled, 1Gb network.
[s0mbre@uganda httperf-0.8]$ ./httperf --server kano --num-conns 10000
httperf --client=0/1 --server=kano --port=80 --uri=/ --send-buffer=4096 --recv-buffer=16384 --num-conns=10000 --num-calls=1
Maximum connect burst length: 1

Total: connections 10000 requests 10000 replies 10000 test-duration 7.947 s

Connection rate: 1258.4 conn/s (0.8 ms/conn, <=1 concurrent connections)
Connection time [ms]: min 0.2 avg 0.8 max 561.2 median 0.5 stddev 5.6
Connection time [ms]: connect 0.2
Connection length [replies/conn]: 1.000

Request rate: 1258.4 req/s (0.8 ms/req)
Request size [B]: 55.0

Reply rate [replies/s]: min 1348.2 avg 1348.2 max 1348.2 stddev 0.0 (1 samples)
Reply time [ms]: response 0.6 transfer 0.0
Reply size [B]: header 198.0 content 3931.0 footer 0.0 (total 4129.0)
Reply status: 1xx=0 2xx=0 3xx=0 4xx=10000 5xx=0

CPU time [s]: user 1.92 system 5.90 (user 24.2% system 74.3% total 98.4%)
Net I/O: 5141.7 KB/s (42.1*10^6 bps)

Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

:: Link / Comments (0)


Kevent benchmarking. Step 2.


Using 10k requests with maximum 1k requests in burst and 1 second timeout between bursts.

Static index.html on kevent based single-threaded handmade http server (using sendfile() for file) on Xeon 2.4 Ghz, 512 Mb RAM, HT enabled, 1Gb network:

[s0mbre@uganda httperf-0.8]$ ./httperf --server pcix --num-conns 10000 --rate 1000 --timeout 1
httperf --timeout=1 --client=0/1 --server=pcix --port=80 --uri=/ --rate=1000 --send-buffer=4096 --recv-buffer=16384 --num-conns=10000 --num-calls=1
Maximum connect burst length: 207

Total: connections 10000 requests 8742 replies 8519 test-duration 11.001 s

Connection rate: 909.0 conn/s (1.1 ms/conn, <=453 concurrent connections)
Connection time [ms]: min 96.8 avg 149.0 max 854.3 median 141.5 stddev 36.5
Connection time [ms]: connect 0.2
Connection length [replies/conn]: 1.000

Request rate: 794.7 req/s (1.3 ms/req)
Request size [B]: 55.0

Reply rate [replies/s]: min 812.0 avg 851.8 max 891.6 stddev 56.3 (2 samples)
Reply time [ms]: response 148.8 transfer 0.0
Reply size [B]: header 198.0 content 3931.0 footer 0.0 (total 4129.0)
Reply status: 1xx=0 2xx=0 3xx=0 4xx=8519 5xx=0

CPU time [s]: user 0.19 system 10.40 (user 1.7% system 94.5% total 96.3%)
Net I/O: 3165.3 KB/s (25.9*10^6 bps)

Errors: total 1481 client-timo 1481 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

The same index.html on Apache/2.0.54 from FC4 (default config) on P4 3.00GHz, 512 Mb RAM, HT enabled, 1Gb network.
[s0mbre@uganda httperf-0.8]$ ./httperf --server kano --num-conns 10000 --rate 1000 --timeout 1
httperf --timeout=1 --client=0/1 --server=kano --port=80 --uri=/ --rate=1000 --send-buffer=4096 --recv-buffer=16384 --num-conns=10000 --num-calls=1
Maximum connect burst length: 21

Total: connections 10000 requests 4731 replies 4731 test-duration 10.976 s

Connection rate: 911.1 conn/s (1.1 ms/conn, <=813 concurrent connections)
Connection time [ms]: min 0.2 avg 1.2 max 20.9 median 0.5 stddev 1.5
Connection time [ms]: connect 0.1
Connection length [replies/conn]: 1.000

Request rate: 431.0 req/s (2.3 ms/req)
Request size [B]: 55.0

Reply rate [replies/s]: min 401.6 avg 472.9 max 544.3 stddev 100.9 (2 samples)
Reply time [ms]: response 1.1 transfer 0.0
Reply size [B]: header 198.0 content 3931.0 footer 0.0 (total 4129.0)
Reply status: 1xx=0 2xx=0 3xx=0 4xx=4731 5xx=0

CPU time [s]: user 0.20 system 10.55 (user 1.9% system 96.1% total 98.0%)
Net I/O: 1761.2 KB/s (14.4*10^6 bps)

Errors: total 5269 client-timo 5269 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

:: Link / Comments (0)


Climbed today with Grange.


It was definitely good time - I've finished several complex old and new traces, although not everything was done right. It was excellent training.

:: Link / Comments (0)


Grange started w1 project.


He even has first results:

gpioow1 at gpio1 pins 21: DQ[21] open-drain pull-up
onewire1 at gpioow1
onewire1: found ROM 0xf00008005343fb10
It took him two years after we bought first couple of ds18b20 thermal sensors to start this project :)

:: Link / Comments (0)


Sun, 15 Jan 2006

Kevent hacking.


I've finished poll()/select() kevent notifications subsystem and start testing it. There is one problem currently: when kevent has been reported as ready and userspace has cought this, when it next time calls ioctl(KEVENT_USER_WAIT) on that user, it will sleep until at least one kevent in that queue becomes ready again, but it is possible, that when we call ioctl(KEVENT_USER_WAIT), event that was previously marked as ready is ready again, but we can not detect it until some activity found in corresponding kevent_storage. It can be easily illustrated with sockets: after new socket has been accepted and queued for KEVENT_POLL_POLLIN event and there is data in that socket, ioctl(KEVENT_USER_WAIT) will return with given event, and if after recv() syscall but before ioctl(KEVENT_USER_WAIT) is called again given socket is closed, that activity will not be detected by kevent_poll() subsystem, since ioctl(KEVENT_USER_WAIT) itself does not call ->poll() method, which would return appropriate event, but will wait until some timeout (about 200 msecs in socket code).
In case of poll()/epoll() ioctl(KEVENT_USER_WAIT) is transferred into some callbacks which will end up in calling ->poll() for given set of interests, and new state will be detected in time of calling.
I need to extend API to provide some kind of requeueing of events returned by ioctl(KEVENT_USER_WAIT), so next ioctl(KEVENT_USER_WAIT) would return immediately if the same event is ready.

:: Link / Comments (0)


Sat, 14 Jan 2006

Kevent hacking.


I've added generic poll()/select() kevent notifications but have not stress tested it yet, like it was done for inode notifications.
Kevent currently lacks kevent removing support, since all kevents are linked into single list, removing will take O(N) where N is a number of all kevents, which is unacceptible, so I need to add both modification and removing support into kevent before stress test poll()/select() with guess what?
Right, http server. Each network kernel hacker must write at least one web server in his life.

:: Link / Comments (0)


Fri, 13 Jan 2006

ABR, Happy Birthday!

:: Link / Comments (0)


Kevent hacking.


I've tried to add generic poll()/select() support into kevent subsystem, work is not finished yet, but the majority of issues are resolved. The main unconvenient thing is tht user can call poll_wait() for different kinds of return event and thus add different polling wait queue entries into different wait queues while request only one kevent for all events. It requires allocation of new structure with kevent pointer each time user calls poll_wait() and store this structure somewhere in the kevent. Fortunately this can only happen in ->poll() callback which is called in process' context and only when new kevent is being queued.

:: Link / Comments (0)


Climbed a little today.


After such a long delay, about a month, after previous short period of climbing I've started again. I only finished several old traverses which are quite simple, but I already feels that delay in every piece of my body, fortunately without any fingers are felt broken. It was definitely a good time.

:: Link / Comments (0)


Thu, 12 Jan 2006

Kevent hacking.


Design notes.
Each kevent now is queued into three lists:

  • kevent_user->kevent_list - list of all registered kevents.
  • kevent_user->ready_list - list of ready kevents.
  • kevent_storage->list - list of all interests for given kevent_storage.

When kevent is queued into storage, it will live there until removed by kevent_dequeue(). When some activity is noticed in given storage, it scans it's kevent_storage->list for kevents which match activity event. If kevents are found and they are not already in the kevent_user->ready_list, they will be added there at the end.

Added ->poll() method which is quite trivial.
Patch is available in archive. Also thinking about socket and generic poll()/select() notifications.

:: Link / Comments (0)


Wed, 11 Jan 2006

Kevent hacking. Notifications of inode events.


Here is evtest.c output after for i in `seq 1 10`; do touch /tmp/test/$i; done:

1136980589.00958284: Wait: num=1, ctl->num=1: diff=1857538 usec.
00000000: 00000004.00000000 - 00000000.00000000
1136980589.00959683: Wait: num=1, ctl->num=1: diff=1351 usec.
00000000: 00000004.00000000 - 00000000.00000000
1136980589.00961091: Wait: num=1, ctl->num=1: diff=1359 usec.
00000000: 00000004.00000000 - 00000000.00000000
1136980589.00962455: Wait: num=1, ctl->num=1: diff=1313 usec.
00000000: 00000004.00000000 - 00000000.00000000
1136980589.00963818: Wait: num=1, ctl->num=1: diff=1312 usec.
00000000: 00000004.00000000 - 00000000.00000000
1136980589.00965233: Wait: num=1, ctl->num=1: diff=1363 usec.
00000000: 00000004.00000000 - 00000000.00000000
1136980589.00966606: Wait: num=1, ctl->num=1: diff=1322 usec.
00000000: 00000004.00000000 - 00000000.00000000
1136980589.00967983: Wait: num=1, ctl->num=1: diff=1328 usec.
00000000: 00000004.00000000 - 00000000.00000000
1136980589.00969384: Wait: num=1, ctl->num=1: diff=1355 usec.
00000000: 00000004.00000000 - 00000000.00000000
1136980589.00970780: Wait: num=1, ctl->num=1: diff=1344 usec.
00000000: 00000004.00000000 - 00000000.00000000
evtest.c was asked to wait for KEVENT_INODE_CREATE event which is emitted each time new file is created.
It is quite different compared with inotify: it does not send filename as parameter, it does not allocate new event of requested type when inode is changed, but scans for events which were requested by user before, so if kevent was not marked as one-shot it will be requeued into list of interests for given storage (inode, socket, timer, anything) only when user has read either requested number of kevents or timeout elapsed, so it is possible to lose events and when it is requeued, i.e. it is not one-shot event, it must check if the same condition occured while request was in user's ready queue and if so, it must signal about this immediately.

New version of kevent subsytem patch is available in archive.

:: Link / Comments (0)


Tue, 10 Jan 2006

Kevent hacking.


New patch is availble in archive which fixes all mentioned issues and also includes various cleanups. This code runs several hours already with 30 timers and userspace requests wakeup when at least 10 timers have fired or timeout elapsed.
Next version will support inode and then socket notifications.

:: Link / Comments (0)


Sun, 08 Jan 2006

Japanese food day.


I first time drunk japanese beer - not filtered "Ryusej" or something like that. It was not bad, but as with sake I expected something different.

:: Link / Comments (0)


Sat, 07 Jan 2006

Merry Christmas!

:: Link / Comments (0)


Kevent hacking.


Hacked a little kevent subsystem - found two issues which are quite simple to resolve:
1. there is intentional BUG_ON() in __kqueue_dequeue_kevent() which fires with normal conditions.
2. user's exit is not fully implemented yet.

Both issues are trivial to resolve, but my remote test machine has been frozen on BIOS startup, I even know that it wants F2 to be pressed to continue, so I definitely can not work with it until Monday.

So, have a nice vacations!

:: Link / Comments (0)


Updated CARP module.


Added carp_conn module which bradcasts master/backup events through connector.
carp_conn_daemon catches this events and calls appropriate programms.

:: Link / Comments (0)


Fri, 06 Jan 2006

Kevent hacking.


Here is initial dmesg:

kevent_enqueue: k=de679400.
__kqueue_enqueue_kevent: into READY or ORIGIN k=de679400, q=dd70f9e4.
kevent_storage_enqueue: into storage k=de679400, st=dd70f9d8.
kevent_user_wait: requested: ready_num=10, timeout=1000.
kevent_timer_callback: k=de679400.
__kqueue_dequeue_kevent: from READY or ORIGIN k=de679400, q=dd70f9e4.
__kqueue_enqueue_kevent: into READY or ORIGIN k=de679400, q=de679514.
__kqueue_dequeue_one: from READY or ORIGIN k=de679400, q=de679514.
kqueue_dequeue_ready: dequeued from READY queue k=de679400.
__kqueue_dequeue_one: from READY or ORIGIN k=00000000, q=de679514.
kqueue_dequeue_ready: dequeued from READY queue k=00000000.
__kqueue_dequeue_one: from READY or ORIGIN k=00000000, q=de679514.
kqueue_dequeue_ready: dequeued from READY queue k=00000000.
It is not very informative, but it looks like timer events are somehow managed by new kevent subsystem. It's initial draft has horrible userspace interface, but main logic has been already implemented. Next couple of days I probably will not work on this problem, so it will be postponed until Monday. Initial patch is available in archive. It is quite well documented, which is strange for me...

:: Link / Comments (0)


Thu, 05 Jan 2006

FreeBSD's kqueue.


One big disadvantage of Linux is absence of event notifications mechanism which could be quite generic for different types of events like timers, socket changes, AIO completions, inode changes and so on. Connector could be such a mechanism, but netlink itself is not very fast, it depends on the network stack and it is not always convenient for userspace for example as replacement of poll()/select(), and actually it was created for different kind of usage.
So there is no generic implementation.
For files one can use epoll, and actually this is all. Real-time POSIX signals have many disadvantages due to it's queueing, and they can not be used as generic event notifications mechanism, since it is impossible to easily connect RT-signal to any event source. FreeBSD's kqueue is a very good design of event notification subsystem.

I've started Linux implementation of similar functionality.
Initial design notes:
Each possible event source can have some queue for each type of events it can generate (for example, socket can generate data receiving events, data sending events, new connection accept events and so on...) embedded into it, so it could be possible to add new event requests into it, so if one or more users want to know about new data receiving, it could be possible to add several kevents into receiving event queue, and when new data packet arrives, callback for each kevent could be invoked. Kevent then could be enqueued into some "ready" queue, which in turn can be checked by users. Such kevents can contain additional info like number of bytes in the socket queue or new file size.
After kevent is ready it can be either removed, so called level-triggered events, or it can be leaved in event source's queue and gather new portion of information next time this event has been fired, it is so called edge-triggered events. User interested in several events can wait until one or several events are ready or timeout elapsed.

I'm working some time on this problem already and plan to have something interesting to see tomorrow. Major plan is to implement networking AIO (aio_recv(), aio_accept(), aio_send() and aio_sendfile() are the most important networking AIO events), on top of this event notification mechanism it becomes possible task.

:: Link / Comments (0)


Wed, 04 Jan 2006

Still in a hacking land.


I've committed two simple bug fixes for UFS and MMC which were queued into -stable tree. I found them during receiving zero-copy hacking. First one lives in ufs_quota_write() where inode semaphore is not released in error path, second one is an atomic unmapping, which must work on virtual address not on page.
Bugs actually are trivial as well as fixes are.

Working at holidays is always good - no noise, no annoying conversations on unneded meetings, I can do everything I want, so I've bought couple of theatre tickets to Kvartet's-I "Election Day".

:: Link / Comments (0)


Tue, 03 Jan 2006

Day of interesting hacking and movie watching.


I've finished watching "Master and Margarita" based on Bulgakov's classic yesterday, it is really very good movie taken very close to the original.
So today is a "Die Hard" movie day.

:: Link / Comments (0)


Mon, 02 Jan 2006

Happy New Year!


I'm back from Saint-Petersburg, it was really nice time there with my friends.

:: Link / Comments (0)