Zbr's days.
December
Sun Mon Tue Wed Thu Fri Sat
       
2005
Months
Dec

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Wed, 28 Dec 2005

Acrypto IPsec hacking.


Something was changed from 2.6.14 days in XFRM engine, so it does not work now. Bug was found in esp4 output callback, where parent skb was not updated with the right auth length, so remote tcpdump showed something like this:

IP truncated-ip - 12 bytes missing! 192.168.4.78 > 192.168.4.79: ESP(spi=0x020bc674,seq=0x1)
Patch for XFRM/IPsec/ESP4 engine acrypto port has been put into archive.

I have a very interesting idea about Linux network stack, so I start digging about high-performance networking event-driven interfaces. There are not so many solutions:
  • epoll or /dev/*poll
  • RT Posix signals
The latter seems to have smaller latencies, but have some problems when number of interests becomes large, since there are no event batching per each signal which leads to signal queue overflow, while the former (/dev/poll implementation by Niels Provos and Charles Lever) is slightly slower according to "Scalable Network I/O in Linux" paper. Epoll provides similar to /dev/poll functionality and is part of the 2.6 kernel.
For statistics I've run simple benchmark to determine overhead of system call in Linux.
It is about 0.2-0.25 usecs per syscall on my Xeon(2.4Ghz) and AMD64 3500+ (2200Ghz) running 2.6.15 linux kernel.

:: Link / Comments (0)


Tue, 27 Dec 2005

I congratulate you with Catholic Christmas!

:: Link / Comments (0)


Mon, 26 Dec 2005

I've implemented priority queues for acrypto.


New version is available, but there is a nitpick in realisation.
I allocate and link new priority queue into the list of queues first time new crypto session with such priority is allocated, and never free it until device is removed. Such caching is done for performance reason, but it has a disadvantage: when there are no crypto sessions with higher priority, access to the lower priority queues includes overhead of all higher priority lists traversal and checks if that lists are empty.

Climbed a little with Grange. unfortunetely aching fingers do not allow to enjoy the process, so I only finished couple of easy traces without much fun. I think I will not climb until middle of January...
In theory, I would like to visit tooth breaker and bone straightener, but probability of such events is very small...

:: Link / Comments (0)


Sat, 24 Dec 2005

Process...


I think the main issue of sushi eating is the Process itself. It is quite tasty things, but each time I get this rolls and sushi with figs and vassabi and of course chopsticks, my first thought is how different it is from what I used to. And I really enjoy the process, although I do not know much about it.

Hacked a little acrypto priority queues, but have freezed remote test machine, so it will stay untouched until Monday. All design goals are already implemented, bug lives probably in list manipulations, so it will be easy to remove when I will have access to serial console.

:: Link / Comments (0)


Fri, 23 Dec 2005

Acrypto hacking day.


Added direct completion mode.
If session's callback can be invoked in any context or if crypto provider can call complete_session() from process context, one can set SESSION_DIRECT in session flags which will lead to callback invocation directly from complete_session(), but not from workqueue.
This increases performance noticebly.
Ok, here are dm-crypt port changes:

  • Reduced memory usage.
  • Use memory pools.
  • Removed several race conditions.
  • Code simplification.

New acrypto package has been released. It does not have priority queues yet, but I will implement them soon.
Patches and tarball are available in archive.

:: Link / Comments (0)


Thu, 22 Dec 2005

Acrypto hacking day.


Ok, I've fixed several bugs in dm-crypt port and run several benchmarks against pure dm-crypt. Results are not so good as I expected, but there couple of pleaces where it can be improved a little.
I'm almost sure that acrypto itself works quite good, but due to several context switches from session allocation to async_provider and from async_provider to callback invocation performance degrade, but CPU usage is also decreased. I run bonnie++ benchmark and found, that sequential output per-char and block sequential input performance is about 30% less than sync dm-crypt with the same CPU usage, but relative sequential output block is about 30% better and relative sequential input per-char performance is about the same.

:: Link / Comments (0)


Wed, 21 Dec 2005

Appetite comes at meal-times.


While I'm in acrypto land now, I've decided to revisit some design approaches. So I've removed crypto load balancer thread, eliminate main_crypto_dev, removed unused session states, simplified locking and reference counting. Today or tomorrow I plan to test new acrypto with dm-crypt and IPsec, while currently it runs under heavy load from "consumer" test module. Rate is about 12-13K sessions per second on 2.4 Ghz Xeon (1+HT), each session requires 4k AES-128 CBC data encryption, which is upto 420Mbit/sec. Only SW async_provider is loaded.
After successfull dm-crypt and IPsec tests I will start priority queue implementation, with current changes it will not take too much time, so release agenda is still valid.

Climbed a lot with Grange. After month of laziness it was especially hard, but I finish couple of old traces, one of them is quite complex, and tried on-sight new one - completely passive, which was a real trouble for my aching fingers, so I've finished it with some problems. It was a really good time today.

:: Link / Comments (0)


Tue, 20 Dec 2005

Acrypto hacking.


I've done several updates for new acrypto release.
It includes per-device crypto_route memory pool, various kmap* transformation. Also started speedup work which aims in crypto_lb thread removal. This thread is used to serialize session callback calling and session removal, but this can be easily done directly from complete_session() call. It still has some slab corruption bug which must be resolved before release.
Ok, it looks like I've found where the bug lives. I think about it's resolution while going home and have created a very nice idea about high-performance priority queues in acrypto. What it currently has is a total crap actually, so I will create priority queues with O(1) session selection and doing the same will fix current slab corruption problem. It should also increase performance a little.

If things will go like I want, I plan to release new acrypto version at the end of this week or just after weekend. I do hope I will find time to test IPsec and dm-crypt from 2.6.15 git tree with new acrypto before New Year's holidays. I will be in Saint-Petersburg until Jan 2.

:: Link / Comments (0)


Mon, 19 Dec 2005

Receiving zero-copy support has been finished.


I've fixed bug with inode truncation, so zero-copy subsystem writes exactly number of bytes received into the file now.
Truncation failed when socket was not connected and one tried to close it, since system received data (RST packet) which were put into skb which was trimmed then and fragment page's reference counter was decremented, so later vmtruncate() failed to free that page.
Patch is available in archive.

Climbed a little today - everything still in pain especially fingers, so I only did several traverses. Strange thing I found - if I can finish trace on-sight, I can not repeat it after some time.

:: Link / Comments (0)


Sat, 17 Dec 2005

Fighting with receiving zero-copy.


It does not write extended pages to the file now, so size is always equal to number of bytes transferred, but there is one problem. If connection was not established and zero-copy socket is closed through either process exit or close() system call, preallocated pages can not be safely truncated to zero size. It looks like due to several preallocated pages the first one has smaller reference counter and thus page unlocking in vmtruncate() fails.

Finally climbed today!
I'm definitely not in my best form, since I did not climb about a month, so today's training is just a start without major results except pain in the whole body. But its cool.

:: Link / Comments (0)


Fri, 16 Dec 2005

I've run receiving zero-copy benchmark.


Here is a CPU usage graph [1gb transfer using 8139 adapter]:
recv_zerocopy_cpu_usage

:: Link / Comments (0)


Thu, 15 Dec 2005

Receiving zero-copy support. TCP support.


It has been found, that there are no data loss, but page reordering happens.
Groovy, I've found a root of the problem: sequence numbers are set consistently with each other, i.e. sequence number difference between two pages is always the same, while page's index, which in VFS is transferred into index inside the file mapping, was allocated in sequential manner, so when different pages were committed and then grabbed from VFS it's sequnce number difference remains the same, but page's index difference becomes different. Now this bug is closed and tests do not show any data corruption.

While testing receiving zero-copy I found, that my journaling/ext3 hack is not so innocent: sock_sendfile() freezes somewhere inside ->commit_write(), aka ext3_ordered_commit_write(), after 400Mb transfer, so I just remount test partition in read-only mode and rerun test which has been finished without any problems.

I've updated zero-copy patch and sent patch to netdev@.

:: Link / Comments (0)


Wed, 14 Dec 2005

Ok, it looks like only one problem remains - data integrity.


All journaling changes are just to adjust h_buffer_credits when starting a journal with existing handle, which happens when ->prepare_write() is called several times before ->commit_write().
ext3 changes are simple too: commented check for current journaling handler in ext3_write_inode(), which is related to above changes.

With all this changes receiving zero-copy has only one problem - data corruption. Sometimes several pages are eaten from the flow, and it looks like a race between running in process context commit code and running in interrupt data copying.

:: Link / Comments (0)


Tue, 13 Dec 2005

Crap


It looks like receiving zero-copy will put it's dirty hands into the Holy Grail of Linux FS - journaling code aka JBD.
As far as I can see, one may not call ->prepare_write() several times before calling ->commit_write(), since journaling transaction, at least for ext3, is setup only for the first requested number of blocks, i.e. when zero-copy code first time grabs a page from VFS and calls ->prepare_write() for that page, journaling code allocates it's structures and reserves number of blocks according to inode state, if zero-copy grabs next page and calls ->prepare_page() for that page again, no new blocks are reserved and only journal's reference counter is incremented. When ext3 then tries to do actual page preparation, it fails since journal_dirty_metadata() which only counted bocks for the first page. This happens every time I run zero-copy test without any logs, but did not happen, when there were tons of messages written to serial console, so this process seems to race with journal committing code.
At least this is how I can describe permanent assertion failure at fs/jbd/transaction.c:1114: "handle->h_buffer_credits > 0" when running without slow debug output.

Thinking in expansion way I started to hack in ext3 code, i.e. I do not understand why things happen in this or that way, but try to change some simple stuff based on assumption about above journaling problem. This is very wrong way, but I want to test networking stuff first and then switch to the different problem, so backing storage can wait.

:: Link / Comments (0)


Mon, 12 Dec 2005

I congratulate you with The Day of Constitution of Russian Federation.


:: Link / Comments (0)


Receiving zero-copy support.


I've finally fixed TCP sequence number check, so there are following issues to consider before pushing this stuff up:

  • Data integrity check.
  • Moving commiting code into workqueue, since waiting queue seems to be quite slow.
Fixed data corruption issue, removed all debug and found, that it crashes in kmap_atomic() when highmem is turned on. Tried to merge zero-copy changes with the latest 2.6 git tree and completely broke my setup - it crashes on first received packet when tries to free skb and somehow decides to free through zero-copy path, which is obviously wrong.

:: Link / Comments (0)


Sat, 10 Dec 2005

Some kind of this year hacking agenda:

  • Finish receiving zero-copy implementation. Only TCP sequence numbers check is not done yet.
  • Check acrypto and IPsec/dm-crypt under _very_ high load, as far as I rememeber there were some problems.
  • Start HIFN driver rewrite. Actually it only requires initialisation and ISR processing fixes.

:: Link / Comments (0)


I've updated OSF iptables module.


Update includes README file with bits of documentation of passive OS fingerprinting iptables module usage. New version is available in archive.
I've also sent bunch of small OSF cleanups for 2.4 kernel version, documentation update and resent OSF netlink related changes for 2.6 kernel version to netfilter-devel@.
All this stuff has been already added into previous OSF release.

:: Link / Comments (0)


Wed, 07 Dec 2005

W1 project has been updated.


Now it has masters/slaves drivers split, new DS2482 I2C <-> W1 bridge and nice set of documentation. Gret thanks to Ben Gardner .
One can find updated version either in archive, or, hopefully, in the next -mm release. Eventually this will be pushed upstrem in 2.6.16 timeframe.

:: Link / Comments (0)


Tue, 06 Dec 2005

Receiving zero-copy hacking.


Found new problem in zero-copy TCP "implementation" - when all pages are filled, but not committed, remote side starts resending packets and waits ACKs for them, but zero-copy drops them, since it knows, that the same packets are in flight and will be committed and ACKed soon, so receiving TCP state machine does not even knows that there were some unACKed data. I will try to accept that retransmits, but without actual data copying.

:: Link / Comments (0)


Mon, 05 Dec 2005

"Here I am in Rio de Janeiro, idiot's dream has come true" as said Ostap Bender in "Golden calf" by Ilf and Petrov.


I've finished quite big part of one "project" which required a lot of money, debts, time and problems. It is not completely done, and will not probably next three years, but major part is already behind me.
And what? Nothing: no emotions, no fun, no expectation, no excitation, nothing what I thought about several years before.

Is it a bright future I wanted? What next?

:: Link / Comments (0)


Fri, 02 Dec 2005

Groovy, I've found New Year present!


So I will have a bit of drink today, and may be tomorrow...
Presents are rox!

:: Link / Comments (0)


Wed, 30 Nov 2005

Receiving zero-copy hacking.


TCP is still missed - if I preallocate 32 pages from VFS, 100 Mbit network overflows them faster than userspace application, which calls sendfile(), commits pages and grabs new, so it is possible to move part of the packet into the last page, but first one is still not commited, so data will be dropped, but sending part will retransmit the whole TCP packet. There are at least two solutions:
1. Do not permit packets which data size is not equal to page size.
2. Preallocate more pages so no overruns can happen.

In case of uncommitted page, we very likely caught following problem: part of the packet has been written into the previous page, but next page contains old data which is not committed to VFS, and we can not overwrite them. In this case we must fallback all writes to the previous pages, so we start from the begining, select one by one the same pages as were selected for writing, and decreases it's zp->used counter, so page starts looking like it was before.

:: Link / Comments (0)


Mon, 28 Nov 2005

Grange reminds, that William Blake was born today.


Some are born to sweet delight,
Some are born to endless night.

:: Link / Comments (0)


Sun, 27 Nov 2005

Ok, receiving zero-copy support has come to it's finish line.


I think so...
Today I've fixed bunch of various bugs related to TCP sequence number check.
Now it looks like being almost finished, but I've crashed remote system, so even serial console does not work. Hopefully tomorrow I will push the last changes in and send updated patch for review to netdev@.

Receiving zero-copy implementation with TCP sequence number checks support can be found in this patch. It is not well tested yet.

:: Link / Comments (0)


Mda... Sometimes moving forward is a result of kick from behind.

:: Link / Comments (0)


Sat, 26 Nov 2005

Kernel hacking config option.


Turning most of it's menu on drops networking performance by half.

Normal networking performance is not affected by receiving zero-copy patch, as I found today in my tests. It of course can not be 100% right, since receiving zero-copy adds several protocol checks and hash calculations, and using 100Mbit adapter it is not possible to destinguish those overhead from noise, but fact is that on 100Mbit speeds on 2.4 Ghz 2-way (1 + HT) Xeon performance is the same as without zero-copy patch.

Changed locking schema a little - now it uses one per-socket read lock being held for reading and one per-page spinlock in the fast path.

Fixed nasty bug in my 8169too.c drivers hack, which lead to undescribable bugs and oopses in highmem code...
Current TCP sequence number check is broken, but I really do not want to check all those TCP RFCs to find why seq/ack pair is not updated. I do think solution is very simple, just need to analyze couple of tcpdump files...
Ok, I found where the problem is: frame itself contains right data with updated seq and ack TCP fields, but zero-copy stack somehow obtains header with old values, so it drops the whole packet based on the fact, that the same packet already was processed.

:: Link / Comments (0)


Fri, 25 Nov 2005

Fine grained receiving zero-copy locking must be redesigned.


I found that Linux journaling code is not protected against interrupts and even soft interrupts, so it is just not permitted to try to commit a page from BH context, when socket is closed...
Fortunately it is quite easy task - zero-copy socket will continue to have fine grained locking and reference counter, but VFS page finilazing, which is atomic with respect to itself, will be called only from the very end of sendfile() syscall or from sk_free().

Hmm, found interesting usage for receiving zero-copy.
It can be used to DMA data not only into VFS cache, but in theory, into any area - userspace pages, different network adapter's DMA ring or different physical memory, for example into buffer in the IB or iSCSI.

:: Link / Comments (0)


Thu, 24 Nov 2005

Cleaning receiving zero-copy patch up.


Removed debug, committed things into local git tree, changed sendfile()'s wakeup logic - to speedup ack processing, kernel_recvmsg() should be called not only when the whole page is ready but as fasst as possible to eliminate retransmits, so I added flags field into zsock structure and reserved a bit there to show if at least something was copied from the net. While doing this optimisatio remotely, machine caught a bug...

Tomorrow I plan to finish TCP sequence numbers check and try to setup PPC32 test machine to check receiving zero-copy on different arch.

:: Link / Comments (0)


Wed, 23 Nov 2005

My laziness should start worring me.


I've released new zero-copy patch and sent it to netdev@ for design review.

:: Link / Comments (0)


Tue, 22 Nov 2005

I've started the greatest project ever!


I plan to add at least some bits of documentation into all my projects, so they could be somehow more user-friendly.
CARP is the first one.
I've added simple README which describes what it is and how it operates. It does not have usage examples though, so administrator will setup it using some Yoda's advices, although it is quite easy after running carpctl -h.

:: Link / Comments (0)


Mon, 21 Nov 2005

Zero-copy abstraction layer.


I've almost finish simple layer which will be used for allocation methods lookup and dataflow control. Since there will be much less zero-copy sockets than usual ones, I decided to use simplified hashing technique.

:: Link / Comments (0)


Sat, 19 Nov 2005

Thinking about new zero-copy abstraction...


I like this idea more and more - it could allow to have one zsocket for data and if zero-copy fails, data could be queued into original socket and then later, when new pages are grabbed from VFS cache, copy it from original socket according to it's sequence numbers mapping into the VFS pages.

Today Grange took part in a music jam. According to photos from theirs phorum, jam was really fun.
I listened Grange's punk-rock in the past when he was a leader, vocal and guitarist in his band "Facultet". I still rememeber theirs the last concert in cellophane raincoats :)
If you like something like RHCP with light punk bias or something similar, I could recomend to listen all "Facultet" songs, you can find them in mp3 format on band's homepage.

:: Link / Comments (0)


Fri, 18 Nov 2005

Zero-copy hackfest.


Not much progress - there is a problem with hashing/unhashing, socket releasing and interrupts where socket is used. I do think that searching for the whole stack and change each assumption that packets can be processed only in BH context is a very badly broken idea, which will hurt performance and born tons of bugs.
Interesting idea could be to hold zero-copy socket until sendfile() is interrupted and check in interrupts not the real socket, but only some reference to it, which will be removed with interrupts disabled from the very end of sendfile() syscall. It could be even new abstraction which will exist only for zero-copy capable sockets, but if this approach will be proven to be effective, it will require all those hashing algorithms which current network stack has.

:: Link / Comments (0)


Thu, 17 Nov 2005

There are two types of systems: those that have downtime and those that will.


I've read today book of Berd Kivi "Gigabytes of power" (in russian).
Interesting reading about total governments and corporations control over private life and new mechanisms to achieve this - from RFID and data-mining to masons in FBI :). Technically competent man can avoid many of this issues, but unfortunately several of them are just not under our control.
And as l0pht group said several years ago, the main threat is not corporates or governments itself, but uninformed people.

:: Link / Comments (0)


Wed, 16 Nov 2005

Linux, why, why do you steal my interrupt?


2.6 kernel calls me "system with badly broken firmware"...

:: Link / Comments (0)


Tue, 15 Nov 2005

Receiving zero-copy.


I've found an interesting issue in deep socket's internals in the Linux kernel tree - it is heavily based on assumption, that every socket processing can only happen in bottom-half or process contexts, so hashing/unhashing can not be protected against hard irqs, and thus socket can not be safely used from interrupt context.
I've added sequence numbers check mechanism into allocation path, when each preallocated page has a TCP sequence window associated with it and only data with appropriate sequence numbers can be moved into that page.

:: Link / Comments (0)


Sun, 13 Nov 2005

Old friends meeting.


Mephody returned from Ireland today and will stay in Moscow about a week, so we celebrated his past birthday and meeting itself. There were Meph and Ira, Wijo and Sasha, Fedor and Ira, Ivan Gammel aka Unexy, Masha and me. All this crowd somehow sit down in the small Irin and Masha's kitchen. We took a little fire water, eat something, saw Mephody's photos of Ireland road trip from Limerik to Dublin over south coast with many old fortresses and castles.
It was definitely a very good time.

:: Link / Comments (0)


Sat, 12 Nov 2005

Receiving zero-copy implementation.


I've put next release of receiving zero-copy patch in archive. The whole day was spent in searching for very strange thing - it was possible to get and hold a socket, but half of it's fields were filled with red-zone 0x6b markers. This issue has been resolved after deep comparison of Linux kernel TCP socket internals with what I had - RCU locking and holding of socket reference were not enough, but following TCP's __inet_lookup() usage I found additional bh_lock_sock()/bh_unlock_sock() pair there.

I've also added some checks which prevent system from usage of the same backend file for several zero-copy sockets at the same time.

I've removed ability to post comments in my blog, since blog is flooded only with spam without comments from readers.

:: Link / Comments (0)


Fri, 11 Nov 2005

Elections.


Here will be Moscow Duma elections in a month, so various political blocks have started it's election compaignes.
All of them are very similar: "We are cool, they are crap."
Sigh, does shitloads of others really so powerfull election campaign?

I even think that if I will ever meet a political block which says: "Hey, on previous elections we promised to do A and B, and we did C and D, and now we promise to do E and F.", and I will definitely give them my vote.

Ok, it looks like all uninteresting stuff has been done, so I plan to resurrect zero-copy hacking tomorrow and want to finally create good-working implementation.

:: Link / Comments (0)


Wed, 09 Nov 2005

Working-working-working...


Now it looks like I've finished almost all cases, atlhough it could be not the case, tomorrow will show if I am wrong.

Spent half of the day searching New Year presents - have not found particular variant, but created in a model some tricky affair, which can in theory help me with searching for the bright future. Interesting thing is that after spending many hours with computer and application bugs, brain is turned into alternative mode where new completely different and absolutely unconcerned ideas are born which in a common situation would require a lot of thinking or even braindamaging to be transformed into this simple and elegant form.

:: Link / Comments (0)


Sat, 05 Nov 2005

Hack on passive OS fingerprinting module aka OSF.


Added version check and fixed compilation for 2.6.14+ kernels, where netlink API was changed.

New release is available in archive.

:: Link / Comments (0)


Fri, 04 Nov 2005

Hacked receiving zero-copy.


Finish line is near already - there is one main problem now - TCP retransmits, which basically break all the idea. And even using socket option to turn socket into receiving zero-copy mode does not help with retransmits. I have an idea how to remove them: the simpliest one is just to drop a packet if it's TCP header does not match one system expect, but I suspect it will hurt performance a lot, so I need to implement some heuristic on top of TCP sequence numbers to allow receiving of packets with sequence number above currently expected. So it can be done in a following way:
1. check if sequence number is less than we expect, then just drop packet.
2. check if sequence number is more than number of pages we allocated, then drop packet.
3. if sequnce number is inside allocated window, then select right page for the packet and move data there.

This will rise new problem, when some page inside the window is commited, but later sending part will retransmit data with the same sequence number, which corresponds to page already commited.
I see solution in having an array of sequence windows which can be zerocopied into existing preallocated pages, i.e. just link some sequence numbers window to the newly grabbed page, and when the packet's header is received, check if it can be placed somewhere in the allocated pages.

:: Link / Comments (0)


Thu, 03 Nov 2005

I've started zero-copy cleanup.


First of all, I move zero-copy initialization away from sock_sendfile() into new socket option to eliminate startup race, when initial TCP portion can be dropped and will be retransmited later. Second in agend is a check for skb usage after sock_sendfile() is finished and socket has been marked as not zero-copy capable. In theory nothing prevents such usage, since zero-copied skbs and it's buffers are not intended to be used in stuff like netfilter and IPsec, since theirs overhead is not compatible with high speed setups where receiving zero-copy could be used, although it is a grey are now...

Other unpleasant issue is that Linux performs write in two operatins, i.e. it is guaraneed to be nonatomic, and in case of current zero-copy design, this situation is permanent, i.e. we prepare several pages in process context, wait until they are filled in interrupt context, and then commit filled pages back to VFS cache. Between prepare and commit stages one can not use selected file's inode, and thus operation like "ls" will be locked.

:: Link / Comments (0)


Wed, 02 Nov 2005

The whole day has been devoted to my payed work.


Yep, I want to eat and thus need to do my work - although it was not so interesting. I plan to finish zero-copy receiving implementation this week and present it to netdev@vger.kernel.org and linux kernel network hackers. If things will go very smooth it can happen even tomorrow.

Climbed a little with Grange today after one idle week, it looks like either my finger or wrist is broken, or, hopefully, just heavily striked, since it ails me very much when climbing and especially when getting passive holds.

:: Link / Comments (0)


Tue, 01 Nov 2005

Receiving zero-copy hacking.


This release allows to transfer huge amount of data without errors (probably), but it has one very unpleasant nitpick - if sendfile() is interrupted it's reference counter never drops to zero, and I have not implemented fine-grained reference counter with proper skb->descriptor() yet.
Other issue is data reordering - obviously sendfile() system call is called far after socket is connected, so some data is sent, but it is dropped and writing into file starts from some arbitrary position in the dataflow. but it can be solved easily by moving zero-copy setup into socket() system call. Retransmits can pollute dataflow too since they potentially can be inserted inside the dataflow not in it's position in the original file. Actually retransmits could end up in grabbing the right page from VFS cache, but page grabbing is quite long operation which sleeps, and we can not know in advance, i.e. when grabbing new pages from VFS cache in process context, where data, which will be received in next interrupt, must be placed.

Potential solution for this problem is page remapping, i.e. we could remap page from skb fragment into, for example, VFS cache page, or into userspace page, which I implemented in zero-copy sniffer. This is how zero-copy receiving is implemented in FreeBSD. But it is not very elegant way, since such transformation requires tlb flushing, which can be much more expensive than data copying with small sizes but can be upto 5-15 times faster using even 1500 bytes chunks.

The latest patch (revision 6) is awailable in archive.

:: Link / Comments (0)


Mon, 31 Oct 2005

Halloween.


Be aware.
witch.jpg

:: Link / Comments (0)


Receiving zero-copy hacking.


Ok, I found why TCP stalled and no ACKs were sent - system always failed to slow path and then dropped packets due to invalid checksum, the root of this problem is TCP options, precisely TCP timestamp, which were not copied into header part and thus TCP stack never had a valid checksum. After 8139too.c driver update, which now checks for TCP/IP options and, it looks like TCP stream can be established in a right way.

I congratulate Mephody aka Alexander Boykov with his birthday and really wait when he comes to Russia in two weeks. That will be very drinkfun time, so main things must be done before that black hole period.

:: Link / Comments (0)


Sun, 30 Oct 2005

Weekend rest.


Something should be done, so I've started to search New Year presents. Mechanism is turned on, so I think next week I will have something interesting.

Hacked a little this html page - now it does not have annoying horizontal scrolling and it can be accessed directly from main page.

:: Link / Comments (0)


Fri, 28 Oct 2005

Receiving zero-copy.


As you probably know, my implementation already works with non-page-aligned data and the only problem is generic stack processing itself - TCP ACKs, socket accounting and so on. It has been implemented using fragmentation array in skb's shared area, i.e. header is placed into skb->data, and all real data goes directly into VFS cache, which page pointers are stored in skb_shinfo(skb)->frags. Unfortunately it looks like either fragmented input skbs are not allowed, at least skb_put() may not be called with nonlinear skb, or, which is more likely, header data and appropriate skb fields are incorrectly setup, so system crashes somewhere in netif_receive_skb().

Ok, I've shed some lights on this - putting netif_receive_skb() into workqueue allowed to scroll console there and see EIP, which was set to netif_receive_skb(), but not into it's internals, and error was about wrong pointer dereference at 0x106 address. Looking more precisely into skb setup in original 8139 interrupt path, I found that ->dev field was not setup correctly, it was not setup at all, so this fixed that bug. I can see ACKs from zero-copy capable hosts, but after some period of time they stop and conenction stalls.

I've released the new version of receiving zero-copy concept for the interested reader, patch is awailable in archive.

:: Link / Comments (0)


Thu, 27 Oct 2005

Timer interrupts and signal delivery on PPC.


I've found very strange thing - SIGALRM handler can not safely call sleep() although man signal says, that sleep() is a signal safe function. Also couple of other safe functions actually cause SIGALRM handler to deadlock.
This happens only on our PPC405GPr boards, and is 100% reproducable. So I spent the whole day trying to move timer's state machine in userspace out of signal handler. It looks like it is rock stable now.

Met with Abr and his family today - I've seen his son Anton first time today - nice small man with very smart face and naturalist and very demanding temper - he tried to eat everything he could get and definitely tried to say us that he does not like our attempts to prevent this. Unfortunately 6 month old man can not express himself in other way than screaming and laughing, but parents already can understand many of his signals.
I spent very nice evening with them.

:: Link / Comments (0)


Wed, 26 Oct 2005

First snow in Moscow... Winter has came.


Ok, receiving zero-copy can handle non-page-aligned data now, but it has opened new problem - since when using zero-copy, systems grabs data frames from the stack, and thus receiving side can not acknowledge received data and TCP stucks.
This is how it looks from sending point of view:

12:46:25.439777 IP 192.168.0.48.12345 > 192.168.4.78.48192: . 4213170362:4213171810(1448) ack 1304099580 win 1448 
12:47:18.690968 IP 192.168.0.48.12345 > 192.168.4.78.48192: . 0:1448(1448) ack 1 win 1448 
12:49:05.193624 IP 192.168.0.48.12345 > 192.168.4.78.48192: . 0:1448(1448) ack 1 win 1448 
12:51:05.201123 IP 192.168.0.48.12345 > 192.168.4.78.48192: . 0:1448(1448) ack 1 win 1448 

I have some idea about how to fix this: skb could be allocated with header in skb->data and all even non-page-aligned data in fragmentation list, which will be then passed into the stack, such skb may not be removed using original destructor and can not be mangled in stuff like netfilter or IPsec, anyway noone should use IPsec and netfilter in fast zero-copy path. Destructor for such skb should call kfree_skb_zerocopy() instead of blindly release fragmentation list. This requires some thinking...

:: Link / Comments (0)


Tue, 25 Oct 2005

Old friends meeting.


Today Fedor, Irin, Yuliana, Pavel, Alexander and me met Abr, which returned from England to his family and they are going to move to London in a couple of days. It was very good time to see them again.

:: Link / Comments (0)


Mon, 24 Oct 2005

Third zero-copy patchset.


I've put new receiving zero-copy patch in archive.
It works - first receiving zero-copy implementation for Linux kernel.
Although I will not publish any results now, since test is only preliminary, but dataflow already can be established between zero-copy receiving sendfile and original server.
Setup with small MTU is the main problem now - I do want to have zero-copy receiving path for standard 1500 MTU, and I very doubt my high-end Realtek 8139 chip supports jumbo frames of 4k. Currently it is not solved, and dataflow is stopped in a page boundary.

:: Link / Comments (0)


Sat, 22 Oct 2005

Day of new things.


I've bought myself tons of new acquisition. Then me and Grange celebrated it in 5 oborotov, which was not changed over the years and still very good or even the best beer restaurant.
While getting there, we found, that "Mayakovskaya" subway station changed greatly - it looks like it was completely redigged, it has modern interior, long semi-mirror corridors, transparent policy rooms and great mosaic ceiling with very interesting drawing.
I do think only in Russia you will find such lovely designed subway stations.

:: Link / Comments (0)


Thu, 20 Oct 2005

Preliminary receiving zero-copy support for Linux kernel 2.6 network stack.


I've put first release of receiving zero-copy mechanism into archive.
This patch was not tested yet, since I have no high-end Realtek 8139 network card with it's outstanding MMIO interface, I will try it later today at home.

For interested reader - here one can find a photo of my workplace with small comments (1.2 Mb). Enjoy.

Ok, I've updated patch, it only includes several cleanups in 8139too.c driver.
System without zero-copy capable sockets works without any problems.
Fortunately I have two such cards and one e100 Intel network adapter in my home router, so I can easily remove one and continue testing at work.

:: Link / Comments (0)


Wed, 19 Oct 2005

Some stars moved in special sign,

so tomorrow promises to be a very interesting day.
Real decision has been made, so let's put it into practice.

:: Link / Comments (0)


Receiving zero-copy.


Hacked receiving zero-copy mechanism at work a little, it is almost ready for tests, so hopefully tomorrow I will crash my system first time using it.

Design is quite simple - in sendfile() system call I grab several pages from VFS cache and provide them to the appropriate socket, which is determined using copied headers in driver's interrupt handler when it is going to allocate new skb and DMA/copy data into it.

Climbed a little today - several traverses and starts, and I've finished my favorite trace today with only four fails.

:: Link / Comments (0)


Tue, 18 Oct 2005

HIFN driver is broken.


Either my hardware or my driver (which is more likely) is broken, since just after interrupts are enabled, my test system freezes due to interrupt storm from HIFN card.
As far as I remember, it did not work this way when I wrote this driver, but nevertheless this driver is based on source that I never tested, so I decided to rewrite HIFN driver.

ok, after some debug work I found, that it was not a good idea to request IRQ before setting DMA up, so after moving some functions up and down, it can successfully run FIPS test, but after IRQ is requested, first interrupt hires (why?) and system freezes at the very end of request_irq(), I suspect in local_irq_restore().
This driver must be rewritten since it not only looks like a deadman, but becomes to smell badly...

:: Link / Comments (0)


Mon, 17 Oct 2005

I'm back from vacations, and go working.


Groovy, I've fixed (finally) acrypto and input IPsec issues. New patch is available in archive.

New version of Acrypto has been released. Tarball is available in archive.
It includes locking cleanups and fixes, atomic bitops usage and general cleanups. It also has simple load balancer embedded into acrypto module, so acrypto system can work right after module insert not waiting for load balancer module, although one still can insert own load balancer module.

2.6.15 network tree has been opened. David Miller goes to vacation for two weeks, and in this time Arnaldo Carvalho de Melo will be a network maintainer.

Probably I will have something to show in 2.6.15 timeframe from zero-copy receiving side.

Good climbing day - I finally finished the hardest trace I ever climbed, it is quite complex and is on the negative slope. Although I several times failed down I've done it, and it was definitely very good climbing.

:: Link / Comments (0)


Fri, 14 Oct 2005

OpenBSD has a birthday today - Saturday 16:36 MST, 1995!


My congratulations - 10 years!

Updated acrypto-ipsec patch: removed some debug, added missing diff for xfrm.h header.
Hmm... I've broken something with today's git pulling/commiting, so acrypto+ESP just freezes machine under the load. I will investigate this issue today.

:: Link / Comments (0)


Thu, 13 Oct 2005

Input ESP4 IPsec processing engine has been ported to acrypto!


I'm cool. XFRM not. Porting was ugly.
Patch can be found in archive.
It has been also sent to linux-crypto@ and netdev@ for interested reader.

Acrypto now supports full ESP4 input/output processing crypto operations.

Ok, for the first zero-copy release I decided to drop SKB if zero-copy capable socket can not provide a page, if such approach will not work or will work worse than without it, I will think about queueing.

I've updated w1 driver - synced with in-kernel driver, various small fixes and whitespace cleanup. New version is available in archive.

:: Link / Comments (0)


Wed, 12 Oct 2005

DARPA Grand challenge results.


A Stanford-designed robotic car has driven away with the $2M prize in the second DARPA Challenge, a 175-mile race for autonomous vehicles held this weekend in the Mojava desert south of Las Vegas.
Four of 23 vehicles completed the course.
The car "Stanley" is based on a VW Taureg SUV, with seven Pentium M-powered computers mounted in the trunk in a fault-tolerant configuration.

Cleaned boring stuff at work, so next two days will be devoted completely to acrypto and zero-copy receiving. Hope I will have something to show at the end of the week.

Climbed a little today, but it was hard training - I found a new trace, which is not too complex, but it's most hard part is that it has permanent negative slope - so hands are tired a lot, and when it hurts I feel myself alive.
And it makes me feel good.

:: Link / Comments (0)


Mon, 10 Oct 2005

Acrypto hacking.


Bug with mangled TCP content when doing input ESP4 IPsec processing in acrypto has been narrowed down to be in scatterlist crypto setup, since acrypto itself produces right data, which has been verified using consumer.c test module from acrypto package.

Ok, problem has been found - IV was not set up correctly. Now input ESP IPsec processing works with acrypto, although there are some bugs in it - XFRM code definitely does not allow asynchronous processing, since xfrm_state can be changed. The only solution I see to audit every xfrm_state usage and change the whole things to not flush it's data until some reference counter is dropped...

Megapixel+ project is delayed for one week - there are always boring thing which can be assigned to be done before.

Zero-copy receiving strikes several problems:
since SKB can be allocated in hard IRQ context, and grabbing a page from VFS can take too long and must happen in process context, there is yet unresolved race there:
if we want to store received data, then we should queue it like before, and then later when page is grabbed and ready for writing, somehow find that data from skb must be copied into it, and this page should not go into allocation routing.
Qeueing mechanism also creates a problem if network driver will ask for skb each time just before page is grabbed in some process context, so original skb will be allocated with kmalloc()'ed data area, it will be queued, and this will happen again and again without possibility to actually turn zero-copy on. Although this problem can be solved easily - just check if there is data pending in socket queue, and grab a page with advanced file pointer.
All this problems dissapear if we just decide to drop a packet, if there is no free page allocated in advance in process context, but I do not want to get this approach into account yet.

My sister has a birthday today, wow! Marina, I congratulate you and wish the best of everything!
Although I hope you do not read this flow of madness.

:: Link / Comments (0)


Sun, 09 Oct 2005

Zero-copy receiving.


Drawing various zero-copy fast pathes in my head. The main design goal is clear - fetch some headers from network card, provide it to the list of registered zerocopy handlers, one of which can decide that this packet belongs to him, so it could call it's private allocator and return data.
For example, let's have a TCP socket, marked as receive zerocopy capable in receiving sock_sendfile(). Using IP and TCP headers we can find corresponding socket and decide if it is zerocopy capcable or not. If yes, then handler will grab a page from VFS cache and return it's address as skb->data.

Things are clear on a paper, but there are some trouble in a real life.
First, I have no card which is capable to get only some header from it's RX ring. The problem has been solved easily - 8139too copies packet data using MMIO (that is why it is so slow), so I will just copy packet headers first, provide them to zerocopy capable SKB allocation function and then copy the rest of the packet into skb->data, which will be either grabbed VFS page or usual kmalloc()'ed area.
Second, TCP socket lookup can only run from BH/process context, but not from hard IRQ context, where skb allocation can happen. I did not investigate this problem deeply and for tests just leave it as is.

So, after several caps of tea, beer and some food I've written couple of bytes of code, which can be used as a base for further receiving zerocopy development. Handler's interface and TCP socket lookup have been finished, receiving sock_sendfile() I will take from my previous patches.

Things move not bad.

:: Link / Comments (0)


Sat, 08 Oct 2005

Acrypto hacking.


Hacked ESP input processing a litte from home - remote machine frozen and no clues why asynchronous processing with the same call chain produces wrong data. The worst situation will be if it is acrypto core itself, but it was tested many times under heavy load and data was never corrupted... I need to investigate it, but will be able to do it only after weekend. I'm sure it is something simple and stupid I forgot.

:: Link / Comments (0)


Fri, 07 Oct 2005

New project codename "Megapixel+" started.


The main aim of this project is to create standalone implementation of simple network stack on top of unknown board/system using Intel 82541PI adapter. Of course I will not write whole TCP implementaion, but will only create UDP/UDP-Lite stack without any socket-like interfaces. System does not have userspace, so driver will be simple - it only must read some memory and send it over 82541PI adapter, it also will receive some control messages. The highest transmit performance of system is a must - upto 1Gbps using large frames from memory. Unfortunately system will not be able to get data directly into send buffers, so there will be at least one copy.

First milestone is to create simple driver for Linux, which will send/receive UDP frames from/to shared memory without any usage of existing network stack. It must be as small as possible, so existing e1000 driver will be truncated significantly.

New non-payed project - real zero-copy network receive path.
The main idea is to mark socket as being a pipe into a file, using sock_sendfile() or sock_recvfile(), so when network driver allocates new skb for received data, this skb will be allocated using file's cached pages if it's data should go into specified socket, which is determined from packet header.

But this will be started after I finish porting ESP4 IPsec input processing path to acrypto, which should happen very soon.

Excellent climbing after excellent day - what could be better?
Not many traces, new one was not finished, but nevertheless, it was really cool.

Gee, Grange's GPIO framework has been committed into NetBSD CVS tree. My congratulations!

:: Link / Comments (0)


Thu, 06 Oct 2005

XFRM hacking.


Updated XFRM engine to support callback mechanism - it works very good with synchronous ESP crypto processing, but using acrypto, TCP header is mangled, but IP header is always fine. Magic.

Playing kwith wavelets a little - create simple program to decompose and reconstruct black-white images - it is a first step into wavelet interpolation.
Results are contradicted - image became absolutely different and nonstandard transformation produced negative factors, which never existed in original basis.

:: Link / Comments (0)


Wed, 05 Oct 2005

Acrypto hacking.


Input IPsec processing ported to acrypto hangs much rarely now, but still can freeze machine completely. And packet is not delivered to higher layer yet, but it is not a big problem. The main issue is a crash, which happens in absolutely unpredictible time...

Added I2C RTC support for D16 board - it uses DS1338 chip, which was copied from lm_sensors tree.

It was day of relaxing climbing - only half of one new trace on the negative slope, that trace really deserves it's category, since I definitely failed there on-sight.

:: Link / Comments (0)


Tue, 04 Oct 2005

Acrypto hacking.


I've moved forward in porting input IPsec processing mode to acrypto - systems already hangs, which is not always a bad sign!
Easy way, when I decided, that skb can be asynchronously processed without being cloned, as expected, was wrong, so I will experiment with more complex approach of SKB usage tomorrow.

Hacked PPC board a little - I finally found a way to log all output to stderr and stdout, i.e. console output, into my own tty driver, which moves that data in the same way all kernel messages go through console driver. Debugging become much easier.

:: Link / Comments (0)


Mon, 03 Oct 2005

Weekend status.


1. Without good company half of liter of tequila flows not very tasty.
2. I found, that I do not eat one day after big drinking, and next day is spent almost only eating.
3. Good drinking refreshes brain very good. Only good climbing can be compared in this case.
4. It was not bad.

XFRM hacks.
I've changed input XFRM engine to support recursive callbacks, so with right shared structure it can be used with acrypto. The main problem is to provide it somehow into ESP/AH input methods, probably I will add some pointer into XFRM state structure...
Patch has been sent into netdev@ and linux-crypto@ mail lists for review, today or tomorrow I plan to implement acrypto support for input IPsec crypto processing path.
Herbert Xu, current linux kernel crypto maintainer, announced support for several crypto algorithms in current CryptoAPI stack. It is a first step of asynchronous crypto processing support in this stack. Like acrypto and OCF it now supports priority of algorithm implementations.
XFRM Patch can also be found in archive.

Climbing was very good today - I tried two new complex traces, one has been finished, but I completely fail to finish second one - just do not know, how to move there. The bad thing is that I did not even see how other people did it, maybe noone even tried it except author, and it as quite long time ago, so I completely do not remember it, except that it was hard trace.

:: Link / Comments (0)


Thu, 29 Sep 2005

Userspace init hacking.


Complex chain has been created to satisfy our embedded/box needs - three steps must be taken to run userspace on D16 board.
First init run from initrd and is linked to the kernel image, so it lives either on flash o is loaded using PCI. This first init multicasts it's IP address into predefined group, so remote initilisation application could connect to it and send several loopback images and files, which are then moounted into tmpfs and root is changed into this in-ram directory, where second init has been started, which parses initial configs adn starts other daemons, one of which begins to multicast board's state into predefined group, so remote initialisation application could load firmware into DSPs and configure them.
Design looks complex, but it allows to completely eliminate NFS server, and thus to put any set of embedded boards into remote network without thrashing theirs infrastructure, since only one applicatin is required to fully configure all devices.

:: Link / Comments (0)


Tue, 27 Sep 2005

DARPA Grand Challenge rox!


DARPA Grand Challenge is a field test intended to accelerate research and development in autonomous ground vehicles.
Engineering forces produced different machine types - from production models like VW, Hammer, Nissan equipped with extreemly high-tech electronic driver which can even switch gearbox using mechanical "arm", to specially designed for California desert heavy gears and even motocycles from Berkeley team. There is even one team from high school on Acura MDX car.
I wish Russian universities could even have something similar. I'm not saying about fully equipped car, but at least some kind of automotive robotics. In MIPT where I studied, my friend worked with one Panasonic robot, which had simple video-camera, gears and very low batteries. There was RedHat Linux in this box, and it only could automatically twirl it's camera after the laser pointer. It had operator's console which allowed to manipulate with camera, and move gears.
I suspect this robot lives in some box under the thik layer of dust somewhere now...

Netlink in the current 2.6 kernel has some issue, which can be called differently - by defaul all netlink sockets can only broadcast data from kernelspace to several (read only to one group) first groups, i.e. 1,2 and 3, but if someone wants to broadcast to, for example, group 0x123, it requires to call multicast binding socket option from userspace. I've created a patch, which allows to broadcast to group number specified in bind() time, as long as bradcast to additional multicast groups. Patch is quite simple - it was sent to netdev@ for review, since I did not investigate it's impact too deep.

:: Link / Comments (0)


Mon, 26 Sep 2005

Userspace hacking day.


Hacked init to create rootfs without NFS - basic idea is to multicast it's IP address, obtained either from DHCP or assigned in other way, and create a listening socket, where the whole roots, compresed or not, as files or as image, could be loaded from network, so init could create filesystem in a RAM and pivot into it.

Unfortunately I have completely no time to finish asynchronous input IPsec processing, but I will do finish it very soon.

Climbing today was very good - although I was alone and climbed only traverses and boulderings, many of them were very good. Good physical training after good hacking day - that is exactly what is needed.

:: Link / Comments (0)


Sun, 25 Sep 2005

Several days of total stupidity.


Absolute inactivity, like sitting in a swamp.Nothing.

Today found several performances of single actor Evgeniy Grishkovets - it was really good time to see them - I recommend it to everyone.

	... они там занимаются чем-то, ну и пусть занимаются,
	может быть сегодня они забудут меня разбудить...
	А слышно - не забыли. Идут будить. 
	А идти, главное, не далеко.
	Но ты успеваешь приготовить свое самое жалкое лицо,
	такое лицо... Вот такое...
	Ну такое лицо, которое означает, что это спит ангел,
	его будить нельзя... Иначе улетит...
	А они зашли, не посмотрели на лицо,
	не посмотрели на крылья, которые аккуратно лежат за спиной,
	просто включили свет и ушли.

:: Link / Comments (0)


Thu, 22 Sep 2005

Initrd linked with ppc405 embedded image.


It is first time I put initrd into embedded image, and with all respect to kernel developers, it has been done very smooth.

Grange has ported OpenBSD on NEC MobilePro 780 Handheld PC with LE MIPS64. It is first and uniq port of OpenBSD on this processor, written from scratch, since OpenBSD never supported LE MIPS.
Here is a screenshot of the board.
My congratulations!

Read math fundamentals of veivlet processing - some time ago I had an idea to implement an algorithm which could allow to sharp zoomed images. Since veivlet basis has information not only about current point or pixel, but about set of them, classical interpolation of п╟transformation factors can lead to factor prediction and thus image quality improvements. In that times I implemented a simple program which selects a real image from the database based on hand-wrtten one or scanned or damaged, by comparing the most significant factors of veivlet transformation. It worked quite good with images, processed by Photoshop using some "corrupting" filters.

:: Link / Comments (0)


Wed, 21 Sep 2005

Map analyzer hacks or power of math.


Analyzer works with Dijkstra algo already, so it easily finds the shortest path between any points of the road on original bitmap map from scanner or internet, but only produces not interesting pies on the path - so I've written the smallest squares interpolation for those points - result can be found on the screenshot, where blue line is 3-order polynomial interpolation for the corner points of the shortest path (red circles on the map), green line is a Bezier interpolation from Gnuplot.

:: Link / Comments (0)


Tue, 20 Sep 2005

Hacking on sock_sendfile(). Continue.


I've created nice CPU usage graph for 2.6.13-rc6 and 2.6.14-rc1-git trees for recv()/write() and receiving sendfile() usage.
sock_sendfile_cpu_usage

Patch is available in archive.

Acrypto hacking - input asynchronous IPsec ESP processing implementation goes very slowly - due to possibility to have several encapsulated headers, each XFRM state should be processed synchronously after previous, so it must be called from acrypto callback, which should be stackable...

:: Link / Comments (0)


Mon, 19 Sep 2005

Hacking on sock_sendfile().


The whole day was spent trying to understand, why sock_sendfile() is slower than recv()/write() sometimes.
It looks like root of this lives in CPU usage - with sendfile() usage, and thus sock_sendfile(), it always less than recv()/write(). Oprofile data shows that poll_idle is less and third place is taken by __copy_from_user_ll() in recv()/write() case. CPU usage does not go up probably because of remote side just can not fill the pipe.

Climbed a little today - shoes still damage my feet, so some traces look more complex, but I'm sure in a couple of trainings things will stay better.

:: Link / Comments (0)


Sat, 17 Sep 2005

sock_sendfile() hacking.


I've created a new version of sock_sendfile() which can be used for receiving data into a file without meaningless iteraction with userspace.
Previous implementation works in the following way:
it allocates a page, receiving data into it using recvmsg() and then calls file_send_actor() which basically is a wrapper over ->sendpage() method, I implemented ->sendpage() method to grab page from VFS and then memcpy() data from given page.
Such approach removes copying data from userspace when doing write operation, and performance improvement was about 5% per calling thread, which was about 20 seconds when downloading 650 Mb ISO.

In a new approach system grabs a page from VFS and receives data directly into it, which should completely eliminate any data copying.

:: Link / Comments (0)


Fri, 16 Sep 2005

Acrypto input hacking.


Basic part is done - decrypting part is localized and infrastructure for acrypto has been created, so it could be as simple as output path, but input XFRM engine is much more complex and this part is not even touched yet.
The most complex part is data decapsulating, which happens after decryption.

Hmm, Linux networking input path is completely synchronous after IP processing started, it does not have some kind of dst_entry there, so it can not be splitted into several pieces, some of which could be offloaded into hardware. This means that it is not possible to create simple asynchronous IPsec crypto processing for input networking path.

Although everything can be much easier...
What if we just say to the system, that packet has been delivered - as far as I can see nothing prevents from it, if so, acrypto can just call all xfrm processing code in asynchronous mode, and then call netif_rx(skb) with the new SKB itself.
It is an idea.

On embedded side there are good news: kernel and userspace compiled with gcc-3.4.4 work excellent together, so I definitely recommend to use gcc3 instead of gcc4 for PowerPC cross-compilation.

Climbed a little in a new climbing shoes - Boreal Spider - it's base is stronger, so I think they will better stay on small holds. This shoes are really good, but until trod out they damage my feet a lot.

:: Link / Comments (0)


Wed, 14 Sep 2005

Connector update.


If input message rate from userspace is too high, do not drop them, but try to deliver using work queue allocation. Failing there is some kind of congestion control. It also removes warn_on on this condition, which scares people.

Updated version was sent for inclusion and available in archive.

Start working on input esp4 processing support for acrypto.

:: Link / Comments (0)


Tue, 13 Sep 2005

Connector is in Linus' tree.


:: Link / Comments (0)


PPC init problems.


BUG was narrowed down to elf interpreter - it somehow misses brk adjustment, so later it fails into do_page_fault() without proper VMA.
Magic things is that if I add any memory barrier into load_elf_binary() inside binary mapping loop, binary is loaded properly with right brk adjustment.

I found the problem - it is GCC-4.1-20050716.
There is following piece of code in load_elf_binary():

		k = elf_ppnt->p_vaddr + elf_ppnt->p_filesz;
		if (k > elf_bss)
			elf_bss = k;
		if ((elf_ppnt->p_flags & PF_X) && end_code < k)
			end_code = k;
		if (end_data < k)
			end_data = k;
		k = elf_ppnt->p_vaddr + elf_ppnt->p_memsz;
Reading "k" variable at the end of this block shows, that it does not equal to elf_ppnt->p_vaddr + elf_ppnt->p_memsz:
load_elf_binary: k=00000000, elf_brk=00000000, p_vaddr=10070000, p_filesz=00001b0c, p_memsz=00002780.
Making "k" volatile fixes init problem on ppc405gpr with linux 2.6 kernel.
What I wonder is that 2.4 kernel compiled with the same compiler works fine.

:: Link / Comments (0)


Mon, 12 Sep 2005

PPC init problems.


Todays git pull has brought me nice debug message in dmesg:

init has generated signal 11 but has no handler for it
Kernel panic - not syncing: Attempted to kill init!
 <0>Rebooting in 180 seconds..
After reading git commits I found this patch from Paul Mackerras:
    [PATCH] ppc32: Kill init on unhandled synchronous signals

    
    This is a patch that I have had in my tree for ages.  If init causes
    an exception that raises a signal, such as a SIGSEGV, SIGILL or
    SIGFPE, and it hasn't registered a handler for it, we don't deliver
    the signal, since init doesn't get any signals that it doesn't have a
    handler for.  But that means that we just return to userland and
    generate the same exception again immediately.  With this patch we
    print a message and kill init in this situation.
    
    This is very useful when you have a bug in the kernel that means that
    init doesn't get as far as executing its first instruction. :)
    Without this patch the system hangs when it gets to starting the
    userland init; with it you at least get a message giving you a clue
    about what has gone wrong.
This means that my 2.6 ppc kernel is broken, and it is quite good news, since digging with compiler problems could be much worse.

:: Link / Comments (0)


Sun, 11 Sep 2005

Connector in -mm tree again.


It has a strange life - first time it was added more than a year ago into -mm tree by GregKH with my SuperIO patch, which provided generic access to standart SuperIO chips and devices on it's bus, namely GPIO and access bus on scx100 board and pc8736x chips. Interface allows very easy addition of new devices and chips. It still lives in archive in soekris directory, although I did not try it for quite some time, since I does not have hardware.

Then Greg decided to change it's politic of adding stuff, so I needed to add it directly through Andrew Morton. I first tried to push connector, but people did not see any profit of having it, they think that macros on top of netlink allocation can help creating real message bus, so it was removed again.

I hope this time I have convinced people, that all connector subsystems is definitely required to have powerfull bidirectional event bus. It has very simple mechanism of event allocation and notification and very convenient method of receiving messages from userspace based on callback registration.

Here is connector's hopemage.

Connector will be pushed into Linus tree today.

:: Link / Comments (0)


Sat, 10 Sep 2005

sock_sendfile()/generic_file_sendpage() numbers.


Using recv()/write() is about 5% slower in one thread and about 8-10% slower using two threads compared to new sendfile() on 1+1 hyperthreaded machine. Server uses sendfile.

 
recv()/write(): 
3m45.424s  659905536
3m50.352s  659905536

receiving sendfile():
3m29.432s  659905536
3m33.065s  659905536

:: Link / Comments (0)


Fri, 09 Sep 2005

It looks like I found the root of the problem with not running userspace on PPC405.


IBM has famous erratum 77, which requires sync or dcbt instruction to be executed before stwcx. Neither of my compilers do not have it, but kernel itself has such a workaround.

lwarx/stwcx are used for organizing atomic oerations, so it is heavily used in linuxthreads/pthreads.

Or maybe not, AMCC affirms that GPr core does not have this bug.
After some gcc/glibc hacks I can confirm, that it was not the case - userspace starts, but does not run.

Climbing was good today - old traces were done - jumping, yellow, black, several boulderings and one new trace finished on-sight - it was good time, which cured the day.

:: Link / Comments (0)


Thu, 08 Sep 2005

PPC userspace cross-compilation.


The whole day was spent in attempts to run init on ppc board. It does not. After more than 12 hours spent on this problem, I only can create following table:

gcc version 3.2.1 --with-cpu=405 --nfp --without-fp        userspace works

gcc version 4.1.0 20050702 (experimental) x86 build
--without-fp --nfp --with-cpu=405 
--enable-cxx-flags=-mcpu=405
--disable-nls --enable-threads=posix 
--enable-symvers=gnu --enable-__cxa_atexit 
--enable-languages=c,c++ --enable-shared 
--enable-c99 --enable-long-long                            userspace does not work

gcc version 4.1.0 20050716 (experimental) x86_64 build     userspace does not work
Linux kernel 2.6.13 is compiled with the second compiler and it works fine.

Binutils for the latest two compilers are the same 2.16.1.
Userspace does not crash or not work completely - I have one binary compiled with 4.1.0 compiler, which works. I even can create magic sequence of operands in main(), and new binary will also work, but if then I add additional a = b;, then it stops...
I've started gcc-3.4.4 with new binutils compilation, hopefully it will work better.

:: Link / Comments (0)


Wed, 07 Sep 2005

Init.


I'm stupid - I ported linux kernel 2.6 to new board, rewrote ppc BIOS and... can not compile init. I have one binary without source, which only remounts root into read-write mode and creates one file in /tmp - it works, now I have a source, which does exit(-1) on startup, and init is run by kernel, but it does not run into main() and does not call exit().

With this stuff in brain I went climbing alone - rubbed fingers, found couple of new interesting boulderings and starts - it was fun.

:: Link / Comments (0)


Tue, 06 Sep 2005

PPC hacking.

It becomes boring - previous digital recording board D16 had half of it's DSP on low half of an address bus, and other half on the high addresses, so there were following code:

if (((unsigned long)dsp->phys_pm & 0x00020000) == 0x00000000)
	*((uint32_t *) dsp->pm_base + 0x1) = (DSP_DM_ADDR + addr);
else
	*((uint32_t *) dsp->pm_base + 0x1) = ((DSP_DM_ADDR + addr) << 16);
...

if (((unsigned long)dsp->phys_pm & 0x00020000) == 0x00000000)
	value = (0xffff & tmp0) + (tmp1 << 16);
else
	value = (tmp0 >> 16) + (tmp1 & 0xffff0000);
current board has following address bus crossing from AABBCCDD -> BBAADDCC, so new board has following:
	*(volatile u16 *)dsp->vcs2 = addr;
	value = *(volatile u32 *)dsp->vcs1;
	val[0] = value & 0xffff;
	val[1] = (value >> 16) & 0xffff;
	mod_value = __le32_to_cpu((__le16_to_cpu(val[0]) | (__le16_to_cpu(val[1]) << 16)));

Hardware guys have real fun developing such schemas I think.

:: Link / Comments (0)


Mon, 05 Sep 2005

dm-crypt hacking.


Unfortunately it is broken - device mapper itself was not designed for asynchronous operations, all it's shared objects can not store private variables since they can be accessed in parallele, but those which can are only allocated in stack for local variables. So the most bugfree solution I see is to allocate own objects in heap, but this will add additional overhead. Here is an updated version of dm-crypt port to acrypto.

Ronen Shitrit [rshitrit_marvell.com] has ported dm-crypt for acrypto to OCF.

Climbed a lot today - bad thing happen - my new climbing shoes have been rubbed, crap, I bought it less than a month ago. I climbed couple of interesting traces with even with dynamic jums - it was fine. Somehow finished 6b+ on-sight - I do think it is either not 6b+, or I got several wrong holds.

:: Link / Comments (0)


Sun, 04 Sep 2005

I congratulate you with Day of Moscow.


Central streets are closed for car moving, artiicially created good weather and many entertainments and fun.

Added couple of acrypto helpers for TFM bridge, like converters from TFM to acrypto mode and type.
As usual, updated version is available in archive.

I've updated dm-crypt patch, which fixes issue with multiple dm-crypt devices for different partitions. Bug could slow down processing, and probably can cause an oops. It's md5sum is 318261505489fa0f46d030cdf3844b35.

:: Link / Comments (0)


Sat, 03 Sep 2005

Drunking^W^W^WRelaxing day.


The whole night I drunk Olmeca tequila with Grange. Then morning tea, and moving to friends meeting, where Sauza tequila was flawn.
It is always nice to talk about interesting things with interesting people with couple of glasses of the fire water.

:: Link / Comments (0)


Fri, 02 Sep 2005

dm-crypt ported to acrypto.


I've announced dm-crypt-2.6.13.diff patch in linux-crypto@ and put it into archive.

I've implemented generic_file_sendpage() method and sock_sendfile(), now Linux users can use sendfile() system call for any file descriptors communications, like socket<->socket and socket<->file. The main advantage is that for socket->file one does not need to copy data from userspace buffer into kernel when doing write() using very slow copy_from_user(). Patch was presented in netdev@.

Patch against 2.6.13 is available in archive.

:: Link / Comments (0)


Thu, 01 Sep 2005

Day of knowledge.


People start searching for the knowledge today, and I started to port dm-crypt to acrypto. It is almost done, so I plan to clean it up tomorrow and announce in linux-crypto@, next point is input path of in-kernel IPsec stack.

I've finally finish new complex trace and feel myself very good about it. Also climbed several old traces, but they were not so cool as first one. Tried second time trace called "Beaujolais", I do not like wine, so I falied it again, but I'm quite sure next time I will uncork this bottle.

:: Link / Comments (0)


Tue, 30 Aug 2005

2.6.14 kernel tree will be called "Affluent Albatross".


Fighting with EMAC driver on our new PPC405GPR based D16 board - it does not work with my Dlink switch (model DES-1008D), permanent FCS errors, and if omit FCS check frames itself are broken too. With something from CNet it works fine. I know, switch is crap, and now I have yet another confirmation on top of not working autonegotiation with Realtek 8169 PHY and Marvell 88e1111, managed by r8169 and forcedeth modules.

Climbed with Grange today, he finally returned from vacations. It was very good - people finally do not scream, that I shin without a rope, so several old traces were done, although not clean - next time it will be better.

:: Link / Comments (0)


Mon, 29 Aug 2005

PPC hacking day.


Ok, new D16 board has beed booted - the main change from PPC405GP initialisation was only data cache size - it is now 16Kb, and in PPC405GP it was 8K, so BIOS caught an exception when tried to write into half-invalidated cache. Other changes were mostly setting different GPIO values. Now linux kernel 2.6.13 works ok. Next task is to change DSP drivers to the new addressing schema.

I've implemented simple no-way-zero-copy support for sendfile for socket <-> socket connections.
Patch is quite stupid - it uses kernel_recvmsg() into preallocated page and then provides it for given actor() method.
If this approach will be concidered usefull I will probably implement sendpage() method for fs, so it could be really usefull for huge ftp uploads and so on.
Until then you can find a patch in archive.

:: Link / Comments (0)


Sun, 28 Aug 2005

The laziest day.


I feel myself as being a soulsick - why, should I ask, I spent several hours in my bed... watching all three parts of Harry Potter?
In russian it is spelled similar to "Potniy", i.e. Harry Potniy, which is Harry Wet in rough translation.
And I lie in my bed and ... and watching Harry "Wet" Potter.

Crap.

:: Link / Comments (0)


Fri, 26 Aug 2005

Connector updated.


Ok, I've send updated netlink connector version to netdev@ maillist and updated archive.

4 hours of climbing in a two traces - I found new very complex bouldering and work on one traverse. It was very good time. I found that pain after the training moved from hands into the fingers and it's first phalange - pillows are just rubbed.

:: Link / Comments (0)


Wed, 24 Aug 2005

Lazy day.


Nothing interesting happened today - new board is not loaded - ppc405GP BIOS can not be started on ppc405GPr, probably it freezes on data cache initilisation, and now it is impossible to reprogramm an eeprom so it will wait until Monday when programmator will be delivered.

Climbed a little today - it was huge crowd of people in Skala-city, so I only did several old traverses and starts. I've bought quarter season ticket to the climbing zone, so it could be called regular training start.

:: Link / Comments (0)


Tue, 23 Aug 2005

PPC hacking.


I've finished ppc port to old D16 board. Today I hacked miscelaneous tools and host drivers for our new D16 board. Here is a photo (Warning, 4Mb!). It is PPC405GPr based embedded board with 32Mb SDRAM and 2MB flash memory installed. It ships 4 Analog Devices ADSP-2185 and set of i2c devices - RTC, thermal sensor. Tomorrow I will start porting 2.4/2.6 kernel on this board.

:: Link / Comments (0)


Mon, 22 Aug 2005

Linux kernel 2.6 PPC ported to D16 digital board.


Ok, it works!
The most complex and interest part was to hack board's kernel loader, and searching in head.S and relocate_kernel.S where the kernel stops using some kind of this stub

        lis 0,0xaabb
        ori 0,0,52445
        lis 9,0x80
        stw 0,0(9)
_loop_:
	b _loop_
Here is complete dmesg:
<5>Linux version 2.6.13-rc6 (s0mbre@uganda) 
(gcc version 4.1.0 20050716 (experimental)) #54 Mon Aug 22 14:40:49 MSD 2005
<4>Running as PCI slave, kernel PCI disabled !
<4>PCLIO_BASE = 0xe7ffe000
<4>PCI bridge regs before fixup 
<4>            ma       la    pcila    pciha
<4> pmm0 00000000 fffe0000 fffe0000 00000000
<4> pmm1 00000000 00000000 00000000 00000000
<4> pmm2 00000000 00000000 00000000 00000000
<4> ptm1 ms: fe000001 la: 00000000
<4> ptm2 ms: fe000001 la: fe000000
<4>PCI bridge regs after fixup 
<4>            ma       la    pcila    pciha
<4> pmm0 c0000001 80000000 80000000 00000000
<4> pmm1 00000000 00000000 00000000 00000000
<4> pmm2 00000000 00000000 00000000 00000000
<4> ptm1 ms: fe000001 la: 00000000
<4> ptm2 ms: fe000001 la: fe000000
<4>Message 1
<0>Message 2
<4>sk: mem_ipc_setup() done
<4>       D16 port (C) 2000-2005
<4>       machine: D16
<4>
<4>  bi_s_version: 
<4>  bi_r_version: 
<4>    bi_memsize: 0x00ff0000	 16320KBytes
<4>bi_enetaddr 0: 732069-703d3a
<4>bi_enetaddr 1: 3a3a3a-643136
<4> pin strapping: 0x6aaa9000
<4>    bi_intfreq: 198Mhz
<4> plb bus clock: 33MHz
<4>bi_pci_busfreq: 33MHz
<4> opb bus clock: 33MHz
<4>cs0 CR: ff09a000 AP: 03840200
<4>cs1 CR: fe07c000 AP: 80000380
<4>cs2 CR: 00000000 AP: 00000000
<4>cs3 CR: 00000000 AP: 00000000
<4>cs4 CR: 00000000 AP: 00000000
<4>cs5 CR: 00000000 AP: 00000000
<4>cs6 CR: 00000000 AP: 00000000
<4>cs7 CR: 00000000 AP: 00000000
<4>EBC0_CFG: 80400000
<7>On node 0 totalpages: 4096
<7>  DMA zone: 4096 pages, LIFO batch:1
<7>  Normal zone: 0 pages, LIFO batch:1
<7>  HighMem zone: 0 pages, LIFO batch:1
<4>Built 1 zonelists
<5>Kernel command line: root=/dev/nfs ip=::::d16-0.net:eth0:any console=ttyS binfo=0xabcd 
idsp=ixpro,d16,8,0xfe020000,0x8000,0,0,0xfe028000,0x8000,0xfe060000,0x8000,0,0,0xfe068000,
0x8000,0xfe0a0000,0x8000,0,0,0xfe0a8000,0x8000,0xfe0d0000,0x8000,0,0,0xfe0d8000,0x8000,
0xfe110000,0x8000,0,0,0xfe118000,0x8000,0xfe150000,0x8000,0,0,0xfe158000,0x8000,0xfe190000,
0x8000,0,0,0xfe198000,0x8000,0xfe1d0000,0x8000,0,0,0xfe1d8000,0x8000 amb=ixpro,d16,1,0xfe300000,
0x1ffff,0x10000,0x18000 l_tx_ptr=0x80000000 l_tx_len=0x1000
<4>binfo_setup: MAC 00:00:dd:dd:ab:cd
<4>PID hash table entries: 128 (order: 7, 2048 bytes)
<4>Console: colour dummy device 80x25
<4>Dentry cache hash table entries: 4096 (order: 2, 16384 bytes)
<4>Inode-cache hash table entries: 2048 (order: 1, 8192 bytes)
<4>Memory: 13184k available (2260k kernel code, 636k data, 108k init, 0k highmem)
<7>Calibrating delay loop... 196.09 BogoMIPS (lpj=98048)
<4>Mount-cache hash table entries: 512
<6>NET: Registered protocol family 16
<6>PCI: Probing PCI hardware
<3>Memory resource not set for host bridge 0
<4>D16 serial port emulator driver.
<4>mem_con	d16 char io device registered to major: 253 minor:  0
<6>Initializing Cryptographic API
<4>vty_init
<4>tty_register_driver: register_chrdev_region() failed with error=0.
<4>tty_register_driver: register_chrdev_region() failed with error=0.
<4>tty_register_driver: register_chrdev_region() failed with error=0.
<6>Serial: 8250/16550 driver $Revision: 1.90 $ 32 ports, IRQ sharing enabled
<4>tty_register_driver: register_chrdev_region() failed with error=0.
<4>ttyS0 at MMIO 0x0 (irq = 0) is a 16550A
<4>ttyS1 at MMIO 0x0 (irq = 1) is a 16550A
<6>io scheduler noop registered
<6>io scheduler cfq registered
<4>RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
<6>loop: loaded (max 8 devices)
<6>mal0: Initialized, 1 tx channels, 1 rx channels
<6>emac: IBM EMAC Ethernet driver, version 2.0
<6>Maintained by Benjamin Herrenschmidt 
<4>eth0: IBM emac, MAC 00:00:dd:dd:ab:cd
<6>eth0: Found Generic MII PHY (0x02)
<4>netconsole: not configured, aborting
<6>mice: PS/2 mouse device common for all mice
<6>i2c /dev entries driver
<6>IBM IIC driver v2.1
<6>ibm-iic0: using standard (100 kHz) mode
<6>oprofile: using timer interrupt.
<6>NET: Registered protocol family 2
<4>IP route cache hash table entries: 256 (order: -2, 1024 bytes)
<4>TCP established hash table entries: 1024 (order: 1, 8192 bytes)
<4>TCP bind hash table entries: 1024 (order: 2, 28672 bytes)
<6>TCP: Hash tables configured (established 1024 bind 1024)
<6>TCP reno registered
<6>TCP bic registered
<6>NET: Registered protocol family 1
<6>NET: Registered protocol family 17
<6>eth0: Link is Up
<6>eth0: Speed: 100, Full duplex.
<6>eth0: Link is Up
<6>eth0: Speed: 100, Full duplex.
<5>Sending DHCP and RARP requests ., OK
<4>IP-Config: Got DHCP answer from 192.168.0.202, my address is 192.168.0.243
<4>IP-Config: Complete:
<4>      device=eth0, addr=192.168.0.243, mask=255.255.248.0, gw=192.168.0.1,
<4>     host=d16-0, domain=, nis-domain=net,
<4>     bootserver=192.168.0.202, rootserver=192.168.0.188, rootpath=/home/pwp/work/nfsbox
<5>Looking up port of RPC 100003/2 on 192.168.0.188
<5>Looking up port of RPC 100005/1 on 192.168.0.188
<4>VFS: Mounted root (nfs filesystem) readonly.
<4>Freeing unused kernel memory: 108k init
After good hacking day it was definitely good climbing evening - although I did not shinned up quite high, but instead found couple of very interesting starts and small traverses. Although saw one little bouldering, but was too tired to complete.

:: Link / Comments (0)


Sat, 20 Aug 2005

Embedded PPC hacking.


Finally it boots, although not the whole kernel. Here is first dmesg:

version 2.6.13-rc6 (s0mbre@uganda) 
(gcc version 4.1.0 20050716 (experimental)) #24 Sat Aug 20 21:11:46 MSD 2005
<4>Running as PCI slave, kernel PCI disabled !
<7>On node 0 totalpages: 4096
<7>  DMA zone: 4096 pages, LIFO batch:1
<7>  Normal zone: 0 pages, LIFO batch:1
<7>  HighMem zone: 0 pages, LIFO batch:1
<4>Built 1 zonelists
<5>Kernel command line:
<4>PID hash table entries: 128 (order: 7, 2048 bytes)
<4>Console: colour dummy device 80x25
<4>Dentry cache hash table entries: 4096 (order: 2, 16384 bytes)
<4>Inode-cache hash table entries: 2048 (order: 1, 8192 bytes)
This required to hack board's BIOS, i.e. those program which is loaded from the EEPROM before kernel is loaded and even copied through PCI. It looks like problem lives in PCI initialisation part, since board should not look like PCI slave. Probably BIOS fixup is called too late.

Not bad for two days, I think.

:: Link / Comments (0)


Fri, 19 Aug 2005

PPC embedded development.


Ok, the whole morning I'm fighting with the 2.6 kernel on our custom embedded PPC405GP/PPC405GPr board. Now it almost run!

Linux/PPC load: .... ..Uncompressing Linux.....done.
..initrd moved:  .Now booting the kernel
.exit
...pause
..

..

 -- System halted.0123456789ABCDEF....bad gzipped data
...gunzip: ran out of data in header
..oops... out of memory

On this nice news I climbed very good - I've finished complex old trace, done several interesting starts and boulderings. I'm entering quite nice stripe in a life, which results in excellent results in almost every aspect of life, I'me quite sure it will be finished, but it will be well-deserved rest from the active time.

:: Link / Comments (0)


Thu, 18 Aug 2005

PPC development day.


It was not so easy to extract my PPC tree from BitKeeper, since bk after Jul 1 refuses to work completely, so I found Andrew Tridgell's sourcepuller and extracted the tree, then found that the latest version I work with is 2.4.26, which is definitely not what I wanted to work with, so after several hours of merging/reading/writing/thinking I compiled 2.4.26 kernel for our ppc405 based platform from pure 2.4.26, without huge Montavista's patches. It works, which is very nice. Later today I will merge this project with the latest 2.4 tree.
The main purpose of this steps is to port 2.6 kernel to this platform. Size of the resulted patch is about 1.5 Mbytes.

:: Link / Comments (0)


Wed, 17 Aug 2005

Acrypto and asynchronous IPsec.


Something major happened between 2.6.12-rc2 and current kernel in XFRM processing engine. My old proof-of-concept patch can only send 5 packets now, but with old kernel it's asynchronous performance was almost the same as synchronous stack. So I need to investigate what changes in XFRM stack can cause such behaviour.

Test shows that it is ICMP which has such a strange behaviour - TCP ssh over asynchronous IPsec works perfectly without any stalls.

Ok, ICMP problem found - raw_sendmsg()->ip_append_data()->sock_alloc_send_skb()-> sock_alloc_send_pskb()->atomic_read(&sk->sk_wmem_alloc), this means that sk_wmem_alloc is never decreased enough to free space in socket queue, it is decremented in sock_wfree() which is called from kfree_skb(), so it looks like raw skbs do not pass the same way TCP/UDP skbs are processed...

Ok, this problem has been fixed and I released new patch. It's md5sum is 506ad2ad7148199c29e25d8b0afe0c66.

:: Link / Comments (0)


Tue, 16 Aug 2005

Acrypto hacking.


First half of the day was spent in PPC cross-platform compilation. Due to simple typo in the very first config file I was needed to recompile glibc with different compilers, faking it's configure to allow part of it to be compiled using x86_64 gcc-4.0.1, other as compiled using x86_32 gcc-3.2... Crap, but finally it was finished, many thanks to Dan Kegel for his excellent cross-development toolchain.

Second half of the day was spent in acrypto hacking - I replaced yesterday's diff with new one, but it still has some issues. It probably broken more deeply, than my local version, so do not use it for now, I will investigate it further. Local version has one big problem - it only sends 5 ICMP packets, and then dst->otput() is not even called. Tomorrow I will setup my very old proof-of-concept patch with asynchronous IPsec processing and verify that it still works, if it does not, then something major happened between 2.6.12-rc2 and 2.6.13-rc6 in core network stack, it's about 4 months, and I will investigate current XFRM deeply. If old patch still works, and I think it will, then something small and stupid sneaked into my code.

:: Link / Comments (0)


Mon, 15 Aug 2005

Acrypto and IPsec.