|
|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Mon, 31 Oct 2005
Halloween.
Be aware.
:: Link / Comments (0)
Receiving zero-copy hacking.
Ok, I found why TCP stalled and no ACKs were sent -
system always failed to slow path and then dropped
packets due to invalid checksum, the root of this
problem is TCP options, precisely TCP timestamp,
which were not copied into header part and thus
TCP stack never had a valid checksum. After 8139too.c
driver update, which now checks for TCP/IP options
and, it looks like TCP stream can be established
in a right way.
I congratulate Mephody aka Alexander Boykov with his birthday and
really wait when he comes to Russia in two weeks.
That will be very drinkfun time, so main
things must be done before that black hole period.
:: Link / Comments (0)
Sun, 30 Oct 2005
Weekend rest.
Something should be done, so I've started
to search New Year presents.
Mechanism is turned on, so I think
next week I will have something interesting.
Hacked a little this html page - now it does not have
annoying horizontal scrolling and it can be accessed
directly from main page.
:: Link / Comments (0)
Fri, 28 Oct 2005
Receiving zero-copy.
As you probably know, my implementation already works
with non-page-aligned data and the only problem is
generic stack processing itself - TCP ACKs, socket accounting
and so on. It has been implemented using fragmentation array
in skb's shared area, i.e. header is placed into skb->data,
and all real data goes directly into VFS cache, which page pointers
are stored in skb_shinfo(skb)->frags.
Unfortunately it looks like either fragmented input skbs are not
allowed, at least skb_put() may not be called with nonlinear skb,
or, which is more likely, header data and appropriate skb fields
are incorrectly setup, so system crashes somewhere in netif_receive_skb().
Ok, I've shed some lights on this - putting netif_receive_skb()
into workqueue allowed to scroll console there and see EIP, which
was set to netif_receive_skb(), but not into it's internals,
and error was about wrong pointer dereference at 0x106 address.
Looking more precisely into skb setup in original 8139 interrupt path,
I found that ->dev field was not setup correctly, it was not setup at all,
so this fixed that bug. I can see ACKs from zero-copy capable hosts,
but after some period of time they stop and conenction stalls.
I've released the new version of receiving zero-copy concept for the interested reader,
patch
is awailable in archive.
:: Link / Comments (0)
Thu, 27 Oct 2005
Timer interrupts and signal delivery on PPC.
I've found very strange thing - SIGALRM handler
can not safely call sleep() although man signal says,
that sleep() is a signal safe function.
Also couple of other safe functions actually cause
SIGALRM handler to deadlock.
This happens only on our PPC405GPr boards, and is 100%
reproducable. So I spent the whole day trying to move
timer's state machine in userspace out of signal handler.
It looks like it is rock stable now.
Met with Abr and his family today - I've seen his son Anton
first time today - nice small man with very smart face
and naturalist and very demanding temper -
he tried to eat everything he could get and definitely
tried to say us that he does not like our attempts to
prevent this. Unfortunately 6 month old man can not
express himself in other way than screaming and laughing,
but parents already can understand many of his signals.
I spent very nice evening with them.
:: Link / Comments (0)
Wed, 26 Oct 2005
First snow in Moscow... Winter has came.
Ok, receiving zero-copy can handle non-page-aligned data now,
but it has opened new problem - since when using zero-copy,
systems grabs data frames from the stack, and thus receiving side
can not acknowledge received data and TCP stucks.
This is how it looks from sending point of view:
12:46:25.439777 IP 192.168.0.48.12345 > 192.168.4.78.48192: . 4213170362:4213171810(1448) ack 1304099580 win 1448
12:47:18.690968 IP 192.168.0.48.12345 > 192.168.4.78.48192: . 0:1448(1448) ack 1 win 1448
12:49:05.193624 IP 192.168.0.48.12345 > 192.168.4.78.48192: . 0:1448(1448) ack 1 win 1448
12:51:05.201123 IP 192.168.0.48.12345 > 192.168.4.78.48192: . 0:1448(1448) ack 1 win 1448
I have some idea about how to fix this:
skb could be allocated with header in skb->data
and all even non-page-aligned data in fragmentation list,
which will be then passed into the stack, such skb may not be removed
using original destructor and can not be mangled in stuff
like netfilter or IPsec, anyway noone should use IPsec and netfilter
in fast zero-copy path. Destructor for such skb should
call kfree_skb_zerocopy() instead of
blindly release fragmentation list. This requires some thinking...
:: Link / Comments (0)
Tue, 25 Oct 2005
Old friends meeting.
Today Fedor, Irin, Yuliana, Pavel, Alexander and me met Abr,
which returned from England to his family and they are
going to move to London in a couple of days. It was very
good time to see them again.
:: Link / Comments (0)
Mon, 24 Oct 2005
Third zero-copy patchset.
I've put new receiving zero-copy patch in
archive.
It works - first receiving zero-copy implementation for Linux kernel.
Although I will not publish any results now, since test is only preliminary,
but dataflow already can be established between zero-copy receiving sendfile
and original server.
Setup with small MTU is the main problem now - I do want to have zero-copy receiving path
for standard 1500 MTU, and I very doubt my high-end Realtek 8139 chip supports
jumbo frames of 4k. Currently it is not solved, and dataflow is stopped in a
page boundary.
:: Link / Comments (0)
Sat, 22 Oct 2005
Day of new things.
I've bought myself tons of new acquisition.
Then me and Grange
celebrated it in 5 oborotov,
which was not changed over the years and still very good or even
the best beer restaurant.
While getting there, we found, that "Mayakovskaya" subway station
changed greatly - it looks like it was completely redigged,
it has modern interior, long semi-mirror corridors, transparent
policy rooms and great mosaic ceiling with very interesting drawing.
I do think only in Russia you will find such lovely designed subway stations.
:: Link / Comments (0)
Thu, 20 Oct 2005
Preliminary receiving zero-copy support for Linux kernel 2.6 network stack.
I've put first release of receiving zero-copy mechanism into
archive.
This patch was not tested yet, since I have no high-end Realtek 8139 network card
with it's outstanding MMIO interface, I will try it later today at home.
For interested reader - here
one can find a photo of my workplace with small comments (1.2 Mb). Enjoy.
Ok, I've updated patch, it only includes several cleanups in 8139too.c driver.
System without zero-copy capable sockets works without any problems.
Fortunately I have two such cards and one e100 Intel network adapter in my
home router, so I can easily remove one and continue testing at work.
:: Link / Comments (0)
Wed, 19 Oct 2005
Some stars moved in special sign,
so tomorrow promises to be a very interesting day.
Real decision has been made, so let's put it into practice.
:: Link / Comments (0)
Receiving zero-copy.
Hacked receiving zero-copy mechanism at work a little,
it is almost ready for tests, so hopefully tomorrow I
will crash my system first time using it.
Design is quite simple - in sendfile() system call
I grab several pages from VFS cache and provide them
to the appropriate socket, which is determined
using copied headers in driver's interrupt handler
when it is going to allocate new skb and DMA/copy
data into it.
Climbed a little today - several traverses and starts,
and I've finished my favorite trace today with only
four fails.
:: Link / Comments (0)
Tue, 18 Oct 2005
HIFN driver is broken.
Either my hardware or my driver (which is more likely)
is broken, since just after interrupts are enabled, my
test system freezes due to interrupt storm from HIFN card.
As far as I remember, it did not work this way when I wrote
this driver, but nevertheless this driver is based on source
that I never tested, so I decided to rewrite HIFN driver.
ok, after some debug work I found, that it was not a good idea
to request IRQ before setting DMA up, so after moving some functions up and
down, it can successfully run FIPS test, but after IRQ is requested, first
interrupt hires (why?) and system freezes at the
very end of request_irq(), I suspect in local_irq_restore().
This driver must be rewritten since it not only looks like a deadman,
but becomes to smell badly...
:: Link / Comments (0)
Mon, 17 Oct 2005
I'm back from vacations, and go working.
Groovy, I've fixed (finally) acrypto and input IPsec issues.
New patch
is available in archive.
New
version of Acrypto has been released. Tarball is available in
archive.
It includes locking cleanups and fixes, atomic bitops usage and general cleanups.
It also has simple load balancer embedded into acrypto module, so acrypto system
can work right after module insert not waiting for load balancer module, although
one still can insert own load balancer module.
2.6.15 network tree has been opened. David Miller goes to vacation for two weeks,
and in this time Arnaldo Carvalho de Melo will be a network maintainer.
Probably I will have something to show in 2.6.15 timeframe from zero-copy receiving side.
Good climbing day - I finally finished the hardest trace I ever climbed,
it is quite complex and is on the negative slope. Although I several times
failed down I've done it, and it was definitely very good climbing.
:: Link / Comments (0)
Fri, 14 Oct 2005
OpenBSD has a birthday today - Saturday 16:36 MST, 1995!
My congratulations - 10 years!
Updated acrypto-ipsec patch: removed some debug, added missing diff
for xfrm.h header.
Hmm... I've broken something with today's git pulling/commiting, so
acrypto+ESP just freezes machine under the load. I will investigate this
issue today.
:: Link / Comments (0)
Thu, 13 Oct 2005
Input ESP4 IPsec processing engine has been ported to acrypto!
I'm cool. XFRM not. Porting was ugly.
Patch
can be found in archive.
It has been also sent to linux-crypto@ and netdev@ for interested reader.
Acrypto now supports full ESP4 input/output processing crypto operations.
Ok, for the first zero-copy release I decided to drop SKB if zero-copy capable
socket can not provide a page, if such approach will not work or will work
worse than without it, I will think about queueing.
I've updated w1 driver - synced with in-kernel driver, various small fixes and whitespace
cleanup. New version is available in archive.
:: Link / Comments (0)
Wed, 12 Oct 2005
DARPA Grand challenge results.
A Stanford-designed robotic car has driven away with the $2M prize
in the second DARPA Challenge, a 175-mile race for autonomous vehicles
held this weekend in the Mojava desert south of Las Vegas.
Four of 23 vehicles completed the course.
The car "Stanley" is based on a VW Taureg SUV,
with seven Pentium M-powered computers mounted in the trunk in a
fault-tolerant configuration.
Cleaned boring stuff at work, so next two days will be devoted completely
to acrypto and zero-copy receiving. Hope I will have something to show at
the end of the week.
Climbed a little today, but it was hard training - I found a new trace,
which is not too complex, but it's most hard part is that it has permanent
negative slope - so hands are tired a lot, and when it hurts I feel myself alive.
And it makes me feel good.
:: Link / Comments (0)
Mon, 10 Oct 2005
Acrypto hacking.
Bug with mangled TCP content when doing
input ESP4 IPsec processing in acrypto has been narrowed down
to be in scatterlist crypto setup, since acrypto itself
produces right data, which has been verified using consumer.c
test module from acrypto package.
Ok, problem has been found - IV was not set up correctly.
Now input ESP IPsec processing works with acrypto, although
there are some bugs in it - XFRM code definitely does not
allow asynchronous processing, since xfrm_state can be changed.
The only solution I see to audit every xfrm_state usage and
change the whole things to not flush it's data until some reference
counter is dropped...
Megapixel+
project is delayed for one week - there are always boring thing which
can be assigned to be done before.
Zero-copy receiving strikes several problems:
since SKB can be allocated in hard IRQ context,
and grabbing a page from VFS can take too long and must
happen in process context, there is yet unresolved race there:
if we want to store received data, then we should queue it like
before, and then later when page is grabbed and ready for writing,
somehow find that data from skb must be copied into it, and this page
should not go into allocation routing.
Qeueing mechanism also creates a problem if network driver will ask
for skb each time just before page is grabbed in some process context,
so original skb will be allocated with kmalloc()'ed data area,
it will be queued, and this will happen again and again without possibility
to actually turn zero-copy on. Although this problem can be solved easily -
just check if there is data pending in socket queue, and grab a page
with advanced file pointer.
All this problems dissapear if we just decide to drop a packet,
if there is no free page allocated in advance in process context,
but I do not want to get this approach into account yet.
My sister has a birthday today, wow! Marina, I congratulate you and wish the best of everything!
Although I hope you do not read this flow of madness.
:: Link / Comments (0)
Sun, 09 Oct 2005
Zero-copy receiving.
Drawing various zero-copy fast pathes in my head.
The main design goal is clear - fetch some headers
from network card, provide it to the list of registered
zerocopy handlers, one of which can decide that
this packet belongs to him, so it could call it's private
allocator and return data.
For example, let's have a TCP socket, marked as receive zerocopy capable in
receiving sock_sendfile().
Using IP and TCP headers we can find corresponding socket
and decide if it is zerocopy capcable or not.
If yes, then handler will grab a page from VFS cache
and return it's address as skb->data.
Things are clear on a paper, but there are some trouble
in a real life.
First, I have no card which is capable to get only some header
from it's RX ring. The problem has been solved easily - 8139too
copies packet data using MMIO (that is why it is so slow), so
I will just copy packet headers first, provide them to zerocopy
capable SKB allocation function and then copy the rest of the packet
into skb->data, which will be either grabbed VFS page or usual
kmalloc()'ed area.
Second, TCP socket lookup can only run from BH/process context,
but not from hard IRQ context, where skb allocation can happen.
I did not investigate this problem deeply and for tests just
leave it as is.
So, after several caps of tea, beer and some food I've written couple of bytes
of code, which can be used as a base for further receiving zerocopy
development. Handler's interface and TCP socket lookup have been finished,
receiving sock_sendfile() I will take from my
previous
patches.
Things move not bad.
:: Link / Comments (0)
Sat, 08 Oct 2005
Acrypto hacking.
Hacked ESP input processing a litte from home - remote
machine frozen and no clues why asynchronous processing
with the same call chain produces wrong data.
The worst situation will be if it is acrypto core
itself, but it was tested many times under heavy load
and data was never corrupted...
I need to investigate it, but will be able to do it
only after weekend. I'm sure it is something simple
and stupid I forgot.
:: Link / Comments (0)
Fri, 07 Oct 2005
New project codename "Megapixel+" started.
The main aim of this project is to create standalone
implementation of simple network stack on top of
unknown board/system using Intel 82541PI adapter.
Of course I will not write whole TCP implementaion,
but will only create UDP/UDP-Lite stack without
any socket-like interfaces. System does not have
userspace, so driver will be simple -
it only must read some memory and send it over 82541PI
adapter, it also will receive some control messages.
The highest transmit performance of system is a must -
upto 1Gbps using large frames from memory. Unfortunately
system will not be able to get data directly into send
buffers, so there will be at least one copy.
First milestone is to create simple driver for Linux,
which will send/receive UDP frames from/to shared memory
without any usage of existing network stack.
It must be as small as possible, so existing e1000 driver
will be truncated significantly.
New non-payed project - real zero-copy network receive path.
The main idea is to mark socket as being a pipe into a file,
using sock_sendfile() or sock_recvfile(),
so when network driver allocates new skb for received data,
this skb will be allocated using file's cached pages if
it's data should go into specified socket, which is determined
from packet header.
But this will be started after I finish porting
ESP4 IPsec input processing path to acrypto, which
should happen very soon.
Excellent climbing after excellent day - what could be better?
Not many traces, new one was not finished, but nevertheless,
it was really cool.
Gee, Grange's GPIO framework
has been committed into NetBSD CVS tree. My congratulations!
:: Link / Comments (0)
Thu, 06 Oct 2005
XFRM hacking.
Updated XFRM engine to support callback mechanism -
it works very good with synchronous ESP crypto processing,
but using acrypto, TCP header is mangled, but IP header is always
fine. Magic.
Playing kwith wavelets a little - create simple program
to decompose and reconstruct black-white images - it is a first
step into wavelet interpolation.
Results are contradicted - image became absolutely different
and nonstandard transformation produced negative factors, which
never existed in original basis.
:: Link / Comments (0)
Wed, 05 Oct 2005
Acrypto hacking.
Input IPsec processing ported to acrypto hangs much rarely now,
but still can freeze machine completely.
And packet is not delivered to higher layer yet,
but it is not a big problem. The main issue is a crash,
which happens in absolutely unpredictible time...
Added I2C RTC support for D16 board - it uses DS1338 chip,
which was copied from lm_sensors tree.
It was day of relaxing climbing - only half of one new
trace on the negative slope, that trace really deserves
it's category, since I definitely failed there on-sight.
:: Link / Comments (0)
Tue, 04 Oct 2005
Acrypto hacking.
I've moved forward in porting input IPsec processing mode
to acrypto - systems already hangs, which is not always a bad sign!
Easy way, when I decided, that skb can be asynchronously processed without
being cloned, as expected, was wrong, so I will experiment with more complex
approach of SKB usage tomorrow.
Hacked PPC board a little - I finally found a way to log all output to
stderr and stdout, i.e. console output, into my own tty driver, which
moves that data in the same way all kernel messages go through console driver.
Debugging become much easier.
:: Link / Comments (0)
Mon, 03 Oct 2005
Weekend status.
1. Without good company half of liter of tequila flows not very tasty.
2. I found, that I do not eat one day after big drinking, and next day
is spent almost only eating.
3. Good drinking refreshes brain very good. Only good climbing can be compared
in this case.
4. It was not bad.
XFRM hacks.
I've changed input XFRM engine to support recursive callbacks,
so with right shared structure it can be used with acrypto.
The main problem is to provide it somehow into ESP/AH input methods,
probably I will add some pointer into XFRM state structure...
Patch has been sent into netdev@ and linux-crypto@ mail lists for review,
today or tomorrow I plan to implement acrypto support for input
IPsec crypto processing path.
Herbert Xu, current linux kernel crypto maintainer, announced
support for several crypto algorithms in current CryptoAPI stack.
It is a first step of asynchronous crypto processing support in
this stack. Like acrypto and OCF it now supports priority
of algorithm implementations.
XFRM Patch
can also be found in archive.
Climbing was very good today - I tried two new complex traces,
one has been finished, but I completely fail to finish second one -
just do not know, how to move there. The bad thing is that I did not
even see how other people did it, maybe noone even tried it except author,
and it as quite long time ago, so I completely do not remember it, except
that it was hard trace.
:: Link / Comments (0)
|