|
|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Wed, 28 Dec 2005
Acrypto IPsec hacking.
Something was changed from 2.6.14 days in XFRM engine, so it does not work now.
Bug was found in esp4 output callback, where parent skb was not updated with the
right auth length, so remote tcpdump showed something like this:
IP truncated-ip - 12 bytes missing! 192.168.4.78 > 192.168.4.79: ESP(spi=0x020bc674,seq=0x1)
Patch for XFRM/IPsec/ESP4 engine acrypto port has been put into
archive.
I have a very interesting idea about Linux network stack, so I start digging
about high-performance networking event-driven interfaces. There are not so many solutions:
- epoll or /dev/*poll
- RT Posix signals
The latter seems to have smaller latencies, but have some problems when number
of interests becomes large, since there are no event batching per each signal which
leads to signal queue overflow,
while the former (/dev/poll implementation by Niels Provos and Charles Lever) is slightly slower
according to
"Scalable Network I/O in Linux" paper.
Epoll provides
similar to /dev/poll functionality and is part of the 2.6 kernel.
For statistics I've run simple benchmark to determine overhead of system call in Linux.
It is about 0.2-0.25 usecs per syscall on my Xeon(2.4Ghz) and AMD64 3500+ (2200Ghz) running 2.6.15 linux kernel.
:: Link / Comments (0)
Tue, 27 Dec 2005
I congratulate you with Catholic Christmas!
:: Link / Comments (0)
Mon, 26 Dec 2005
I've implemented priority queues for acrypto.
New version is available, but there is a nitpick in realisation.
I allocate and link new priority queue into the list of queues
first time new crypto session with such priority is allocated, and
never free it until device is removed. Such caching is done for performance
reason, but it has a disadvantage: when there are no crypto sessions
with higher priority, access to the lower priority queues includes
overhead of all higher priority lists traversal and checks if that lists
are empty.
Climbed a little with Grange. unfortunetely aching fingers do not allow
to enjoy the process, so I only finished couple of easy traces without
much fun. I think I will not climb until middle of January...
In theory, I would like to visit tooth breaker and bone straightener,
but probability of such events is very small...
:: Link / Comments (0)
Sat, 24 Dec 2005
Process...
I think the main issue of sushi eating is the Process itself.
It is quite tasty things, but each time I get this rolls and sushi
with figs and vassabi and of course chopsticks,
my first thought is how different it is from what I used to.
And I really enjoy the process, although I do not know
much about it.
Hacked a little acrypto priority queues, but have freezed remote test
machine, so it will stay untouched until Monday. All design goals are already
implemented, bug lives probably in list manipulations, so it will be
easy to remove when I will have access to serial console.
:: Link / Comments (0)
Fri, 23 Dec 2005
Acrypto hacking day.
Added direct completion mode.
If session's callback can be invoked in any context
or if crypto provider can call complete_session() from
process context, one can set SESSION_DIRECT in session flags
which will lead to callback invocation directly from complete_session(),
but not from workqueue.
This increases performance noticebly.
Ok, here are dm-crypt port changes:
- Reduced memory usage.
- Use memory pools.
- Removed several race conditions.
- Code simplification.
New acrypto package has been released. It does not have priority queues yet, but I will implement them soon.
Patches and tarball are available in archive.
:: Link / Comments (0)
Thu, 22 Dec 2005
Acrypto hacking day.
Ok, I've fixed several bugs in dm-crypt port
and run several benchmarks against pure dm-crypt.
Results are not so good as I expected, but there
couple of pleaces where it can be improved a little.
I'm almost sure that acrypto itself works quite good, but
due to several context switches from session allocation
to async_provider and from async_provider to callback invocation
performance degrade, but CPU usage is also decreased.
I run bonnie++ benchmark and found, that sequential output per-char
and block sequential input performance is about 30% less than sync dm-crypt
with the same CPU usage, but relative sequential output block is about 30% better
and relative sequential input per-char performance is about the same.
:: Link / Comments (0)
Wed, 21 Dec 2005
Appetite comes at meal-times.
While I'm in acrypto land now, I've decided to revisit some design approaches.
So I've removed crypto load balancer thread, eliminate main_crypto_dev,
removed unused session states, simplified locking and reference counting.
Today or tomorrow I plan to test new acrypto with dm-crypt and IPsec, while
currently it runs under heavy load from "consumer" test module. Rate is about
12-13K sessions per second on 2.4 Ghz Xeon (1+HT),
each session requires 4k AES-128 CBC data encryption, which is upto 420Mbit/sec.
Only SW async_provider is loaded.
After successfull dm-crypt and IPsec tests I will start priority queue implementation,
with current changes it will not take too much time, so release agenda
is still valid.
Climbed a lot with Grange.
After month of laziness it was especially hard, but I finish couple of old
traces, one of them is quite complex, and tried on-sight new one - completely
passive, which was a real trouble for my aching fingers, so I've finished it
with some problems. It was a really good time today.
:: Link / Comments (0)
Tue, 20 Dec 2005
Acrypto hacking.
I've done several updates for new acrypto release.
It includes per-device crypto_route memory pool, various kmap* transformation.
Also started speedup work which aims in crypto_lb thread removal.
This thread is used to serialize session callback calling and session removal,
but this can be easily done directly from complete_session() call.
It still has some slab corruption bug which must be resolved before release.
Ok, it looks like I've found where the bug lives. I think about it's
resolution while going home and have created a very nice idea about
high-performance priority queues in acrypto. What it currently has is a total
crap actually, so I will create priority queues with O(1) session selection
and doing the same will fix current slab corruption problem. It should also
increase performance a little.
If things will go like I want, I plan to release new acrypto version at the end of this
week or just after weekend. I do hope I will find time to test IPsec and dm-crypt
from 2.6.15 git tree with new acrypto before New Year's holidays. I will be
in Saint-Petersburg until Jan 2.
:: Link / Comments (0)
Mon, 19 Dec 2005
Receiving zero-copy support has been finished.
I've fixed bug with inode truncation, so zero-copy subsystem writes
exactly number of bytes received into the file now.
Truncation failed when socket was not connected and one tried to close it,
since system received data (RST packet) which were put into skb which was
trimmed then and fragment page's reference counter was decremented,
so later vmtruncate() failed to free that page.
Patch is available in archive.
Climbed a little today - everything still in pain especially fingers,
so I only did several traverses. Strange thing I found - if I can finish
trace on-sight, I can not repeat it after some time.
:: Link / Comments (0)
Sat, 17 Dec 2005
Fighting with receiving zero-copy.
It does not write extended pages to the file now,
so size is always equal to number of bytes transferred,
but there is one problem. If connection was not established
and zero-copy socket is closed through either process exit or close()
system call, preallocated pages can not be safely truncated
to zero size. It looks like due to several preallocated pages
the first one has smaller reference counter and thus
page unlocking in vmtruncate() fails.
Finally climbed today!
I'm definitely not in my best form, since I did not climb
about a month, so today's training is just a start without
major results except pain in the whole body. But its cool.
:: Link / Comments (0)
Fri, 16 Dec 2005
I've run receiving zero-copy benchmark.
Here is a CPU usage graph [1gb transfer using 8139 adapter]:

:: Link / Comments (0)
Thu, 15 Dec 2005
Receiving zero-copy support. TCP support.
It has been found, that there are no data loss, but page reordering happens.
Groovy, I've found a root of the problem: sequence numbers are set consistently
with each other, i.e. sequence number difference between two pages is always the same,
while page's index, which in VFS is transferred into index inside the file mapping,
was allocated in sequential manner, so when different pages were committed and
then grabbed from VFS it's sequnce number difference remains the same, but page's index
difference becomes different. Now this bug is closed and tests do not show any data corruption.
While testing receiving zero-copy I found, that my journaling/ext3 hack is not
so innocent: sock_sendfile() freezes somewhere inside ->commit_write(),
aka ext3_ordered_commit_write(),
after 400Mb transfer, so I just remount test partition in read-only mode
and rerun test which has been finished without any problems.
I've updated zero-copy patch
and sent patch to netdev@.
:: Link / Comments (0)
Wed, 14 Dec 2005
Ok, it looks like only one problem remains - data integrity.
All journaling changes are just to adjust h_buffer_credits
when starting a journal with existing handle, which happens when
->prepare_write() is called several times before ->commit_write().
ext3 changes are simple too: commented check for current journaling handler
in ext3_write_inode(), which is related to above changes.
With all this changes receiving zero-copy has only one problem - data corruption.
Sometimes several pages are eaten from the flow, and it looks like a race between
running in process context commit code and running in interrupt data copying.
:: Link / Comments (0)
Tue, 13 Dec 2005
Crap
It looks like receiving zero-copy will put it's dirty hands into
the Holy Grail of Linux FS - journaling code aka JBD.
As far as I can see, one may not call ->prepare_write() several times
before calling ->commit_write(), since journaling transaction, at least for ext3,
is setup only for the first requested number of blocks, i.e. when zero-copy code
first time grabs a page from VFS and calls ->prepare_write() for that
page, journaling code allocates it's structures and reserves number of blocks
according to inode state, if zero-copy grabs next page and calls ->prepare_page()
for that page again, no new blocks are reserved and only journal's reference counter
is incremented. When ext3 then tries to do actual page preparation, it fails
since journal_dirty_metadata() which only counted bocks for the first page.
This happens every time I run zero-copy test without any logs, but did not happen,
when there were tons of messages written to serial console, so this process
seems to race with journal committing code.
At least this is how I can describe permanent assertion failure at
fs/jbd/transaction.c:1114: "handle->h_buffer_credits > 0" when running
without slow debug output.
Thinking in expansion way I started to hack in ext3 code, i.e. I do not understand
why things happen in this or that way, but try to change some simple stuff based
on assumption about above journaling problem. This is very wrong way, but I want
to test networking stuff first and then switch to the different problem,
so backing storage can wait.
:: Link / Comments (0)
Mon, 12 Dec 2005
I congratulate you with The Day of Constitution of Russian Federation.
:: Link / Comments (0)
Receiving zero-copy support.
I've finally fixed TCP sequence number check,
so there are following issues to consider before pushing this stuff up:
-
Data integrity check.
-
Moving commiting code into workqueue, since waiting queue seems to be quite slow.
Fixed data corruption issue, removed all debug and found, that it crashes
in kmap_atomic() when highmem is turned on. Tried to merge zero-copy changes with
the latest 2.6 git tree and completely broke my setup - it crashes on first
received packet when tries to free skb and somehow decides to free through zero-copy
path, which is obviously wrong.
:: Link / Comments (0)
Sat, 10 Dec 2005
Some kind of this year hacking agenda:
-
Finish receiving zero-copy implementation. Only TCP sequence numbers check is not done yet.
-
Check acrypto and IPsec/dm-crypt under _very_ high load, as far as I rememeber there were some
problems.
-
Start HIFN driver rewrite. Actually it only requires initialisation and ISR processing fixes.
:: Link / Comments (0)
I've updated OSF iptables module.
Update includes README file with bits of documentation of passive OS fingerprinting iptables
module usage. New version is available in archive.
I've also sent bunch of small OSF cleanups for 2.4 kernel version, documentation update
and resent OSF netlink related changes for 2.6 kernel version to netfilter-devel@.
All this stuff has been already added into previous OSF release.
:: Link / Comments (0)
Wed, 07 Dec 2005
W1 project has been updated.
Now it has masters/slaves drivers split, new DS2482 I2C <-> W1 bridge and nice set of
documentation. Gret thanks to Ben Gardner .
One can find updated version either in archive,
or, hopefully, in the next -mm release. Eventually this will be pushed upstrem in 2.6.16 timeframe.
:: Link / Comments (0)
Tue, 06 Dec 2005
Receiving zero-copy hacking.
Found new problem in zero-copy TCP "implementation" -
when all pages are filled, but not committed, remote side
starts resending packets and waits ACKs for them, but
zero-copy drops them, since it knows, that the same packets are in
flight and will be committed and ACKed soon, so
receiving TCP state machine does not even knows that there were
some unACKed data. I will try to accept that retransmits, but without
actual data copying.
:: Link / Comments (0)
Mon, 05 Dec 2005
"Here I am in Rio de Janeiro, idiot's dream has come true" as said Ostap Bender in "Golden calf" by Ilf and Petrov.
I've finished quite big part of one "project" which required a lot of money, debts, time and problems.
It is not completely done, and will not probably next three years, but major part is already behind me.
And what? Nothing: no emotions, no fun, no expectation, no excitation, nothing what I thought about several years before.
Is it a bright future I wanted? What next?
:: Link / Comments (0)
Fri, 02 Dec 2005
Groovy, I've found New Year present!
So I will have a bit of drink today, and may be tomorrow...
Presents are rox!
:: Link / Comments (0)
Wed, 30 Nov 2005
Receiving zero-copy hacking.
TCP is still missed - if I preallocate 32 pages from VFS,
100 Mbit network overflows them faster than userspace application,
which calls sendfile(), commits pages and grabs new, so it is possible
to move part of the packet into the last page, but first one is still
not commited, so data will be dropped, but sending part will retransmit
the whole TCP packet. There are at least two solutions:
1. Do not permit packets which data size is not equal to page size.
2. Preallocate more pages so no overruns can happen.
In case of uncommitted page, we very likely caught following problem:
part of the packet has been written into the previous page, but next page
contains old data which is not committed to VFS, and we can not overwrite them.
In this case we must fallback all writes to the previous pages, so we start
from the begining, select one by one the same pages as were selected for writing,
and decreases it's zp->used counter, so page starts looking like it was before.
:: Link / Comments (0)
Mon, 28 Nov 2005
Grange reminds, that William Blake was born today.
Some are born to sweet delight,
Some are born to endless night.
:: Link / Comments (0)
Sun, 27 Nov 2005
Ok, receiving zero-copy support has come to it's finish line.
I think so...
Today I've fixed bunch of various bugs related to TCP sequence number check.
Now it looks like being almost finished, but I've crashed remote system, so even serial console
does not work. Hopefully tomorrow I will push the last changes in and send updated
patch for review to netdev@.
Receiving zero-copy implementation with TCP sequence number checks support can be found
in this patch.
It is not well tested yet.
:: Link / Comments (0)
Mda... Sometimes moving forward is a result of kick from behind.
:: Link / Comments (0)
Sat, 26 Nov 2005
Kernel hacking config option.
Turning most of it's menu on drops networking performance by half.
Normal networking performance is not affected by receiving zero-copy patch,
as I found today in my tests. It of course can not be 100% right, since
receiving zero-copy adds several protocol checks and hash calculations,
and using 100Mbit adapter it is not possible to destinguish those overhead
from noise, but fact is that on 100Mbit speeds on 2.4 Ghz 2-way (1 + HT) Xeon
performance is the same as without zero-copy patch.
Changed locking schema a little - now it uses one per-socket read lock being held for reading
and one per-page spinlock in the fast path.
Fixed nasty bug in my 8169too.c drivers hack, which lead to undescribable bugs and oopses
in highmem code...
Current TCP sequence number check is broken, but I really do not want to check all those
TCP RFCs to find why seq/ack pair is not updated. I do think solution is very simple, just need
to analyze couple of tcpdump files...
Ok, I found where the problem is: frame itself contains right data with updated seq and ack TCP fields,
but zero-copy stack somehow obtains header with old values, so it drops the whole packet
based on the fact, that the same packet already was processed.
:: Link / Comments (0)
Fri, 25 Nov 2005
Fine grained receiving zero-copy locking must be redesigned.
I found that Linux journaling code is not protected against interrupts and even
soft interrupts, so it is just not permitted to try to commit a page
from BH context, when socket is closed...
Fortunately it is quite easy task - zero-copy socket will continue to have fine grained
locking and reference counter, but VFS page finilazing, which is atomic with
respect to itself, will be called only from the very end of sendfile() syscall
or from sk_free().
Hmm, found interesting usage for receiving zero-copy.
It can be used to DMA data not only into VFS cache, but in theory,
into any area - userspace pages, different network adapter's DMA ring
or different physical memory, for example into buffer in the IB or iSCSI.
:: Link / Comments (0)
Thu, 24 Nov 2005
Cleaning receiving zero-copy patch up.
Removed debug, committed things into local git tree,
changed sendfile()'s wakeup logic - to speedup ack processing,
kernel_recvmsg() should be called not only when the whole page
is ready but as fasst as possible to eliminate retransmits,
so I added flags field into zsock structure and reserved a bit there
to show if at least something was copied from the net. While doing
this optimisatio remotely, machine caught a bug...
Tomorrow I plan to finish TCP sequence numbers check and try to setup
PPC32 test machine to check receiving zero-copy on different arch.
:: Link / Comments (0)
Wed, 23 Nov 2005
My laziness should start worring me.
I've released new zero-copy patch
and sent it to netdev@ for design review.
:: Link / Comments (0)
Tue, 22 Nov 2005
I've started the greatest project ever!
I plan to add at least some bits of documentation into all my projects,
so they could be somehow more user-friendly.
CARP is the first one.
I've added simple README which describes what it is and how it operates.
It does not have usage examples though, so administrator will setup it
using some Yoda's advices, although it is quite easy after running carpctl -h.
:: Link / Comments (0)
Mon, 21 Nov 2005
Zero-copy abstraction layer.
I've almost finish simple layer which will be
used for allocation methods lookup and dataflow
control. Since there will be much less zero-copy sockets
than usual ones, I decided to use simplified hashing technique.
:: Link / Comments (0)
Sat, 19 Nov 2005
Thinking about new zero-copy abstraction...
I like this idea more and more - it could allow
to have one zsocket for data and if zero-copy fails,
data could be queued into original socket and
then later, when new pages are grabbed from VFS cache,
copy it from original socket according to it's sequence
numbers mapping into the VFS pages.
Today Grange took part in
a music jam. According to photos
from theirs phorum, jam was really fun.
I listened Grange's punk-rock
in the past when he was a leader, vocal and guitarist in his band
"Facultet".
I still rememeber theirs the last concert in cellophane raincoats :)
If you like something like RHCP with light punk bias or something similar,
I could recomend to listen all "Facultet" songs,
you can find them in mp3 format on band's homepage.
:: Link / Comments (0)
Fri, 18 Nov 2005
Zero-copy hackfest.
Not much progress - there is a problem with hashing/unhashing,
socket releasing and interrupts where socket is used. I do think
that searching for the whole stack and change each assumption
that packets can be processed only in BH context is a very badly
broken idea, which will hurt performance and born tons of bugs.
Interesting idea could be to hold zero-copy socket until sendfile()
is interrupted and check in interrupts not the real socket, but
only some reference to it, which will be removed with interrupts disabled
from the very end of sendfile() syscall. It could be even new abstraction
which will exist only for zero-copy capable sockets, but if this
approach will be proven to be effective, it will require all those
hashing algorithms which current network stack has.
:: Link / Comments (0)
Thu, 17 Nov 2005
There are two types of systems: those that have downtime and those that will.
I've read today book of Berd Kivi "Gigabytes of power"
(in russian).
Interesting reading about total governments and corporations control over
private life and new mechanisms to achieve this - from RFID and data-mining
to masons in FBI :).
Technically competent man can avoid many of this issues,
but unfortunately several of them are just not under our control.
And as l0pht group said several years ago, the main threat is not
corporates or governments itself, but uninformed people.
:: Link / Comments (0)
Wed, 16 Nov 2005
Linux, why, why do you steal my interrupt?
2.6 kernel calls me "system with badly broken firmware"...
:: Link / Comments (0)
Tue, 15 Nov 2005
Receiving zero-copy.
I've found an interesting issue in deep socket's internals
in the Linux kernel tree - it is heavily based on assumption,
that every socket processing can only happen in bottom-half or process
contexts, so hashing/unhashing can not be protected against
hard irqs, and thus socket can not be safely used from interrupt
context.
I've added sequence numbers check mechanism into allocation path,
when each preallocated page has a TCP sequence window associated
with it and only data with appropriate sequence numbers can be
moved into that page.
:: Link / Comments (0)
Sun, 13 Nov 2005
Old friends meeting.
Mephody returned from Ireland today and will stay in Moscow about a week,
so we celebrated his past birthday and meeting itself. There were Meph and Ira,
Wijo and Sasha, Fedor and Ira, Ivan Gammel aka Unexy, Masha and me. All this crowd
somehow sit down in the small Irin and Masha's kitchen. We took a little fire water,
eat something, saw Mephody's photos of Ireland road trip from Limerik to Dublin over
south coast with many old fortresses and castles.
It was definitely a very good time.
:: Link / Comments (0)
Sat, 12 Nov 2005
Receiving zero-copy implementation.
I've put next release of receiving zero-copy
patch
in archive. The whole day was spent in searching for
very strange thing - it was possible to get and hold a socket,
but half of it's fields were filled with
red-zone 0x6b markers. This issue has been resolved
after deep comparison of Linux kernel TCP socket internals
with what I had - RCU locking and holding of socket reference were not enough,
but following TCP's __inet_lookup() usage I found
additional bh_lock_sock()/bh_unlock_sock() pair there.
I've also added some checks which prevent system from usage
of the same backend file for several zero-copy sockets
at the same time.
I've removed ability to post comments in my blog, since blog is flooded only
with spam without comments from readers.
:: Link / Comments (0)
Fri, 11 Nov 2005
Elections.
Here will be Moscow Duma elections in a month,
so various political blocks have started it's
election compaignes.
All of them are very similar: "We are cool, they are crap."
Sigh, does shitloads of others really so powerfull election campaign?
I even think that if I will ever meet a political block
which says: "Hey, on previous elections we promised to do A and B,
and we did C and D, and now we promise to do E and F.",
and I will definitely give them my vote.
Ok, it looks like all uninteresting stuff has been done, so I plan
to resurrect zero-copy hacking tomorrow and want to finally create
good-working implementation.
:: Link / Comments (0)
Wed, 09 Nov 2005
Working-working-working...
Now it looks like I've finished almost all cases,
atlhough it could be not the case, tomorrow
will show if I am wrong.
Spent half of the day searching New Year presents -
have not found particular variant, but created in a model some
tricky affair, which can in theory help me with
searching for the bright future. Interesting thing is that after
spending many hours with computer and application bugs,
brain is turned into alternative mode where new completely
different and absolutely unconcerned ideas are born
which in a common situation would
require a lot of thinking or even braindamaging
to be transformed into this simple and elegant form.
:: Link / Comments (0)
Sat, 05 Nov 2005
Hack on passive OS fingerprinting module aka OSF.
Added version check and fixed compilation for 2.6.14+ kernels, where netlink API was changed.
New release is available in archive.
:: Link / Comments (0)
Fri, 04 Nov 2005
Hacked receiving zero-copy.
Finish line is near already - there is one main problem
now - TCP retransmits,
which basically break all the idea. And even using
socket option to turn socket into receiving zero-copy mode
does not help with retransmits. I have an idea how to
remove them: the simpliest one is just to drop a packet
if it's TCP header does not match one system expect,
but I suspect it will hurt performance a lot, so I need
to implement some heuristic on top of TCP sequence numbers
to allow receiving of packets with sequence number above
currently expected. So it can be done in a following way:
1. check if sequence number is less than we expect, then just drop
packet.
2. check if sequence number is more than number of pages
we allocated, then drop packet.
3. if sequnce number is inside allocated window, then select
right page for the packet and move data there.
This will rise new problem, when some page inside the window
is commited, but later sending part will retransmit data with
the same sequence number, which corresponds to page already commited.
I see solution in having an array of sequence windows which
can be zerocopied into existing preallocated pages, i.e. just
link some sequence numbers window to the newly grabbed page,
and when the packet's header is received, check if it can be
placed somewhere in the allocated pages.
:: Link / Comments (0)
Thu, 03 Nov 2005
I've started zero-copy cleanup.
First of all, I move zero-copy initialization
away from sock_sendfile() into new socket option
to eliminate startup race, when initial TCP portion
can be dropped and will be retransmited later.
Second in agend is a check for skb usage after
sock_sendfile() is finished and socket has been marked
as not zero-copy capable. In theory nothing prevents
such usage, since zero-copied skbs and it's buffers
are not intended to be used in stuff like netfilter
and IPsec, since theirs overhead is not compatible
with high speed setups where receiving zero-copy could
be used, although it is a grey are now...
Other unpleasant issue is that Linux performs write
in two operatins, i.e. it is guaraneed to be nonatomic,
and in case of current zero-copy design, this situation is
permanent, i.e. we prepare several pages in process context,
wait until they are filled in interrupt context,
and then commit filled pages back to VFS cache. Between
prepare and commit stages one can not use selected file's
inode, and thus operation like "ls" will be locked.
:: Link / Comments (0)
Wed, 02 Nov 2005
The whole day has been devoted to my payed work.
Yep, I want to eat and thus need to do my work -
although it was not so interesting. I plan to finish
zero-copy receiving implementation this week
and present it to netdev@vger.kernel.org and
linux kernel network hackers. If things will go
very smooth it can happen even tomorrow.
Climbed a little with Grange
today after one idle week, it looks like either my finger
or wrist is broken, or, hopefully, just heavily striked,
since it ails me very much when climbing and especially
when getting passive holds.
:: Link / Comments (0)
Tue, 01 Nov 2005
Receiving zero-copy hacking.
This release allows to transfer huge amount of data
without errors (probably), but it has one very unpleasant
nitpick - if sendfile() is interrupted it's reference counter
never drops to zero, and I have not implemented fine-grained
reference counter with proper skb->descriptor() yet.
Other issue is data reordering - obviously sendfile() system
call is called far after socket is connected, so some data
is sent, but it is dropped and writing into file starts
from some arbitrary position in the dataflow. but it can be solved
easily by moving zero-copy setup into socket() system call.
Retransmits can pollute dataflow too since they potentially can
be inserted inside the dataflow not in it's position
in the original file.
Actually retransmits could end up in
grabbing the right page from VFS cache, but page grabbing is
quite long operation which sleeps, and we can not know in advance,
i.e. when grabbing new pages from VFS cache in process context,
where data, which will be received in next interrupt, must be placed.
Potential solution for this problem is page remapping, i.e.
we could remap page from skb fragment into, for example, VFS cache page,
or into userspace page, which I implemented in
zero-copy sniffer.
This is how zero-copy receiving is implemented in FreeBSD.
But it is not very elegant way, since such transformation requires
tlb flushing, which can be much more expensive than data copying
with small sizes but can be upto
5-15 times faster
using even 1500 bytes chunks.
The latest patch
(revision 6) is awailable in archive.
:: Link / Comments (0)
Mon, 31 Oct 2005
Halloween.
Be aware.
:: Link / Comments (0)
Receiving zero-copy hacking.
Ok, I found why TCP stalled and no ACKs were sent -
system always failed to slow path and then dropped
packets due to invalid checksum, the root of this
problem is TCP options, precisely TCP timestamp,
which were not copied into header part and thus
TCP stack never had a valid checksum. After 8139too.c
driver update, which now checks for TCP/IP options
and, it looks like TCP stream can be established
in a right way.
I congratulate Mephody aka Alexander Boykov with his birthday and
really wait when he comes to Russia in two weeks.
That will be very drinkfun time, so main
things must be done before that black hole period.
:: Link / Comments (0)
Sun, 30 Oct 2005
Weekend rest.
Something should be done, so I've started
to search New Year presents.
Mechanism is turned on, so I think
next week I will have something interesting.
Hacked a little this html page - now it does not have
annoying horizontal scrolling and it can be accessed
directly from main page.
:: Link / Comments (0)
Fri, 28 Oct 2005
Receiving zero-copy.
As you probably know, my implementation already works
with non-page-aligned data and the only problem is
generic stack processing itself - TCP ACKs, socket accounting
and so on. It has been implemented using fragmentation array
in skb's shared area, i.e. header is placed into skb->data,
and all real data goes directly into VFS cache, which page pointers
are stored in skb_shinfo(skb)->frags.
Unfortunately it looks like either fragmented input skbs are not
allowed, at least skb_put() may not be called with nonlinear skb,
or, which is more likely, header data and appropriate skb fields
are incorrectly setup, so system crashes somewhere in netif_receive_skb().
Ok, I've shed some lights on this - putting netif_receive_skb()
into workqueue allowed to scroll console there and see EIP, which
was set to netif_receive_skb(), but not into it's internals,
and error was about wrong pointer dereference at 0x106 address.
Looking more precisely into skb setup in original 8139 interrupt path,
I found that ->dev field was not setup correctly, it was not setup at all,
so this fixed that bug. I can see ACKs from zero-copy capable hosts,
but after some period of time they stop and conenction stalls.
I've released the new version of receiving zero-copy concept for the interested reader,
patch
is awailable in archive.
:: Link / Comments (0)
Thu, 27 Oct 2005
Timer interrupts and signal delivery on PPC.
I've found very strange thing - SIGALRM handler
can not safely call sleep() although man signal says,
that sleep() is a signal safe function.
Also couple of other safe functions actually cause
SIGALRM handler to deadlock.
This happens only on our PPC405GPr boards, and is 100%
reproducable. So I spent the whole day trying to move
timer's state machine in userspace out of signal handler.
It looks like it is rock stable now.
Met with Abr and his family today - I've seen his son Anton
first time today - nice small man with very smart face
and naturalist and very demanding temper -
he tried to eat everything he could get and definitely
tried to say us that he does not like our attempts to
prevent this. Unfortunately 6 month old man can not
express himself in other way than screaming and laughing,
but parents already can understand many of his signals.
I spent very nice evening with them.
:: Link / Comments (0)
Wed, 26 Oct 2005
First snow in Moscow... Winter has came.
Ok, receiving zero-copy can handle non-page-aligned data now,
but it has opened new problem - since when using zero-copy,
systems grabs data frames from the stack, and thus receiving side
can not acknowledge received data and TCP stucks.
This is how it looks from sending point of view:
12:46:25.439777 IP 192.168.0.48.12345 > 192.168.4.78.48192: . 4213170362:4213171810(1448) ack 1304099580 win 1448
12:47:18.690968 IP 192.168.0.48.12345 > 192.168.4.78.48192: . 0:1448(1448) ack 1 win 1448
12:49:05.193624 IP 192.168.0.48.12345 > 192.168.4.78.48192: . 0:1448(1448) ack 1 win 1448
12:51:05.201123 IP 192.168.0.48.12345 > 192.168.4.78.48192: . 0:1448(1448) ack 1 win 1448
I have some idea about how to fix this:
skb could be allocated with header in skb->data
and all even non-page-aligned data in fragmentation list,
which will be then passed into the stack, such skb may not be removed
using original destructor and can not be mangled in stuff
like netfilter or IPsec, anyway noone should use IPsec and netfilter
in fast zero-copy path. Destructor for such skb should
call kfree_skb_zerocopy() instead of
blindly release fragmentation list. This requires some thinking...
:: Link / Comments (0)
Tue, 25 Oct 2005
Old friends meeting.
Today Fedor, Irin, Yuliana, Pavel, Alexander and me met Abr,
which returned from England to his family and they are
going to move to London in a couple of days. It was very
good time to see them again.
:: Link / Comments (0)
Mon, 24 Oct 2005
Third zero-copy patchset.
I've put new receiving zero-copy patch in
archive.
It works - first receiving zero-copy implementation for Linux kernel.
Although I will not publish any results now, since test is only preliminary,
but dataflow already can be established between zero-copy receiving sendfile
and original server.
Setup with small MTU is the main problem now - I do want to have zero-copy receiving path
for standard 1500 MTU, and I very doubt my high-end Realtek 8139 chip supports
jumbo frames of 4k. Currently it is not solved, and dataflow is stopped in a
page boundary.
:: Link / Comments (0)
Sat, 22 Oct 2005
Day of new things.
I've bought myself tons of new acquisition.
Then me and Grange
celebrated it in 5 oborotov,
which was not changed over the years and still very good or even
the best beer restaurant.
While getting there, we found, that "Mayakovskaya" subway station
changed greatly - it looks like it was completely redigged,
it has modern interior, long semi-mirror corridors, transparent
policy rooms and great mosaic ceiling with very interesting drawing.
I do think only in Russia you will find such lovely designed subway stations.
:: Link / Comments (0)
Thu, 20 Oct 2005
Preliminary receiving zero-copy support for Linux kernel 2.6 network stack.
I've put first release of receiving zero-copy mechanism into
archive.
This patch was not tested yet, since I have no high-end Realtek 8139 network card
with it's outstanding MMIO interface, I will try it later today at home.
For interested reader - here
one can find a photo of my workplace with small comments (1.2 Mb). Enjoy.
Ok, I've updated patch, it only includes several cleanups in 8139too.c driver.
System without zero-copy capable sockets works without any problems.
Fortunately I have two such cards and one e100 Intel network adapter in my
home router, so I can easily remove one and continue testing at work.
:: Link / Comments (0)
Wed, 19 Oct 2005
Some stars moved in special sign,
so tomorrow promises to be a very interesting day.
Real decision has been made, so let's put it into practice.
:: Link / Comments (0)
Receiving zero-copy.
Hacked receiving zero-copy mechanism at work a little,
it is almost ready for tests, so hopefully tomorrow I
will crash my system first time using it.
Design is quite simple - in sendfile() system call
I grab several pages from VFS cache and provide them
to the appropriate socket, which is determined
using copied headers in driver's interrupt handler
when it is going to allocate new skb and DMA/copy
data into it.
Climbed a little today - several traverses and starts,
and I've finished my favorite trace today with only
four fails.
:: Link / Comments (0)
Tue, 18 Oct 2005
HIFN driver is broken.
Either my hardware or my driver (which is more likely)
is broken, since just after interrupts are enabled, my
test system freezes due to interrupt storm from HIFN card.
As far as I remember, it did not work this way when I wrote
this driver, but nevertheless this driver is based on source
that I never tested, so I decided to rewrite HIFN driver.
ok, after some debug work I found, that it was not a good idea
to request IRQ before setting DMA up, so after moving some functions up and
down, it can successfully run FIPS test, but after IRQ is requested, first
interrupt hires (why?) and system freezes at the
very end of request_irq(), I suspect in local_irq_restore().
This driver must be rewritten since it not only looks like a deadman,
but becomes to smell badly...
:: Link / Comments (0)
Mon, 17 Oct 2005
I'm back from vacations, and go working.
Groovy, I've fixed (finally) acrypto and input IPsec issues.
New patch
is available in archive.
New
version of Acrypto has been released. Tarball is available in
archive.
It includes locking cleanups and fixes, atomic bitops usage and general cleanups.
It also has simple load balancer embedded into acrypto module, so acrypto system
can work right after module insert not waiting for load balancer module, although
one still can insert own load balancer module.
2.6.15 network tree has been opened. David Miller goes to vacation for two weeks,
and in this time Arnaldo Carvalho de Melo will be a network maintainer.
Probably I will have something to show in 2.6.15 timeframe from zero-copy receiving side.
Good climbing day - I finally finished the hardest trace I ever climbed,
it is quite complex and is on the negative slope. Although I several times
failed down I've done it, and it was definitely very good climbing.
:: Link / Comments (0)
Fri, 14 Oct 2005
OpenBSD has a birthday today - Saturday 16:36 MST, 1995!
My congratulations - 10 years!
Updated acrypto-ipsec patch: removed some debug, added missing diff
for xfrm.h header.
Hmm... I've broken something with today's git pulling/commiting, so
acrypto+ESP just freezes machine under the load. I will investigate this
issue today.
:: Link / Comments (0)
Thu, 13 Oct 2005
Input ESP4 IPsec processing engine has been ported to acrypto!
I'm cool. XFRM not. Porting was ugly.
Patch
can be found in archive.
It has been also sent to linux-crypto@ and netdev@ for interested reader.
Acrypto now supports full ESP4 input/output processing crypto operations.
Ok, for the first zero-copy release I decided to drop SKB if zero-copy capable
socket can not provide a page, if such approach will not work or will work
worse than without it, I will think about queueing.
I've updated w1 driver - synced with in-kernel driver, various small fixes and whitespace
cleanup. New version is available in archive.
:: Link / Comments (0)
Wed, 12 Oct 2005
DARPA Grand challenge results.
A Stanford-designed robotic car has driven away with the $2M prize
in the second DARPA Challenge, a 175-mile race for autonomous vehicles
held this weekend in the Mojava desert south of Las Vegas.
Four of 23 vehicles completed the course.
The car "Stanley" is based on a VW Taureg SUV,
with seven Pentium M-powered computers mounted in the trunk in a
fault-tolerant configuration.
Cleaned boring stuff at work, so next two days will be devoted completely
to acrypto and zero-copy receiving. Hope I will have something to show at
the end of the week.
Climbed a little today, but it was hard training - I found a new trace,
which is not too complex, but it's most hard part is that it has permanent
negative slope - so hands are tired a lot, and when it hurts I feel myself alive.
And it makes me feel good.
:: Link / Comments (0)
Mon, 10 Oct 2005
Acrypto hacking.
Bug with mangled TCP content when doing
input ESP4 IPsec processing in acrypto has been narrowed down
to be in scatterlist crypto setup, since acrypto itself
produces right data, which has been verified using consumer.c
test module from acrypto package.
Ok, problem has been found - IV was not set up correctly.
Now input ESP IPsec processing works with acrypto, although
there are some bugs in it - XFRM code definitely does not
allow asynchronous processing, since xfrm_state can be changed.
The only solution I see to audit every xfrm_state usage and
change the whole things to not flush it's data until some reference
counter is dropped...
Megapixel+
project is delayed for one week - there are always boring thing which
can be assigned to be done before.
Zero-copy receiving strikes several problems:
since SKB can be allocated in hard IRQ context,
and grabbing a page from VFS can take too long and must
happen in process context, there is yet unresolved race there:
if we want to store received data, then we should queue it like
before, and then later when page is grabbed and ready for writing,
somehow find that data from skb must be copied into it, and this page
should not go into allocation routing.
Qeueing mechanism also creates a problem if network driver will ask
for skb each time just before page is grabbed in some process context,
so original skb will be allocated with kmalloc()'ed data area,
it will be queued, and this will happen again and again without possibility
to actually turn zero-copy on. Although this problem can be solved easily -
just check if there is data pending in socket queue, and grab a page
with advanced file pointer.
All this problems dissapear if we just decide to drop a packet,
if there is no free page allocated in advance in process context,
but I do not want to get this approach into account yet.
My sister has a birthday today, wow! Marina, I congratulate you and wish the best of everything!
Although I hope you do not read this flow of madness.
:: Link / Comments (0)
Sun, 09 Oct 2005
Zero-copy receiving.
Drawing various zero-copy fast pathes in my head.
The main design goal is clear - fetch some headers
from network card, provide it to the list of registered
zerocopy handlers, one of which can decide that
this packet belongs to him, so it could call it's private
allocator and return data.
For example, let's have a TCP socket, marked as receive zerocopy capable in
receiving sock_sendfile().
Using IP and TCP headers we can find corresponding socket
and decide if it is zerocopy capcable or not.
If yes, then handler will grab a page from VFS cache
and return it's address as skb->data.
Things are clear on a paper, but there are some trouble
in a real life.
First, I have no card which is capable to get only some header
from it's RX ring. The problem has been solved easily - 8139too
copies packet data using MMIO (that is why it is so slow), so
I will just copy packet headers first, provide them to zerocopy
capable SKB allocation function and then copy the rest of the packet
into skb->data, which will be either grabbed VFS page or usual
kmalloc()'ed area.
Second, TCP socket lookup can only run from BH/process context,
but not from hard IRQ context, where skb allocation can happen.
I did not investigate this problem deeply and for tests just
leave it as is.
So, after several caps of tea, beer and some food I've written couple of bytes
of code, which can be used as a base for further receiving zerocopy
development. Handler's interface and TCP socket lookup have been finished,
receiving sock_sendfile() I will take from my
previous
patches.
Things move not bad.
:: Link / Comments (0)
Sat, 08 Oct 2005
Acrypto hacking.
Hacked ESP input processing a litte from home - remote
machine frozen and no clues why asynchronous processing
with the same call chain produces wrong data.
The worst situation will be if it is acrypto core
itself, but it was tested many times under heavy load
and data was never corrupted...
I need to investigate it, but will be able to do it
only after weekend. I'm sure it is something simple
and stupid I forgot.
:: Link / Comments (0)
Fri, 07 Oct 2005
New project codename "Megapixel+" started.
The main aim of this project is to create standalone
implementation of simple network stack on top of
unknown board/system using Intel 82541PI adapter.
Of course I will not write whole TCP implementaion,
but will only create UDP/UDP-Lite stack without
any socket-like interfaces. System does not have
userspace, so driver will be simple -
it only must read some memory and send it over 82541PI
adapter, it also will receive some control messages.
The highest transmit performance of system is a must -
upto 1Gbps using large frames from memory. Unfortunately
system will not be able to get data directly into send
buffers, so there will be at least one copy.
First milestone is to create simple driver for Linux,
which will send/receive UDP frames from/to shared memory
without any usage of existing network stack.
It must be as small as possible, so existing e1000 driver
will be truncated significantly.
New non-payed project - real zero-copy network receive path.
The main idea is to mark socket as being a pipe into a file,
using sock_sendfile() or sock_recvfile(),
so when network driver allocates new skb for received data,
this skb will be allocated using file's cached pages if
it's data should go into specified socket, which is determined
from packet header.
But this will be started after I finish porting
ESP4 IPsec input processing path to acrypto, which
should happen very soon.
Excellent climbing after excellent day - what could be better?
Not many traces, new one was not finished, but nevertheless,
it was really cool.
Gee, Grange's GPIO framework
has been committed into NetBSD CVS tree. My congratulations!
:: Link / Comments (0)
Thu, 06 Oct 2005
XFRM hacking.
Updated XFRM engine to support callback mechanism -
it works very good with synchronous ESP crypto processing,
but using acrypto, TCP header is mangled, but IP header is always
fine. Magic.
Playing kwith wavelets a little - create simple program
to decompose and reconstruct black-white images - it is a first
step into wavelet interpolation.
Results are contradicted - image became absolutely different
and nonstandard transformation produced negative factors, which
never existed in original basis.
:: Link / Comments (0)
Wed, 05 Oct 2005
Acrypto hacking.
Input IPsec processing ported to acrypto hangs much rarely now,
but still can freeze machine completely.
And packet is not delivered to higher layer yet,
but it is not a big problem. The main issue is a crash,
which happens in absolutely unpredictible time...
Added I2C RTC support for D16 board - it uses DS1338 chip,
which was copied from lm_sensors tree.
It was day of relaxing climbing - only half of one new
trace on the negative slope, that trace really deserves
it's category, since I definitely failed there on-sight.
:: Link / Comments (0)
Tue, 04 Oct 2005
Acrypto hacking.
I've moved forward in porting input IPsec processing mode
to acrypto - systems already hangs, which is not always a bad sign!
Easy way, when I decided, that skb can be asynchronously processed without
being cloned, as expected, was wrong, so I will experiment with more complex
approach of SKB usage tomorrow.
Hacked PPC board a little - I finally found a way to log all output to
stderr and stdout, i.e. console output, into my own tty driver, which
moves that data in the same way all kernel messages go through console driver.
Debugging become much easier.
:: Link / Comments (0)
Mon, 03 Oct 2005
Weekend status.
1. Without good company half of liter of tequila flows not very tasty.
2. I found, that I do not eat one day after big drinking, and next day
is spent almost only eating.
3. Good drinking refreshes brain very good. Only good climbing can be compared
in this case.
4. It was not bad.
XFRM hacks.
I've changed input XFRM engine to support recursive callbacks,
so with right shared structure it can be used with acrypto.
The main problem is to provide it somehow into ESP/AH input methods,
probably I will add some pointer into XFRM state structure...
Patch has been sent into netdev@ and linux-crypto@ mail lists for review,
today or tomorrow I plan to implement acrypto support for input
IPsec crypto processing path.
Herbert Xu, current linux kernel crypto maintainer, announced
support for several crypto algorithms in current CryptoAPI stack.
It is a first step of asynchronous crypto processing support in
this stack. Like acrypto and OCF it now supports priority
of algorithm implementations.
XFRM Patch
can also be found in archive.
Climbing was very good today - I tried two new complex traces,
one has been finished, but I completely fail to finish second one -
just do not know, how to move there. The bad thing is that I did not
even see how other people did it, maybe noone even tried it except author,
and it as quite long time ago, so I completely do not remember it, except
that it was hard trace.
:: Link / Comments (0)
Thu, 29 Sep 2005
Userspace init hacking.
Complex chain has been created to satisfy
our embedded/box needs - three steps must be taken
to run userspace on D16 board.
First init run from initrd and is linked to the
kernel image, so it lives either on flash o is loaded
using PCI. This first init multicasts it's IP address
into predefined group, so remote initilisation application
could connect to it and send several loopback images and
files, which are then moounted into tmpfs and root is changed
into this in-ram directory, where second init has been started,
which parses initial configs adn starts other daemons,
one of which begins to multicast board's state into predefined
group, so remote initialisation application could load
firmware into DSPs and configure them.
Design looks complex, but it allows to completely eliminate
NFS server, and thus to put any set of embedded boards into
remote network without thrashing theirs infrastructure,
since only one applicatin is required to fully configure all
devices.
:: Link / Comments (0)
Tue, 27 Sep 2005
DARPA Grand Challenge rox!
DARPA Grand Challenge
is a field test intended to accelerate research and development in autonomous ground vehicles.
Engineering forces produced different machine types - from production
models like VW, Hammer, Nissan equipped with extreemly high-tech electronic driver
which can even switch gearbox using mechanical "arm",
to specially designed for California desert heavy gears and even
motocycles from Berkeley team. There is even one team from high school on Acura MDX car.
I wish Russian universities could even have something similar.
I'm not saying about fully equipped car, but at least some
kind of automotive robotics. In MIPT
where I studied, my friend worked with one Panasonic robot,
which had simple video-camera, gears and very low batteries.
There was RedHat Linux in this box, and it only could
automatically twirl it's camera after the laser pointer.
It had operator's console which allowed to manipulate with camera,
and move gears.
I suspect this robot lives in some box under the thik layer of dust
somewhere now...
Netlink in the current 2.6 kernel has some issue, which can be
called differently - by defaul all netlink sockets
can only broadcast data from kernelspace to several (read only
to one group) first groups, i.e. 1,2 and 3, but if someone wants to broadcast to,
for example, group 0x123, it requires to call multicast binding socket
option from userspace. I've created a patch, which allows
to broadcast to group number specified in bind() time, as long as
bradcast to additional multicast groups.
Patch is quite simple - it was sent to netdev@ for review, since I
did not investigate it's impact too deep.
:: Link / Comments (0)
Mon, 26 Sep 2005
Userspace hacking day.
Hacked init to create rootfs without NFS -
basic idea is to multicast it's IP address,
obtained either from DHCP or assigned in other way,
and create a listening socket, where the whole
roots, compresed or not, as files or as image, could be
loaded from network, so init could create filesystem
in a RAM and pivot into it.
Unfortunately I have completely no time to finish
asynchronous input IPsec processing, but I will do
finish it very soon.
Climbing today was very good - although I was alone
and climbed only traverses and boulderings,
many of them were very good. Good physical
training after good hacking day - that is
exactly what is needed.
:: Link / Comments (0)
Sun, 25 Sep 2005
Several days of total stupidity.
Absolute inactivity, like sitting in a swamp.Nothing.
Today found several performances of single actor
Evgeniy Grishkovets -
it was really good time to see them - I recommend it to everyone.
... они там занимаются чем-то, ну и пусть занимаются,
может быть сегодня они забудут меня разбудить...
А слышно - не забыли. Идут будить.
А идти, главное, не далеко.
Но ты успеваешь приготовить свое самое жалкое лицо,
такое лицо... Вот такое...
Ну такое лицо, которое означает, что это спит ангел,
его будить нельзя... Иначе улетит...
А они зашли, не посмотрели на лицо,
не посмотрели на крылья, которые аккуратно лежат за спиной,
просто включили свет и ушли.
:: Link / Comments (0)
Thu, 22 Sep 2005
Initrd linked with ppc405 embedded image.
It is first time I put initrd into embedded image,
and with all respect to kernel developers, it has been done
very smooth.
Grange has ported
OpenBSD on NEC MobilePro 780 Handheld PC with LE MIPS64.
It is first and uniq port of OpenBSD on this processor,
written from scratch, since OpenBSD never supported LE MIPS.
Here
is a screenshot of the board.
My congratulations!
Read math fundamentals of veivlet processing - some time ago
I had an idea to implement an algorithm which could allow
to sharp zoomed images. Since veivlet basis has information
not only about current point or pixel, but about set of them,
classical interpolation of п╟transformation factors
can lead to factor prediction and thus image quality improvements.
In that times I implemented a simple program which selects
a real image from the database based on hand-wrtten one or scanned
or damaged, by comparing the most significant factors of veivlet transformation.
It worked quite good with images, processed by Photoshop
using some "corrupting" filters.
:: Link / Comments (0)
Wed, 21 Sep 2005
Map analyzer hacks or power of math.
Analyzer
works with Dijkstra algo already, so it easily finds the shortest path between any
points of the road on original bitmap map from scanner or internet,
but only produces not interesting
pies on the path - so I've written the smallest squares interpolation for
those points - result can be found on the
screenshot,
where blue line is 3-order polynomial interpolation for the corner points
of the shortest path (red circles on the map), green line is a Bezier
interpolation from Gnuplot.
:: Link / Comments (0)
Tue, 20 Sep 2005
Hacking on sock_sendfile(). Continue.
I've created nice CPU usage graph for 2.6.13-rc6 and 2.6.14-rc1-git
trees for recv()/write() and receiving sendfile() usage.

Patch is available in archive.
Acrypto hacking - input asynchronous IPsec ESP processing implementation goes
very slowly - due to possibility to have several encapsulated headers,
each XFRM state should be processed synchronously after previous,
so it must be called from acrypto callback, which should be stackable...
:: Link / Comments (0)
Mon, 19 Sep 2005
Hacking on sock_sendfile().
The whole day was spent trying to understand, why
sock_sendfile() is slower than recv()/write() sometimes.
It looks like root of this lives in CPU usage - with sendfile()
usage, and thus sock_sendfile(), it always less than
recv()/write(). Oprofile data shows that poll_idle is less
and third place is taken by __copy_from_user_ll() in recv()/write() case.
CPU usage does not go up probably because of
remote side just can not fill the pipe.
Climbed a little today - shoes still damage my feet,
so some traces look more complex, but I'm sure in a
couple of trainings things will stay better.
:: Link / Comments (0)
Sat, 17 Sep 2005
sock_sendfile() hacking.
I've created a new version of sock_sendfile() which can be used
for receiving data into a file without meaningless iteraction with
userspace.
Previous
implementation works in the following way:
it allocates a page, receiving data into it using recvmsg() and then
calls file_send_actor() which basically is a wrapper over ->sendpage() method,
I implemented ->sendpage() method to grab page from VFS and then
memcpy() data from given page.
Such approach removes copying data from userspace when doing write operation,
and performance improvement was about 5% per calling thread,
which was about 20 seconds when downloading 650 Mb ISO.
In a new approach system grabs a page from VFS and receives data
directly into it, which should completely eliminate any data copying.
:: Link / Comments (0)
Fri, 16 Sep 2005
Acrypto input hacking.
Basic part is done - decrypting part is localized and infrastructure
for acrypto has been created, so it could be as simple as output path, but
input XFRM engine is much more complex and this part is not even touched yet.
The most complex part is data decapsulating, which happens after decryption.
Hmm, Linux networking input path is completely synchronous after IP processing started,
it does not have some kind of dst_entry there, so it can not be splitted into several
pieces, some of which could be offloaded into hardware. This means that it is not possible
to create simple asynchronous IPsec crypto processing for input networking path.
Although everything can be much easier...
What if we just say to the system, that packet has been delivered - as
far as I can see nothing prevents from it, if so, acrypto can just call
all xfrm processing code in asynchronous mode, and then call
netif_rx(skb) with the new SKB itself.
It is an idea.
On embedded side there are good news: kernel and userspace compiled with gcc-3.4.4
work excellent together, so I definitely recommend to use gcc3 instead of gcc4 for PowerPC
cross-compilation.
Climbed a little in a new climbing shoes - Boreal Spider - it's base is stronger,
so I think they will better stay on small holds. This shoes are really good,
but until trod out they damage my feet a lot.
:: Link / Comments (0)
Wed, 14 Sep 2005
Connector update.
If input message rate from userspace is too high, do not drop them, but try
to deliver using work queue allocation. Failing there is some kind of congestion control.
It also removes warn_on on this condition, which scares people.
Updated version was sent for inclusion and available in
archive.
Start working on input esp4 processing support for acrypto.
:: Link / Comments (0)
Tue, 13 Sep 2005
Connector is in Linus' tree.
:: Link / Comments (0)
PPC init problems.
BUG was narrowed down to elf interpreter - it somehow misses
brk adjustment, so later it fails into do_page_fault() without proper VMA.
Magic things is that if I add any memory barrier into load_elf_binary() inside binary mapping loop,
binary is loaded properly with right brk adjustment.
I found the problem - it is GCC-4.1-20050716.
There is following piece of code in load_elf_binary():
k = elf_ppnt->p_vaddr + elf_ppnt->p_filesz;
if (k > elf_bss)
elf_bss = k;
if ((elf_ppnt->p_flags & PF_X) && end_code < k)
end_code = k;
if (end_data < k)
end_data = k;
k = elf_ppnt->p_vaddr + elf_ppnt->p_memsz;
Reading "k" variable at the end of this block shows, that
it does not equal to elf_ppnt->p_vaddr + elf_ppnt->p_memsz:
load_elf_binary: k=00000000, elf_brk=00000000, p_vaddr=10070000, p_filesz=00001b0c, p_memsz=00002780.
Making "k" volatile fixes init problem on ppc405gpr with linux 2.6 kernel.
What I wonder is that 2.4 kernel compiled with the same compiler works fine.
:: Link / Comments (0)
Mon, 12 Sep 2005
PPC init problems.
Todays git pull has brought me nice debug message in dmesg:
init has generated signal 11 but has no handler for it
Kernel panic - not syncing: Attempted to kill init!
<0>Rebooting in 180 seconds..
After reading git commits I found this patch from Paul Mackerras:
[PATCH] ppc32: Kill init on unhandled synchronous signals
This is a patch that I have had in my tree for ages. If init causes
an exception that raises a signal, such as a SIGSEGV, SIGILL or
SIGFPE, and it hasn't registered a handler for it, we don't deliver
the signal, since init doesn't get any signals that it doesn't have a
handler for. But that means that we just return to userland and
generate the same exception again immediately. With this patch we
print a message and kill init in this situation.
This is very useful when you have a bug in the kernel that means that
init doesn't get as far as executing its first instruction. :)
Without this patch the system hangs when it gets to starting the
userland init; with it you at least get a message giving you a clue
about what has gone wrong.
This means that my 2.6 ppc kernel is broken, and it is quite good news, since
digging with compiler problems could be much worse.
:: Link / Comments (0)
Sun, 11 Sep 2005
Connector in -mm tree again.
It has a strange life - first time it was added more than
a year ago into -mm tree by GregKH with my SuperIO patch,
which provided generic access to standart SuperIO chips and
devices on it's bus, namely GPIO and access bus on scx100 board and pc8736x chips.
Interface allows very easy addition of new devices and chips.
It still lives in archive in soekris directory,
although I did not try it for quite some time, since I does not have
hardware.
Then Greg decided to change it's politic of adding stuff, so I
needed to add it directly through Andrew Morton. I first tried to
push connector, but people did not see any profit of having it,
they think that macros on top of netlink allocation can help
creating real message bus, so it was removed again.
I hope this time I have convinced people, that all connector
subsystems is definitely required to have powerfull bidirectional
event bus. It has very simple mechanism of event allocation and
notification and very convenient method of receiving messages from
userspace based on callback registration.
Here
is connector's hopemage.
Connector will be pushed into Linus tree today.
:: Link / Comments (0)
Sat, 10 Sep 2005
sock_sendfile()/generic_file_sendpage() numbers.
Using recv()/write() is about 5% slower in one thread and about 8-10%
slower using two threads compared to new sendfile() on 1+1 hyperthreaded
machine. Server uses sendfile.
recv()/write():
3m45.424s 659905536
3m50.352s 659905536
receiving sendfile():
3m29.432s 659905536
3m33.065s 659905536
:: Link / Comments (0)
Fri, 09 Sep 2005
It looks like I found the root of the problem with not running userspace on PPC405.
IBM has famous erratum 77, which requires sync or dcbt instruction
to be executed before stwcx. Neither of my compilers do not have it,
but kernel itself has such a workaround.
lwarx/stwcx are used for organizing atomic oerations,
so it is heavily used in linuxthreads/pthreads.
Or maybe not, AMCC affirms that GPr core does not have this bug.
After some gcc/glibc hacks I can confirm, that it was not the case - userspace starts,
but does not run.
Climbing was good today - old traces were done - jumping, yellow, black, several boulderings
and one new trace finished on-sight - it was good time, which cured the day.
:: Link / Comments (0)
Thu, 08 Sep 2005
PPC userspace cross-compilation.
The whole day was spent in attempts to run init on ppc board.
It does not. After more than 12 hours spent on this problem,
I only can create following table:
gcc version 3.2.1 --with-cpu=405 --nfp --without-fp userspace works
gcc version 4.1.0 20050702 (experimental) x86 build
--without-fp --nfp --with-cpu=405
--enable-cxx-flags=-mcpu=405
--disable-nls --enable-threads=posix
--enable-symvers=gnu --enable-__cxa_atexit
--enable-languages=c,c++ --enable-shared
--enable-c99 --enable-long-long userspace does not work
gcc version 4.1.0 20050716 (experimental) x86_64 build userspace does not work
Linux kernel 2.6.13 is compiled with the second compiler
and it works fine.
Binutils for the latest two compilers are the same 2.16.1.
Userspace does not crash or not work completely - I have one binary
compiled with 4.1.0 compiler, which works. I even can create
magic sequence of operands in main(), and new binary will also work,
but if then I add additional a = b;, then it stops...
I've started gcc-3.4.4 with new binutils compilation,
hopefully it will work better.
:: Link / Comments (0)
Wed, 07 Sep 2005
Init.
I'm stupid - I ported linux kernel 2.6 to new board,
rewrote ppc BIOS and... can not compile init.
I have one binary without source, which only remounts root
into read-write mode and creates one file in /tmp - it works, now
I have a source, which does exit(-1) on startup, and
init is run by kernel, but it does not run into main() and does not
call exit().
With this stuff in brain I went climbing alone - rubbed fingers,
found couple of new interesting boulderings and starts - it was fun.
:: Link / Comments (0)
Tue, 06 Sep 2005
PPC hacking.
It becomes boring - previous digital recording board D16 had half of it's DSP
on low half of an address bus, and other half on the high addresses, so
there were following code:
if (((unsigned long)dsp->phys_pm & 0x00020000) == 0x00000000)
*((uint32_t *) dsp->pm_base + 0x1) = (DSP_DM_ADDR + addr);
else
*((uint32_t *) dsp->pm_base + 0x1) = ((DSP_DM_ADDR + addr) << 16);
...
if (((unsigned long)dsp->phys_pm & 0x00020000) == 0x00000000)
value = (0xffff & tmp0) + (tmp1 << 16);
else
value = (tmp0 >> 16) + (tmp1 & 0xffff0000);
current board has following address bus crossing from AABBCCDD -> BBAADDCC,
so new board has following:
*(volatile u16 *)dsp->vcs2 = addr;
value = *(volatile u32 *)dsp->vcs1;
val[0] = value & 0xffff;
val[1] = (value >> 16) & 0xffff;
mod_value = __le32_to_cpu((__le16_to_cpu(val[0]) | (__le16_to_cpu(val[1]) << 16)));
Hardware guys have real fun developing such schemas I think.
:: Link / Comments (0)
Mon, 05 Sep 2005
dm-crypt hacking.
Unfortunately it is broken - device mapper itself was
not designed for asynchronous operations, all it's shared objects
can not store private variables since they can be accessed
in parallele, but those which can are only allocated in stack
for local variables. So the most bugfree solution I see is to allocate
own objects in heap, but this will add additional overhead.
Here
is an updated version of dm-crypt port to acrypto.
Ronen Shitrit [rshitrit_marvell.com] has ported dm-crypt for acrypto
to OCF.
Climbed a lot today - bad thing happen - my new climbing shoes
have been rubbed, crap, I bought it less than a month ago.
I climbed couple of interesting traces with even with dynamic jums - it was fine.
Somehow finished 6b+ on-sight - I do think it is either not 6b+,
or I got several wrong holds.
:: Link / Comments (0)
Sun, 04 Sep 2005
I congratulate you with Day of Moscow.
Central streets are closed for car moving,
artiicially created good weather and
many entertainments and fun.
Added couple of acrypto helpers for TFM bridge,
like converters from TFM to acrypto mode and type.
As usual, updated version is available in
archive.
I've updated
dm-crypt
patch, which fixes issue with multiple dm-crypt devices for different partitions.
Bug could slow down processing, and probably can cause an oops.
It's md5sum is 318261505489fa0f46d030cdf3844b35.
:: Link / Comments (0)
Sat, 03 Sep 2005
Drunking^W^W^WRelaxing day.
The whole night I drunk Olmeca tequila with Grange.
Then morning tea, and moving to friends meeting,
where Sauza tequila was flawn.
It is always nice to talk about interesting things
with interesting people with couple of glasses of the
fire water.
:: Link / Comments (0)
Fri, 02 Sep 2005
dm-crypt ported to acrypto.
I've announced
dm-crypt-2.6.13.diff
patch in linux-crypto@ and put it into archive.
I've implemented generic_file_sendpage() method and sock_sendfile(),
now Linux users can use sendfile() system call for any file descriptors
communications, like socket<->socket and socket<->file. The main advantage
is that for socket->file one does not need to copy data from userspace
buffer into kernel when doing write() using very slow copy_from_user().
Patch was presented in netdev@.
Patch
against 2.6.13 is available in archive.
:: Link / Comments (0)
Thu, 01 Sep 2005
Day of knowledge.
People start searching for the knowledge today,
and I started to port dm-crypt to acrypto.
It is almost done, so I plan to clean it up tomorrow
and announce in linux-crypto@, next point is
input path of in-kernel IPsec stack.
I've finally finish new complex trace and feel myself
very good about it. Also climbed several old traces,
but they were not so cool as first one.
Tried second time trace called "Beaujolais", I do not like
wine, so I falied it again, but I'm quite sure next time
I will uncork this bottle.
:: Link / Comments (0)
Tue, 30 Aug 2005
2.6.14 kernel tree will be called "Affluent Albatross".
Fighting with EMAC driver on our new PPC405GPR based D16 board -
it does not work with my Dlink switch (model DES-1008D), permanent FCS errors,
and if omit FCS check frames itself are broken too. With something from CNet
it works fine. I know, switch is crap,
and now I have yet another confirmation on top of
not working autonegotiation with Realtek 8169 PHY and Marvell 88e1111,
managed by r8169 and forcedeth modules.
Climbed with Grange today, he finally
returned from vacations. It was very good - people finally do not scream,
that I shin without a rope, so several old traces were done, although
not clean - next time it will be better.
:: Link / Comments (0)
Mon, 29 Aug 2005
PPC hacking day.
Ok, new D16 board has beed booted - the main change from PPC405GP initialisation
was only data cache size - it is now 16Kb, and in PPC405GP it was 8K,
so BIOS caught an exception when tried to write into half-invalidated
cache. Other changes were mostly setting different GPIO values.
Now linux kernel 2.6.13 works ok. Next task is to change
DSP drivers to the new addressing schema.
I've implemented simple no-way-zero-copy support for sendfile
for socket <-> socket connections.
Patch is quite stupid - it uses kernel_recvmsg() into
preallocated page and then provides it for given actor() method.
If this approach will be concidered usefull I will
probably implement sendpage() method for fs, so it could be
really usefull for huge ftp uploads and so on.
Until then you can find a patch in
archive.
:: Link / Comments (0)
Sun, 28 Aug 2005
The laziest day.
I feel myself as being a soulsick - why, should I ask,
I spent several hours in my bed... watching all three
parts of Harry Potter?
In russian it is spelled similar to "Potniy", i.e.
Harry Potniy, which is Harry Wet in rough translation.
And I lie in my bed and ... and watching Harry "Wet" Potter.
Crap.
:: Link / Comments (0)
Fri, 26 Aug 2005
Connector updated.
Ok, I've send updated
netlink connector
version to netdev@ maillist and
updated archive.
4 hours of climbing in a two traces - I found new very complex bouldering
and work on one traverse. It was very good time. I found that pain after
the training moved from hands into the fingers and it's first phalange -
pillows are just rubbed.
:: Link / Comments (0)
Wed, 24 Aug 2005
Lazy day.
Nothing interesting happened today - new board is not loaded -
ppc405GP BIOS can not be started on ppc405GPr, probably it
freezes on data cache initilisation, and now it is impossible
to reprogramm an eeprom so it will wait until Monday when
programmator will be delivered.
Climbed a little today - it was huge crowd of people
in Skala-city, so I only did several old traverses and starts.
I've bought quarter season ticket to the climbing zone, so
it could be called regular training start.
:: Link / Comments (0)
Tue, 23 Aug 2005
PPC hacking.
I've finished ppc port to old D16 board.
Today I hacked miscelaneous tools and host drivers for our
new D16 board.
Here is a photo (Warning, 4Mb!).
It is PPC405GPr based embedded board with 32Mb SDRAM and 2MB flash memory
installed. It ships 4 Analog Devices ADSP-2185 and set of i2c devices - RTC, thermal sensor.
Tomorrow I will start porting 2.4/2.6 kernel on this board.
:: Link / Comments (0)
Mon, 22 Aug 2005
Linux kernel 2.6 PPC ported to D16 digital board.
Ok, it works!
The most complex and interest part was to hack board's kernel loader,
and searching in head.S and relocate_kernel.S where the kernel stops
using some kind of this stub
lis 0,0xaabb
ori 0,0,52445
lis 9,0x80
stw 0,0(9)
_loop_:
b _loop_
Here is complete dmesg:
<5>Linux version 2.6.13-rc6 (s0mbre@uganda)
(gcc version 4.1.0 20050716 (experimental)) #54 Mon Aug 22 14:40:49 MSD 2005
<4>Running as PCI slave, kernel PCI disabled !
<4>PCLIO_BASE = 0xe7ffe000
<4>PCI bridge regs before fixup
<4> ma la pcila pciha
<4> pmm0 00000000 fffe0000 fffe0000 00000000
<4> pmm1 00000000 00000000 00000000 00000000
<4> pmm2 00000000 00000000 00000000 00000000
<4> ptm1 ms: fe000001 la: 00000000
<4> ptm2 ms: fe000001 la: fe000000
<4>PCI bridge regs after fixup
<4> ma la pcila pciha
<4> pmm0 c0000001 80000000 80000000 00000000
<4> pmm1 00000000 00000000 00000000 00000000
<4> pmm2 00000000 00000000 00000000 00000000
<4> ptm1 ms: fe000001 la: 00000000
<4> ptm2 ms: fe000001 la: fe000000
<4>Message 1
<0>Message 2
<4>sk: mem_ipc_setup() done
<4> D16 port (C) 2000-2005
<4> machine: D16
<4>
<4> bi_s_version:
<4> bi_r_version:
<4> bi_memsize: 0x00ff0000 16320KBytes
<4>bi_enetaddr 0: 732069-703d3a
<4>bi_enetaddr 1: 3a3a3a-643136
<4> pin strapping: 0x6aaa9000
<4> bi_intfreq: 198Mhz
<4> plb bus clock: 33MHz
<4>bi_pci_busfreq: 33MHz
<4> opb bus clock: 33MHz
<4>cs0 CR: ff09a000 AP: 03840200
<4>cs1 CR: fe07c000 AP: 80000380
<4>cs2 CR: 00000000 AP: 00000000
<4>cs3 CR: 00000000 AP: 00000000
<4>cs4 CR: 00000000 AP: 00000000
<4>cs5 CR: 00000000 AP: 00000000
<4>cs6 CR: 00000000 AP: 00000000
<4>cs7 CR: 00000000 AP: 00000000
<4>EBC0_CFG: 80400000
<7>On node 0 totalpages: 4096
<7> DMA zone: 4096 pages, LIFO batch:1
<7> Normal zone: 0 pages, LIFO batch:1
<7> HighMem zone: 0 pages, LIFO batch:1
<4>Built 1 zonelists
<5>Kernel command line: root=/dev/nfs ip=::::d16-0.net:eth0:any console=ttyS binfo=0xabcd
idsp=ixpro,d16,8,0xfe020000,0x8000,0,0,0xfe028000,0x8000,0xfe060000,0x8000,0,0,0xfe068000,
0x8000,0xfe0a0000,0x8000,0,0,0xfe0a8000,0x8000,0xfe0d0000,0x8000,0,0,0xfe0d8000,0x8000,
0xfe110000,0x8000,0,0,0xfe118000,0x8000,0xfe150000,0x8000,0,0,0xfe158000,0x8000,0xfe190000,
0x8000,0,0,0xfe198000,0x8000,0xfe1d0000,0x8000,0,0,0xfe1d8000,0x8000 amb=ixpro,d16,1,0xfe300000,
0x1ffff,0x10000,0x18000 l_tx_ptr=0x80000000 l_tx_len=0x1000
<4>binfo_setup: MAC 00:00:dd:dd:ab:cd
<4>PID hash table entries: 128 (order: 7, 2048 bytes)
<4>Console: colour dummy device 80x25
<4>Dentry cache hash table entries: 4096 (order: 2, 16384 bytes)
<4>Inode-cache hash table entries: 2048 (order: 1, 8192 bytes)
<4>Memory: 13184k available (2260k kernel code, 636k data, 108k init, 0k highmem)
<7>Calibrating delay loop... 196.09 BogoMIPS (lpj=98048)
<4>Mount-cache hash table entries: 512
<6>NET: Registered protocol family 16
<6>PCI: Probing PCI hardware
<3>Memory resource not set for host bridge 0
<4>D16 serial port emulator driver.
<4>mem_con d16 char io device registered to major: 253 minor: 0
<6>Initializing Cryptographic API
<4>vty_init
<4>tty_register_driver: register_chrdev_region() failed with error=0.
<4>tty_register_driver: register_chrdev_region() failed with error=0.
<4>tty_register_driver: register_chrdev_region() failed with error=0.
<6>Serial: 8250/16550 driver $Revision: 1.90 $ 32 ports, IRQ sharing enabled
<4>tty_register_driver: register_chrdev_region() failed with error=0.
<4>ttyS0 at MMIO 0x0 (irq = 0) is a 16550A
<4>ttyS1 at MMIO 0x0 (irq = 1) is a 16550A
<6>io scheduler noop registered
<6>io scheduler cfq registered
<4>RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
<6>loop: loaded (max 8 devices)
<6>mal0: Initialized, 1 tx channels, 1 rx channels
<6>emac: IBM EMAC Ethernet driver, version 2.0
<6>Maintained by Benjamin Herrenschmidt
<4>eth0: IBM emac, MAC 00:00:dd:dd:ab:cd
<6>eth0: Found Generic MII PHY (0x02)
<4>netconsole: not configured, aborting
<6>mice: PS/2 mouse device common for all mice
<6>i2c /dev entries driver
<6>IBM IIC driver v2.1
<6>ibm-iic0: using standard (100 kHz) mode
<6>oprofile: using timer interrupt.
<6>NET: Registered protocol family 2
<4>IP route cache hash table entries: 256 (order: -2, 1024 bytes)
<4>TCP established hash table entries: 1024 (order: 1, 8192 bytes)
<4>TCP bind hash table entries: 1024 (order: 2, 28672 bytes)
<6>TCP: Hash tables configured (established 1024 bind 1024)
<6>TCP reno registered
<6>TCP bic registered
<6>NET: Registered protocol family 1
<6>NET: Registered protocol family 17
<6>eth0: Link is Up
<6>eth0: Speed: 100, Full duplex.
<6>eth0: Link is Up
<6>eth0: Speed: 100, Full duplex.
<5>Sending DHCP and RARP requests ., OK
<4>IP-Config: Got DHCP answer from 192.168.0.202, my address is 192.168.0.243
<4>IP-Config: Complete:
<4> device=eth0, addr=192.168.0.243, mask=255.255.248.0, gw=192.168.0.1,
<4> host=d16-0, domain=, nis-domain=net,
<4> bootserver=192.168.0.202, rootserver=192.168.0.188, rootpath=/home/pwp/work/nfsbox
<5>Looking up port of RPC 100003/2 on 192.168.0.188
<5>Looking up port of RPC 100005/1 on 192.168.0.188
<4>VFS: Mounted root (nfs filesystem) readonly.
<4>Freeing unused kernel memory: 108k init
After good hacking day it was definitely good climbing evening -
although I did not shinned up quite high, but instead found couple of
very interesting starts and small traverses. Although saw one little bouldering,
but was too tired to complete.
:: Link / Comments (0)
Sat, 20 Aug 2005
Embedded PPC hacking.
Finally it boots, although not the whole kernel.
Here is first dmesg:
version 2.6.13-rc6 (s0mbre@uganda)
(gcc version 4.1.0 20050716 (experimental)) #24 Sat Aug 20 21:11:46 MSD 2005
<4>Running as PCI slave, kernel PCI disabled !
<7>On node 0 totalpages: 4096
<7> DMA zone: 4096 pages, LIFO batch:1
<7> Normal zone: 0 pages, LIFO batch:1
<7> HighMem zone: 0 pages, LIFO batch:1
<4>Built 1 zonelists
<5>Kernel command line:
<4>PID hash table entries: 128 (order: 7, 2048 bytes)
<4>Console: colour dummy device 80x25
<4>Dentry cache hash table entries: 4096 (order: 2, 16384 bytes)
<4>Inode-cache hash table entries: 2048 (order: 1, 8192 bytes)
This required to hack board's BIOS, i.e. those program which is loaded from the EEPROM
before kernel is loaded and even copied through PCI. It looks like problem lives
in PCI initialisation part, since board should not look like PCI slave. Probably
BIOS fixup is called too late.
Not bad for two days, I think.
:: Link / Comments (0)
Fri, 19 Aug 2005
PPC embedded development.
Ok, the whole morning I'm fighting with the 2.6 kernel on
our custom embedded PPC405GP/PPC405GPr board. Now it almost run!
Linux/PPC load: .... ..Uncompressing Linux.....done.
..initrd moved: .Now booting the kernel
.exit
...pause
..
..
-- System halted.0123456789ABCDEF....bad gzipped data
...gunzip: ran out of data in header
..oops... out of memory
On this nice news I climbed very good - I've finished complex old trace,
done several interesting starts and boulderings. I'm entering quite nice
stripe in a life, which results in excellent results in almost every aspect
of life, I'me quite sure it will be finished, but it will be well-deserved
rest from the active time.
:: Link / Comments (0)
Thu, 18 Aug 2005
PPC development day.
It was not so easy to extract my PPC tree from BitKeeper, since
bk after Jul 1 refuses to work completely, so I found Andrew Tridgell's
sourcepuller and extracted the tree, then found that the latest version
I work with is 2.4.26, which is definitely not what I wanted to work with,
so after several hours of merging/reading/writing/thinking I compiled
2.4.26 kernel for our ppc405 based platform from pure 2.4.26, without
huge Montavista's patches. It works, which is very nice. Later today I will
merge this project with the latest 2.4 tree.
The main purpose of this steps is to port 2.6 kernel to this platform.
Size of the resulted patch is about 1.5 Mbytes.
:: Link / Comments (0)
Wed, 17 Aug 2005
Acrypto and asynchronous IPsec.
Something major happened between 2.6.12-rc2 and current kernel in XFRM processing
engine. My old proof-of-concept patch can only send 5 packets now, but with old
kernel it's asynchronous performance was almost the same as synchronous stack.
So I need to investigate what changes in XFRM stack can cause such behaviour.
Test shows that it is ICMP which has such a strange behaviour - TCP ssh over asynchronous
IPsec works perfectly without any stalls.
Ok, ICMP problem found -
raw_sendmsg()->ip_append_data()->sock_alloc_send_skb()->
sock_alloc_send_pskb()->atomic_read(&sk->sk_wmem_alloc),
this means that sk_wmem_alloc is never decreased enough to free space in socket queue,
it is decremented in sock_wfree() which is called from kfree_skb(),
so it looks like raw skbs do not pass the same way TCP/UDP skbs are processed...
Ok, this problem has been fixed and I released new
patch.
It's md5sum is 506ad2ad7148199c29e25d8b0afe0c66.
:: Link / Comments (0)
Tue, 16 Aug 2005
Acrypto hacking.
First half of the day was spent in PPC cross-platform compilation.
Due to simple typo in the very first config file I was needed to recompile
glibc with different compilers, faking it's configure to allow part of it
to be compiled using x86_64 gcc-4.0.1, other as compiled using x86_32 gcc-3.2...
Crap, but finally it was finished, many thanks to Dan Kegel for his excellent
cross-development toolchain.
Second half of the day was spent in acrypto hacking - I replaced yesterday's diff
with new one, but it still has some issues. It probably broken more deeply, than my local version,
so do not use it for now, I will investigate it further. Local version
has one big problem - it only sends 5 ICMP packets, and then dst->otput() is not even called.
Tomorrow I will setup my very old proof-of-concept patch with asynchronous IPsec processing
and verify that it still works, if it does not, then something major happened between 2.6.12-rc2
and 2.6.13-rc6 in core network stack, it's about 4 months, and I will investigate current
XFRM deeply. If old patch still works, and I think it will, then something small and
stupid sneaked into my code.
:: Link / Comments (0)
Mon, 15 Aug 2005
Acrypto and IPsec. |