<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Zbr's days.   </title>
    <link>http://tservice.net.ru/~s0mbre/blog</link>
    <description>Zbr's days.</description>
    <language>en</language>

  <item>
    <title>Midnight creatiff. Casted by LHC start.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//other/2008_07_05</link>
    <description>
- Shit! There are no more M8 screw-nuts.&lt;br/&gt;
- What? Use M12, bozon should pass through.&lt;br/&gt;
- We all will be fucked this Monday!&lt;br/&gt;&lt;br/&gt;

&lt;img src=&quot;http://tservice.net.ru/~s0mbre/gallery/lhc_ready.gif&quot; alt=&quot;Building LHC&quot;&gt;&lt;br/&gt;&lt;br/&gt;

Good night. Actually as a former physicist I can say,
that at least two out of four killing theories are really
stupid, but nevertheless its interesting!

Comments (2)
   </description>
 </item>
  <item>
    <title>In case we will die this Monday...</title>
    <link>http://tservice.net.ru/~s0mbre/blog//other/2008_07_04</link>
    <description>
I've started a countdown...&lt;br/&gt;&lt;br/&gt;

&lt;img src=&quot;http://tservice.net.ru/~s0mbre/gallery/clontarf_bottle.jpg&quot; alt=&quot;Countdown has been started&quot;&gt;&lt;br/&gt;&lt;br/&gt;

&lt;a href=&quot;http://lhc.web.cern.ch/lhc/&quot;&gt;Large Hadron Collider&lt;/a&gt; will be started in 3 days...

Comments (0)
   </description>
 </item>
  <item>
    <title>POHMELFS crypto support has been completed.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/fs/2008_07_03</link>
    <description>
&lt;pre&gt;kernel$ git commit -a
Created commit b07e3ed: Added crypto support.
 9 files changed, 1534 insertions(+), 221 deletions(-)
 create mode 100644 fs/pohmelfs/crypto.c

fserver$ git commit -a -m &quot;Aded crypto support.&quot;
Created commit f916b2f: Aded crypto support.
 3 files changed, 788 insertions(+), 94 deletions(-)&lt;/pre&gt;

I implemented pool of crypto processing threads (number of them
is mount option parameter), each of which has pool of pages to
encrypt data into, so crypto thread is not released until server
returns acknowledge that data was successfully written, so one
should tune number of threads and page pool (number of pages
in each thread is maximum number of pages per transaction,
this limit has own mount option too) according to desired behaviour.&lt;br/&gt;&lt;br/&gt;

Testing shows that writing performance was reduced with this approach
noticebly: with 4 encryption threads and 4 receiving thread in server
perfromance dropped by around 30% from 65+ MB/s down to 46+ MB/s,
but I think it can be improved with larger number of encryption threads.
During iozone write/rewrite test each of 4 crypto threads ate about 20-30%
of CPU, while server ate about 130% (4 threads totally). In all previous iozone tests
the larger number of userspace was used, the worse results were
(this is somewhat expected, since iozone is singlethreaded benchmark,
so larger number of threads lead only to performance degradation),
so I will test different setups (namely larger number of crypto threads
and smaller number of server threads).&lt;br/&gt;&lt;br/&gt;

But this behaviour is not a problem, and I expect it to be tuned, real
problem is reading performance. Right now there is only single thread,
which reads from one socket: it was done intentionally, since reading
data from socket is longer operation than searching page in radix tree
or any other operation performed by that thread, so there is no way
to saturate its capabilities. Until we start encryption, which is slow,
so any subsequent data reading from the socket can not be done in parallel
with crypto processing, and overall reading performance drops to ground.&lt;br/&gt;&lt;br&gt;

This problem has to be fixed, so I plan to use the same crypto
processing threads to decrypt and/or perform hash check for received data
and push it up to the VFS stack.

Comments (0)
   </description>
 </item>
  <item>
    <title>POHMELFS crypto: feel incredibly stupid.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/fs/2008_07_02</link>
    <description>
First,
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/old/?section=projects&amp;item=pohmelfs&quot;&gt;POHMELFS&lt;/a&gt;
does need to have encryption. Because I plan to use
distributed hash table approach in server (well, consider POHMELFS
kernel client as a kind of bittorrent filesystem client), and as in any
non-centralized system, content transferred via uncontrolled data channels
has to be encrypted.&lt;br/&gt;&lt;br/&gt;

But... I'm incredibly stupid: I implemented encryption and decryption in place,
i.e. VFS page is being encrypted prior to be written to the servers, so
subsequent reading leads to... Yes, it reads encrypted content.&lt;br/&gt;
To fix this issue I plan to encrypt data into different pages and send them,
leaving VFS ones as is. There are two approaches I consider:&lt;ul&gt;
&lt;li&gt;allocate and send pages at writeback time - we want to send 5 pages, so allocate
5 pages, encrypt data into them and broadcast them to all needed servers.&lt;/li&gt;
&lt;li&gt;allocate (potentially large) pool of pages at mount time per crypto thread
and encrypt data into them. This will have about zero run-time overhead for VFS,
except slightly delayed because of encryption write completion.&lt;/li&gt;&lt;/ul&gt;

Comments (7)
   </description>
 </item>
  <item>
    <title>Louis Maggio trumpet school: never smile.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//life/2008_07_02</link>
    <description>

Comments (0)
   </description>
 </item>
  <item>
    <title>Holy shit: kernel summit.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/other/2008_07_01</link>
    <description>
&lt;blockquote&gt;We would like to invite you to the 2008 Kernel summit, and we hope that
you will be able to join us...&lt;/blockquote&gt;

I'm trying to recall previous kernel summit:&lt;br/&gt;&lt;br/&gt;
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/gallery2/main.php?g2_itemId=2685&quot;&gt;
&lt;img src=&quot;http://tservice.net.ru/~s0mbre/gallery2/main.php?g2_view=core.DownloadItem&amp;g2_itemId=2690&amp;g2_serialNumber=2&quot;&gt;&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;

That was fun, but no one wanted to play football &lt;s&gt;instead of talking about whatever we talked about&lt;/s&gt;.&lt;br/&gt;&lt;br/&gt;

For that year I only committed a
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/blog/devel/acrypto/hifn/index.html&quot;&gt;HIFN driver&lt;/a&gt;
into the tree, and there was no &lt;a href=&quot;http://tservice.net.ru/~s0mbre/old/?section=projects&amp;item=kevent&quot;&gt;kevent&lt;/a&gt; :)&lt;br/&gt;&lt;br/&gt;

This time in US, thinking...

Comments (5)
   </description>
 </item>
  <item>
    <title>Why is blocking sending considered harmful?</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/networking/2008_07_01_1</link>
    <description>
I frequently hear that whatever server you implement, it has to
be non-blocking, since in case of parallel sending it allows to
send multiple requests to fast servers, while not-sending data to
slow server, since non-blocking socket will return EAGAIN.&lt;br/&gt;&lt;br/&gt;

This is only half-right solution: when we have to put given data to
all servers, and can not free it until all servers replied with acknowledge,
non-blocking mode can bring more damage than gain.&lt;br/&gt;&lt;br/&gt;

Mainly because it
allows to eat all the memory for requests, which are still in the queue
to be sent to slow server, and which was already sent to fast ones.
In this case higher-level application (consider simple application which generates
some data and writes it into the file in distributed filesystem, which writes
file to several servers) will never block since transfer
to fast servers completes quickly, and will provide more and more data,
which will consume all RAM.&lt;br/&gt;&lt;br/&gt;

It is possible to deadlock system in this case,
since to send some data to remote server we always have to allocate at least some
data to put network headers into. With non-blocking solution we will consume
all memory and kick itself into the coma.

Comments (2)
   </description>
 </item>
  <item>
    <title>Passive OS fingerprinting.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/networking/2008_07_01</link>
    <description>
I've updated &lt;a href=&quot;http://tservice.net.ru/~s0mbre/old/?section=projects&amp;item=osf&quot;&gt;OSF&lt;/a&gt;
modules to xtables, so you have to enable its support in kernel config and get
recent iptables (I tested with 1.4.1.1, which is the latest release to date).&lt;br/&gt;&lt;br/&gt;

OSF allows you to match incoming packets by different sets of SYN-packet and determine,
which remote system is on the remote end, so you can make decisions based on OS type
and even version at some degreee.&lt;br/&gt;&lt;br/&gt;

Installation instruction, example and source code can be found on
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/old/?section=projects&amp;item=osf&quot;&gt;homepage&lt;/a&gt;.&lt;br/&gt;&lt;br/&gt;

I've also sent it to netfilter-devel@ and netdev@ maillists, since my previous mails never appeared
there likely because of spam filters.

Comments (0)
   </description>
 </item>
  <item>
    <title>Filesystem development rumors.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/fs/2008_06_30</link>
    <description>
Rumor number one. &lt;a href=&quot;http://www.swsoft.com&quot;&gt;SWsoft&lt;/a&gt;
aka Parallels actively searches for Linux kernel hackers in
lead Moscow universities, namely MSU and MIPT. I saw theirs
posters, where among other (wanted) requirements there is
distributed filesystem knowledge.&lt;br/&gt;&lt;br/&gt;

Rumor number two. Alexey Kuznetsov (if you do not know,
its the guy who wrote major part of linux network stack,
namely TCP/UDP/IP and socket implementations, and although
there was lots of changes in the stack since then, I think it will not
be an exaggeration to call him the author), who also worked
on Virtuozzo and OpenVZ (and its interesting VFS parts, which
AFAICS are not in kernel, maybe yet), so he works on some
filesystem too. The last time we 'confronted' was couple
of years ago, when I first time implemented
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/old/?section=projects&amp;item=netchannel&quot;&gt;netchannels&lt;/a&gt;
and tried to convince network community (and namely Alexey Kuznetsov
and &lt;a href=&quot;http://vger.kernel.org/~davem/&quot;&gt;David Miller&lt;/a&gt;)
that netchannel idea worth further investigation and implementation.
IIRC I did not succeed, although results were very
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/blog/devel/networking/unetstack/2008_06_14.html&quot;&gt;impressive&lt;/a&gt;.&lt;br/&gt;
Let's see what will happen with filesystems :)&lt;br/&gt;
&lt;br/&gt;

Rumor number three. SWsoft recently started to actively search
for kernel hacker for 'new interesting open source project'. They
always searched for kernel programmers, but never told anything
about projects, now something changed.&lt;br/&gt;&lt;br/&gt;

Rumor number four. OpenVZ and Virtuozzo have serious problems with NFS
(especially when server dies), probably because of very ugly NFS protocol
(yes it is), so its hard to properly virtualize it (or not?). There are
no alternatives for NFS right now in major productions, but you all know about
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/old/?section=projects&amp;item=pohmelfs&quot;&gt;POHMELFS&lt;/a&gt;
which right now can be used as really good replacement.&lt;br/&gt;&lt;br/&gt;

Rumor number five. SWsoft has long history of PHD defences (at least in MIPT) based on
theoretical FS called TorFS (namely Tormasov FileSystem), year ago it was still
not very alive project in practice,
but I heard that it was very impressive in theory. This rumor exists
really many years.&lt;br/&gt;&lt;br/&gt;

So, I have a quite clear picture, that SWsoft started development of the new
distributed filesystem, which is aimed at first to replace NFS in virtualized
environments. I can also imagine very interesting distributed parallel facilities
needed for virtualized systems. And they try to attract lots of people to the
project as long as really heavy artillery like Alexey Kuznetsov.&lt;br/&gt;&lt;br/&gt;

Which basically means, that sooner or later my development will meet strong
concurency from this company, which has lots of really good professionals.&lt;br/&gt;
And that's very interesting and cool :)&lt;br/&gt;&lt;br/&gt;

P.S. or it may be a complete bullshit and delirium of my fevered consciousness.&lt;br/&gt;&lt;br/&gt;

And one fact about
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/old/?section=projects&amp;item=pohmelfs&quot;&gt;POHMELFS&lt;/a&gt;:
today I finished client support for padded crypto processing of all requests
and started to work out server bits, I expect to finish it in a day or around,
so new release is very close.

Comments (3)
   </description>
 </item>
  <item>
    <title>Listened how my trumpet can sound.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//other/2008_06_28</link>
    <description>
It was really interesting. Although it is very simple student
model, a friend produced very good sounds. He did not practice
many years already, but nevertheless it was not that bad.&lt;br/&gt;&lt;br/&gt;

My everyday half to hour exercises usually produce worse sound, although
sometimes I do find really cool notes. Unfortunately I still do not
know some magic bit about how to catch on that sound, it borns and
dissapears on its own, but I'm sure I will find it, and I think I'm close
to where it hides :)

Comments (0)
   </description>
 </item>
  <item>
    <title>Need to rethink POHMELFS crypto a bit.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/fs/2008_06_28</link>
    <description>
1. Because of encryption problem - data to be encrypted has to be
blocksize aligned, so some informaion about padding has to
be added into network command as long as crypto data size.&lt;br/&gt;&lt;br/&gt;

2. IV generation. I decided to extend network command and put there
64 bit IV for given packet. using simple sequence number is enough
to protect against repeat message attack.&lt;br/&gt;&lt;br/&gt;

3. Encryption/hashing data. I decided not to ecnrypt/hash network headers,
and only do it for transmitted data. If transaction contains several
commands, data for all commands will be encrypted/hashed, in case of hash,
signle digest/hmac will be generated and placed into transaction header.&lt;br/&gt;&lt;br/&gt;

4. It is possible, that I will add strong header checksum, which will be generated
only for header and placed into special field. It will be calculated
assuming checksum field is zero. This step is optional so far, but network header
has 32 reserved bits, which can be used for it.&lt;br/&gt;&lt;br/&gt;

Right now hashing and encryption work, but are not checked on server (although generated),
because of crypto alignment ugliness I decided to rethink approach a bit.&lt;br/&gt;
Evolution process in action...

Comments (0)
   </description>
 </item>
  <item>
    <title>0:3</title>
    <link>http://tservice.net.ru/~s0mbre/blog//other/2008_06_26</link>
    <description>
That was really suck - yes, we played bad. Just like it was before.
It is not somewhat surprising.&lt;br/&gt;
But what was the fucking ubnormal week ago agains Holland? &lt;b&gt;That&lt;/b&gt;
was new, was cool, was bloody great, but not today. Tired or whatever...
What's the difference right now, we lose.&lt;br/&gt;&lt;br/&gt;

Yes, Spain played really good, my congratulations.&lt;br/&gt;
But our command showed, that it is possible.&lt;br/&gt;
That there is &lt;b&gt;nothing&lt;/b&gt; impossible.&lt;br/&gt;
We can, when we want. You can, when you want.&lt;br/&gt;&lt;br/&gt;

Thanks a lot for the games!

Comments (0)
   </description>
 </item>
  <item>
    <title>POHMELFS server got initial crypto processing capabilities.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/fs/2008_06_26</link>
    <description>
POHMELFS server is able to handshake hash/cipher names and operation
modes, to initialize appropriate algorithms and perfrom basic operations
(like more generic &lt;code&gt;hash_update()&lt;/code&gt; instead of different
functions with different arguments used to hash data depending on operation mode,
either simple digest or hmac: &lt;code&gt;EVP_DigestUpdate()/HMAC_Update()&lt;/code&gt;.
I'm working on the right way of doing crypto processing, since how it is done right now is a bit hairy,
i.e. without serious changes in the code.&lt;br/&gt;
I already hate OpenSSL API: &lt;code&gt;EVP_get_cipherbyname(), EVP_MD_CTX, EVP_DigestFinal_ex()&lt;/code&gt;.
It looks like above functions were written by three different persons and they
never actually talked to each other about how to make them look similar... But it is
a minor issue of course.&lt;br/&gt;&lt;br/&gt;

So, when things are settled down, I will make a new release, likely it will see the light this week.

Comments (0)
   </description>
 </item>
  <item>
    <title>Hacking your ISP for fun and profit.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/other/2008_06_26</link>
    <description>
My ISP again blocked my account and can not unblock it although there
are money on the deposit. There are serious problems in its billing
system which requires manual intervention of the operator. Unfortunately
it is a real challenge to call them, it already took more than half of a hour
yesterday, and without success.&lt;br/&gt;
So, I decided to implement an interesting idea on how to bypass its blocking.&lt;br/&gt;&lt;br/&gt;

It is based on the security 'hole' in its (and I think vast majority
of ISPs do the same) DNS configuration, which allows
to request any DNS record even if account is blocked. It will be fetched from
remote DNS server if there are no records in the IPSs cache.&lt;br/&gt;
Thus attack vector becomes visible: implement IP over DNS tunnel network device
and setup local routing to use it by default. One has to control at least one
remote machine which hosts DNS records for given domain name, since it is required
to parse incoming DNS requests and process them accordingly.&lt;br/&gt;&lt;br/&gt;

There are at least two known IP over DNS tunnel solutions:
&lt;a href=&quot;http://savannah.nongnu.org/projects/nstx/&quot;&gt;NSTX&lt;/a&gt;
(&lt;a href=&quot;http://thomer.com/howtos/nstx.html&quot;&gt;howto&lt;/a&gt;) and
&lt;a herf=&quot;http://www.doxpara.com/&quot;&gt;OzymanDNS&lt;/a&gt;
(&lt;a href=&quot;http://dnstunnel.de/&quot;&gt;howto&lt;/a&gt;). Both solutions require that you own one or another
server to run ip-over-dns tunnel server on it.
Unfortunately I have only single machine with static IP address, which is not protected
by lots of firewalls and allows incoming connections.&lt;br/&gt;&lt;br/&gt;

The simplest solution for this problem is to create iptables input target rule
for the server, which will parse incoming DNS requests and redirect usual queries up
the network stack to the userspace server, and handle 'poisoned' queries as tunnel.&lt;br/&gt;
Client can be TUN/TAP based, but can also be a tunnel network device.&lt;br/&gt;
I believe the more weird it looks, the more interesting it is, so likely will think
more about kernel based tunnels.&lt;br/&gt;&lt;br/&gt;

DNS queries are limited enough not to allow binary data (IIRC,
the most interesting is DNS TXT records), but it can be appropriately
encoded and enciphered. So, will put it into
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/old/?section=notes&amp;item=todo&quot;&gt;todo&lt;/a&gt; list.&lt;br/&gt;
I even think that it is not that bad idea to have such modules in kernel :)

Comments (3)
   </description>
 </item>
  <item>
    <title>POHMELFS input crypto processing engine is ready for testing.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/fs/2008_06_25_1</link>
    <description>
But testing can not be done without appropriate server support, which
is now the main task. POHMELFS uses lazy crypto engine - each network state
(it represents connection between client and one server) contains
number of fields used exclusively for semi-lockless input data processing
(it locks state when performs actual reading, but does not
hold that lock when processing incoming messages, since it is the only
path, which receives data), now it also has crypto information about
how to manage reply messages (they include read page reply for example),
so it does not queue work to be done by crypto threads, but does that itself
instead. It may or may not be the bottleneck of the input path, tests will
provide facts, so far I do not have plans to change it, but it can be done
of course if performance will suck.&lt;br/&gt;&lt;br/&gt;

After I finish crypto processing in both client (it has been written, but requires lots
of testing with server) and server (just have started to recall how to work with
OpenSSL. Well, I've read how HMAC works in OpenSSL, found it to be simple enough
and then started to read how to parse binary data in LISP :)
But anything which is interesting for me now, ends up in good results for all other
projects), I will switch to something different for a while.&lt;br/&gt;
Some voices in the brain ask to be spread it in lots of interesting directions :)

Comments (0)
   </description>
 </item>
  <item>
    <title>POHMELFS crypto performance.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/fs/2008_06_25</link>
    <description>
I've ran read/reread and write/rewrite tests as described
in &lt;a href=&quot;http://tservice.net.ru/~s0mbre/blog/devel/fs/2008_06_19_1.html&quot;&gt;previous&lt;/a&gt; run,
now with HMAC(SHA1) of all outgoing transactions (note, that reading response data is not yet
encrypted and does not contain digital signature, server also does not support neither operation),
essentially only writing should be affected by this, but I also ran reading tests for compelteness.&lt;br/&gt;&lt;br/&gt;

Results show zero performance overhead of the full data SHA1 hashing, but note that quite fast
machines were used (2 3Ghz Xeons (2 physical and 2 logical CPUs, HT enabled) with 1 GB of RAM). All the time only
two crypto threads were actively hashing data, since there are only two &lt;code&gt;pdflush&lt;/code&gt; threads on this machine.&lt;br/&gt;&lt;br/&gt;

&lt;img src=&quot;http://tservice.net.ru/~s0mbre/gallery/pohmelfs_iozone/read.png&quot; alt=&quot;Read&quot;&gt;
&lt;img src=&quot;http://tservice.net.ru/~s0mbre/gallery/pohmelfs_iozone/reread.png&quot; alt=&quot;Reread&quot;&gt;&lt;br/&gt;&lt;br/&gt;

&lt;img src=&quot;http://tservice.net.ru/~s0mbre/gallery/pohmelfs_iozone/write.png&quot; alt=&quot;Write&quot;&gt;
&lt;img src=&quot;http://tservice.net.ru/~s0mbre/gallery/pohmelfs_iozone/rewrite.png&quot; alt=&quot;Rewrite&quot;&gt;&lt;br/&gt;&lt;br/&gt;

Writing is even faster with hashing, but results drifted around, so essentially performance is the same.

Comments (0)
   </description>
 </item>
  <item>
    <title>VM gotcha: forbidden double kmapping.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/other/2008_06_24</link>
    <description>
I've just known, that it is impossible to map the same page
twice: for example first time using &lt;code&gt;kmap()/kunmap()&lt;/code&gt;
and second one via &lt;code&gt;kmap_atomic()/kunmap_atomic()&lt;/code&gt;.&lt;br/&gt;
Although mechanisms are a bit different in both mappings, it is
forbidden to do and system will panic like this:&lt;pre&gt;
IP: [&lt;c0114089&gt;] kmap_atomic_prot+0x1b/0xc5
*pdpt = 0000000031c79001 *pde = 0000000000000000 
Oops: 0000 [#1] SMP 

Pid: 6478, comm: pohmelfs-crypto Not tainted (2.6.25 #27)
EIP: 0060:[&lt;c0114089&gt;] EFLAGS: 00010202 CPU: 2
EIP is at kmap_atomic_prot+0x1b/0xc5
EAX: ebc7c000 EBX: 00000003 ECX: 00000000 EDX: 00000003
ESI: 00000fdc EDI: 00000163 EBP: 80000000 ESP: ebc7dee4
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process pohmelfs-crypto (pid: 6478, ti=ebc7c000 task=f25040b0 task.ti=ebc7c000)
Stack: 00000000 00000003 00000fdc f7cf4078 00000fdc c0114144 00000163 80000000 
       c01991b1 ebc7df44 f70e3580 00000000 ebc7dfa8 ebc7df40 f70e3580 00000003 
       00000000 f7cf4000 f70e3580 f70ff8b0 f70ff880 f7096c00 c019a771 f70e3580 
Call Trace:
 [&lt;c0114144&gt;] kmap_atomic+0x11/0x14
 [&lt;c01991b1&gt;] update2+0x7c/0x13f
 [&lt;c019a771&gt;] hmac_update+0x49/0x50
 [&lt;f8b64e01&gt;] pohmelfs_crypto_thread_func+0x304/0x3e8 [pohmelfs]
 [&lt;c011813c&gt;] hrtick_set+0x7a/0xd7
 [&lt;c012af08&gt;] autoremove_wake_function+0x0/0x2b
 [&lt;f8b64afd&gt;] pohmelfs_crypto_thread_func+0x0/0x3e8 [pohmelfs]
 [&lt;c012ae45&gt;] kthread+0x38/0x5f
 [&lt;c012ae0d&gt;] kthread+0x0/0x5f
 [&lt;c01046b7&gt;] kernel_thread_helper+0x7/0x10&lt;/pre&gt;

This happend for exacly above case, when page was first mapped via
&lt;code&gt;kmap()&lt;/code&gt; in POHMELFS and then via
&lt;code&gt;kmap_atomic()&lt;/code&gt; in HMAC crypto processing code.&lt;br/&gt;
I wonder what will happen if we ever try to send kmapped pages
over IPsec tunnel. Likely it will ooops too...&lt;br/&gt;
This can happen for example when pages are mapped in
&lt;code&gt;tcp_sendpage()&lt;/code&gt; when calling &lt;code&gt;sendfile()&lt;/code&gt;
over the interface, which does not support hardware checksumming
and scater-gather: mapped pages are pushed down the network stack
where they will be eventually encrypted/hashed in IPsec, which
will in turn call &lt;code&gt;kmap_atomic()&lt;/code&gt;.&lt;br/&gt;&lt;br/&gt;

So, if you will find obscure oops in &lt;code&gt;kmap_atomic()&lt;/code&gt;
and friends, first check that calling stack did not map page
earlier.

Comments (0)
   </description>
 </item>
  <item>
    <title>POHMELFS client got initial part of multithreaded crypto/checksum processing.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/fs/2008_06_23_1</link>
    <description>
So far it only includes encryption and hash calculation for outgoing
transactions. System has (mount option) number of threads per superblock,
which are responsible for encryption/hashing (each thread has own crypto structure,
so there are no additional allocations in the fast path, although I think
they would not harm performance since should be small enough
fraction on top of crypto processing overhead) and subsequent data sending,
so original caller (like writeback/readahead code) will not block if there
are ready threads, otherwise it will wait until some thread finishes its current crypto work.&lt;br/&gt;&lt;br/&gt;

I decided to implement kind of continuation for such transactions, when network sending
code (which is supposed to be started after crypto processing) will be invoked from those threads,
which performed crypto operations, and not returning back to originall caller context.
For massively multiqueue NICs that should be a benefit, but so far I did not test its performance.&lt;br/&gt;
Next step is receiving crypto support and userspace changes.

Comments (0)
   </description>
 </item>
  <item>
    <title>Crypto processing in POHMELFS. OpenSSL vs GNU TLS.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/fs/2008_06_23</link>
    <description>
If I did not miss something,
&lt;a href=&quot;http://www.gnu.org/software/gnutls/&quot;&gt;GNU TLS&lt;/a&gt; (I never worked with it)
supports very limited amount of ciphers and hashes, so it is not appropriate for
filesystem data protection layer.&lt;br/&gt;
According to its
&lt;a href=&quot;http://www.gnu.org/software/gnutls/manual/html_node/All-the-supported-ciphersuites-in-GnuTLS.html#All-the-supported-ciphersuites-in-GnuTLS&quot;&gt;documentation&lt;/a&gt;
GNU TLS only supports AES, RC4 and 3DES ciphers and SHA1 and MD5 hashes. There is also only CBC
chaining mode and several hash/cipher schemes.&lt;br/&gt;&lt;br/&gt;

So, POHMELFS server will use OpenSSL for data protection. Sooner or later OpenSSL
will get hardware crypto support on Linux too (well, Linux crypto stack should first
implement userspace API, which does not exist yet, although there is a
&lt;a href=&quot;http://marc.info/?l=linux-crypto-vger&amp;m=121261522619909&amp;w=2&quot;&gt;work&lt;/a&gt;
by Loc Ho from AMCC to add such support).&lt;br/&gt;&lt;br/&gt;

So far I decided to implement following protection scheme: checksumm or encryption
will cover full transaction data, but will be applied by chunks:&lt;ul&gt;
&lt;li&gt;Transaction 'first-level' data, i.e. header and data immediately placed after transaction
header. For all commands except page writing it will be finish. &lt;/li&gt;
&lt;li&gt;For write pages command, each header is generated dynamically and does not exist
until data is really being sent, so crypto code will run over all pages and update checksum
processing headers and data pages separately. Checkum update should be simple enough, since
there are crypto helpers to update and finalize checksum, but encryption is more complex:
I requires all chunks to be setup in advance in single scatterlist chain, with dynamic header
generation it is too big overhead (it requires not only scatterlist allocation, but also
header allocation just for encryption), so encryption will be done separately for headers and pages,
and I will have to create some IV propagation scheme (like last bytes of previous unencrypted chunk
will become IV for the next chunk, or something like that). I understand, that it may be not very
secure approach though.&lt;/li&gt;
&lt;li&gt;Reading data back from server is simpler, since there are no transactions,
and data will be encrypted/checksummed like in the first step above. It is possible, that it will
force to increase network header structure a bit (32 or 16 bits to store size of the attached checksumm).&lt;/li&gt;&lt;/ul&gt;

Comments (2)
   </description>
 </item>
  <item>
    <title>3:1. It is fucking unbelivable, but we wooooon!</title>
    <link>http://tservice.net.ru/~s0mbre/blog//other/2008_06_22_1</link>
    <description>
We won against Holland, one of the stongest football team!&lt;br/&gt;
There was blood on the field, rule breaks and other crap, but we made it!&lt;br/&gt;&lt;br/&gt;

3:1 against Holland. &quot;Russia rised from the knees&quot; as screamed.
We are in one half of the final. You have been warned, russians are coming! :)&lt;br/&gt;&lt;br/&gt;

Pivo and vodka, we will not sleep today!

Comments (3)
   </description>
 </item>
  <item>
    <title>We do not belive in penalti!</title>
    <link>http://tservice.net.ru/~s0mbre/blog//other/2008_06_22</link>
    <description>
It is fucking unbelivable, but Russia plays with Holland 
and score is 1:1. Not only its equal, we do play a cool football!&lt;br/&gt;
And Holland equaled score in a 87 minute, we were so close, but
it is not yet stopped. We can win. We will win!&lt;br/&gt;
I do not understand, how in the hell our team started to play &lt;b&gt;that&lt;/b&gt;
good, we can. We fucking can, when we want. We play not for the goal, not
for the money, not for fucking anyhintg, we play just for the game.
And game wins!&lt;br/&gt;&lt;br/&gt;

Ended first half of the additional time. Russia vs Holland 1:1.&lt;br/&gt;
We can. Just because we can.

Comments (0)
   </description>
 </item>
  <item>
    <title>POHMELFS and HMAC/crypto operations.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/fs/2008_06_19_2</link>
    <description>
As I found with
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/old/?section=projects&amp;item=dst&quot;&gt;distributed storage&lt;/a&gt;
project, any communication channels, which involve huge amount of data transfers,
have to have additional strong checksum embedded in the protocol, since TCP one is not
enough in some cases. There are some options, like TCP MD5 signatures or IPsec transformations,
but it is not always available.&lt;br/&gt;&lt;br/&gt;

&lt;a href=&quot;http://tservice.net.ru/~s0mbre/old/?section=projects&amp;item=pohmelfs&quot;&gt;POHMELFS&lt;/a&gt;
will include ability to both encrypt whole data channel and/or only digitally
sign all messages. This will be implemented on transaction level, so no higher layer code
(like reading/writing data functions) will ever be affected.&lt;br/&gt;
POHMELFS will also have mount time self-configuration, i.e. client will send to server
information about supported capabilities, requested by administrator, and if server does not
support some of them (for example it can only do HMAC and not encryption, and both operations were
requested at mount time), they will be dropped (and mount failed optionally).
In the future it will be possible to extend it with additional flags if needed.&lt;br/&gt;&lt;br/&gt;

&lt;code&gt;mount&lt;/code&gt; is not very convenient command to transfer crypto information (like binary keys)
to kernel, so I use the same infrastructure as initial server group initialization (i.e. using
POHMELFS existing configuration utility).&lt;br/&gt;&lt;br/&gt;

Support for HMAC and encryption will force server to depend on &lt;a href=&quot;http://www.openssl.org/&quot;&gt;OpenSSL&lt;/a&gt;,
but I do not think it is a problem. In some future time I can write autoconfiguration, which will
allow to compile server without crypto support (and thus do not accept encrypted clients and 
do not check signatures) if there is no OpenSSL.&lt;br/&gt;&lt;br/&gt;

After crypto operations are implemented (I expect it to be finished this week), I will release as promised
new &lt;a href=&quot;http://tservice.net.ru/~s0mbre/old/?section=projects&amp;item=netchannel&quot;&gt;netchannel&lt;/a&gt;
version (and will remove unneded functionality like NAT), and add some interesting bits (like async
processing) into &lt;a href=&quot;http://tservice.net.ru/~s0mbre/old/?section=projects&amp;item=dst&quot;&gt;distributed storage&lt;/a&gt;,
so expect its new release soon too.&lt;br/&gt;&lt;br/&gt;

Stay tuned!

Comments (2)
   </description>
 </item>
  <item>
    <title>CLISP socket streams.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/other/2008_06_19</link>
    <description>
Excellent &lt;a href=&quot;http://clisp.cons.org/impnotes/socket.html&quot;&gt;documentation&lt;/a&gt; with examples.
I expect that it is implementation (i.e. CLISP) specific and will not work with SBCL or Allegro
for example, but nevertheless I want to learn and somewhat use it.&lt;br/&gt;
If it will be good for my usage cases, what my next userspace server will be written with? :)

Comments (0)
   </description>
 </item>
  <item>
    <title>POHMELFS, NFS, Ext4 and XFS in iozone benchmark. Graphs.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/fs/2008_06_19_1</link>
    <description>
Hardware used in testing: 4-way Intel E7520 system (two logical and two physical CPUs)
3Ghz 32 bit Xeons with 1gb of ram, Adaptec AIC7902 Ultra320 SCSI adapter with SEAGATE
ST3300007LC 10k rpm 300 Gb testing disk. Its linear reading speed is about 90 MB/s.&lt;br/&gt;&lt;br/&gt;

Software used in testing: 2.6.25 kernels (on server and client), in-kernel async NFS server,
userspace POHMELFS server.&lt;br/&gt;&lt;br/&gt;

Tests were performed with 8gb files (amount of ram was reduced to 1gb to eliminate caching
influence) with different (from 8 to 1024 KB) record size. I ran write/rewrite, read/reread and
random read and write tests.&lt;br/&gt;&lt;br/&gt;

&lt;img src=&quot;http://tservice.net.ru/~s0mbre/gallery/pohmelfs_iozone/read.png&quot; alt=&quot;Read&quot;&gt;
&lt;img src=&quot;http://tservice.net.ru/~s0mbre/gallery/pohmelfs_iozone/reread.png&quot; alt=&quot;Reread&quot;&gt;&lt;br/&gt;&lt;br/&gt;

&lt;img src=&quot;http://tservice.net.ru/~s0mbre/gallery/pohmelfs_iozone/write.png&quot; alt=&quot;Write&quot;&gt;
&lt;img src=&quot;http://tservice.net.ru/~s0mbre/gallery/pohmelfs_iozone/rewrite.png&quot; alt=&quot;Rewrite&quot;&gt;&lt;br/&gt;&lt;br/&gt;

&lt;img src=&quot;http://tservice.net.ru/~s0mbre/gallery/pohmelfs_iozone/random-read.png&quot; alt=&quot;Random read&quot;&gt;
&lt;img src=&quot;http://tservice.net.ru/~s0mbre/gallery/pohmelfs_iozone/random-write.png&quot; alt=&quot;Random write&quot;&gt;

Comments (0)
   </description>
 </item>
  <item>
    <title>CRFS got metadata cache coherency support.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/fs/2008_06_19</link>
    <description>
&lt;a href=&quot;http://www.zabbo.net&quot;&gt;Zach Brown&lt;/a&gt; has 
&lt;a href=&quot;http://oss.oracle.com/mercurial/zab/crfs/rev/c7122a7d42d3&quot;&gt;committed&lt;/a&gt;
cache coherency support into CRFS repository.&lt;br/&gt;
Cache coherency protocol works by broadcasting special messages from
server, and each client invalidates appropriate inodes (and dentries if needed)
before sending back a reply.&lt;br/&gt;
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/blog/devel/fs/index.html&quot;&gt;POHMELFS&lt;/a&gt;
uses a bit different mechanism: client does not send acks back to server,
so all such messages are kind of advisory-only, but I did not yet complete (well,
I did not even think about this problem this week) locking design, so it can change.&lt;br/&gt;&lt;br/&gt;

Main problem with sync cache coherency support is its absolute non-scalability.
While number of sage cases might require such behaviour, I expect that if not major,
but noticeble part of users do not want perfromance degradation as a price for
posix-like coherency expectation. This approach is worse that write-through cache,
since there is whole round-trip of the cache coherency request instead of just
data sending during its writing. Single direction sending is faster than sending+waiting,
so for me it is still a questionable approach.&lt;br/&gt;&lt;br/&gt;

I will think a lot of this problem later this week(end), so that solution would
satisfy both high-perfomance and safety camps (although at some degree only I think).

Comments (0)
   </description>
 </item>
  <item>
    <title>LISP macros rox!</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/other/2008_06_18</link>
    <description>
&lt;!-- Generator: GNU source-highlight 2.9
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite --&gt;
&lt;pre&gt;&lt;tt&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;defmacro with&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;output&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;dir &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;((&lt;/font&gt;out pos dir flags&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt; &lt;font color=&quot;#990000&quot;&gt;&amp;amp;&lt;/font&gt;body form&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt;
  `&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;let &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;((,&lt;/font&gt;pos &lt;font color=&quot;#993399&quot;&gt;2&lt;/font&gt;&lt;font color=&quot;#990000&quot;&gt;))&lt;/font&gt;
     &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;dolist &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;operation &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;nthcdr &lt;font color=&quot;#993399&quot;&gt;2&lt;/font&gt; &lt;font color=&quot;#990000&quot;&gt;*&lt;/font&gt;iozone&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;tests&lt;font color=&quot;#990000&quot;&gt;*))&lt;/font&gt;
       &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;let&lt;font color=&quot;#990000&quot;&gt;*&lt;/font&gt; &lt;font color=&quot;#990000&quot;&gt;((&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;dir &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;pathname&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;as&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;directory dir&lt;font color=&quot;#990000&quot;&gt;))&lt;/font&gt;
	     &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;output&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;file &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;make&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;pathname
			 &lt;font color=&quot;#990000&quot;&gt;:&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;directory &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;pathname&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;directory &lt;font color=&quot;#990000&quot;&gt;,&lt;/font&gt;dir&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt;
			 &lt;font color=&quot;#990000&quot;&gt;:&lt;/font&gt;name operation
			 &lt;font color=&quot;#990000&quot;&gt;:&lt;/font&gt;type &lt;font color=&quot;#FF0000&quot;&gt;&quot;gnuplot&quot;&lt;/font&gt;&lt;font color=&quot;#990000&quot;&gt;)))&lt;/font&gt;
        &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;with&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;open&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;file &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(,&lt;/font&gt;out output&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;file &lt;font color=&quot;#990000&quot;&gt;:&lt;/font&gt;direction &lt;font color=&quot;#990000&quot;&gt;:&lt;/font&gt;output &lt;font color=&quot;#990000&quot;&gt;:&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#0000FF&quot;&gt;if&lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;exists &lt;font color=&quot;#990000&quot;&gt;,&lt;/font&gt;flags&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt;
	  &lt;font color=&quot;#990000&quot;&gt;,&lt;/font&gt;@form&lt;font color=&quot;#990000&quot;&gt;))&lt;/font&gt;
       &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;incf pos&lt;font color=&quot;#990000&quot;&gt;))))&lt;/font&gt;

&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;defun write&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;gnuplot&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;headers &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;dir&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt;
  &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;with&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;output&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;dir &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;out pos dir &lt;font color=&quot;#990000&quot;&gt;:&lt;/font&gt;supersede&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt;
		    &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;format out &lt;font color=&quot;#FF0000&quot;&gt;&quot;set title \&quot;Iozone performance: ~a, KB/s\&quot;~%&quot;&lt;/font&gt; operation&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt;
	            &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;format out &lt;font color=&quot;#FF0000&quot;&gt;&quot;set terminal png small size 450 350~%&quot;&lt;/font&gt;&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt;
	            &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;format out &lt;font color=&quot;#FF0000&quot;&gt;&quot;set logscale x~%&quot;&lt;/font&gt;&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt;
	            &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;format out &lt;font color=&quot;#FF0000&quot;&gt;&quot;set xlabel \&quot;Record size in KBytes\&quot;~%&quot;&lt;/font&gt;&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt;
	            &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;format out &lt;font color=&quot;#FF0000&quot;&gt;&quot;set ylabel \&quot;Kbytes/sec\&quot;~%&quot;&lt;/font&gt;&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt;
	            &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;format out &lt;font color=&quot;#FF0000&quot;&gt;&quot;set output \&quot;~a.png\&quot;~%&quot;&lt;/font&gt; &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;elt &lt;font color=&quot;#990000&quot;&gt;*&lt;/font&gt;iozone&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;tests&lt;font color=&quot;#990000&quot;&gt;*&lt;/font&gt; pos&lt;font color=&quot;#990000&quot;&gt;))&lt;/font&gt;
		    &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;format out &lt;font color=&quot;#FF0000&quot;&gt;&quot;plot &quot;&lt;/font&gt;&lt;font color=&quot;#990000&quot;&gt;)))&lt;/font&gt;

&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;defun update&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;gnuplot&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;headers &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;dir file&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt;
  &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;with&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;output&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;dir &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;out pos dir &lt;font color=&quot;#990000&quot;&gt;:&lt;/font&gt;append&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt;
		   &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;unless &lt;font color=&quot;#990000&quot;&gt;*&lt;/font&gt;first&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;file&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;p&lt;font color=&quot;#990000&quot;&gt;*&lt;/font&gt;
		     &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;format out &lt;font color=&quot;#FF0000&quot;&gt;&quot;, &quot;&lt;/font&gt;&lt;font color=&quot;#990000&quot;&gt;))&lt;/font&gt;
		   &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;let&lt;font color=&quot;#990000&quot;&gt;*&lt;/font&gt; &lt;font color=&quot;#990000&quot;&gt;((&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;fstype &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;pathname&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;name file&lt;font color=&quot;#990000&quot;&gt;))&lt;/font&gt;
			  &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;name &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;make&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;output&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;name file&lt;font color=&quot;#990000&quot;&gt;)))&lt;/font&gt;
		     &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;format out &lt;font color=&quot;#FF0000&quot;&gt;&quot;\&quot;~a\&quot; using 1:~d title \&quot;~a\&quot; with lines&quot;&lt;/font&gt; &lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;name &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;&lt;font color=&quot;#993399&quot;&gt;1&lt;/font&gt;&lt;font color=&quot;#990000&quot;&gt;+&lt;/font&gt; pos&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt; fstype&lt;font color=&quot;#990000&quot;&gt;))))&lt;/font&gt;
&lt;/tt&gt;&lt;/pre&gt;

Macros are really the coolest feature of the LISP. Now I believe I started to understand LISP kung-fu.&lt;br/&gt;
Iozone parser is essentially ready. I was a bit pessimistic yesterday: it took only half of the day and several
hours today, and code itself is rather ugly (and frequently really ugly, likely far from the LISP way), but it works:
it runs over given dir, searches there for files with given extensions, parses them (removes unneded iozone information),
writes result to specified directory. Also runs over iozone test strings and generate gnuplot scripts for them, which
will build a graph based on filesystem info it gathered traversing the tree above, so results looks like this:&lt;pre&gt;
$ ./parser.lisp
Processing: /tmp/iozone/tmpfs/nfs.out ... done
Processing: /tmp/iozone/tmpfs/pohmelfs.out ... done
$ cat /tmp/iozone/tmpfs/out/read.gnuplot 
set title &quot;Iozone performance: read, KB/s&quot;
set terminal png small size 450 350
set logscale x
set xlabel &quot;Record size in KBytes&quot;
set ylabel &quot;Kbytes/sec&quot;
set output &quot;read.png&quot;
plot &quot;/tmp/iozone/tmpfs/nfs.out.data&quot; using 1:5 title &quot;nfs&quot; with lines,
	&quot;/tmp/iozone/tmpfs/pohmelfs.out.data&quot; using 1:5 title &quot;pohmelfs&quot; with lines&lt;/pre&gt;

Comments (2)
   </description>
 </item>
  <item>
    <title>LISP development zen.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/other/2008_06_17</link>
    <description>
&lt;pre&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;defun &lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;string_to_list &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;str&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt;
  &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;let &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;((&lt;/font&gt;num &lt;font color=&quot;#993399&quot;&gt;0&lt;/font&gt;&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt; &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;ret '&lt;font color=&quot;#990000&quot;&gt;())&lt;/font&gt; &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;string_len &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;length str&lt;font color=&quot;#990000&quot;&gt;)))&lt;/font&gt;
    &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;dotimes &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;i string_len&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt;
      &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;let &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;((&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;sym &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;elt str i&lt;font color=&quot;#990000&quot;&gt;)))&lt;/font&gt;
        &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;cond
	  &lt;font color=&quot;#990000&quot;&gt;((&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#0000FF&quot;&gt;not&lt;/font&gt;&lt;/b&gt; &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;char&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;number&lt;font color=&quot;#990000&quot;&gt;-&lt;/font&gt;p sym&lt;font color=&quot;#990000&quot;&gt;))&lt;/font&gt;
            &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;unless &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;eql num &lt;font color=&quot;#993399&quot;&gt;0&lt;/font&gt;&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt;
	      &lt;font color=&quot;#990000&quot;&gt;;(&lt;/font&gt;format t &lt;font color=&quot;#FF0000&quot;&gt;&quot;: ~d~%&quot;&lt;/font&gt; num&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt;
	      &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;push num ret&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt;
              &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;setf num &lt;font color=&quot;#993399&quot;&gt;0&lt;/font&gt;&lt;font color=&quot;#990000&quot;&gt;)))&lt;/font&gt;
          &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;t &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;setf &lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;num &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(+&lt;/font&gt; &lt;font color=&quot;#990000&quot;&gt;(*&lt;/font&gt; num &lt;font color=&quot;#993399&quot;&gt;10&lt;/font&gt;&lt;font color=&quot;#990000&quot;&gt;)&lt;/font&gt; &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;to_number sym&lt;font color=&quot;#990000&quot;&gt;)))&lt;/font&gt;
	     &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;&lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;when  &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;eql &lt;b&gt;&lt;font color=&quot;#000000&quot;&gt;i &lt;/font&gt;&lt;/b&gt;&lt;font color=&quot;#990000&quot;&gt;(-&lt;/font&gt; string_len &lt;font color=&quot;#993399&quot;&gt;1&lt;/font&gt;&lt;font color=&quot;#990000&quot;&gt;))&lt;/font&gt;
	       &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;push num ret&lt;font color=&quot;#990000&quot;&gt;))))))&lt;/font&gt;
  &lt;font color=&quot;#990000&quot;&gt;(&lt;/font&gt;nreverse ret&lt;font color=&quot;#990000&quot;&gt;)))&lt;/font&gt;&lt;/pre&gt;

Which is a part of my LISP parser for iozone output files. So far it is able to convert its output numbers (performance in KB/sec)
into LISP lists (one list per record), so single line of iozone output becomes a single list of numbers
(ugh, I was forced to write string-to-number conversion function).&lt;br/&gt;
It is not that serious achievement likely, and it took the whole day, but nevertheless I like it,
although I would write the same in C much faster :)&lt;br/&gt;&lt;br/&gt;

Main problem with Lisp for me is its functional-conditioning system. Converted to C it looks like:&lt;pre&gt;
if (a) {
  if (b) {
    if (c) {
      do_stuff()
    }
  }
}&lt;/pre&gt;

While I would write:&lt;pre&gt;
if (!a)
  return;
if (!b)
  return;
if (!c)
  return;
do_stuff()&lt;/pre&gt;

So far I did not use macros at all, and all the time looked into
&lt;a href=&quot;http://www.gigamonkeys.com/book/&quot;&gt;Practical Common Lisp&lt;/a&gt; book
(and frankly got from there directory processing functions, although
modified it a bit), but what would you expect from the first project. Tomorrow I will extend it to
write gnuplot-compatible file and finally generate some graphs (I do not know
how to call external programms from LISP though).&lt;br/&gt;
Frankly, I'm not yet excited about how cool LISP is, but I like it, since it is different.
Just like I like my &lt;s&gt;neverending&lt;/s&gt;
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/blog/devel/flat/index.html&quot;&gt;appartment development process&lt;/a&gt;.&lt;br/&gt;
Ugh, and with proper automatic vim highlightning I am not afraid of parenthesis.&lt;br/&gt;&lt;br/&gt;

Interested reader can grab my &lt;a href=&quot;http://tservice.net.ru/~s0mbre/archive/tmp/lisp/iozone_parser.lisp&quot;&gt;sources&lt;/a&gt;
and comment on ugliness.&lt;br/&gt;&lt;br/&gt;

Also found an 'interesting' article at IEEE about LISP:
&lt;a href=&quot;http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?tp=&amp;arnumber=4145042&amp;isnumber=4145008&quot;&gt;Migration of Common Lisp Programs to the Java Platform -The Linj Approach&lt;/a&gt; :)

Comments (2)
   </description>
 </item>
  <item>
    <title>Meanwhile at appartment development side.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/flat/2008_06_15</link>
    <description>
Decided to work on completely different than usual
area today, so &lt;s&gt;neverending&lt;/s&gt; appartment development.&lt;br/&gt;&lt;br/&gt;

Today I painted whole ceiling in the kitched and I want to belive,
that it is the last time. It was not that quick, but took noticebly smaller
amount of day.&lt;br/&gt;&lt;br/&gt;

Main task was floor in the hall. I finially covered it with ceramic granite.&lt;br/&gt;
It was supposed to be seamless granite installation, but... tiles have so precise
dimensions, that difference between them was never more than &lt;b&gt;half of santimeter&lt;/b&gt;
in each side, so I was forced to make small seams and move tiles around quite
for a while before they formed somewhat straight lines, although there are
lots of non-straight crosses.&lt;br/&gt;
Nevertheless it looks cool, I'm glad I finished this part.&lt;br/&gt;&lt;br/&gt;

&lt;img src=&quot;http://tservice.net.ru/~s0mbre/gallery/hall_ceramic_granite.jpg&quot; alt=&quot;Hall ceramic granite&quot;&gt;

Comments (0)
   </description>
 </item>
  <item>
    <title>Passive OS fingerprinting.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/networking/2008_06_14_1</link>
    <description>
Ever dreamt to block all Linux users in your network from accessing
internet and allow full bandwidth to Windows worm? We have to care about
our smaller brothers, so this iptables extension module allows you to do
so.

OSF stands for OS Fingerprint allows you to build usual iptables
decision on incoming TCP packets, only initial handhsake containing SYN
bit is enough to understand what remote OS is. Original idea belongs to
&lt;a href=&quot;http://lcamtuf.coredump.cx/&quot;&gt;Michal Zalewski&lt;/a&gt;.&lt;br/&gt;
This iptables module was
imlemented almost 5 years ago and lived in patch-o-matic (userspace
library is still there) iptables tree. Now I've updated it to Xtables
and send for review.&lt;br/&gt;&lt;br/&gt;

Installation steps are described on the
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/old/?section=projects&amp;item=osf&quot;&gt;homepage&lt;/a&gt;,
but are trivial and include usual make/make lib building and loading rules into the module
via procfs file.&lt;pre&gt;
# insmod ./ipt_osf.ko
# ./load ./pf.os /proc/sys/net/ipv4/osf
# iptables -I INPUT -j ACCEPT -p tcp -m osf --genre Linux --log 0 --ttl 2 --connector&lt;/pre&gt;

You find something like this in syslog:&lt;pre&gt;
ipt_osf: Windows [2000:SP3:Windows XP Pro SP1, 2000 SP3]: 11.22.33.55:4024 -&gt; 11.22.33.44:139&lt;/pre&gt;

Comments (0)
   </description>
 </item>
  <item>
    <title>New userspace network stack release.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/networking/unetstack/2008_06_14</link>
    <description>
Fixed bug found by Salvatore Del Popolo (delpopolo_dit.unitn.it)
in TCP implementation, when system checked sending window and determined,
that packet was not allowed to be sent and nevertheless tried to do so in some
cases.&lt;br/&gt;&lt;br/&gt;

&lt;a href=&quot;http://tservice.net.ru/~s0mbre/old/?section=projects&amp;item=unetstack&quot;&gt;Userspace network stack&lt;/a&gt;
is a very fast (if working on top of
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/old/?section=projects&amp;item=netchannel&quot;&gt;netchannels&lt;/a&gt;,
also supported packet socket) and very small network stack (TCP/UDP/IP/ethernet) implemeneted
entirely in userspace. Because of it lives near the very the end of the peer (i.e. very close
or even embedded into application), it allows much faster processing of some workloads, namely
small packet sending and receiving, where
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/blog/devel/networking/2006_10_26.html&quot;&gt;it&lt;/a&gt;
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/blog/devel/networking/2006_12_21.html&quot;&gt;outperforms&lt;/a&gt;
vanilla Linux TCP/IP stack 3 times in performance and 4 times CPU usage (sending and receiving vary).&lt;br/&gt;&lt;br/&gt;

&lt;img src=&quot;http://tservice.net.ru/~s0mbre/gallery/atcp_speed_gigabit.png&quot; alt=&quot;ATCP gigabit test&quot;&gt;&lt;br/&gt;&lt;br/&gt;

Comapre netchannels+unetstack versus Linux sockets (2006 year numbers).&lt;br/&gt;&lt;br/&gt;

It is not about problems in the Linux stack, but overhead of syscalls, which are in turn
results of too separate data sending and reply processing in the existing model.

Comments (0)
   </description>
 </item>
  <item>
    <title>CARP: Common Address Redundancy Protocol for Linux kernel.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/networking/2008_06_14</link>
    <description>
I've finally made a new release of the
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/old/?section=projects&amp;item=carp&quot;&gt;CARP&lt;/a&gt;
for Linux kernel.&lt;br/&gt;&lt;br/&gt;

CARP is an improved version of the Virtual Router Redundancy Protocol (VRRP) standard.
The latest protocol to help provide high availability and network redundancy, it was
developed because router giant Cisco Systems believes that its Hot Standby Router
Protocol (HSRP) patent covers some of the same technical areas as VRRP.&lt;br/&gt;&lt;br/&gt;

This project allows you to build high-available clusters of multiple machines with
balanced master selection between them. Installation and setup are pretty trivial:&lt;pre&gt;
$ tar -zxf carp_latest.tar.gz
$ cd carp
$ make

# insmod ip_carp.ko
# modprobe cn
# insmod carp_conn.ko
# ifconfig carp0 up
# carp_conn_daemon -m master.sh -b backup.sh&lt;/pre&gt;

And the same on all other machines.&lt;br/&gt;
Each script as you got from its name is executed when node becomes master or backup one,
you can put there firewall rule changes, traffic shaping setup, network daemon start/stop
scripts and whatever you like.&lt;br/&gt;&lt;br/&gt;

Its main advantage over any other existing open (well, it behaves much more robust than Cisco VRRP though)
master/backup solutions (like Hearbeat or userspace CARP) is ability to setup multicast address (via usual
&lt;code&gt;/sbin/ifconfig&lt;/code&gt; command) and thus do not confuse some &lt;s&gt;crappy&lt;/s&gt;Cisco
hardware, which will not understand that node changed.&lt;br/&gt;&lt;br/&gt;

One can get the latest sources from &lt;a href=&quot;http://tservice.net.ru/~s0mbre/old/?section=projects&amp;item=carp&quot;&gt;CARP homepage&lt;/a&gt;.&lt;br/&gt;
Enjoy!

Comments (0)
   </description>
 </item>
  <item>
    <title>The latest iozone benchmark of POHMELFS, NFS, XFS and Ext4.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/fs/2008_06_13_1</link>
    <description>
1Gb of RAM, 8Gb files. SEAGATE ST3300007LC 10k rpm 300 Gb on Adaptec AIC7902 Ultra320 SCSI adapter.&lt;br/&gt;&lt;br/&gt;

Performance in KB/s.&lt;br/&gt;&lt;br/&gt;

NFS:&lt;pre&gt;
                                                   random  random
     KB  reclen   write rewrite    read    reread    read   write
8388608       8   53210   57769    24304    24448    1360    4775
8388608      16   54577   57481    23871    24080    2592    7937
8388608      32   54736   56203    24015    24114    4738   12637
8388608      64   52075   54051    23653    23555    7610   18475
8388608     128   52307   54636    23305    23375   13017   26584
8388608     256   52189   53030    23585    23531   15615   34390
8388608     512   52938   54063    23709    23882   17524   42781
8388608    1024   57458   57006    24187    24292   29701   43892&lt;/pre&gt;

POHMELFS:&lt;pre&gt;
                                                   random  random
     KB  reclen   write rewrite    read    reread    read   write
8388608       8   66473   63721    74232    74288    1103    4953
8388608      16   52604   62339    73423    74259    2001    8438
8388608      32   53278   62283    73497    74115    3360   13849
8388608      64   56931   61370    73135    74077    5076   21063
8388608     128   59419   62743    72736    74122    8068   30279
8388608     256   60861   63094    73284    74554   10848   38869
8388608     512   59438   62081    73329    74441   17290   48722
8388608    1024   62790   62130    73322    74100   27741   46470&lt;/pre&gt;

POHMELFS write speed about 10% faster, read speed 3-3.5 times faster
(essentially disk/local fs IO limit, see below).
POHMELFS random read speed is smaller, and that is task with the highest priority now,
especially compared to local FS results.POHMELFS random write is slightly faster than NFS.&lt;br/&gt;&lt;br/&gt;

For comparison, local filesystem, used for tests.&lt;br/&gt;
&lt;code&gt;mkfs.xfs -d agcount=75 -l size=64m /dev/sdc1;&lt;br/&gt;
mount -o logbufs=8,nobarrier,noatime,nodiratime,osyncisdsync /dev/sdc1 /mnt/&lt;/code&gt;:&lt;pre&gt;
                                                   random  random
     KB  reclen   write rewrite    read    reread    read   write
8388608       8   75124   60560    77672    77797    1860    5059
8388608      16   75044   60036    77754    77775    3601    8772
8388608      32   75958   62038    77593    77765    6821   14781
8388608      64   74728   59384    77688    77782   12475   23228
8388608     128   74889   59676    77731    77736   21734   32241
8388608     256   75022   59285    77676    77718   28833   40324
8388608     512   74885   59187    77653    77713   40013   48057
8388608    1024   74838   64217    77796    77765   55100   46104&lt;/pre&gt;

And Ext4 to the group (mount options: &lt;code&gt;rw,noatime,data=writeback,extents&lt;/code&gt;):&lt;pre&gt;
                                                   random  random
     KB  reclen   write rewrite    read    reread    read   write
8388608       8   72107   73017    77276    77335    1849    5015
8388608      16   72276   73849    77304    77287    3577    8666
8388608      32   72680   73647    77284    77326    6755   14394
8388608      64   71965   74287    77327    77288   12366   22513
8388608     128   72660   73864    77207    77343   21617   31160
8388608     256   72813   74058    77296    77338   28652   42003
8388608     512   72985   73317    77284    77343   40572   50619
8388608    1024   72184   74131    77264    77250   55649   50365&lt;/pre&gt;

Nice graphs will be done, when I will write Lisp (no less :) parser for it.&lt;br/&gt;
Stay tuned!

Comments (3)
   </description>
 </item>
  <item>
    <title>New POHMELFS release: doing it wrong fast is at least better than doing it wrong slowly.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/fs/2008_06_13</link>
    <description>
Via &lt;a href=&quot;http://www.ashleighbrilliant.com/&quot;&gt;Ashleigh Brilliant&lt;/a&gt; and bits of Tullamore Dew.&lt;br/&gt;&lt;br/&gt;

Here we go, short changelog for this release:&lt;ul&gt;
&lt;li&gt;Read requests (data read, directory listing, lookup requests) balancing between multiple servers.&lt;/li&gt;
&lt;li&gt;Write requests are sent to multiple servers and completed only when all of them sent an ack.&lt;/li&gt;
&lt;li&gt;Ability to add and/or remove servers from working set at run-time from userspace (via netlink,
so the same command can be processed from real network though, but since server does not support it
yet, I dropped network part).&lt;/li&gt;
&lt;li&gt;Documentation (overall view and protocol commands)!&lt;/li&gt;
&lt;li&gt;Rename command (oops, forgot it in previous releases :)&lt;/li&gt;
&lt;li&gt;Several new mount options to control client behaviour instead of hardcoded numbers.&lt;/li&gt;
&lt;li&gt;Bug fixes.&lt;/li&gt;&lt;/ul&gt;

I will complete documentation in a few moments and send this release to the mail lists.&lt;br/&gt;
Very likely it is last non-bug-fixing release of the kernel client side, next release will incorporate
features, needed for distributed parallel data processing (like ability to add new servers via network
command from another servers), so most of the work will be devoted to server code.

Comments (0)
   </description>
 </item>
  <item>
    <title>Preparing for the next (last non-bug-fixing?) release.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/fs/2008_06_11</link>
    <description>
Essnetially that's it, I belive really most of the features I wanted
from network distributed parallel filesystem, which should live
in client, are already implemented in POHMELFS.&lt;br/&gt;&lt;br/&gt;

Client has following (if did not forget something interesting,
listed only interesting from parallel point of view) features:&lt;ul&gt;
&lt;li&gt;Automatic failover reconnect to the same server.&lt;/li&gt;
&lt;li&gt;Run-time addition/removal of the servers from the working set
(only via userspace command, since server does not support that yet,
but addition is trivial).&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://tservice.net.ru/~s0mbre/blog/devel/fs/2008_05_17.html&quot;&gt;Coherent&lt;/a&gt; data and metadata cache&lt;/li&gt;
&lt;li&gt;Transactions support. Full failover for all operations. Resending transactions to different servers on timeout or error.&lt;/li&gt;
&lt;li&gt;Load balancing of reading (directory reading and lookups inclusive) requests and
simultaneous writing to all servers in current working set.&lt;/li&gt;&lt;/ul&gt;

It is damn fast (but remember, that random reading
is no yet optimal enough, and in 
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/blog/devel/fs/2008_06_06.html&quot;&gt;the last tests&lt;/a&gt; it was slower NFS).&lt;br/&gt;&lt;br/&gt;

Userspace server meantime does not support lots of features it has to support
to be called complete parallel distributed solution, and main work should now
be concentrated on it.&lt;br/&gt;
Main missing (and the most complex) features are:&lt;ul&gt;
&lt;li&gt;Distributed data coherency protocol like PAXOS for server data, stored on multiple machines.&lt;/li&gt;
&lt;li&gt;Ability to mirror data itself on multiple machines.&lt;/li&gt;&lt;/ul&gt;

So, likely release will see the light tomorrow or Friday.

Comments (0)
   </description>
 </item>
  <item>
    <title>Sun and water. Sasha and Masha.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//life/2008_06_10</link>
    <description>
&lt;img src=&quot;http://tservice.net.ru/~s0mbre/gallery/masha_wed.jpg&quot;&gt;
&lt;img src=&quot;http://tservice.net.ru/~s0mbre/gallery/grange_wed.jpg&quot;&gt;&lt;br/&gt;&lt;br/&gt;

Thank you, that was great!

Comments (2)
   </description>
 </item>
  <item>
    <title>Contributors we are losing and kernel summit talk about it.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/other/2008_06_06</link>
    <description>
By 'we' I mean kernel community, although I do not think
I personally win or lose if someone decided not to hack
on Linux kernel.&lt;br/&gt;&lt;br/&gt;

I even found myself in a
&lt;a href=&quot;http://kerneltrap.org/mailarchive/linux-kernel/2008/5/30/1985104&quot;&gt;'contributors we are losing'&lt;/a&gt;
list :)&lt;br/&gt;&lt;br/&gt;

And yes, very likely Linux kernel community lost me (and I do believe
none cares as long as me).
But not Linux kernel, it is definitely the place I like.&lt;br/&gt;&lt;br/&gt;

People, who want to hack on Linux kernel will do that without all
that empty talks and brilliant ideas, all of which are only aimed in
a single direction: do what we will ask you to do for us. Be fair and
admit that you do not want new ideas implemented, you want old bugs (introduced
by someone else) fixed only, so that kernel got more respect without
possible additional work for you.&lt;br/&gt;&lt;br/&gt;

It is not how interested people work, instead they just decide themself
how and what to do. That's why kernel janitor project did not succeed:
it is not interesting for anyone. The same applies to its refocus to bugfixes.&lt;br/&gt;
And I do know what is kernel janitorial: I started with that not long time ago: fixed
trivial error checks like &lt;code&gt;request_region()/check_region()&lt;/code&gt; code
and other minor things like PCI remap errors.&lt;br/&gt;
That was hell of crap. Frequently there was a situation,
when I fixed lots (like 20 or more) drivers in one go and submitted a patch,
instead I was asked to split it to separate patches, to add each driver maintainer
into the copy, wait for theirs ACK, resubmit and so on. And frequently happend
(especially when new feature was introduced and lot of small code has to be changed
a little), that while I did that, some other known kernel hacker did the same, and his
patch was immediately applied.&lt;br/&gt;&lt;br/&gt;

Janitorial and all hypocrisy about 'we want more developers' just suck.&lt;br/&gt;&lt;br/&gt;

My advice for those who really want to hack on kernel: just do what you like,
try yourself in whatever subsystem you want, implement your ideas, be creative and do
whatever you like with kernel and not what all those kernel heads tell you to do.&lt;br/&gt;
The only way to succeed is to move forward!&lt;br/&gt;&lt;br/&gt;

Argh, and do not listen for any such kind of advices at all :)

Comments (3)
   </description>
 </item>
  <item>
    <title>POHMELFS development status.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/fs/2008_06_06</link>
    <description>
&lt;a href=&quot;http://tservice.net.ru/~s0mbre/old/?section=projects&amp;item=pohmelfs&quot;&gt;POHMELFS&lt;/a&gt;
got ability to add/remove servers in run-time (although not via network command,
since I do not know, how to test it yet), but via netlink interface. The same
message can be passed via network though, so it will be simple to extend.&lt;br/&gt;
Also, POHMELFS got readahead support via &lt;code&gt;-&gt;readpages()&lt;/code&gt;
callback. I removed AIO reading from POHMELFS in favour of readahead
and got excellent result in sequential reading: 3-3.5 times faster than NFS
and essentially reaching disk IO bandwidth (a bit less though),
but random reading dropped to miserable numbers.&lt;br/&gt;
Also rewritten reading method should provide better balanced between multiple servers
capabilities for the system, but it will not show any benefit in single-threaded
iozone benchmark, since it reads data via single call to &lt;code&gt;read()&lt;/code&gt;,
which gets sequential data access, which in turn is faster than network bandwidth.
So multithreaded load should greatly benefit from read balancing, but I did not
yet test that.&lt;br/&gt;&lt;br/&gt;

I ran sequential read/reread, write/rewrite and random read/write tests for
XFS, Ext4, NFS (over XFS) and POHMELFS (over XFS) with 1Gb of RAM and 8Gb
of test files (to eliminate VFS caching influence) with 8Kb to 1Mb record size.&lt;br/&gt;
Results exist in text files in standard iozone output format, but since I'm learning
LISP I decided to write a graph generator (via gnuplot) using my very basic
knowledge of this language, so nice graph results can take a while...&lt;br/&gt;&lt;br/&gt;

Also, tomorrow morning I will flight away to my friends marriage and will only
return monday 9. I will not have internet access there, only lots of fun.

Comments (0)
   </description>
 </item>
  <item>
    <title>Travelling to Uganda.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//life/2008_06_05</link>
    <description>
Friend calls to move to Uganda this September. Promises beautiful nature a very interesting travel as is.&lt;br/&gt;&lt;br/&gt;

Thinking...

Comments (2)
   </description>
 </item>
  <item>
    <title>Optimized POHMELFS transactions.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/fs/2008_06_04</link>
    <description>
Now they eat less memory, and single writing transaction can accumulate
up to 1024 pages. This can be further tuned especially for small requests
mixed with sync. Currently writing transaction is allocated for its maximum
size, and then pages pointers are written to the allocated area, so
if number of dirty pages requiring writeback is small, quite lots of
space will be wasted.&lt;br/&gt;
It is a task for the next optimization, nevertheless currently sequential
writing is only limited by disk throughput or network bandwidth in case of
multiple servers, since link
is shared between machines, so effective bandwidth becomes equal
to GigE/number of servers, or about 60 MB/s in my environment with two servers
and single client.&lt;br/&gt;&lt;br/&gt;

Also, reading path was not changed at all (only transaction
internals) - there is still no readahead
and new transaction is allocated for each page to be read. Nevertheless,
see how reading was improved: POHMELFS not only outperformed NFS again,
but reached disk bandwidth limit already for 16Kb requsts (almost two
times faster than NFS). Table shows IO throughput in KB/s.&lt;pre&gt;
                                                    random  random
      KB  reclen   write rewrite    read    reread    read   write
 8388608       8   74058   68392    40130    79509   43588    4818
 8388608      16   62332   66978    73714   122074   42160    8434
 8388608      32   64775   67073   109357   171139  145416   14183
 8388608      64   66962   66602   147350   217323  227962   22257
 8388608     128   67724   67133   185574   266855  321060   32681
 8388608     256   68233   67922   201591   283567  474657   40944
 8388608     512   68339   66514   213513   295995  646897   50303
 8388608    1024   67744   67384   220858   297748  676582   48796&lt;/pre&gt;

I will create nice graphs out of this tables and also will include
optimized reading tests (tomorrow likely) and two data server results.&lt;br/&gt;&lt;br/&gt;

What also should be done, is testing with either bigger files or smaller
amount of ram and thus smaller VFS cache size. As you saw in all tests, when
lots of reads start to hit the cache, picture becomes completely non-informative
for filesystem behaviour. So I want to limit all three testing machines
to 1Gb of RAM (booting with mem=1G parameter) and perform the same iozone
bench for 8Gb file. Results should be more realistic.&lt;br/&gt;&lt;br/&gt;

In parallel I will implement userspace run-time server addition/removal
command, which will also be used as-is for network message from one
or another server, connected before. With optimized reading transactions
it will be a good ground for the next POHMELFS release. So I plan to schedule
it to thursday or middle of the next week, since I will be on small vacation
jun 6-9.

Comments (0)
   </description>
 </item>
  <item>
    <title>AppArmor and path-based security approaches vs object bound policies.</title>
    <link>http://tservice.net.ru/~s0mbre/blog//devel/fs/2008_06_02_1</link>
    <description>
&lt;blockquote&gt;- So again, can you offer an alternative?&lt;br/&gt;
- Just give up on this dumb idea completely.&lt;/blockquote&gt;

It is not about AppArmor in general (although maybe about it too), but about security hooks which provide
path information into inode callbacks. There are pros and cons for this decision,
but things look like path based security hooks will not be accepted.&lt;br/&gt;&lt;br/&gt;

There is a really trivial way to fix it. No kidding, it is simple: create own
name cache and do not bind it to dentries, but instead index it by inode number.
This allows you to have whatever you want callbacks and information in stricktly
bound VFS operations. Need to have path info in &lt;code&gt;-&gt;inode_create()&lt;/code&gt;?
Put it into own tree indexed by inode number for parent inode, lookup that data in
security hook and make a decision. Yes, it is slower, but active security was never
a fast solution. It is still against the rules others created for security based
systems, but still formally it in the all boundaries of the created (maybe ugly
for someone) interfaces.&lt;br/&gt;&lt;br/&gt;

And I will not point to project, which already uses such approach in different area
though :)&lt;br/&gt;
It is interesting to implement your ideas not by breaking something (although sometimes
it is need, but that's likely an exeption or when you are hacking deeply internal kernel
part), but instead by hacking around existing limitations.

Comments (4)
   </description>
 </item>
  </channel>
</rss>