|
|
About
TODO
Blog
RSS
Old blog
Projects
Gallery
Notes
Wed, 23 Jul 2008
Manager's thoughts: unused extensibility and used de-facto standards.
After some before-sleep-reading (this time DNS RFC specifications) I found,
that DNS protocol is so much extensible, that is can perfectly cover not only its area,
but also help in really lots of close problems. It already has (though completely
unused) many interesting RRs and types, which have nothing to deal with DNS
(like NULL RR, which allows to transmit binary data or TXT RR, which also is not
related to DNS area). And the most popular RRs are A, PTR, SOA CNAME and MX. That's all
from about 20 others. The same applies to (q)type and class (I first time read
about Hesiod class for example). And DNS allows to introduce own classes, types and resource
records.
It is just not used, but we could create distributed DNS system with new types.
It would be really simple (and actually it can be done even without new DNS extensions).
But it is not actually needed, since people are used to have DNS just like it is.
Another example is internet video. There is de-facto Adobe standard, no matter what W3C will
put into its new standard, everyone will continue to use existing one. Just because it works
ok. Not excellent or perfect or whatever, it just works how we used to know.
And there are lots and lots similar examples.
People are so much intert in this questions (although I think in most areas, just because
it is convenient not to do something better, when existing solution just works, even
if not perfectly and even if not good), that no one will ever bother to change something
dramatically, because it will not only require huge amount of money, but also changes in the
way people used to think about given area, which is likely even more complex (and money-hungry)
problem.
All this talk is about simple thing, I just opened for myself: when you created something
completely new, even if it is not the best solution for given problem, if you will start
pushing it to wide audience to be used, then you are able to get all 'the market'. That's why
when you have something new
on the market, where most of the users already used to work with one or another solution,
(and even if your project is potentially very good and definitely much better than existing solutions)
then there will not be any major gain, only single links to the completely new users.
This is probably told to the first year MBA students, but I was quite excited and dissapointed
by this issue: the first new idea, when properly presented even if not the best solution for given
problem, can get all the users, after which they will not switch to the new one just because they
used to have it this way.
/devel/other :: Link / Comments (1)
Tue, 22 Jul 2008
POHMELFS distributed facilities design notes.
Since I'm quite busy with VISA/hotel/tickets and overall preparations
for Kernel Summit, there is no development progress, but it should be
completed very soon I think, and so I will write here some design notes
I have in mind about how POHMELFS server will be designed. It is not a
finished draft, but somewhat a rough direction paint.
POHMELFS will utilize distributed hash table approach, i.e. storage
will support ability to get an obect based on some key attached to it.
In a local filessytem we already work with hash table: directory
lookup is no more than lookup for inode object based on its name, i.e.
lookup for the value based on attached key. And although key in this
case is not created based on object itself (like hash of the content or
some other function), it still is a (turn on your imagination here) table lookup.
Cloud of POHMELFS servers will utilize similar approach. Consider a single
server in the system. When it joins the cloud (I ommit this proccess for now,
and will describe it below) first time, it is empty, so it gets some unique
id, either via administrator steps or randomly, or it just waits in the queue
to be filled with new data, so it will get id at that time, it does not matter
for now how it gets its id, but this id is propagated to some cloud of its
neighbours (or if it would be a bittorrent or napster to the main server).
There are two ideas on how to treat this ID: either as a part of the filename,
or as a nameless pointer in the abstract namespace, I will show below that actually
it does not matter.
Now, let's check what will happen when user wants to perform some IO on given file.
Every file access actually happen to inode, stored on disk. In our case it can be stored
somewhere we do not know yet where, so we need to perform a lookup to get address
of the node in cluster which contains our data. In existing schemas like bittorrent
or Lustre there is a server (or small cloud of servers) which contain mapping information
about where this or that object is placed in data cloud, so simple lookup to this server(s)
return needed info. This approach does not scale to really lots of nodes and is failure-prone.
Instead I consider completely distributed metadata storage. Let's check how system will lookup
the whole path in our case.
Each path starts from the root directory, which is '/', which in turn is a id in the global
namespace (or hash from this string or whatever else mapping), so we first need to lookup
a node, which is responsible to content of this directory. Each node contains routes only
to the very limited set neighbour nodes (in various designs this number varys, but idea
lays in the fact, that node, performing lookup, does not know which node contains needed info).
Gnutella system just broadcasted this lookup request to all of its neighbours, so each one
broadcasted it to its neighbours and so on until one of the system replied, that it contains
needed info. Amount of unneded broadcasts killed Gnutella next day after Napster was closed.
So, this approach does not scale, and instead we need to map needed directory into node address
in a more intelligent way. There are at least two the most appealing design choices: ring-based
structure implemted in CHORD and multidimensional torus implemented in CAN.
Right now it does not matter, let's assume that we found a node, which has information
about content of the needed directory. When we have that data, we can find next node (or this
info can be cached on 'parent' directory node) and so on until get node, which is resposible for
storing content of the needed object.
When new node joins the cloud it connects to one or another known node (provided either in public
service or by administrator) and sends there information about its available space, gets ID
and just waits until some client connects to it and start writing a data.
When node joins with some content, which was written to it by the system before, or written by
local users bypassing distributed mechanism, node has to tell this information to the node, which
holds parent directory. This information should be stored in each directory it exports, or it
can be provided by administrator, for example this node exports dir '/zbr' which is actually a subdir
of '/home', so node will lookup '/home' directory content owner and update its records, that now
it contains new dir. There is a problem here: what if there is already another node, which also
claims to have dir '/zbr' in '/home'? This can be handled via attached to each object extended attribute,
which will tell us the last modification date, so system can select either the last modified '/zbr'
dir or that node, which contains dir with the biggest number of the same replicas. It can be setup by
administrator.
Main advantage of this joining scheme is the fact, that we actually do not need to know content of any
object in the exported directory, we publish only high-level object, which may or may not contain some
inner file or dir. Thus we do not need to hash millions of files in the exported directory and publish
them one by one, we do not need to store information about each inner object,
no need attach full path to each object and so on.
When we will decide to split the same object between multiple node, we will need to introduce not only
name based lookup, but also extend it to the offset inside the object. This can be done by introducing
ssytem wide 'block size', so each file is actually set of blocks of given size, so when we found a node,
resposible for storing information about directory, where it is located, this node can also contain
information where each part of the object was stored.
Looks quite simple, but... Devil is in the details.
I obviously missed some bits in the design (and I created it in mind during talk being
under 'impression' of the greece spirit while talking with asm@, who suggested to look
at Kademlia project), like redundancy management of the nodes, splitting of the node content between
multiple nodes and other bits, but it is one of the first drafts, so things can be changed if needed.
Stay tuned, I will be very soon back to development process
(DST first :), since paper work for kernel summit travel
seems to reach its end.
/devel/fs :: Link / Comments (0)
Mon, 21 Jul 2008
Foot, fingers, shins, knees, thigs, shoulder, back.
No, it is not parts of the body I know (half of it I looked
in the dictionary), it is what is being aching right now.
Its called football.
Yes, sounds a bit scary, but that was hell the super game today.
We were much stronger, but I have to admit, that mostly because we get
a right transfer decision and selected right players at the beginning,
so our previous team was strengthenen. I managed to make a goal,
couple of nice saves and even make quite technical outplay sometimes,
which was quite surprisingly, since I did not play football for 5 years.
I would not say I'm getting into the shape, but have a progress.
/life :: Link / Comments (0)
Sun, 20 Jul 2008
Crazy security idea.
I've just thought, that I do not know a way to make
some (running) application to encrypt all its data,
which hits the disk (either via swap or usual way, like
editor writing the file and all its temporary files).
I actually consider this as a very useful feature for the
editors, browsers, instant messengers and mail clients,
downloading applications and musical players and
so on. This is especially valid for temporary files, when
one expects editor to be highly secure (or even working on
encrypted partition), while its temprary files are stored
somewhere in /tmp which is not encrypted.
It could be started via some wrapper, which will tell the
kernel encryption algorithm, key, iv and all needed info,
it will attach a crypto processing callback to the process,
so when disk activity is started by given pid (swap or data writing
or reading), it is encrypted/decrypted in flight.
Kernel should check all file descriptors opened by the given
process and appropriately process them. There may be some problems
with communication with unprotected applications, which should
be thought out, but overall I like the idea...
Has put it into todo
list.
/devel/other :: Link / Comments (0)
Project presentation.
I've just realized, that lots of my blog posts
are valid enough presentation abstracts, at least they contain
enough words describing the problem, possible solution
and overall interested for given area topics. But I
never presented such projects in english before, although quite frankly
I'm not that bad speaker in russian, at least I
am not afraid to talk and probably like a contact with interesting
auditory. After all there is this blog :) and even had number
of similar kind of presentations from 15 minutes to couple of hours
including question/answer part.
My english used in blog is rather ugly, but I rarely (if at all)
fix errors which I detect after subsequent reading of the text
in the browser (and I detect lots of them) as long as in mails
and other posts.
So probably eventually we will have interesting
talks about diferent areas, but expect to 'listen' a world-wide
language of the gestures :)
/devel/other :: Link / Comments (0)
Sat, 19 Jul 2008
Disributed storage is dead, long live the Distributed storage!
As you may know, DST
project was an attempt to implement redundant, failover resistant, flexible block level storage
subsytem. Among other features it supported ability to map multiple remote nodes via linear
or mirroring algorithms to single node, reconnect to failed node, reading balancing and
parallel writing to multiple nodes (in case of mirroring) and
so on.
Now it has gone. There is no more distributed storage you knew before, instead there is
completely new project being developed, which main goal is to provide a transport layer for
the block requests only. Consider it as Network Block Device on huge steroids. Consider it
as iSCSI on huge steroids. Consider it as ATA-over-Ethernet on even more huge steroids.
It is just an example of what all those protocols should have. And only that.
An it does not sound very ambitious, previous DST versions already supported lots of features,
which never existed (and in some cases were impossible to be added) in another block level
network storages.
DST moves further.
There will be no mirroring and overall ability to map multiple devices into single one,
instead one should use Device Mapper for this goal, since its features were simply mirrored
(although I tried to optimize them sometimes) in DST, and amount of targets was noticebly smaller.
Now DST is just a simple block device which operates on top of network connection. With just a
single exception: its done right.
Features planned for the new Distributed Storage:
- kernelspace client and server
- initial autoconfiguration between client and server nodes
- automatic reconnect to failed target
- transaction model: resending, timeout error completion, full rollback of the failed transaction
- wire speed performance
- data channel encryption, strong checksumming
- cryptographical authentification
- ability to work on top of any network protocol
- barriers support (when, if any, Device Mapper will start support them, DST will not need to be changed)
- flexible protocol with simple ability to extend it to needed functionality
- trivial configuration
Project is being written from scratch, but it is actually very simple,
and should be quite small, so expect its first release quite soon.
It will be pushed upstream when ready.
/devel/dst :: Link / Comments (8)
Fri, 18 Jul 2008
Completed distributed storage redesign.
I also managed to play second octave F# and sometimes the whole chromatic scale
down to small (minor?) octave F on my trumpet, and I belive I started to understand
overall trumpet kung-fu, but expect it is not what you wanted to read under
DST tag.
So, DST becomes smaller, cleaner and simpler. Notably, I decided to drop userspace
target completely for now.
Kernel part now operates on transaction entity, which holds a reference to the node,
where data should be sent/received. There can be at most two such nodes if block IO
request spans the boundary. In case of mirroring (which will be dropped for the first release)
list of nodes to mirror this data to will be maintained by the first node, so transaction
will not need to know about them.
In theory block request can be as much as BIO_MAX_PAGES pages,
which is 256 for now, but I decided to limit minimum node size to be not smaller than
above bio limit, so there will be always at most two nodes per request.
Each node has either block device behind it (so it will just call generic_make_request()
with different block device for given bio), or network state machine.
Network state will have two threads: RX and TX. Receive one is used to get replies for the
read/write messages, search appropriate transaction and complete it.
In case of DST server it will also handle read/write requests and generate replies, but the whole
processing will be exactly the same, client node will have a switch to process read/write requests from
the network, but they should be only received by server.
Sending thread is tricky.
It is used as fallback for non-blocking sockets, which are used first at generic_make_request()
time, i.e. when higher level user performed read or write, if block was not fully sent,
then it is queued to this thread and it will try to send the rest of the data when
polling allows. ->make_request_fn() function returns in this case and higher
layer can proceed with own operations.
Transaction is not freed until reply is received from the remote side or resending retry
count fires.
Transaction is always allocated (from the appropriate memory pool) and that is actually
all allocations in DST itself. In case it works with block devices, it is possible to clone a bio,
when it crosses the boundaries (or even always, I have to check it, but it is essentially
what device mapper with lots of own additional allocations), but it should be very rare condition.
Network stack will allocate data itself too.
That was a theory. Practice tells me, that essentially 90% of the code should be rewritten
from scratch, so I recloned the tree and so far implemented generic bits of registering
block device, creating various sysfs files and directories and other similar trivial bits.
I still plan to finish it this weekend (without mirroring), but things may turn to me a different side though...
/devel/dst :: Link / Comments (0)
Have sent all documents for US visa.
Checked my passports and decided that if other countries allowed to let me in
with that photos, then US custom officers should not frown too much upon
current ones.
So, waiting for the results. I almost sure that I will get visa and
will met with interesting people at kernel summit and Plumbers conference,
but anyway would like to draw the line.
For instance, Zach Brown will
talk
about CRFS (as long as show
some chocolate and coctail bars around, imho the only good coctail is rum with cola
(smaller colla) and ice), so there will be something to listen.
/life :: Link / Comments (0)
Thu, 17 Jul 2008
"The Gun Seller" by Hugh Laurie.
Just finished to read this excellent
detective novell (at Amazon,
electronic version in russian).
People call it the best english humor novell for reason: it indeed is fun and interesting,
although I suspect lots of its witty satire was a bit lost in translation, but nevertheless
I do recommend it for easy reading.
And of course if you like House M.D., you have
to read this novel, and you will not waste your time for sure.
/other :: Link / Comments (0)
Morning trumpet exercises.
Today's morning I raped ears almost two hours,
and at the end managed to play a chromatic scale (glide?)
from second octave D (E trumpet) down to minor octave F (E trumpet),
at least that is what my Korg tuner showed. Much more frequently
I was able to play single first octave (only via descended direction
though, I did not yet try to rise tones).
Korg AW1 tuner does not show octaves, but I really do not think,
that it is possible to play one octave lower than what my the lowest
sound was, but pretty sure it is possible to have at least one octave higher
than my the highest tone, so I decided that I play several tones around first octave.
Ugh, it was supposed to move earlier to the office (before heat and traffic jams),
but instead I fucked my brain via ears (and probably neighbours were not happy
either, although I did not play on the full volume).
/life :: Link / Comments (0)
Wed, 16 Jul 2008
New toys: Korg AW1 tuner.
I believe I can produce enough sounds out of my trumpet,
so I need to have tuner, which will tell me how bad that sounds are.

So, now I'm starting to seriously tune my sounds.
So far I hope think that I can play at least two octaves, actually I mean not to play,
but to produce a sound, it is still not that simple and not always very clean. But since
I 'play' (or better say rape ears study) my trumpet only couple of months and never played any instrument (not counting
couple guitar riffs in the university) before and do not play with a teacher, I think this tuner will
be very good addition.
/other :: Link / Comments (4)
Tue, 15 Jul 2008
Distributed storage development roadmap.
Yes, DST
project is alive and will beat out the crap very soon, since I decided to change its
underlying architecture, and switch to transaction model just like
POHMELFS.
This basically means that as long as system has enough RAM writing operations will be
extremely fast, reading can be balanced between multiple nodes (in mirror), transactions
can be resent, failover mechanism becomes much simpler,
and system overall will be much more robust to failures.
Transaction model also means that system requires explicit acknowlege from remote side,
and there are two possibilities here: two handle implicit ack which comes with TCP ack
packets like I experimented
before, and send explicit ack from server for each client's request. \
The former approach although has smaller performance overhead, still suffers from
the fact, that pages sent via DST are always stateless, i.e. at this layer there is
no knowledge about who sends this page. We can determine inode page belongs to, can
even get a socket when page is about to be released when ack has been received,
but we can not know from exactly which PIPE it was submitted into given socket,
so when multiple threads send the same page via miltiple sendfile()
calls we do not know when and how page will be released. We can put pipes this page belong
to into single-linked list (since page has only two unused at this point pointers: LRU
list head, and one of them is used to determine that this page belongs to sendfile()/splice codepath),
and likely traversing this list will not hurt usual users, but malicios one can
create a local DoS with this approach. After some experiments with the splice code
today I decided to drop this idea implementation for now.
There is a strong argument in favour of explicit acks from the server: this allows to make asynchronous transaction
processing (with implicit acks we can not hook into processing path, since we do not know where exactly
skb with our pages is chained), and this does not hurt perfromance (which was proven by
POHMELFS benchmarks).
So, overall plan to develop DST is to switch to transaction model and perform async processing
of all events (there are only two actually: reading and writing of the given pages to given
locations).
This task is not that complex, so I expect some new results later this week. Stay tuned!
/devel/dst :: Link / Comments (5)
Football match has made the day!
That was exceptionally bloody cool evening!
We had three teams of 6 playerrs in each and played
on a small mini-football field about 2.5 hours, each match took
either 7 minutes or 2 goals into single gates. It sucked power
so much cool, that even exceptional tireness right now brings
kind of masochistic pleasure.
My breathing system really sucks, and actually it is not a surprise,
I did not play football more than 5 years already, but nevertheless
shoes and ball are in a good shape.
I managed to damage knees, shoulder and fingers on the leg in various
'contacts' during the game, but that's not a problem.
Our team was not the best one really, but we strongly hold second place,
and actually can fight for the first one, since all our players had long
enough pauses in own games, while first team players regulary train
in its own teams (including youth football champion).
That was the super time!
/life :: Link / Comments (0)
Mon, 14 Jul 2008
ParaLLels concert.

Visited ParaLLels concert this weekend...
Mixed feelings, but saw lots of old friends (musicians by a coincidence), which made the day.
/life :: Link / Comments (0)
Sun, 13 Jul 2008
Hermite interpolation.

This interpolation uses cardinal splines approach, and namely
Catmull-Rom splines. Next task is to test how the Kochanek-Bartels splines (also called TCB-splines)
behave. The latter are used in all popular 3d modelling engines. Since math behind them
is very non-trivial, I will try just to use existing formulas for hermite tangents, which
are quite simple.
Now its time to think, how to use this knowledge and how to apply given approach
to detect and decode letters on the image...
/devel/math/bezier :: Link / Comments (0)
Sat, 12 Jul 2008
Monday evening prognosis.
It promises to be just bloody excellent!

I did not play football several years already, but I've found people,
who do like it, so our games promse to be really fun and interesing!
There are already three commands (4+1 players in the team).
This will be my first game after about 5 years of football silence,
likely there is nothing in the legs which can help playing football,
well climbing likely does not correlate with it, as long as so my experience in other physical trainings,
but nevertheless I'm looking forward this promised to be excelptionally cool game!
/life :: Link / Comments (0)
Fri, 11 Jul 2008
Spline graphical interpolation fun.

Playing with different spline interpolation methods. So far they seems to be quite simple
when written in matrix form, so I cooked up simple GTK application to test various methods.
There is no interpolation implementation yet, since I devoted last two days to read lots of
materials about Bezier and Hermite interpolation techniques (as long as lots of papers about
distributed hash tables, which I will use as a filesystem storage base for
POHMELFS).
/devel/captcha :: Link / Comments (6)
Wed, 09 Jul 2008
Captcha transformation algorithms.
Couple of first ideas. Pretty trivial.

Next step is to squize images, so that all bold lines moved to single-pixel ones.
In theory it should not be very complex (I have an algorithm in mind), but in practice
it will - starting to recall why in the hell I
learnt LISP.
Basic idea is to transform above BW pictures into simple binary format, which will be read by LISP
application, since I do not know and do not want to devote much time to learn how to parse/process
various image formats, instead it is done by GTK application written in C. I belive LISP was
called the best language for artificial intellengence development for reason, so will try to
find why.
Slacking - rox :)
/devel/captcha :: Link / Comments (4)
Tue, 08 Jul 2008
Anecdots and allegories.
I'm not a major kernel contributor, but I was invited 3 times last
3 years to kernel summit.
And I will try to move to this year
one
in Portland, Oregon, at least I started some preparation process and contacted needed people.
I hope I will also participate in
Plumber's conference.
As before I will bring bottle of vodka (number of people
who wanted to talk suddenly dropped to ground) and greatly appreciate your
contact and discussion topics :)
That's of course if stars will stay in a straight line, but I will push
them a bit.
/devel/other :: Link / Comments (0)
Mon, 07 Jul 2008
New POHMELFS release.
Irish 'Clontarf' and Scotch 'Grant's' helped to rule this release out.
This POHMELFS release features
include:
- Strong cryptography support. One can encrypt whole data channel (except headers) and/or hash/digest it.
System will try to autoconfigure itself and if server does not support requested algorithms, mount will either
fail (if special mount option is specified) or disable appropriate algorithm usage.
- Bug fixes.
Cryptography support is essential addition to the POHMELFS core. It was implemented with performance
in mind, so that processing speeds would not drop noticeble even in case of very CPU-hungry operations
(one can check performance graphs).
POHMELFS utilizes pool of crypto threads (its number can be specified via mount option), which perform data crypto
processing and submit it either to network or VFS layer.
Now I will concentrate mostly on userspace server features, mainly its distributed facilities, current ability
to write data to multiple servers and balance reading among them is not enough for POHMELFS, but it will be an
essential building block of the fully distributed fault-tolerant paralllel filesystem.
If this development will require some changes in kernel side (namely network protocol extension), it will be
don in the upcoming releases with possible found bug fixes.
As usual, you can grab sources from
archive or via
GIT tree.
You can also check POHMELFS homepage
to get more details on its design and supported features.
P.S. I think I will have some rest out of this project for several days, which will allow me to concentrate on
main POHMELFS features and work out rough edges. I will switch to DST
and netchannels (main to make a new releases)
and then will devote some time to captcha cracking algorithms.
/devel/fs :: Link / Comments (4)
POHMELFS crypto processing performance.
If you expected a miracle, it did not happen, so I just present a picture, where
I compared plain async in-kernel NFS server (no encryption, no checksumming)
versus POHMELFS, which performed SHA1 hashing and AES-128-CBC encryption of the whole
data channel.
Block size used in iozone test is 8KB, filesize - 8GB, 1GB of RAM.
/devel/fs :: Link / Comments (4)
Sun, 06 Jul 2008
Vodka drinks.
Vodka itself is very interesting drink, but depending on
situation it can be either the cheapest way to become very drunk,
or possibility to have long and fun time in a good company.
Frequently (and likely most of the time) vodka is used for the first
case only, which is sad of course.
I do not know, when and how vodka became popular in Russia, but
I think it is always associated with my country now. Actually
every nation has some kind of vodka in its own history of drinks,
and likely still has it. For example UK/Ireland has whiskey, which
is effectively vodka, but drawn in an oak barrels. This brings very
interesting taste, which allows to use it as a kind of long drink
(especially with ice). After having a whiskey shot one can start breathing
air in (especially via nose), which brings aftertaste directly into the brain
to the every piece of the body. I do not know any coctails based on whiskey.
In my opinion, Irish whiskey is much more tasty and interesting than
(probalby originals of) Scotch, although the former has much more labels.
USA also used to drink whiskey, but most of the time it is its own
labels, which I did not try yet. USA does not have own popular
drink though, or at least I do not know it.
Europe also has lots and lots of different vodka kinds.
Frech drinks cogniak. I do not like it, and belive that it is only
coloured non-tasty vodka, even likely the best labels like Remi Martin and Hennesy
(although the latter is originated by irelands :), but it is only matter of taste
of course. Cogniak creation process is a bit more complex than
vodka, and it also has very different taste, which (for me) is very similar
to clean vodka. Cogniak is one of the most popular strong drinks. Culture of its
drinking is forgotten, but nevertheless it is very interesting. Cogniak should be
drunken only with special temperature (16 degress Centigrade) in glass of specail
form, which concentrate its airtaste. Cogniak is not swallowed immediately, but
'stored' in a mouth for a while to get all taste.
Frenchmen also created absinthe. This is very strong drink (upto 90 degrees),
but its main feature is thujone. History tells us that thujone was the main reason,
why absinthe was forbidden in Europe, and it was quite strong hallucinogen.
History also tells us that its concentration never exceeded 10%, so it is unlikely
that it had some kind of strong effect. Vincent Van Gogh liked it very much,
there is even a theory that it cut his ear during absinthe intoxication, but likely
it was some special absinthe, since 10% less-to-equal thujone concentration does
not have any significant effect. Right now absinthe is allows in most of the countires,
where it was forbidden 200 years ago.
Eastern Europe used to drink various kinds of vodka, which are called in local manner.
For example so called Cha-Cha, which is quite strong (upto 80 degrees) drink, but usually
very clear, so it can be drunken without dilution.
The New World (most of it is from Mexico) brings us very interesting vodka-like drink
called tequila. It is frequently called mexican vodka, although US also produces own labels.
There are also types, which are made using french cogniak barrels.
Usually it is drunken with salt, lime (sometimes lemon) and
mulatto female. Process is very interesting: you lick mulatto's hip, cover it with salt,
lick it, get tequila shot and eat a lime portion. Even without mulatto it is still very
tasty drink. Tequila is made out of special agave sorts, the more it has, the higher
is quality.
One of the very known vodka-like drinks from Carribean is rum. It is also quite strong
drink, but because of its oil-like elements, it is more sweet and very tasty.
Rum is likely one of the most widely used strong drinks for coctails.
I know that Koreans also very like own kind of vodka, which has smaller spirit concentration,
namely 20 degrees. It is made out of rice.
It is very popular drink to be mixed with beer. Drives you roof away
just after couple of shots.
Ukrainians have very interesting drink called 'Gorilka', which is effectively
vodka with pepper. It is very tasty, but never eat Gorilka pepper, or you are
risking to get a peptic poisoning.
There is several vodka mixes.
First and likely the most known, is 'Screwdriver', whcih is vodka mixed with juise. It
is not very tasty imho. One of the most strong roof-driving-out drink is so called
'ruff' or mixture of vodka and beer. Do not try it if you do not know what it is.
I also know one vodka long drink: vodka with Martini mixed one to one. Although it looks
quite strong, it is very tasty drink with excellent sweet and a bit dry taste.
Using my small cellar I created (at least tried first time) another long drink,
which consists of vodka mixed with 'Malibu' rum. It is also possible to add there juice
or cold tea.
Weekend...
/other :: Link / Comments (9)
Multithreaded POHMELFS crypto processing.
Meanwhile having a rest from various celebrations, I managed
to complete receiving multhreaded crypto processing
in POHMELFS.
So far it was only tested in debug environment (i.e. zillions
of logs and overall miserable performance), but it shows, that
different threads pick up the work, both on sending and receiving
directions.
There is a limitation though: the same crypto threads are used both
for receiving and transmit pathes, so it is possible to saturate them
all for example for receiving, so sending will stall. If there are
unsufficient crypto threads, waiting for RX crypto processing can take
too long, so watchdog transmit scanner will fire up and complete transactions
with errors. One can work this around by specifying big enough number of
crypto threads or long enough transaction scanning timeout, both are provided
via mount option.
I would like to test it in more production-like environment and perform various
stresses on it, but I'm far from my working place, so can not do it right now.
Which means release will be postponed for tomorrow (if testing will not show
regressions or bugs).
This will not be last feature release though: for example POHMELFS does not support
extended attributes and ACLs, there is no header checksum (although there is a reserved
32-but field) there may be some features in different areas too,
but I do not hurry to implement them, since I need something to put into future
POHMELFS changelogs. I think sending the same kernel patch with different words
about userspace server changes is not the way to go, so there should be some kernel
changes too :)
I will draw up some design notes on how I plan to implement POHMELFS server, and namely
how distributed facilities will be done, so far I have quite clear picture in mind,
but it needs to be worked out 'on paper' to find rough corners.
Stay tuned!
/devel/fs :: Link / Comments (0)
Sat, 05 Jul 2008
Midnight creatiff. Casted by LHC start.
- Shit! There are no more M8 screw-nuts.
- What? Use M12, bozon should pass through.
- We all will be fucked this Monday!

Good night. Actually as a former physicist I can say,
that at least two out of four killing theories are really
stupid, but nevertheless its interesting!
/other :: Link / Comments (2)
Fri, 04 Jul 2008
In case we will die this Monday...
I've started a countdown...

Large Hadron Collider will be started in 3 days...
/other :: Link / Comments (0)
Thu, 03 Jul 2008
POHMELFS crypto support has been completed.
kernel$ git commit -a
Created commit b07e3ed: Added crypto support.
9 files changed, 1534 insertions(+), 221 deletions(-)
create mode 100644 fs/pohmelfs/crypto.c
fserver$ git commit -a -m "Aded crypto support."
Created commit f916b2f: Aded crypto support.
3 files changed, 788 insertions(+), 94 deletions(-)
I implemented pool of crypto processing threads (number of them
is mount option parameter), each of which has pool of pages to
encrypt data into, so crypto thread is not released until server
returns acknowledge that data was successfully written, so one
should tune number of threads and page pool (number of pages
in each thread is maximum number of pages per transaction,
this limit has own mount option too) according to desired behaviour.
Testing shows that writing performance was reduced with this approach
noticebly: with 4 encryption threads and 4 receiving thread in server
perfromance dropped by around 30% from 65+ MB/s down to 46+ MB/s,
but I think it can be improved with larger number of encryption threads.
During iozone write/rewrite test each of 4 crypto threads ate about 20-30%
of CPU, while server ate about 130% (4 threads totally). In all previous iozone tests
the larger number of userspace was used, the worse results were
(this is somewhat expected, since iozone is singlethreaded benchmark,
so larger number of threads lead only to performance degradation),
so I will test different setups (namely larger number of crypto threads
and smaller number of server threads).
But this behaviour is not a problem, and I expect it to be tuned, real
problem is reading performance. Right now there is only single thread,
which reads from one socket: it was done intentionally, since reading
data from socket is longer operation than searching page in radix tree
or any other operation performed by that thread, so there is no way
to saturate its capabilities. Until we start encryption, which is slow,
so any subsequent data reading from the socket can not be done in parallel
with crypto processing, and overall reading performance drops to ground.
This problem has to be fixed, so I plan to use the same crypto
processing threads to decrypt and/or perform hash check for received data
and push it up to the VFS stack.
/devel/fs :: Link / Comments (0)
Wed, 02 Jul 2008
POHMELFS crypto: feel incredibly stupid.
First,
POHMELFS
does need to have encryption. Because I plan to use
distributed hash table approach in server (well, consider POHMELFS
kernel client as a kind of bittorrent filesystem client), and as in any
non-centralized system, content transferred via uncontrolled data channels
has to be encrypted.
But... I'm incredibly stupid: I implemented encryption and decryption in place,
i.e. VFS page is being encrypted prior to be written to the servers, so
subsequent reading leads to... Yes, it reads encrypted content.
To fix this issue I plan to encrypt data into different pages and send them,
leaving VFS ones as is. There are two approaches I consider:
- allocate and send pages at writeback time - we want to send 5 pages, so allocate
5 pages, encrypt data into them and broadcast them to all needed servers.
- allocate (potentially large) pool of pages at mount time per crypto thread
and encrypt data into them. This will have about zero run-time overhead for VFS,
except slightly delayed because of encryption write completion.
/devel/fs :: Link / Comments (7)
Louis Maggio trumpet school: never smile.
/life :: Link / Comments (0)
Holy shit: kernel summit.
We would like to invite you to the 2008 Kernel summit, and we hope that
you will be able to join us...
I'm trying to recall previous kernel summit:

That was fun, but no one wanted to play football instead of talking about whatever we talked about.
For that year I only committed a
HIFN driver
into the tree, and there was no kevent :)
This time in US, thinking...
/devel/other :: Link / Comments (5)
Tue, 01 Jul 2008
Why is blocking sending considered harmful?
I frequently hear that whatever server you implement, it has to
be non-blocking, since in case of parallel sending it allows to
send multiple requests to fast servers, while not-sending data to
slow server, since non-blocking socket will return EAGAIN.
This is only half-right solution: when we have to put given data to
all servers, and can not free it until all servers replied with acknowledge,
non-blocking mode can bring more damage than gain.
Mainly because it
allows to eat all the memory for requests, which are still in the queue
to be sent to slow server, and which was already sent to fast ones.
In this case higher-level application (consider simple application which generates
some data and writes it into the file in distributed filesystem, which writes
file to several servers) will never block since transfer
to fast servers completes quickly, and will provide more and more data,
which will consume all RAM.
It is possible to deadlock system in this case,
since to send some data to remote server we always have to allocate at least some
data to put network headers into. With non-blocking solution we will consume
all memory and kick itself into the coma.
/devel/networking :: Link / Comments (2)
Passive OS fingerprinting.
I've updated OSF
modules to xtables, so you have to enable its support in kernel config and get
recent iptables (I tested with 1.4.1.1, which is the latest release to date).
OSF allows you to match incoming packets by different sets of SYN-packet and determine,
which remote system is on the remote end, so you can make decisions based on OS type
and even version at some degreee.
Installation instruction, example and source code can be found on
homepage.
I've also sent it to netfilter-devel@ and netdev@ maillists, since my previous mails never appeared
there likely because of spam filters.
/devel/networking :: Link / Comments (0)
Mon, 30 Jun 2008
Filesystem development rumors.
Rumor number one. SWsoft
aka Parallels actively searches for Linux kernel hackers in
lead Moscow universities, namely MSU and MIPT. I saw theirs
posters, where among other (wanted) requirements there is
distributed filesystem knowledge.
Rumor number two. Alexey Kuznetsov (if you do not know,
its the guy who wrote major part of linux network stack,
namely TCP/UDP/IP and socket implementations, and although
there was lots of changes in the stack since then, I think it will not
be an exaggeration to call him the author), who also worked
on Virtuozzo and OpenVZ (and its interesting VFS parts, which
AFAICS are not in kernel, maybe yet), so he works on some
filesystem too. The last time we 'confronted' was couple
of years ago, when I first time implemented
netchannels
and tried to convince network community (and namely Alexey Kuznetsov
and David Miller)
that netchannel idea worth further investigation and implementation.
IIRC I did not succeed, although results were very
impressive.
Let's see what will happen with filesystems :)
Rumor number three. SWsoft recently started to actively search
for kernel hacker for 'new interesting open source project'. They
always searched for kernel programmers, but never told anything
about projects, now something changed.
Rumor number four. OpenVZ and Virtuozzo have serious problems with NFS
(especially when server dies), probably because of very ugly NFS protocol
(yes it is), so its hard to properly virtualize it (or not?). There are
no alternatives for NFS right now in major productions, but you all know about
POHMELFS
which right now can be used as really good replacement.
Rumor number five. SWsoft has long history of PHD defences (at least in MIPT) based on
theoretical FS called TorFS (namely Tormasov FileSystem), year ago it was still
not very alive project in practice,
but I heard that it was very impressive in theory. This rumor exists
really many years.
So, I have a quite clear picture, that SWsoft started development of the new
distributed filesystem, which is aimed at first to replace NFS in virtualized
environments. I can also imagine very interesting distributed parallel facilities
needed for virtualized systems. And they try to attract lots of people to the
project as long as really heavy artillery like Alexey Kuznetsov.
Which basically means, that sooner or later my development will meet strong
concurency from this company, which has lots of really good professionals.
And that's very interesting and cool :)
P.S. or it may be a complete bullshit and delirium of my fevered consciousness.
And one fact about
POHMELFS:
today I finished client support for padded crypto processing of all requests
and started to work out server bits, I expect to finish it in a day or around,
so new release is very close.
/devel/fs :: Link / Comments (3)
Sat, 28 Jun 2008
Listened how my trumpet can sound.
It was really interesting. Although it is very simple student
model, a friend produced very good sounds. He did not practice
many years already, but nevertheless it was not that bad.
My everyday half to hour exercises usually produce worse sound, although
sometimes I do find really cool notes. Unfortunately I still do not
know some magic bit about how to catch on that sound, it borns and
dissapears on its own, but I'm sure I will find it, and I think I'm close
to where it hides :)
/other :: Link / Comments (0)
Need to rethink POHMELFS crypto a bit.
1. Because of encryption problem - data to be encrypted has to be
blocksize aligned, so some informaion about padding has to
be added into network command as long as crypto data size.
2. IV generation. I decided to extend network command and put there
64 bit IV for given packet. using simple sequence number is enough
to protect against repeat message attack.
3. Encryption/hashing data. I decided not to ecnrypt/hash network headers,
and only do it for transmitted data. If transaction contains several
commands, data for all commands will be encrypted/hashed, in case of hash,
signle digest/hmac will be generated and placed into transaction header.
4. It is possible, that I will add strong header checksum, which will be generated
only for header and placed into special field. It will be calculated
assuming checksum field is zero. This step is optional so far, but network header
has 32 reserved bits, which can be used for it.
Right now hashing and encryption work, but are not checked on server (although generated),
because of crypto alignment ugliness I decided to rethink approach a bit.
Evolution process in action...
/devel/fs :: Link / Comments (0)
Fri, 27 Jun 2008
0:3
That was really suck - yes, we played bad. Just like it was before.
It is not somewhat surprising.
But what was the fucking ubnormal week ago agains Holland? That
was new, was cool, was bloody great, but not today. Tired or whatever...
What's the difference right now, we lose.
Yes, Spain played really good, my congratulations.
But our command showed, that it is possible.
That there is nothing impossible.
We can, when we want. You can, when you want.
Thanks a lot for the games!
/other :: Link / Comments (0)
Thu, 26 Jun 2008
POHMELFS server got initial crypto processing capabilities.
POHMELFS server is able to handshake hash/cipher names and operation
modes, to initialize appropriate algorithms and perfrom basic operations
(like more generic hash_update() instead of different
functions with different arguments used to hash data depending on operation mode,
either simple digest or hmac: EVP_DigestUpdate()/HMAC_Update().
I'm working on the right way of doing crypto processing, since how it is done right now is a bit hairy,
i.e. without serious changes in the code.
I already hate OpenSSL API: EVP_get_cipherbyname(), EVP_MD_CTX, EVP_DigestFinal_ex().
It looks like above functions were written by three different persons and they
never actually talked to each other about how to make them look similar... But it is
a minor issue of course.
So, when things are settled down, I will make a new release, likely it will see the light this week.
/devel/fs :: Link / Comments (0)
Hacking your ISP for fun and profit.
My ISP again blocked my account and can not unblock it although there
are money on the deposit. There are serious problems in its billing
system which requires manual intervention of the operator. Unfortunately
it is a real challenge to call them, it already took more than half of a hour
yesterday, and without success.
So, I decided to implement an interesting idea on how to bypass its blocking.
It is based on the security 'hole' in its (and I think vast majority
of ISPs do the same) DNS configuration, which allows
to request any DNS record even if account is blocked. It will be fetched from
remote DNS server if there are no records in the IPSs cache.
Thus attack vector becomes visible: implement IP over DNS tunnel network device
and setup local routing to use it by default. One has to control at least one
remote machine which hosts DNS records for given domain name, since it is required
to parse incoming DNS requests and process them accordingly.
There are at least two known IP over DNS tunnel solutions:
NSTX
(howto) and
OzymanDNS
(howto). Both solutions require that you own one or another
server to run ip-over-dns tunnel server on it.
Unfortunately I have only single machine with static IP address, which is not protected
by lots of firewalls and allows incoming connections.
The simplest solution for this problem is to create iptables input target rule
for the server, which will parse incoming DNS requests and redirect usual queries up
the network stack to the userspace server, and handle 'poisoned' queries as tunnel.
Client can be TUN/TAP based, but can also be a tunnel network device.
I believe the more weird it looks, the more interesting it is, so likely will think
more about kernel based tunnels.
DNS queries are limited enough not to allow binary data (IIRC,
the most interesting is DNS TXT records), but it can be appropriately
encoded and enciphered. So, will put it into
todo list.
I even think that it is not that bad idea to have such modules in kernel :)
/devel/other :: Link / Comments (6)
Wed, 25 Jun 2008
POHMELFS input crypto processing engine is ready for testing.
But testing can not be done without appropriate server support, which
is now the main task. POHMELFS uses lazy crypto engine - each network state
(it represents connection between client and one server) contains
number of fields used exclusively for semi-lockless input data processing
(it locks state when performs actual reading, but does not
hold that lock when processing incoming messages, since it is the only
path, which receives data), now it also has crypto information about
how to manage reply messages (they include read page reply for example),
so it does not queue work to be done by crypto threads, but does that itself
instead. It may or may not be the bottleneck of the input path, tests will
provide facts, so far I do not have plans to change it, but it can be done
of course if performance will suck.
After I finish crypto processing in both client (it has been written, but requires lots
of testing with server) and server (just have started to recall how to work with
OpenSSL. Well, I've read how HMAC works in OpenSSL, found it to be simple enough
and then started to read how to parse binary data in LISP :)
But anything which is interesting for me now, ends up in good results for all other
projects), I will switch to something different for a while.
Some voices in the brain ask to be spread it in lots of interesting directions :)
/devel/fs :: Link / Comments (0)
POHMELFS crypto performance.
I've ran read/reread and write/rewrite tests as described
in previous run,
now with HMAC(SHA1) of all outgoing transactions (note, that reading response data is not yet
encrypted and does not contain digital signature, server also does not support neither operation),
essentially only writing should be affected by this, but I also ran reading tests for compelteness.
Results show zero performance overhead of the full data SHA1 hashing, but note that quite fast
machines were used (2 3Ghz Xeons (2 physical and 2 logical CPUs, HT enabled) with 1 GB of RAM). All the time only
two crypto threads were actively hashing data, since there are only two pdflush threads on this machine.


Writing is even faster with hashing, but results drifted around, so essentially performance is the same.
/devel/fs :: Link / Comments (0)
Tue, 24 Jun 2008
VM gotcha: forbidden double kmapping.
I've just known, that it is impossible to map the same page
twice: for example first time using kmap()/kunmap()
and second one via kmap_atomic()/kunmap_atomic().
Although mechanisms are a bit different in both mappings, it is
forbidden to do and system will panic like this:
IP: [] kmap_atomic_prot+0x1b/0xc5
*pdpt = 0000000031c79001 *pde = 0000000000000000
Oops: 0000 [#1] SMP
Pid: 6478, comm: pohmelfs-crypto Not tainted (2.6.25 #27)
EIP: 0060:[] EFLAGS: 00010202 CPU: 2
EIP is at kmap_atomic_prot+0x1b/0xc5
EAX: ebc7c000 EBX: 00000003 ECX: 00000000 EDX: 00000003
ESI: 00000fdc EDI: 00000163 EBP: 80000000 ESP: ebc7dee4
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process pohmelfs-crypto (pid: 6478, ti=ebc7c000 task=f25040b0 task.ti=ebc7c000)
Stack: 00000000 00000003 00000fdc f7cf4078 00000fdc c0114144 00000163 80000000
c01991b1 ebc7df44 f70e3580 00000000 ebc7dfa8 ebc7df40 f70e3580 00000003
00000000 f7cf4000 f70e3580 f70ff8b0 f70ff880 f7096c00 c019a771 f70e3580
Call Trace:
[] kmap_atomic+0x11/0x14
[] update2+0x7c/0x13f
[] hmac_update+0x49/0x50
[] pohmelfs_crypto_thread_func+0x304/0x3e8 [pohmelfs]
[] hrtick_set+0x7a/0xd7
[] autoremove_wake_function+0x0/0x2b
[] pohmelfs_crypto_thread_func+0x0/0x3e8 [pohmelfs]
[] kthread+0x38/0x5f
[] kthread+0x0/0x5f
[] kernel_thread_helper+0x7/0x10
This happend for exacly above case, when page was first mapped via
kmap() in POHMELFS and then via
kmap_atomic() in HMAC crypto processing code.
I wonder what will happen if we ever try to send kmapped pages
over IPsec tunnel. Likely it will ooops too...
This can happen for example when pages are mapped in
tcp_sendpage() when calling sendfile()
over the interface, which does not support hardware checksumming
and scater-gather: mapped pages are pushed down the network stack
where they will be eventually encrypted/hashed in IPsec, which
will in turn call kmap_atomic().
So, if you will find obscure oops in kmap_atomic()
and friends, first check that calling stack did not map page
earlier.
/devel/other :: Link / Comments (0)
|