|
|
About
TODO
Blog
RSS
Old blog
Projects
Gallery
Notes
Thu, 31 Jul 2008
DNS cache poisoning client/server architecture.
SO far I only implemented simple flooder of the requests,
which as number of destination ports as a parameter and two
names and addresses to put into answer and additional section
of the DNS reply. It uses UDP socket, so source address does not
belong to server, which should pretend to answer given query, so
actually this application will not work, and I need to implement
sending via packet socket and substitue source IP address with
DNS authoritative server's one.
Poison flooder also should not use only one name/address in answer section,
but insteda it should iterate with client, so appropriate request
and answer were synchronized.
So far, initial design of the client/server architecture of this
small project looks like this: depending on flags, either client
connects to multiple flood servers or vice versa, then client
sends a message to each server where specifies a port and ID ranges to attack,
attacked DNS server IP, requested query name and source address,
pretending to be an authoritative name server and additional resource
record data to put into replies (which will poison the cache).
Each server starts sending that data to the specified name server
with changed source address to the authoritative name server's one
and with ID and port changed in given range. When client finished
broadcasting request data to all flood servers, it sends a request
to the attacked DNS server with given query name to resolve. Now
flood servers race with authoritative one to provide an answer. When
client receives the answer, it checks if it looks like poisoned data
we wants to get, or real answer (which should be NX domain, since we
resolve non-existing names). In the former case we exit the process and
enjoy the result, otherwise client specifies next name to resolve and
the same starts again.
Looks interesting...
/devel/networking/dns :: Link / Comments (0)
Wed, 30 Jul 2008
Simple DNS server/resolver.
Exact time to hack a DNS server is a middle of the night: 3 A.M. here
and I've just completed initial draft of the trivial DNS server, which
is only capable to receive a datagram from predefined port, parse it,
fill a reply for static "IN A" record (I think I will add a config file),
this record is placed into 'answer' and 'additional' resource record sections,
then the whole request is being sent back to the client.
That's how it looks for standard UNIX dig command:
$ dig @localhost -p 1025 www.google.com
;; Warning: query response not set
; <<>> DiG 9.4.2-P1 <<>> @localhost -p 1025 www.google.com
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51486
;; flags: rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;www.google.com. IN A
;; ANSWER SECTION:
www.google.com. 123456 IN A 195.178.208.66
;; ADDITIONAL SECTION:
www.google.com. 123456 IN A 195.178.208.66
;; Query time: 15 msec
;; SERVER: 127.0.0.1#1025(127.0.0.1)
;; WHEN: Wed Jul 30 02:56:23 2008
;; MSG SIZE rcvd: 64
There are several warnings, which I will fix later, but main part is
section content: www.google.com obviously does not have an IP address
of my blog site. TTL usually also does not equal to 123456.
Game continues, while I need some sleep...
/devel/networking/dns :: Link / Comments (0)
Distributed storage development progress report.
DST
got full transaction support (resending, timeout completion, error recovery,
memory pool allocation for all kinds of transactions, single transaction
allocation per IO request),
socket processing (initialization of the connected and listened sockets,
failover recovery of the connection, receiving thread, network helpers),
crypto processing of the requests (thread pool utilization for crypto operations,
cipher/hash initialization, cached pages for sending crypto processing).
Thinking of moving receiving and listen/accepted sockets processing to the thread pool too,
likely it is a way to go, right now they have own threads.
Missing bits include the actual data sending/receiving and client accepting by
listened socket (and appropriate initalization of the all needed infrastructure).
This is a quite major part, but likely it will be completed sooner than later.
/devel/dst :: Link / Comments (0)
Tue, 29 Jul 2008
Some DNS port distribution data.
Gathered today's late night, so that DNS server would
not be too much disturbed by other users.
Graphs below show some BIND (do not know version)
source port cloud and distribution for a thousand
runs. Each request issued non-existent subdomain of
controlled domain server, so I was able to capture dums
and analyze them a bit.

This graphs show source ports cloud and its distribution.
Each histogram corresponds to number of hits into 100 ports range,
start of the range is shown at X axis labels.
First, port range is randomly selected in 50k-65k range,
so one needs to guess much smaller amount of port.
Second, even in 1 thousand requests there are lots of
requests with the same port (stats show that there 149 ports,
which were used 2 and more times in above 1000 runs,
there is even single port which was used 4 times).
If we select range of 100 ports, then appropriate distribution
is shown on the graph.
Such behaviour allows to limit source port range even more.
Now, DNS IDs.

The whole range of IDs is used, and theirs distribution (each histogram
corresponds to number of IDs in the appropriate 100 ids range) is more uniform.
There were only 9 IDs used twice per 1000 runs.
But since I do not know exact load of the analyzed DNS server (and it can be
high even at 3 A.M.), I can not say if that numbers are due to port/id
selection algorithm implementation of just because load was high and there were
actually not only my 1000 requests.
To further play with DNS caches I decided to install local
DNS server first test things with it.
/devel/networking/dns :: Link / Comments (0)
Another excellent LISP book.
Common LISP Cookbook has such interesting
things like threads, socket and foreign function interface.
I belive "Common LISP Cookbook" and "Practical Common LISP"
form a must-have library for every LISP programmer. So far I think that that's all what is needed,
since this set covers vast majority of possible usage cases. Even DSL are covered there in details.
/devel/other :: Link / Comments (0)
Mon, 28 Jul 2008
Distributed storage development progress. Thread pools.
Today I implemented simple thread pool subsystem, which allows
to create set of threads, to add/remove them them from this set
in run-time, and to schedule a work to be done by them. Work
is specified as to functions: setup() - it is called when
system has selected a thread for execution, so caller can
setup needed data, and action() - it is called by thread itself,
it has access to the data, provided at initialization time.
Work scheduling has a timeout parameter, which corresponds to
time system will wait for free thread, otherwise error is returned.
System is generic enough not to contain any notion about DST or crypto,
only two new data types: struct thread_pool and
struct thread_pool_worker, only the former is visible to the user.
API looks like this:
void thread_pool_del_worker(struct thread_pool *p);
struct thread_pool_worker *thread_pool_add_worker(struct thread_pool *p,
char *name,
int (* init)(void *private),
void (* cleanup)(void *private),
void *private);
void thread_pool_destroy(struct thread_pool *p);
struct thread_pool *thread_pool_create(int num, char *name,
int (* init)(void *private),
void (* cleanup)(void *private),
void *private);
int thread_pool_schedule(struct thread_pool *p,
int (* setup)(void *private, void *data),
int (* action)(void *private),
void *data, long timeout);
init() and cleanup() callbacks above are used after
new thread is created, so that user could initialize per-thread data,
for example it is used to allocate some cached pages and initialize
crypto algorithms.
This thread pool system is used by the crypto processing code in
the distributed subsystem: when block io request is about to be sent,
or when system has received reply for the read request, it schedules
crypto processing work to the pool, initialized at DST node setup time.
Crypto processing does not yet work in DST as long as some other bits,
so far I only played a bit with its initlialization sequence, so it was
split to network, crypto, security initializations and node start, which
registers new storage in the block layer subsytem. This steps allow to introduce
later additional initialization steps if needed without breaking backward
compatibility.
Next steps include proper network initialization and processing and transaction
management helpers. Then I will combine all existing code and make a first
renewed release.
Stay tuned!
/devel/dst :: Link / Comments (0)
Sun, 27 Jul 2008
Lots of talks about DNS cache poisoning attack.
There are two types of this attack: DNS query ID guessing and
request source port guessing for servers which use randomized source
port, which should be turned on after Dan Kaminsky's
alert.
DNS ID is 16 bits only, so it could be guessed rather fat, one just need to force someone
who uses attacked DNS cache to issue appropriate requests. When request is received by
DNS resolver, it is stored there for predefined amount of time (TTL parameter provided
by higher-level DNS resolver or eventually authoritative name server). Dan found, that
attacker can actually ask not for attacked domain, but some subdomain of it
(if attacker tries to point www.microsoft.com to own IP, it can force sending DNS
requests for 1.microsoft.com, 2.microsoft.com and so on), and put data about actual
target into additional resource records attached to all datagrams. So, when it eventually
win the race, it can store (among lots of subdomains) needed pointers in the attacked DNS cache.
I've just thought that this attack will not be possible, if all queries from DNS
resolvers to higher-level resolvers and/or authoritative name servers would happen over
TCP instead of more common UDP. There is no need to issue requests from random ports anymore,
no need to parse and drop additional resource records. There will be no problems with truncation
of large messages... But to play a bit with the whole idea I'm implementing a simple DNS
query/response processor. Maybe will play a bit with local cache (ISP at office uses only 6 different
ports to send requests) poisoning, although its main goal is
IP-over-DNS tunnel.
This is kind of a real rest after VISA/hotel paperwork. I was told, that if I will be
called to embassy for the interview, chances are high VISA will be declined because of my
sence of humor :)
Update: zbr@gavana:~/aWork/tmp/dns$ ./query -a 195.178.208.66 -i 0x1234 -q tservice.net.ru
query: 'tservice.net.ru', class: 1, type: 1, server: 195.178.208.66:53, protocol: 17, id: 1234.
Connected to 195.178.208.66:53.
id: 1234: flags: resp: 0, opcode: 0, auth: 0, trunc: 0, RD: 1, RA: 0, rcode: 8.
: question: 1, answer: 1, auth: 2, addon: 2.
: question: name: 'tservice.net.ru.', type: 1, class: 1.
: name: 'tservice.net.ru.', type: 1, class: 1, ttl: 86400, rdlen: 4, rdata: 195.178.208.66
: name: 'tservice.net.ru.', type: 2, class: 1, ttl: 86400, rdlen: 14, rdata: ns.tservice.ru.
: name: 'tservice.net.ru.', type: 2, class: 1, ttl: 86400, rdlen: 7, rdata: dns2.tservice.ru.
: name: 'ns.tservice.ru.', type: 1, class: 1, ttl: 86400, rdlen: 4, rdata: 195.178.208.66
: name: 'dns2.tservice.ru.', type: 1, class: 1, ttl: 86400, rdlen: 4, rdata: 62.141.76.164
And DNS protocol gets the first price among the ugliest crappies.
Now its time to create a DNS server itself, which will get requests (above dump shows BIND session),
parse them and perform appropriate actions, like sending reply with specially crafted additional resource
records, either NULL one for example (can contain upto 64k of data) or TXT (length byte followed by
character string, there may be multiple strings as long as total length (including length bytes itsef)
is less than 64k). Or additional A resource record, which may contain information about domain to poison...
/devel/networking/dns :: Link / Comments (0)
Sat, 26 Jul 2008
New POHMELFS release.
This release was fully made by other developers. Thanks a lot for your work.
I only updated some trivial bits and fixed bug in the server.
Short changelog:
- Documentation update by Adam Langley (agl_imperialviolet.org).
Now one can read properly spelled POHMELFS design.
- Server and configuration utility IPv6 support by
Varun Chandramohan (varunc_linux.vnet.ibm.com). Kernel client
does not need this changes, since it supports any protocol.
Now one can create POHMELFS cluster over IPv6.
- Server bug fix and small documentation update by me.
One can get more detail about POHMELFS at its
homepage.
Sources can be downloaded from archive
or via GIT tree.
/devel/fs :: Link / Comments (0)
Fri, 25 Jul 2008
This was supposed to be a new POHMELFS release day.
I accumulated patches from Varun Chandramohan of IBM Linux center,
which add IPv6 support to the POHMELFS
server and configuration utility. Kernel client does not need it, since it works
with any kind of addresses (by design).
I also wanted to add documentation update from Adam Langley, but apparently
I accidentally deleted his patches, so release is being postponed a bit.
Meanwhile I made some little progress at DST
development side. Added trivial configuration bits and started to develop cryptography part,
mainly configuration (which I will copy from POHMELFS) and thread pool subsystem.
The latter is rather simple patch, which will allow to create a thread pool, to add/remove
threads on demand and to queueu a work to the pool. In theory this can be a generic
enough patch to be used by other users (I even saw some kind of topic proposal for
kernel summit), but so far I'm not going to push it separately from DST. Main goal
of this system is crypto processing of the BIOs for the distributed storage.
/devel/fs :: Link / Comments (0)
Do you like when you are photographed?
I do not, so there are no photos with me on this site
(if you would see my passport photos...). But I like to make
them and sometimes I create really interesting ones.
People even print them and give me presents for it, and since photos were
made in public, I think I can publish them.
Although I frequently make photos of people when they do not expect it in public,
and this ones are really the best (never look at photographer!).
I sometimes make them to laugh on someone, this is of course a private data,
which I only send to the 'model' if not delete immediately.
My theory stands on the matter, that all people are very interesting from the photographer point of view.
This just has to be found. I'm trying to do it, and sometimes I succeed.
I do not know, if it is good or not
to publish such photos. I only save pictures which are definitely interesting for me.
Of course it's just a matter of taste.
So, I'm thinking about creation of the new tag in my blog, where I will post photos made by me.
Not that much, one or so pictures per week. So if you do not like the idea, you can always
read development tag only.
/other :: Link / Comments (0)
Wed, 23 Jul 2008
Manager's thoughts: unused extensibility and used de-facto standards.
After some before-sleep-reading (this time DNS RFC specifications) I found,
that DNS protocol is so much extensible, that is can perfectly cover not only its area,
but also help in really lots of close problems. It already has (though completely
unused) many interesting RRs and types, which have nothing to deal with DNS
(like NULL RR, which allows to transmit binary data or TXT RR, which also is not
related to DNS area). And the most popular RRs are A, PTR, SOA CNAME and MX. That's all
from about 20 others. The same applies to (q)type and class (I first time read
about Hesiod class for example). And DNS allows to introduce own classes, types and resource
records.
It is just not used, but we could create distributed DNS system with new types.
It would be really simple (and actually it can be done even without new DNS extensions).
But it is not actually needed, since people are used to have DNS just like it is.
Another example is internet video. There is de-facto Adobe standard, no matter what W3C will
put into its new standard, everyone will continue to use existing one. Just because it works
ok. Not excellent or perfect or whatever, it just works how we used to know.
And there are lots and lots similar examples.
People are so much intert in this questions (although I think in most areas, just because
it is convenient not to do something better, when existing solution just works, even
if not perfectly and even if not good), that no one will ever bother to change something
dramatically, because it will not only require huge amount of money, but also changes in the
way people used to think about given area, which is likely even more complex (and money-hungry)
problem.
All this talk is about simple thing, I just opened for myself: when you created something
completely new, even if it is not the best solution for given problem, if you will start
pushing it to wide audience to be used, then you are able to get all 'the market'. That's why
when you have something new
on the market, where most of the users already used to work with one or another solution,
(and even if your project is potentially very good and definitely much better than existing solutions)
then there will not be any major gain, only single links to the completely new users.
This is probably told to the first year MBA students, but I was quite excited and dissapointed
by this issue: the first new idea, when properly presented even if not the best solution for given
problem, can get all the users, after which they will not switch to the new one just because they
used to have it this way.
/devel/other :: Link / Comments (1)
Tue, 22 Jul 2008
POHMELFS distributed facilities design notes.
Since I'm quite busy with VISA/hotel/tickets and overall preparations
for Kernel Summit, there is no development progress, but it should be
completed very soon I think, and so I will write here some design notes
I have in mind about how POHMELFS server will be designed. It is not a
finished draft, but somewhat a rough direction paint.
POHMELFS will utilize distributed hash table approach, i.e. storage
will support ability to get an obect based on some key attached to it.
In a local filessytem we already work with hash table: directory
lookup is no more than lookup for inode object based on its name, i.e.
lookup for the value based on attached key. And although key in this
case is not created based on object itself (like hash of the content or
some other function), it still is a (turn on your imagination here) table lookup.
Cloud of POHMELFS servers will utilize similar approach. Consider a single
server in the system. When it joins the cloud (I ommit this proccess for now,
and will describe it below) first time, it is empty, so it gets some unique
id, either via administrator steps or randomly, or it just waits in the queue
to be filled with new data, so it will get id at that time, it does not matter
for now how it gets its id, but this id is propagated to some cloud of its
neighbours (or if it would be a bittorrent or napster to the main server).
There are two ideas on how to treat this ID: either as a part of the filename,
or as a nameless pointer in the abstract namespace, I will show below that actually
it does not matter.
Now, let's check what will happen when user wants to perform some IO on given file.
Every file access actually happen to inode, stored on disk. In our case it can be stored
somewhere we do not know yet where, so we need to perform a lookup to get address
of the node in cluster which contains our data. In existing schemas like bittorrent
or Lustre there is a server (or small cloud of servers) which contain mapping information
about where this or that object is placed in data cloud, so simple lookup to this server(s)
return needed info. This approach does not scale to really lots of nodes and is failure-prone.
Instead I consider completely distributed metadata storage. Let's check how system will lookup
the whole path in our case.
Each path starts from the root directory, which is '/', which in turn is a id in the global
namespace (or hash from this string or whatever else mapping), so we first need to lookup
a node, which is responsible to content of this directory. Each node contains routes only
to the very limited set neighbour nodes (in various designs this number varys, but idea
lays in the fact, that node, performing lookup, does not know which node contains needed info).
Gnutella system just broadcasted this lookup request to all of its neighbours, so each one
broadcasted it to its neighbours and so on until one of the system replied, that it contains
needed info. Amount of unneded broadcasts killed Gnutella next day after Napster was closed.
So, this approach does not scale, and instead we need to map needed directory into node address
in a more intelligent way. There are at least two the most appealing design choices: ring-based
structure implemted in CHORD and multidimensional torus implemented in CAN.
Right now it does not matter, let's assume that we found a node, which has information
about content of the needed directory. When we have that data, we can find next node (or this
info can be cached on 'parent' directory node) and so on until get node, which is resposible for
storing content of the needed object.
When new node joins the cloud it connects to one or another known node (provided either in public
service or by administrator) and sends there information about its available space, gets ID
and just waits until some client connects to it and start writing a data.
When node joins with some content, which was written to it by the system before, or written by
local users bypassing distributed mechanism, node has to tell this information to the node, which
holds parent directory. This information should be stored in each directory it exports, or it
can be provided by administrator, for example this node exports dir '/zbr' which is actually a subdir
of '/home', so node will lookup '/home' directory content owner and update its records, that now
it contains new dir. There is a problem here: what if there is already another node, which also
claims to have dir '/zbr' in '/home'? This can be handled via attached to each object extended attribute,
which will tell us the last modification date, so system can select either the last modified '/zbr'
dir or that node, which contains dir with the biggest number of the same replicas. It can be setup by
administrator.
Main advantage of this joining scheme is the fact, that we actually do not need to know content of any
object in the exported directory, we publish only high-level object, which may or may not contain some
inner file or dir. Thus we do not need to hash millions of files in the exported directory and publish
them one by one, we do not need to store information about each inner object,
no need attach full path to each object and so on.
When we will decide to split the same object between multiple node, we will need to introduce not only
name based lookup, but also extend it to the offset inside the object. This can be done by introducing
ssytem wide 'block size', so each file is actually set of blocks of given size, so when we found a node,
resposible for storing information about directory, where it is located, this node can also contain
information where each part of the object was stored.
Looks quite simple, but... Devil is in the details.
I obviously missed some bits in the design (and I created it in mind during talk being
under 'impression' of the greece spirit while talking with asm@, who suggested to look
at Kademlia project), like redundancy management of the nodes, splitting of the node content between
multiple nodes and other bits, but it is one of the first drafts, so things can be changed if needed.
Stay tuned, I will be very soon back to development process
(DST first :), since paper work for kernel summit travel
seems to reach its end.
/devel/fs :: Link / Comments (0)
Mon, 21 Jul 2008
Foot, fingers, shins, knees, thigs, shoulder, back.
No, it is not parts of the body I know (half of it I looked
in the dictionary), it is what is being aching right now.
Its called football.
Yes, sounds a bit scary, but that was hell the super game today.
We were much stronger, but I have to admit, that mostly because we get
a right transfer decision and selected right players at the beginning,
so our previous team was strengthenen. I managed to make a goal,
couple of nice saves and even make quite technical outplay sometimes,
which was quite surprisingly, since I did not play football for 5 years.
I would not say I'm getting into the shape, but have a progress.
/life :: Link / Comments (0)
Sun, 20 Jul 2008
Crazy security idea.
I've just thought, that I do not know a way to make
some (running) application to encrypt all its data,
which hits the disk (either via swap or usual way, like
editor writing the file and all its temporary files).
I actually consider this as a very useful feature for the
editors, browsers, instant messengers and mail clients,
downloading applications and musical players and
so on. This is especially valid for temporary files, when
one expects editor to be highly secure (or even working on
encrypted partition), while its temprary files are stored
somewhere in /tmp which is not encrypted.
It could be started via some wrapper, which will tell the
kernel encryption algorithm, key, iv and all needed info,
it will attach a crypto processing callback to the process,
so when disk activity is started by given pid (swap or data writing
or reading), it is encrypted/decrypted in flight.
Kernel should check all file descriptors opened by the given
process and appropriately process them. There may be some problems
with communication with unprotected applications, which should
be thought out, but overall I like the idea...
Has put it into todo
list.
/devel/other :: Link / Comments (0)
Project presentation.
I've just realized, that lots of my blog posts
are valid enough presentation abstracts, at least they contain
enough words describing the problem, possible solution
and overall interested for given area topics. But I
never presented such projects in english before, although quite frankly
I'm not that bad speaker in russian, at least I
am not afraid to talk and probably like a contact with interesting
auditory. After all there is this blog :) and even had number
of similar kind of presentations from 15 minutes to couple of hours
including question/answer part.
My english used in blog is rather ugly, but I rarely (if at all)
fix errors which I detect after subsequent reading of the text
in the browser (and I detect lots of them) as long as in mails
and other posts.
So probably eventually we will have interesting
talks about diferent areas, but expect to 'listen' a world-wide
language of the gestures :)
/devel/other :: Link / Comments (0)
Sat, 19 Jul 2008
Disributed storage is dead, long live the Distributed storage!
As you may know, DST
project was an attempt to implement redundant, failover resistant, flexible block level storage
subsytem. Among other features it supported ability to map multiple remote nodes via linear
or mirroring algorithms to single node, reconnect to failed node, reading balancing and
parallel writing to multiple nodes (in case of mirroring) and
so on.
Now it has gone. There is no more distributed storage you knew before, instead there is
completely new project being developed, which main goal is to provide a transport layer for
the block requests only. Consider it as Network Block Device on huge steroids. Consider it
as iSCSI on huge steroids. Consider it as ATA-over-Ethernet on even more huge steroids.
It is just an example of what all those protocols should have. And only that.
An it does not sound very ambitious, previous DST versions already supported lots of features,
which never existed (and in some cases were impossible to be added) in another block level
network storages.
DST moves further.
There will be no mirroring and overall ability to map multiple devices into single one,
instead one should use Device Mapper for this goal, since its features were simply mirrored
(although I tried to optimize them sometimes) in DST, and amount of targets was noticebly smaller.
Now DST is just a simple block device which operates on top of network connection. With just a
single exception: its done right.
Features planned for the new Distributed Storage:
- kernelspace client and server
- initial autoconfiguration between client and server nodes
- automatic reconnect to failed target
- transaction model: resending, timeout error completion, full rollback of the failed transaction
- wire speed performance
- data channel encryption, strong checksumming
- cryptographical authentification
- ability to work on top of any network protocol
- barriers support (when, if any, Device Mapper will start support them, DST will not need to be changed)
- flexible protocol with simple ability to extend it to needed functionality
- trivial configuration
Project is being written from scratch, but it is actually very simple,
and should be quite small, so expect its first release quite soon.
It will be pushed upstream when ready.
/devel/dst :: Link / Comments (8)
Fri, 18 Jul 2008
Completed distributed storage redesign.
I also managed to play second octave F# and sometimes the whole chromatic scale
down to small (minor?) octave F on my trumpet, and I belive I started to understand
overall trumpet kung-fu, but expect it is not what you wanted to read under
DST tag.
So, DST becomes smaller, cleaner and simpler. Notably, I decided to drop userspace
target completely for now.
Kernel part now operates on transaction entity, which holds a reference to the node,
where data should be sent/received. There can be at most two such nodes if block IO
request spans the boundary. In case of mirroring (which will be dropped for the first release)
list of nodes to mirror this data to will be maintained by the first node, so transaction
will not need to know about them.
In theory block request can be as much as BIO_MAX_PAGES pages,
which is 256 for now, but I decided to limit minimum node size to be not smaller than
above bio limit, so there will be always at most two nodes per request.
Each node has either block device behind it (so it will just call generic_make_request()
with different block device for given bio), or network state machine.
Network state will have two threads: RX and TX. Receive one is used to get replies for the
read/write messages, search appropriate transaction and complete it.
In case of DST server it will also handle read/write requests and generate replies, but the whole
processing will be exactly the same, client node will have a switch to process read/write requests from
the network, but they should be only received by server.
Sending thread is tricky.
It is used as fallback for non-blocking sockets, which are used first at generic_make_request()
time, i.e. when higher level user performed read or write, if block was not fully sent,
then it is queued to this thread and it will try to send the rest of the data when
polling allows. ->make_request_fn() function returns in this case and higher
layer can proceed with own operations.
Transaction is not freed until reply is received from the remote side or resending retry
count fires.
Transaction is always allocated (from the appropriate memory pool) and that is actually
all allocations in DST itself. In case it works with block devices, it is possible to clone a bio,
when it crosses the boundaries (or even always, I have to check it, but it is essentially
what device mapper with lots of own additional allocations), but it should be very rare condition.
Network stack will allocate data itself too.
That was a theory. Practice tells me, that essentially 90% of the code should be rewritten
from scratch, so I recloned the tree and so far implemented generic bits of registering
block device, creating various sysfs files and directories and other similar trivial bits.
I still plan to finish it this weekend (without mirroring), but things may turn to me a different side though...
/devel/dst :: Link / Comments (0)
Have sent all documents for US visa.
Checked my passports and decided that if other countries allowed to let me in
with that photos, then US custom officers should not frown too much upon
current ones.
So, waiting for the results. I almost sure that I will get visa and
will met with interesting people at kernel summit and Plumbers conference,
but anyway would like to draw the line.
For instance, Zach Brown will
talk
about CRFS (as long as show
some chocolate and coctail bars around, imho the only good coctail is rum with cola
(smaller colla) and ice), so there will be something to listen.
/life :: Link / Comments (0)
Thu, 17 Jul 2008
"The Gun Seller" by Hugh Laurie.
Just finished to read this excellent
detective novell (at Amazon,
electronic version in russian).
People call it the best english humor novell for reason: it indeed is fun and interesting,
although I suspect lots of its witty satire was a bit lost in translation, but nevertheless
I do recommend it for easy reading.
And of course if you like House M.D., you have
to read this novel, and you will not waste your time for sure.
/other :: Link / Comments (0)
Morning trumpet exercises.
Today's morning I raped ears almost two hours,
and at the end managed to play a chromatic scale (glide?)
from second octave D (E trumpet) down to minor octave F (E trumpet),
at least that is what my Korg tuner showed. Much more frequently
I was able to play single first octave (only via descended direction
though, I did not yet try to rise tones).
Korg AW1 tuner does not show octaves, but I really do not think,
that it is possible to play one octave lower than what my the lowest
sound was, but pretty sure it is possible to have at least one octave higher
than my the highest tone, so I decided that I play several tones around first octave.
Ugh, it was supposed to move earlier to the office (before heat and traffic jams),
but instead I fucked my brain via ears (and probably neighbours were not happy
either, although I did not play on the full volume).
/life :: Link / Comments (0)
Wed, 16 Jul 2008
New toys: Korg AW1 tuner.
I believe I can produce enough sounds out of my trumpet,
so I need to have tuner, which will tell me how bad that sounds are.

So, now I'm starting to seriously tune my sounds.
So far I hope think that I can play at least two octaves, actually I mean not to play,
but to produce a sound, it is still not that simple and not always very clean. But since
I 'play' (or better say rape ears study) my trumpet only couple of months and never played any instrument (not counting
couple guitar riffs in the university) before and do not play with a teacher, I think this tuner will
be very good addition.
/other :: Link / Comments (4)
Tue, 15 Jul 2008
Distributed storage development roadmap.
Yes, DST
project is alive and will beat out the crap very soon, since I decided to change its
underlying architecture, and switch to transaction model just like
POHMELFS.
This basically means that as long as system has enough RAM writing operations will be
extremely fast, reading can be balanced between multiple nodes (in mirror), transactions
can be resent, failover mechanism becomes much simpler,
and system overall will be much more robust to failures.
Transaction model also means that system requires explicit acknowlege from remote side,
and there are two possibilities here: two handle implicit ack which comes with TCP ack
packets like I experimented
before, and send explicit ack from server for each client's request. \
The former approach although has smaller performance overhead, still suffers from
the fact, that pages sent via DST are always stateless, i.e. at this layer there is
no knowledge about who sends this page. We can determine inode page belongs to, can
even get a socket when page is about to be released when ack has been received,
but we can not know from exactly which PIPE it was submitted into given socket,
so when multiple threads send the same page via miltiple sendfile()
calls we do not know when and how page will be released. We can put pipes this page belong
to into single-linked list (since page has only two unused at this point pointers: LRU
list head, and one of them is used to determine that this page belongs to sendfile()/splice codepath),
and likely traversing this list will not hurt usual users, but malicios one can
create a local DoS with this approach. After some experiments with the splice code
today I decided to drop this idea implementation for now.
There is a strong argument in favour of explicit acks from the server: this allows to make asynchronous transaction
processing (with implicit acks we can not hook into processing path, since we do not know where exactly
skb with our pages is chained), and this does not hurt perfromance (which was proven by
POHMELFS benchmarks).
So, overall plan to develop DST is to switch to transaction model and perform async processing
of all events (there are only two actually: reading and writing of the given pages to given
locations).
This task is not that complex, so I expect some new results later this week. Stay tuned!
/devel/dst :: Link / Comments (5)
Football match has made the day!
That was exceptionally bloody cool evening!
We had three teams of 6 playerrs in each and played
on a small mini-football field about 2.5 hours, each match took
either 7 minutes or 2 goals into single gates. It sucked power
so much cool, that even exceptional tireness right now brings
kind of masochistic pleasure.
My breathing system really sucks, and actually it is not a surprise,
I did not play football more than 5 years already, but nevertheless
shoes and ball are in a good shape.
I managed to damage knees, shoulder and fingers on the leg in various
'contacts' during the game, but that's not a problem.
Our team was not the best one really, but we strongly hold second place,
and actually can fight for the first one, since all our players had long
enough pauses in own games, while first team players regulary train
in its own teams (including youth football champion).
That was the super time!
/life :: Link / Comments (0)
Mon, 14 Jul 2008
ParaLLels concert.

Visited ParaLLels concert this weekend...
Mixed feelings, but saw lots of old friends (musicians by a coincidence), which made the day.
/life :: Link / Comments (0)
Sun, 13 Jul 2008
Hermite interpolation.

This interpolation uses cardinal splines approach, and namely
Catmull-Rom splines. Next task is to test how the Kochanek-Bartels splines (also called TCB-splines)
behave. The latter are used in all popular 3d modelling engines. Since math behind them
is very non-trivial, I will try just to use existing formulas for hermite tangents, which
are quite simple.
Now its time to think, how to use this knowledge and how to apply given approach
to detect and decode letters on the image...
/devel/math/bezier :: Link / Comments (0)
Sat, 12 Jul 2008
Monday evening prognosis.
It promises to be just bloody excellent!

I did not play football several years already, but I've found people,
who do like it, so our games promse to be really fun and interesing!
There are already three commands (4+1 players in the team).
This will be my first game after about 5 years of football silence,
likely there is nothing in the legs which can help playing football,
well climbing likely does not correlate with it, as long as so my experience in other physical trainings,
but nevertheless I'm looking forward this promised to be excelptionally cool game!
/life :: Link / Comments (0)
Fri, 11 Jul 2008
Spline graphical interpolation fun.

Playing with different spline interpolation methods. So far they seems to be quite simple
when written in matrix form, so I cooked up simple GTK application to test various methods.
There is no interpolation implementation yet, since I devoted last two days to read lots of
materials about Bezier and Hermite interpolation techniques (as long as lots of papers about
distributed hash tables, which I will use as a filesystem storage base for
POHMELFS).
/devel/captcha :: Link / Comments (6)
Wed, 09 Jul 2008
Captcha transformation algorithms.
Couple of first ideas. Pretty trivial.

Next step is to squize images, so that all bold lines moved to single-pixel ones.
In theory it should not be very complex (I have an algorithm in mind), but in practice
it will - starting to recall why in the hell I
learnt LISP.
Basic idea is to transform above BW pictures into simple binary format, which will be read by LISP
application, since I do not know and do not want to devote much time to learn how to parse/process
various image formats, instead it is done by GTK application written in C. I belive LISP was
called the best language for artificial intellengence development for reason, so will try to
find why.
Slacking - rox :)
/devel/captcha :: Link / Comments (6)
Tue, 08 Jul 2008
Anecdots and allegories.
I'm not a major kernel contributor, but I was invited 3 times last
3 years to kernel summit.
And I will try to move to this year
one
in Portland, Oregon, at least I started some preparation process and contacted needed people.
I hope I will also participate in
Plumber's conference.
As before I will bring bottle of vodka (number of people
who wanted to talk suddenly dropped to ground) and greatly appreciate your
contact and discussion topics :)
That's of course if stars will stay in a straight line, but I will push
them a bit.
/devel/other :: Link / Comments (0)
Mon, 07 Jul 2008
New POHMELFS release.
Irish 'Clontarf' and Scotch 'Grant's' helped to rule this release out.
This POHMELFS release features
include:
- Strong cryptography support. One can encrypt whole data channel (except headers) and/or hash/digest it.
System will try to autoconfigure itself and if server does not support requested algorithms, mount will either
fail (if special mount option is specified) or disable appropriate algorithm usage.
- Bug fixes.
Cryptography support is essential addition to the POHMELFS core. It was implemented with performance
in mind, so that processing speeds would not drop noticeble even in case of very CPU-hungry operations
(one can check performance graphs).
POHMELFS utilizes pool of crypto threads (its number can be specified via mount option), which perform data crypto
processing and submit it either to network or VFS layer.
Now I will concentrate mostly on userspace server features, mainly its distributed facilities, current ability
to write data to multiple servers and balance reading among them is not enough for POHMELFS, but it will be an
essential building block of the fully distributed fault-tolerant paralllel filesystem.
If this development will require some changes in kernel side (namely network protocol extension), it will be
don in the upcoming releases with possible found bug fixes.
As usual, you can grab sources from
archive or via
GIT tree.
You can also check POHMELFS homepage
to get more details on its design and supported features.
P.S. I think I will have some rest out of this project for several days, which will allow me to concentrate on
main POHMELFS features and work out rough edges. I will switch to DST
and netchannels (main to make a new releases)
and then will devote some time to captcha cracking algorithms.
/devel/fs :: Link / Comments (4)
POHMELFS crypto processing performance.
If you expected a miracle, it did not happen, so I just present a picture, where
I compared plain async in-kernel NFS server (no encryption, no checksumming)
versus POHMELFS, which performed SHA1 hashing and AES-128-CBC encryption of the whole
data channel.
Block size used in iozone test is 8KB, filesize - 8GB, 1GB of RAM.
/devel/fs :: Link / Comments (4)
Sun, 06 Jul 2008
Vodka drinks.
Vodka itself is very interesting drink, but depending on
situation it can be either the cheapest way to become very drunk,
or possibility to have long and fun time in a good company.
Frequently (and likely most of the time) vodka is used for the first
case only, which is sad of course.
I do not know, when and how vodka became popular in Russia, but
I think it is always associated with my country now. Actually
every nation has some kind of vodka in its own history of drinks,
and likely still has it. For example UK/Ireland has whiskey, which
is effectively vodka, but drawn in an oak barrels. This brings very
interesting taste, which allows to use it as a kind of long drink
(especially with ice). After having a whiskey shot one can start breathing
air in (especially via nose), which brings aftertaste directly into the brain
to the every piece of the body. I do not know any coctails based on whiskey.
In my opinion, Irish whiskey is much more tasty and interesting than
(probalby originals of) Scotch, although the former has much more labels.
USA also used to drink whiskey, but most of the time it is its own
labels, which I did not try yet. USA does not have own popular
drink though, or at least I do not know it.
Europe also has lots and lots of different vodka kinds.
Frech drinks cogniak. I do not like it, and belive that it is only
coloured non-tasty vodka, even likely the best labels like Remi Martin and Hennesy
(although the latter is originated by irelands :), but it is only matter of taste
of course. Cogniak creation process is a bit more complex than
vodka, and it also has very different taste, which (for me) is very similar
to clean vodka. Cogniak is one of the most popular strong drinks. Culture of its
drinking is forgotten, but nevertheless it is very interesting. Cogniak should be
drunken only with special temperature (16 degress Centigrade) in glass of specail
form, which concentrate its airtaste. Cogniak is not swallowed immediately, but
'stored' in a mouth for a while to get all taste.
Frenchmen also created absinthe. This is very strong drink (upto 90 degrees),
but its main feature is thujone. History tells us that thujone was the main reason,
why absinthe was forbidden in Europe, and it was quite strong hallucinogen.
History also tells us that its concentration never exceeded 10%, so it is unlikely
that it had some kind of strong effect. Vincent Van Gogh liked it very much,
there is even a theory that it cut his ear during absinthe intoxication, but likely
it was some special absinthe, since 10% less-to-equal thujone concentration does
not have any significant effect. Right now absinthe is allows in most of the countires,
where it was forbidden 200 years ago.
Eastern Europe used to drink various kinds of vodka, which are called in local manner.
For example so called Cha-Cha, which is quite strong (upto 80 degrees) drink, but usually
very clear, so it can be drunken without dilution.
The New World (most of it is from Mexico) brings us very interesting vodka-like drink
called tequila. It is frequently called mexican vodka, although US also produces own labels.
There are also types, which are made using french cogniak barrels.
Usually it is drunken with salt, lime (sometimes lemon) and
mulatto female. Process is very interesting: you lick mulatto's hip, cover it with salt,
lick it, get tequila shot and eat a lime portion. Even without mulatto it is still very
tasty drink. Tequila is made out of special agave sorts, the more it has, the higher
is quality.
One of the very known vodka-like drinks from Carribean is rum. It is also quite strong
drink, but because of its oil-like elements, it is more sweet and very tasty.
Rum is likely one of the most widely used strong drinks for coctails.
I know that Koreans also very like own kind of vodka, which has smaller spirit concentration,
namely 20 degrees. It is made out of rice.
It is very popular drink to be mixed with beer. Drives you roof away
just after couple of shots.
Ukrainians have very interesting drink called 'Gorilka', which is effectively
vodka with pepper. It is very tasty, but never eat Gorilka pepper, or you are
risking to get a peptic poisoning.
There is several vodka mixes.
First and likely the most known, is 'Screwdriver', whcih is vodka mixed with juise. It
is not very tasty imho. One of the most strong roof-driving-out drink is so called
'ruff' or mixture of vodka and beer. Do not try it if you do not know what it is.
I also know one vodka long drink: vodka with Martini mixed one to one. Although it looks
quite strong, it is very tasty drink with excellent sweet and a bit dry taste.
Using my small cellar I created (at least tried first time) another long drink,
which consists of vodka mixed with 'Malibu' rum. It is also possible to add there juice
or cold tea.
Weekend...
/other :: Link / Comments (9)
Multithreaded POHMELFS crypto processing.
Meanwhile having a rest from various celebrations, I managed
to complete receiving multhreaded crypto processing
in POHMELFS.
So far it was only tested in debug environment (i.e. zillions
of logs and overall miserable performance), but it shows, that
different threads pick up the work, both on sending and receiving
directions.
There is a limitation though: the same crypto threads are used both
for receiving and transmit pathes, so it is possible to saturate them
all for example for receiving, so sending will stall. If there are
unsufficient crypto threads, waiting for RX crypto processing can take
too long, so watchdog transmit scanner will fire up and complete transactions
with errors. One can work this around by specifying big enough number of
crypto threads or long enough transaction scanning timeout, both are provided
via mount option.
I would like to test it in more production-like environment and perform various
stresses on it, but I'm far from my working place, so can not do it right now.
Which means release will be postponed for tomorrow (if testing will not show
regressions or bugs).
This will not be last feature release though: for example POHMELFS does not support
extended attributes and ACLs, there is no header checksum (although there is a reserved
32-but field) there may be some features in different areas too,
but I do not hurry to implement them, since I need something to put into future
POHMELFS changelogs. I think sending the same kernel patch with different words
about userspace server changes is not the way to go, so there should be some kernel
changes too :)
I will draw up some design notes on how I plan to implement POHMELFS server, and namely
how distributed facilities will be done, so far I have quite clear picture in mind,
but it needs to be worked out 'on paper' to find rough corners.
Stay tuned!
/devel/fs :: Link / Comments (0)
Sat, 05 Jul 2008
Midnight creatiff. Casted by LHC start.
- Shit! There are no more M8 screw-nuts.
- What? Use M12, bozon should pass through.
- We all will be fucked this Monday!

Good night. Actually as a former physicist I can say,
that at least two out of four killing theories are really
stupid, but nevertheless its interesting!
/other :: Link / Comments (2)
Fri, 04 Jul 2008
In case we will die this Monday...
I've started a countdown...

Large Hadron Collider will be started in 3 days...
/other :: Link / Comments (0)
Thu, 03 Jul 2008
POHMELFS crypto support has been completed.
kernel$ git commit -a
Created commit b07e3ed: Added crypto support.
9 files changed, 1534 insertions(+), 221 deletions(-)
create mode 100644 fs/pohmelfs/crypto.c
fserver$ git commit -a -m "Aded crypto support."
Created commit f916b2f: Aded crypto support.
3 files changed, 788 insertions(+), 94 deletions(-)
I implemented pool of crypto processing threads (number of them
is mount option parameter), each of which has pool of pages to
encrypt data into, so crypto thread is not released until server
returns acknowledge that data was successfully written, so one
should tune number of threads and page pool (number of pages
in each thread is maximum number of pages per transaction,
this limit has own mount option too) according to desired behaviour.
Testing shows that writing performance was reduced with this approach
noticebly: with 4 encryption threads and 4 receiving thread in server
perfromance dropped by around 30% from 65+ MB/s down to 46+ MB/s,
but I think it can be improved with larger number of encryption threads.
During iozone write/rewrite test each of 4 crypto threads ate about 20-30%
of CPU, while server ate about 130% (4 threads totally). In all previous iozone tests
the larger number of userspace was used, the worse results were
(this is somewhat expected, since iozone is singlethreaded benchmark,
so larger number of threads lead only to performance degradation),
so I will test different setups (namely larger number of crypto threads
and smaller number of server threads).
But this behaviour is not a problem, and I expect it to be tuned, real
problem is reading performance. Right now there is only single thread,
which reads from one socket: it was done intentionally, since reading
data from socket is longer operation than searching page in radix tree
or any other operation performed by that thread, so there is no way
to saturate its capabilities. Until we start encryption, which is slow,
so any subsequent data reading from the socket can not be done in parallel
with crypto processing, and overall reading performance drops to ground.
This problem has to be fixed, so I plan to use the same crypto
processing threads to decrypt and/or perform hash check for received data
and push it up to the VFS stack.
/devel/fs :: Link / Comments (0)
Wed, 02 Jul 2008
POHMELFS crypto: feel incredibly stupid.
First,
POHMELFS
does need to have encryption. Because I plan to use
distributed hash table approach in server (well, consider POHMELFS
kernel client as a kind of bittorrent filesystem client), and as in any
non-centralized system, content transferred via uncontrolled data channels
has to be encrypted.
But... I'm incredibly stupid: I implemented encryption and decryption in place,
i.e. VFS page is being encrypted prior to be written to the servers, so
subsequent reading leads to... Yes, it reads encrypted content.
To fix this issue I plan to encrypt data into different pages and send them,
leaving VFS ones as is. There are two approaches I consider:
- allocate and send pages at writeback time - we want to send 5 pages, so allocate
5 pages, encrypt data into them and broadcast them to all needed servers.
- allocate (potentially large) pool of pages at mount time per crypto thread
and encrypt data into them. This will have about zero run-time overhead for VFS,
except slightly delayed because of encryption write completion.
/devel/fs :: Link / Comments (7)
Louis Maggio trumpet school: never smile.
/life :: Link / Comments (0)
Holy shit: kernel summit.
We would like to invite you to the 2008 Kernel summit, and we hope that
you will be able to join us...
I'm trying to recall previous kernel summit:

That was fun, but no one wanted to play football instead of talking about whatever we talked about.
For that year I only committed a
HIFN driver
into the tree, and there was no kevent :)
This time in US, thinking...
/devel/other :: Link / Comments (5)
Tue, 01 Jul 2008
Why is blocking sending considered harmful?
I frequently hear that whatever server you implement, it has to
be non-blocking, since in case of parallel sending it allows to
send multiple requests to fast servers, while not-sending data to
slow server, since non-blocking socket will return EAGAIN.
This is only half-right solution: when we have to put given data to
all servers, and can not free it until all servers replied with acknowledge,
non-blocking mode can bring more damage than gain.
Mainly because it
allows to eat all the memory for requests, which are still in the queue
to be sent to slow server, and which was already sent to fast ones.
In this case higher-level application (consider simple application which generates
some data and writes it into the file in distributed filesystem, which writes
file to several servers) will never block since transfer
to fast servers completes quickly, and will provide more and more data,
which will consume all RAM.
It is possible to deadlock system in this case,
since to send some data to remote server we always have to allocate at least some
data to put network headers into. With non-blocking solution we will consume
all memory and kick itself into the coma.
/devel/networking :: Link / Comments (2)
Passive OS fingerprinting.
I've updated OSF
modules to xtables, so you have to enable its support in kernel config and get
recent iptables (I tested with 1.4.1.1, which is the latest release to date).
OSF allows you to match incoming packets by different sets of SYN-packet and determine,
which remote system is on the remote end, so you can make decisions based on OS type
and even version at some degreee.
Installation instruction, example and source code can be found on
homepage.
I've also sent it to netfilter-devel@ and netdev@ maillists, since my previous mails never appeared
there likely because of spam filters.
/devel/networking :: Link / Comments (0)
|