Zbr's days.
July
Sun Mon Tue Wed Thu Fri Sat
   
   
2008
Months
Jul
Oct Nov Dec

About TODO Blog RSS Old blog Projects Gallery Notes

Thu, 31 Jul 2008

DNS cache poisoning client/server architecture.

SO far I only implemented simple flooder of the requests, which as number of destination ports as a parameter and two names and addresses to put into answer and additional section of the DNS reply. It uses UDP socket, so source address does not belong to server, which should pretend to answer given query, so actually this application will not work, and I need to implement sending via packet socket and substitue source IP address with DNS authoritative server's one.
Poison flooder also should not use only one name/address in answer section, but insteda it should iterate with client, so appropriate request and answer were synchronized.

So far, initial design of the client/server architecture of this small project looks like this: depending on flags, either client connects to multiple flood servers or vice versa, then client sends a message to each server where specifies a port and ID ranges to attack, attacked DNS server IP, requested query name and source address, pretending to be an authoritative name server and additional resource record data to put into replies (which will poison the cache).
Each server starts sending that data to the specified name server with changed source address to the authoritative name server's one and with ID and port changed in given range. When client finished broadcasting request data to all flood servers, it sends a request to the attacked DNS server with given query name to resolve. Now flood servers race with authoritative one to provide an answer. When client receives the answer, it checks if it looks like poisoned data we wants to get, or real answer (which should be NX domain, since we resolve non-existing names). In the former case we exit the process and enjoy the result, otherwise client specifies next name to resolve and the same starts again.

Looks interesting...

/devel/networking/dns :: Link / Comments (0)


Wed, 30 Jul 2008

Simple DNS server/resolver.

Exact time to hack a DNS server is a middle of the night: 3 A.M. here and I've just completed initial draft of the trivial DNS server, which is only capable to receive a datagram from predefined port, parse it, fill a reply for static "IN A" record (I think I will add a config file), this record is placed into 'answer' and 'additional' resource record sections, then the whole request is being sent back to the client.

That's how it looks for standard UNIX dig command:

$ dig @localhost -p 1025 www.google.com
;; Warning: query response not set

; <<>> DiG 9.4.2-P1 <<>> @localhost -p 1025 www.google.com
; (1 server found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51486
;; flags: rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;www.google.com.			IN	A

;; ANSWER SECTION:
www.google.com.		123456	IN	A	195.178.208.66

;; ADDITIONAL SECTION:
www.google.com.		123456	IN	A	195.178.208.66

;; Query time: 15 msec
;; SERVER: 127.0.0.1#1025(127.0.0.1)
;; WHEN: Wed Jul 30 02:56:23 2008
;; MSG SIZE  rcvd: 64
There are several warnings, which I will fix later, but main part is section content: www.google.com obviously does not have an IP address of my blog site. TTL usually also does not equal to 123456.
Game continues, while I need some sleep...

/devel/networking/dns :: Link / Comments (0)


Distributed storage development progress report.

DST got full transaction support (resending, timeout completion, error recovery, memory pool allocation for all kinds of transactions, single transaction allocation per IO request), socket processing (initialization of the connected and listened sockets, failover recovery of the connection, receiving thread, network helpers), crypto processing of the requests (thread pool utilization for crypto operations, cipher/hash initialization, cached pages for sending crypto processing).
Thinking of moving receiving and listen/accepted sockets processing to the thread pool too, likely it is a way to go, right now they have own threads.

Missing bits include the actual data sending/receiving and client accepting by listened socket (and appropriate initalization of the all needed infrastructure). This is a quite major part, but likely it will be completed sooner than later.

/devel/dst :: Link / Comments (0)


Tue, 29 Jul 2008

Some DNS port distribution data.

Gathered today's late night, so that DNS server would not be too much disturbed by other users.
Graphs below show some BIND (do not know version) source port cloud and distribution for a thousand runs. Each request issued non-existent subdomain of controlled domain server, so I was able to capture dums and analyze them a bit.

DNS source ports cloud DNS source ports distribution

This graphs show source ports cloud and its distribution. Each histogram corresponds to number of hits into 100 ports range, start of the range is shown at X axis labels.
First, port range is randomly selected in 50k-65k range, so one needs to guess much smaller amount of port.
Second, even in 1 thousand requests there are lots of requests with the same port (stats show that there 149 ports, which were used 2 and more times in above 1000 runs, there is even single port which was used 4 times). If we select range of 100 ports, then appropriate distribution is shown on the graph.
Such behaviour allows to limit source port range even more.

Now, DNS IDs.

DNS ID cloud DNS ID distribution

The whole range of IDs is used, and theirs distribution (each histogram corresponds to number of IDs in the appropriate 100 ids range) is more uniform. There were only 9 IDs used twice per 1000 runs.

But since I do not know exact load of the analyzed DNS server (and it can be high even at 3 A.M.), I can not say if that numbers are due to port/id selection algorithm implementation of just because load was high and there were actually not only my 1000 requests.

To further play with DNS caches I decided to install local DNS server first test things with it.

/devel/networking/dns :: Link / Comments (0)


Another excellent LISP book.

Common LISP Cookbook has such interesting things like threads, socket and foreign function interface.
I belive "Common LISP Cookbook" and "Practical Common LISP" form a must-have library for every LISP programmer. So far I think that that's all what is needed, since this set covers vast majority of possible usage cases. Even DSL are covered there in details.

/devel/other :: Link / Comments (0)


Mon, 28 Jul 2008

Distributed storage development progress. Thread pools.

Today I implemented simple thread pool subsystem, which allows to create set of threads, to add/remove them them from this set in run-time, and to schedule a work to be done by them. Work is specified as to functions: setup() - it is called when system has selected a thread for execution, so caller can setup needed data, and action() - it is called by thread itself, it has access to the data, provided at initialization time.
Work scheduling has a timeout parameter, which corresponds to time system will wait for free thread, otherwise error is returned.
System is generic enough not to contain any notion about DST or crypto, only two new data types: struct thread_pool and struct thread_pool_worker, only the former is visible to the user.
API looks like this:

void thread_pool_del_worker(struct thread_pool *p);
struct thread_pool_worker *thread_pool_add_worker(struct thread_pool *p,
	char *name,
	int (* init)(void *private),
	void (* cleanup)(void *private),
	void *private);

void thread_pool_destroy(struct thread_pool *p);
struct thread_pool *thread_pool_create(int num, char *name,
	int (* init)(void *private),
	void (* cleanup)(void *private),
	void *private);

int thread_pool_schedule(struct thread_pool *p,
	int (* setup)(void *private, void *data),
	int (* action)(void *private),
	void *data, long timeout);
init() and cleanup() callbacks above are used after new thread is created, so that user could initialize per-thread data, for example it is used to allocate some cached pages and initialize crypto algorithms.

This thread pool system is used by the crypto processing code in the distributed subsystem: when block io request is about to be sent, or when system has received reply for the read request, it schedules crypto processing work to the pool, initialized at DST node setup time.

Crypto processing does not yet work in DST as long as some other bits, so far I only played a bit with its initlialization sequence, so it was split to network, crypto, security initializations and node start, which registers new storage in the block layer subsytem. This steps allow to introduce later additional initialization steps if needed without breaking backward compatibility.

Next steps include proper network initialization and processing and transaction management helpers. Then I will combine all existing code and make a first renewed release.
Stay tuned!

/devel/dst :: Link / Comments (0)


Sun, 27 Jul 2008

Lots of talks about DNS cache poisoning attack.

There are two types of this attack: DNS query ID guessing and request source port guessing for servers which use randomized source port, which should be turned on after Dan Kaminsky's alert.

DNS ID is 16 bits only, so it could be guessed rather fat, one just need to force someone who uses attacked DNS cache to issue appropriate requests. When request is received by DNS resolver, it is stored there for predefined amount of time (TTL parameter provided by higher-level DNS resolver or eventually authoritative name server). Dan found, that attacker can actually ask not for attacked domain, but some subdomain of it (if attacker tries to point www.microsoft.com to own IP, it can force sending DNS requests for 1.microsoft.com, 2.microsoft.com and so on), and put data about actual target into additional resource records attached to all datagrams. So, when it eventually win the race, it can store (among lots of subdomains) needed pointers in the attacked DNS cache.

I've just thought that this attack will not be possible, if all queries from DNS resolvers to higher-level resolvers and/or authoritative name servers would happen over TCP instead of more common UDP. There is no need to issue requests from random ports anymore, no need to parse and drop additional resource records. There will be no problems with truncation of large messages... But to play a bit with the whole idea I'm implementing a simple DNS query/response processor. Maybe will play a bit with local cache (ISP at office uses only 6 different ports to send requests) poisoning, although its main goal is IP-over-DNS tunnel.

This is kind of a real rest after VISA/hotel paperwork. I was told, that if I will be called to embassy for the interview, chances are high VISA will be declined because of my sence of humor :)

Update:

zbr@gavana:~/aWork/tmp/dns$ ./query -a 195.178.208.66 -i 0x1234 -q tservice.net.ru
query: 'tservice.net.ru', class: 1, type: 1, server: 195.178.208.66:53, protocol: 17, id: 1234.
Connected to 195.178.208.66:53.
id: 1234: flags: resp: 0, opcode: 0, auth: 0, trunc: 0, RD: 1, RA: 0, rcode: 8.
        : question: 1, answer: 1, auth: 2, addon: 2.
	: question: name: 'tservice.net.ru.', type: 1, class: 1.
	: name: 'tservice.net.ru.', type: 1, class: 1, ttl: 86400, rdlen: 4, rdata: 195.178.208.66
	: name: 'tservice.net.ru.', type: 2, class: 1, ttl: 86400, rdlen: 14, rdata: ns.tservice.ru.
	: name: 'tservice.net.ru.', type: 2, class: 1, ttl: 86400, rdlen: 7, rdata: dns2.tservice.ru.
	: name: 'ns.tservice.ru.', type: 1, class: 1, ttl: 86400, rdlen: 4, rdata: 195.178.208.66
	: name: 'dns2.tservice.ru.', type: 1, class: 1, ttl: 86400, rdlen: 4, rdata: 62.141.76.164
And DNS protocol gets the first price among the ugliest crappies.
Now its time to create a DNS server itself, which will get requests (above dump shows BIND session), parse them and perform appropriate actions, like sending reply with specially crafted additional resource records, either NULL one for example (can contain upto 64k of data) or TXT (length byte followed by character string, there may be multiple strings as long as total length (including length bytes itsef) is less than 64k). Or additional A resource record, which may contain information about domain to poison...

/devel/networking/dns :: Link / Comments (0)


Sat, 26 Jul 2008

New POHMELFS release.

This release was fully made by other developers. Thanks a lot for your work.
I only updated some trivial bits and fixed bug in the server.

Short changelog:

  • Documentation update by Adam Langley (agl_imperialviolet.org). Now one can read properly spelled POHMELFS design.
  • Server and configuration utility IPv6 support by Varun Chandramohan (varunc_linux.vnet.ibm.com). Kernel client does not need this changes, since it supports any protocol. Now one can create POHMELFS cluster over IPv6.
  • Server bug fix and small documentation update by me.
One can get more detail about POHMELFS at its homepage. Sources can be downloaded from archive or via GIT tree.

/devel/fs :: Link / Comments (0)


Fri, 25 Jul 2008

This was supposed to be a new POHMELFS release day.

I accumulated patches from Varun Chandramohan of IBM Linux center, which add IPv6 support to the POHMELFS server and configuration utility. Kernel client does not need it, since it works with any kind of addresses (by design).
I also wanted to add documentation update from Adam Langley, but apparently I accidentally deleted his patches, so release is being postponed a bit.

Meanwhile I made some little progress at DST development side. Added trivial configuration bits and started to develop cryptography part, mainly configuration (which I will copy from POHMELFS) and thread pool subsystem.
The latter is rather simple patch, which will allow to create a thread pool, to add/remove threads on demand and to queueu a work to the pool. In theory this can be a generic enough patch to be used by other users (I even saw some kind of topic proposal for kernel summit), but so far I'm not going to push it separately from DST. Main goal of this system is crypto processing of the BIOs for the distributed storage.

/devel/fs :: Link / Comments (0)


Do you like when you are photographed?

I do not, so there are no photos with me on this site (if you would see my passport photos...). But I like to make them and sometimes I create really interesting ones.
People even print them and give me presents for it, and since photos were made in public, I think I can publish them.

Although I frequently make photos of people when they do not expect it in public, and this ones are really the best (never look at photographer!). I sometimes make them to laugh on someone, this is of course a private data, which I only send to the 'model' if not delete immediately.
My theory stands on the matter, that all people are very interesting from the photographer point of view. This just has to be found. I'm trying to do it, and sometimes I succeed.
I do not know, if it is good or not to publish such photos. I only save pictures which are definitely interesting for me. Of course it's just a matter of taste.

So, I'm thinking about creation of the new tag in my blog, where I will post photos made by me. Not that much, one or so pictures per week. So if you do not like the idea, you can always read development tag only.

/other :: Link / Comments (0)


Wed, 23 Jul 2008

Manager's thoughts: unused extensibility and used de-facto standards.

After some before-sleep-reading (this time DNS RFC specifications) I found, that DNS protocol is so much extensible, that is can perfectly cover not only its area, but also help in really lots of close problems. It already has (though completely unused) many interesting RRs and types, which have nothing to deal with DNS (like NULL RR, which allows to transmit binary data or TXT RR, which also is not related to DNS area). And the most popular RRs are A, PTR, SOA CNAME and MX. That's all from about 20 others. The same applies to (q)type and class (I first time read about Hesiod class for example). And DNS allows to introduce own classes, types and resource records.
It is just not used, but we could create distributed DNS system with new types. It would be really simple (and actually it can be done even without new DNS extensions).
But it is not actually needed, since people are used to have DNS just like it is.

Another example is internet video. There is de-facto Adobe standard, no matter what W3C will put into its new standard, everyone will continue to use existing one. Just because it works ok. Not excellent or perfect or whatever, it just works how we used to know.

And there are lots and lots similar examples.

People are so much intert in this questions (although I think in most areas, just because it is convenient not to do something better, when existing solution just works, even if not perfectly and even if not good), that no one will ever bother to change something dramatically, because it will not only require huge amount of money, but also changes in the way people used to think about given area, which is likely even more complex (and money-hungry) problem.

All this talk is about simple thing, I just opened for myself: when you created something completely new, even if it is not the best solution for given problem, if you will start pushing it to wide audience to be used, then you are able to get all 'the market'. That's why when you have something new on the market, where most of the users already used to work with one or another solution, (and even if your project is potentially very good and definitely much better than existing solutions) then there will not be any major gain, only single links to the completely new users.
This is probably told to the first year MBA students, but I was quite excited and dissapointed by this issue: the first new idea, when properly presented even if not the best solution for given problem, can get all the users, after which they will not switch to the new one just because they used to have it this way.

/devel/other :: Link / Comments (1)


Tue, 22 Jul 2008

POHMELFS distributed facilities design notes.

Since I'm quite busy with VISA/hotel/tickets and overall preparations for Kernel Summit, there is no development progress, but it should be completed very soon I think, and so I will write here some design notes I have in mind about how POHMELFS server will be designed. It is not a finished draft, but somewhat a rough direction paint.

POHMELFS will utilize distributed hash table approach, i.e. storage will support ability to get an obect based on some key attached to it. In a local filessytem we already work with hash table: directory lookup is no more than lookup for inode object based on its name, i.e. lookup for the value based on attached key. And although key in this case is not created based on object itself (like hash of the content or some other function), it still is a (turn on your imagination here) table lookup.

Cloud of POHMELFS servers will utilize similar approach. Consider a single server in the system. When it joins the cloud (I ommit this proccess for now, and will describe it below) first time, it is empty, so it gets some unique id, either via administrator steps or randomly, or it just waits in the queue to be filled with new data, so it will get id at that time, it does not matter for now how it gets its id, but this id is propagated to some cloud of its neighbours (or if it would be a bittorrent or napster to the main server).
There are two ideas on how to treat this ID: either as a part of the filename, or as a nameless pointer in the abstract namespace, I will show below that actually it does not matter.

Now, let's check what will happen when user wants to perform some IO on given file.
Every file access actually happen to inode, stored on disk. In our case it can be stored somewhere we do not know yet where, so we need to perform a lookup to get address of the node in cluster which contains our data. In existing schemas like bittorrent or Lustre there is a server (or small cloud of servers) which contain mapping information about where this or that object is placed in data cloud, so simple lookup to this server(s) return needed info. This approach does not scale to really lots of nodes and is failure-prone.
Instead I consider completely distributed metadata storage. Let's check how system will lookup the whole path in our case.

Each path starts from the root directory, which is '/', which in turn is a id in the global namespace (or hash from this string or whatever else mapping), so we first need to lookup a node, which is responsible to content of this directory. Each node contains routes only to the very limited set neighbour nodes (in various designs this number varys, but idea lays in the fact, that node, performing lookup, does not know which node contains needed info). Gnutella system just broadcasted this lookup request to all of its neighbours, so each one broadcasted it to its neighbours and so on until one of the system replied, that it contains needed info. Amount of unneded broadcasts killed Gnutella next day after Napster was closed.
So, this approach does not scale, and instead we need to map needed directory into node address in a more intelligent way. There are at least two the most appealing design choices: ring-based structure implemted in CHORD and multidimensional torus implemented in CAN.
Right now it does not matter, let's assume that we found a node, which has information about content of the needed directory. When we have that data, we can find next node (or this info can be cached on 'parent' directory node) and so on until get node, which is resposible for storing content of the needed object.

When new node joins the cloud it connects to one or another known node (provided either in public service or by administrator) and sends there information about its available space, gets ID and just waits until some client connects to it and start writing a data.
When node joins with some content, which was written to it by the system before, or written by local users bypassing distributed mechanism, node has to tell this information to the node, which holds parent directory. This information should be stored in each directory it exports, or it can be provided by administrator, for example this node exports dir '/zbr' which is actually a subdir of '/home', so node will lookup '/home' directory content owner and update its records, that now it contains new dir. There is a problem here: what if there is already another node, which also claims to have dir '/zbr' in '/home'? This can be handled via attached to each object extended attribute, which will tell us the last modification date, so system can select either the last modified '/zbr' dir or that node, which contains dir with the biggest number of the same replicas. It can be setup by administrator.

Main advantage of this joining scheme is the fact, that we actually do not need to know content of any object in the exported directory, we publish only high-level object, which may or may not contain some inner file or dir. Thus we do not need to hash millions of files in the exported directory and publish them one by one, we do not need to store information about each inner object, no need attach full path to each object and so on.

When we will decide to split the same object between multiple node, we will need to introduce not only name based lookup, but also extend it to the offset inside the object. This can be done by introducing ssytem wide 'block size', so each file is actually set of blocks of given size, so when we found a node, resposible for storing information about directory, where it is located, this node can also contain information where each part of the object was stored.

Looks quite simple, but... Devil is in the details.
I obviously missed some bits in the design (and I created it in mind during talk being under 'impression' of the greece spirit while talking with asm@, who suggested to look at Kademlia project), like redundancy management of the nodes, splitting of the node content between multiple nodes and other bits, but it is one of the first drafts, so things can be changed if needed.

Stay tuned, I will be very soon back to development process (DST first :), since paper work for kernel summit travel seems to reach its end.

/devel/fs :: Link / Comments (0)


Mon, 21 Jul 2008

Foot, fingers, shins, knees, thigs, shoulder, back.

No, it is not parts of the body I know (half of it I looked in the dictionary), it is what is being aching right now.

Its called football.

Yes, sounds a bit scary, but that was hell the super game today. We were much stronger, but I have to admit, that mostly because we get a right transfer decision and selected right players at the beginning, so our previous team was strengthenen. I managed to make a goal, couple of nice saves and even make quite technical outplay sometimes, which was quite surprisingly, since I did not play football for 5 years.
I would not say I'm getting into the shape, but have a progress.

/life :: Link / Comments (0)


Sun, 20 Jul 2008

Crazy security idea.

I've just thought, that I do not know a way to make some (running) application to encrypt all its data, which hits the disk (either via swap or usual way, like editor writing the file and all its temporary files).
I actually consider this as a very useful feature for the editors, browsers, instant messengers and mail clients, downloading applications and musical players and so on. This is especially valid for temporary files, when one expects editor to be highly secure (or even working on encrypted partition), while its temprary files are stored somewhere in /tmp which is not encrypted.

It could be started via some wrapper, which will tell the kernel encryption algorithm, key, iv and all needed info, it will attach a crypto processing callback to the process, so when disk activity is started by given pid (swap or data writing or reading), it is encrypted/decrypted in flight.
Kernel should check all file descriptors opened by the given process and appropriately process them. There may be some problems with communication with unprotected applications, which should be thought out, but overall I like the idea...

Has put it into todo list.

/devel/other :: Link / Comments (0)


Project presentation.

I've just realized, that lots of my blog posts are valid enough presentation abstracts, at least they contain enough words describing the problem, possible solution and overall interested for given area topics. But I never presented such projects in english before, although quite frankly I'm not that bad speaker in russian, at least I am not afraid to talk and probably like a contact with interesting auditory. After all there is this blog :) and even had number of similar kind of presentations from 15 minutes to couple of hours including question/answer part.
My english used in blog is rather ugly, but I rarely (if at all) fix errors which I detect after subsequent reading of the text in the browser (and I detect lots of them) as long as in mails and other posts.
So probably eventually we will have interesting talks about diferent areas, but expect to 'listen' a world-wide language of the gestures :)

/devel/other :: Link / Comments (0)


Sat, 19 Jul 2008

Disributed storage is dead, long live the Distributed storage!

As you may know, DST project was an attempt to implement redundant, failover resistant, flexible block level storage subsytem. Among other features it supported ability to map multiple remote nodes via linear or mirroring algorithms to single node, reconnect to failed node, reading balancing and parallel writing to multiple nodes (in case of mirroring) and so on.

Now it has gone. There is no more distributed storage you knew before, instead there is completely new project being developed, which main goal is to provide a transport layer for the block requests only. Consider it as Network Block Device on huge steroids. Consider it as iSCSI on huge steroids. Consider it as ATA-over-Ethernet on even more huge steroids.
It is just an example of what all those protocols should have. And only that.
An it does not sound very ambitious, previous DST versions already supported lots of features, which never existed (and in some cases were impossible to be added) in another block level network storages.
DST moves further.

There will be no mirroring and overall ability to map multiple devices into single one, instead one should use Device Mapper for this goal, since its features were simply mirrored (although I tried to optimize them sometimes) in DST, and amount of targets was noticebly smaller.

Now DST is just a simple block device which operates on top of network connection. With just a single exception: its done right.

Features planned for the new Distributed Storage:

  • kernelspace client and server
  • initial autoconfiguration between client and server nodes
  • automatic reconnect to failed target
  • transaction model: resending, timeout error completion, full rollback of the failed transaction
  • wire speed performance
  • data channel encryption, strong checksumming
  • cryptographical authentification
  • ability to work on top of any network protocol
  • barriers support (when, if any, Device Mapper will start support them, DST will not need to be changed)
  • flexible protocol with simple ability to extend it to needed functionality
  • trivial configuration
Project is being written from scratch, but it is actually very simple, and should be quite small, so expect its first release quite soon.
It will be pushed upstream when ready.

/devel/dst :: Link / Comments (8)


Fri, 18 Jul 2008

Completed distributed storage redesign.

I also managed to play second octave F# and sometimes the whole chromatic scale down to small (minor?) octave F on my trumpet, and I belive I started to understand overall trumpet kung-fu, but expect it is not what you wanted to read under DST tag.

So, DST becomes smaller, cleaner and simpler. Notably, I decided to drop userspace target completely for now.
Kernel part now operates on transaction entity, which holds a reference to the node, where data should be sent/received. There can be at most two such nodes if block IO request spans the boundary. In case of mirroring (which will be dropped for the first release) list of nodes to mirror this data to will be maintained by the first node, so transaction will not need to know about them.
In theory block request can be as much as BIO_MAX_PAGES pages, which is 256 for now, but I decided to limit minimum node size to be not smaller than above bio limit, so there will be always at most two nodes per request.
Each node has either block device behind it (so it will just call generic_make_request() with different block device for given bio), or network state machine.
Network state will have two threads: RX and TX. Receive one is used to get replies for the read/write messages, search appropriate transaction and complete it. In case of DST server it will also handle read/write requests and generate replies, but the whole processing will be exactly the same, client node will have a switch to process read/write requests from the network, but they should be only received by server.
Sending thread is tricky. It is used as fallback for non-blocking sockets, which are used first at generic_make_request() time, i.e. when higher level user performed read or write, if block was not fully sent, then it is queued to this thread and it will try to send the rest of the data when polling allows. ->make_request_fn() function returns in this case and higher layer can proceed with own operations.
Transaction is not freed until reply is received from the remote side or resending retry count fires.
Transaction is always allocated (from the appropriate memory pool) and that is actually all allocations in DST itself. In case it works with block devices, it is possible to clone a bio, when it crosses the boundaries (or even always, I have to check it, but it is essentially what device mapper with lots of own additional allocations), but it should be very rare condition.
Network stack will allocate data itself too.

That was a theory. Practice tells me, that essentially 90% of the code should be rewritten from scratch, so I recloned the tree and so far implemented generic bits of registering block device, creating various sysfs files and directories and other similar trivial bits. I still plan to finish it this weekend (without mirroring), but things may turn to me a different side though...

/devel/dst :: Link / Comments (0)


Have sent all documents for US visa.

Checked my passports and decided that if other countries allowed to let me in with that photos, then US custom officers should not frown too much upon current ones.

So, waiting for the results. I almost sure that I will get visa and will met with interesting people at kernel summit and Plumbers conference, but anyway would like to draw the line.

For instance, Zach Brown will talk about CRFS (as long as show some chocolate and coctail bars around, imho the only good coctail is rum with cola (smaller colla) and ice), so there will be something to listen.

/life :: Link / Comments (0)


Thu, 17 Jul 2008

"The Gun Seller" by Hugh Laurie.

Just finished to read this excellent detective novell (at Amazon, electronic version in russian).
People call it the best english humor novell for reason: it indeed is fun and interesting, although I suspect lots of its witty satire was a bit lost in translation, but nevertheless I do recommend it for easy reading.

And of course if you like House M.D., you have to read this novel, and you will not waste your time for sure.

/other :: Link / Comments (0)


Morning trumpet exercises.

Today's morning I raped ears almost two hours, and at the end managed to play a chromatic scale (glide?) from second octave D (E trumpet) down to minor octave F (E trumpet), at least that is what my Korg tuner showed. Much more frequently I was able to play single first octave (only via descended direction though, I did not yet try to rise tones).
Korg AW1 tuner does not show octaves, but I really do not think, that it is possible to play one octave lower than what my the lowest sound was, but pretty sure it is possible to have at least one octave higher than my the highest tone, so I decided that I play several tones around first octave.

Ugh, it was supposed to move earlier to the office (before heat and traffic jams), but instead I fucked my brain via ears (and probably neighbours were not happy either, although I did not play on the full volume).

/life :: Link / Comments (0)


Wed, 16 Jul 2008

New toys: Korg AW1 tuner.

I believe I can produce enough sounds out of my trumpet, so I need to have tuner, which will tell me how bad that sounds are.

Korg AW1

So, now I'm starting to seriously tune my sounds.
So far I hope think that I can play at least two octaves, actually I mean not to play, but to produce a sound, it is still not that simple and not always very clean. But since I 'play' (or better say rape ears study) my trumpet only couple of months and never played any instrument (not counting couple guitar riffs in the university) before and do not play with a teacher, I think this tuner will be very good addition.

/other :: Link / Comments (4)


Tue, 15 Jul 2008

Distributed storage development roadmap.

Yes, DST project is alive and will beat out the crap very soon, since I decided to change its underlying architecture, and switch to transaction model just like POHMELFS. This basically means that as long as system has enough RAM writing operations will be extremely fast, reading can be balanced between multiple nodes (in mirror), transactions can be resent, failover mechanism becomes much simpler, and system overall will be much more robust to failures.

Transaction model also means that system requires explicit acknowlege from remote side, and there are two possibilities here: two handle implicit ack which comes with TCP ack packets like I experimented before, and send explicit ack from server for each client's request.
\ The former approach although has smaller performance overhead, still suffers from the fact, that pages sent via DST are always stateless, i.e. at this layer there is no knowledge about who sends this page. We can determine inode page belongs to, can even get a socket when page is about to be released when ack has been received, but we can not know from exactly which PIPE it was submitted into given socket, so when multiple threads send the same page via miltiple sendfile() calls we do not know when and how page will be released. We can put pipes this page belong to into single-linked list (since page has only two unused at this point pointers: LRU list head, and one of them is used to determine that this page belongs to sendfile()/splice codepath), and likely traversing this list will not hurt usual users, but malicios one can create a local DoS with this approach. After some experiments with the splice code today I decided to drop this idea implementation for now.
There is a strong argument in favour of explicit acks from the server: this allows to make asynchronous transaction processing (with implicit acks we can not hook into processing path, since we do not know where exactly skb with our pages is chained), and this does not hurt perfromance (which was proven by POHMELFS benchmarks).

So, overall plan to develop DST is to switch to transaction model and perform async processing of all events (there are only two actually: reading and writing of the given pages to given locations).
This task is not that complex, so I expect some new results later this week. Stay tuned!

/devel/dst :: Link / Comments (5)


Football match has made the day!

That was exceptionally bloody cool evening!
We had three teams of 6 playerrs in each and played on a small mini-football field about 2.5 hours, each match took either 7 minutes or 2 goals into single gates. It sucked power so much cool, that even exceptional tireness right now brings kind of masochistic pleasure.
My breathing system really sucks, and actually it is not a surprise, I did not play football more than 5 years already, but nevertheless shoes and ball are in a good shape.

I managed to damage knees, shoulder and fingers on the leg in various 'contacts' during the game, but that's not a problem.
Our team was not the best one really, but we strongly hold second place, and actually can fight for the first one, since all our players had long enough pauses in own games, while first team players regulary train in its own teams (including youth football champion).

That was the super time!

/life :: Link / Comments (0)


Mon, 14 Jul 2008

ParaLLels concert.

ParaLLels

Visited ParaLLels concert this weekend...
Mixed feelings, but saw lots of old friends (musicians by a coincidence), which made the day.

/life :: Link / Comments (0)


Sun, 13 Jul 2008

Hermite interpolation.

Hermite interpolation examples

This interpolation uses cardinal splines approach, and namely Catmull-Rom splines. Next task is to test how the Kochanek-Bartels splines (also called TCB-splines) behave. The latter are used in all popular 3d modelling engines. Since math behind them is very non-trivial, I will try just to use existing formulas for hermite tangents, which are quite simple.

Now its time to think, how to use this knowledge and how to apply given approach to detect and decode letters on the image...

/devel/math/bezier :: Link / Comments (0)


Sat, 12 Jul 2008

Monday evening prognosis.

It promises to be just bloody excellent!

My old Select!

I did not play football several years already, but I've found people, who do like it, so our games promse to be really fun and interesing! There are already three commands (4+1 players in the team). This will be my first game after about 5 years of football silence, likely there is nothing in the legs which can help playing football, well climbing likely does not correlate with it, as long as so my experience in other physical trainings, but nevertheless I'm looking forward this promised to be excelptionally cool game!

/life :: Link / Comments (0)


Fri, 11 Jul 2008

Spline graphical interpolation fun.

Bezier/Hermite interpolation interface

Playing with different spline interpolation methods. So far they seems to be quite simple when written in matrix form, so I cooked up simple GTK application to test various methods.
There is no interpolation implementation yet, since I devoted last two days to read lots of materials about Bezier and Hermite interpolation techniques (as long as lots of papers about distributed hash tables, which I will use as a filesystem storage base for POHMELFS).

/devel/captcha :: Link / Comments (6)


Wed, 09 Jul 2008

Captcha transformation algorithms.

Couple of first ideas. Pretty trivial.

Average sliding and normalization algorithms

Next step is to squize images, so that all bold lines moved to single-pixel ones. In theory it should not be very complex (I have an algorithm in mind), but in practice it will - starting to recall why in the hell I learnt LISP.
Basic idea is to transform above BW pictures into simple binary format, which will be read by LISP application, since I do not know and do not want to devote much time to learn how to parse/process various image formats, instead it is done by GTK application written in C. I belive LISP was called the best language for artificial intellengence development for reason, so will try to find why.

Slacking - rox :)

/devel/captcha :: Link / Comments (6)


Tue, 08 Jul 2008

Anecdots and allegories.

I'm not a major kernel contributor, but I was invited 3 times last 3 years to kernel summit.
And I will try to move to this year one in Portland, Oregon, at least I started some preparation process and contacted needed people. I hope I will also participate in Plumber's conference.
As before I will bring bottle of vodka (number of people who wanted to talk suddenly dropped to ground) and greatly appreciate your contact and discussion topics :)
That's of course if stars will stay in a straight line, but I will push them a bit.

/devel/other :: Link / Comments (0)


Mon, 07 Jul 2008

New POHMELFS release.

Irish 'Clontarf' and Scotch 'Grant's' helped to rule this release out.

This POHMELFS release features include:

  • Strong cryptography support. One can encrypt whole data channel (except headers) and/or hash/digest it. System will try to autoconfigure itself and if server does not support requested algorithms, mount will either fail (if special mount option is specified) or disable appropriate algorithm usage.
  • Bug fixes.
Cryptography support is essential addition to the POHMELFS core. It was implemented with performance in mind, so that processing speeds would not drop noticeble even in case of very CPU-hungry operations (one can check performance graphs).
POHMELFS utilizes pool of crypto threads (its number can be specified via mount option), which perform data crypto processing and submit it either to network or VFS layer.

Now I will concentrate mostly on userspace server features, mainly its distributed facilities, current ability to write data to multiple servers and balance reading among them is not enough for POHMELFS, but it will be an essential building block of the fully distributed fault-tolerant paralllel filesystem.

If this development will require some changes in kernel side (namely network protocol extension), it will be don in the upcoming releases with possible found bug fixes.

As usual, you can grab sources from archive or via GIT tree.
You can also check POHMELFS homepage to get more details on its design and supported features.

P.S. I think I will have some rest out of this project for several days, which will allow me to concentrate on main POHMELFS features and work out rough edges. I will switch to DST and netchannels (main to make a new releases) and then will devote some time to captcha cracking algorithms.

/devel/fs :: Link / Comments (4)


POHMELFS crypto processing performance.

If you expected a miracle, it did not happen, so I just present a picture, where I compared plain async in-kernel NFS server (no encryption, no checksumming) versus POHMELFS, which performed SHA1 hashing and AES-128-CBC encryption of the whole data channel.
Block size used in iozone test is 8KB, filesize - 8GB, 1GB of RAM.

Encrypted + hashed POHMELFS vs plain NFS

/devel/fs :: Link / Comments (4)


Sun, 06 Jul 2008

Vodka drinks.

Vodka itself is very interesting drink, but depending on situation it can be either the cheapest way to become very drunk, or possibility to have long and fun time in a good company.
Frequently (and likely most of the time) vodka is used for the first case only, which is sad of course.

I do not know, when and how vodka became popular in Russia, but I think it is always associated with my country now. Actually every nation has some kind of vodka in its own history of drinks, and likely still has it. For example UK/Ireland has whiskey, which is effectively vodka, but drawn in an oak barrels. This brings very interesting taste, which allows to use it as a kind of long drink (especially with ice). After having a whiskey shot one can start breathing air in (especially via nose), which brings aftertaste directly into the brain to the every piece of the body. I do not know any coctails based on whiskey.
In my opinion, Irish whiskey is much more tasty and interesting than (probalby originals of) Scotch, although the former has much more labels.
USA also used to drink whiskey, but most of the time it is its own labels, which I did not try yet. USA does not have own popular drink though, or at least I do not know it.

Europe also has lots and lots of different vodka kinds.
Frech drinks cogniak. I do not like it, and belive that it is only coloured non-tasty vodka, even likely the best labels like Remi Martin and Hennesy (although the latter is originated by irelands :), but it is only matter of taste of course. Cogniak creation process is a bit more complex than vodka, and it also has very different taste, which (for me) is very similar to clean vodka. Cogniak is one of the most popular strong drinks. Culture of its drinking is forgotten, but nevertheless it is very interesting. Cogniak should be drunken only with special temperature (16 degress Centigrade) in glass of specail form, which concentrate its airtaste. Cogniak is not swallowed immediately, but 'stored' in a mouth for a while to get all taste.
Frenchmen also created absinthe. This is very strong drink (upto 90 degrees), but its main feature is thujone. History tells us that thujone was the main reason, why absinthe was forbidden in Europe, and it was quite strong hallucinogen. History also tells us that its concentration never exceeded 10%, so it is unlikely that it had some kind of strong effect. Vincent Van Gogh liked it very much, there is even a theory that it cut his ear during absinthe intoxication, but likely it was some special absinthe, since 10% less-to-equal thujone concentration does not have any significant effect. Right now absinthe is allows in most of the countires, where it was forbidden 200 years ago.
Eastern Europe used to drink various kinds of vodka, which are called in local manner.
For example so called Cha-Cha, which is quite strong (upto 80 degrees) drink, but usually very clear, so it can be drunken without dilution.

The New World (most of it is from Mexico) brings us very interesting vodka-like drink called tequila. It is frequently called mexican vodka, although US also produces own labels. There are also types, which are made using french cogniak barrels.
Usually it is drunken with salt, lime (sometimes lemon) and mulatto female. Process is very interesting: you lick mulatto's hip, cover it with salt, lick it, get tequila shot and eat a lime portion. Even without mulatto it is still very tasty drink. Tequila is made out of special agave sorts, the more it has, the higher is quality.

One of the very known vodka-like drinks from Carribean is rum. It is also quite strong drink, but because of its oil-like elements, it is more sweet and very tasty. Rum is likely one of the most widely used strong drinks for coctails.

I know that Koreans also very like own kind of vodka, which has smaller spirit concentration, namely 20 degrees. It is made out of rice. It is very popular drink to be mixed with beer. Drives you roof away just after couple of shots.

Ukrainians have very interesting drink called 'Gorilka', which is effectively vodka with pepper. It is very tasty, but never eat Gorilka pepper, or you are risking to get a peptic poisoning.

There is several vodka mixes.
First and likely the most known, is 'Screwdriver', whcih is vodka mixed with juise. It is not very tasty imho. One of the most strong roof-driving-out drink is so called 'ruff' or mixture of vodka and beer. Do not try it if you do not know what it is.
I also know one vodka long drink: vodka with Martini mixed one to one. Although it looks quite strong, it is very tasty drink with excellent sweet and a bit dry taste.
Using my small cellar I created (at least tried first time) another long drink, which consists of vodka mixed with 'Malibu' rum. It is also possible to add there juice or cold tea.

Weekend...

/other :: Link / Comments (9)


Multithreaded POHMELFS crypto processing.

Meanwhile having a rest from various celebrations, I managed to complete receiving multhreaded crypto processing in POHMELFS.
So far it was only tested in debug environment (i.e. zillions of logs and overall miserable performance), but it shows, that different threads pick up the work, both on sending and receiving directions.
There is a limitation though: the same crypto threads are used both for receiving and transmit pathes, so it is possible to saturate them all for example for receiving, so sending will stall. If there are unsufficient crypto threads, waiting for RX crypto processing can take too long, so watchdog transmit scanner will fire up and complete transactions with errors. One can work this around by specifying big enough number of crypto threads or long enough transaction scanning timeout, both are provided via mount option.

I would like to test it in more production-like environment and perform various stresses on it, but I'm far from my working place, so can not do it right now. Which means release will be postponed for tomorrow (if testing will not show regressions or bugs).

This will not be last feature release though: for example POHMELFS does not support extended attributes and ACLs, there is no header checksum (although there is a reserved 32-but field) there may be some features in different areas too, but I do not hurry to implement them, since I need something to put into future POHMELFS changelogs. I think sending the same kernel patch with different words about userspace server changes is not the way to go, so there should be some kernel changes too :)

I will draw up some design notes on how I plan to implement POHMELFS server, and namely how distributed facilities will be done, so far I have quite clear picture in mind, but it needs to be worked out 'on paper' to find rough corners.

Stay tuned!

/devel/fs :: Link / Comments (0)


Sat, 05 Jul 2008

Midnight creatiff. Casted by LHC start.

- Shit! There are no more M8 screw-nuts.
- What? Use M12, bozon should pass through.
- We all will be fucked this Monday!

Building LHC

Good night. Actually as a former physicist I can say, that at least two out of four killing theories are really stupid, but nevertheless its interesting!

/other :: Link / Comments (2)


Fri, 04 Jul 2008

In case we will die this Monday...

I've started a countdown...

Countdown has been started

Large Hadron Collider will be started in 3 days...

/other :: Link / Comments (0)


Thu, 03 Jul 2008

POHMELFS crypto support has been completed.

kernel$ git commit -a
Created commit b07e3ed: Added crypto support.
 9 files changed, 1534 insertions(+), 221 deletions(-)
 create mode 100644 fs/pohmelfs/crypto.c

fserver$ git commit -a -m "Aded crypto support."
Created commit f916b2f: Aded crypto support.
 3 files changed, 788 insertions(+), 94 deletions(-)
I implemented pool of crypto processing threads (number of them is mount option parameter), each of which has pool of pages to encrypt data into, so crypto thread is not released until server returns acknowledge that data was successfully written, so one should tune number of threads and page pool (number of pages in each thread is maximum number of pages per transaction, this limit has own mount option too) according to desired behaviour.

Testing shows that writing performance was reduced with this approach noticebly: with 4 encryption threads and 4 receiving thread in server perfromance dropped by around 30% from 65+ MB/s down to 46+ MB/s, but I think it can be improved with larger number of encryption threads. During iozone write/rewrite test each of 4 crypto threads ate about 20-30% of CPU, while server ate about 130% (4 threads totally). In all previous iozone tests the larger number of userspace was used, the worse results were (this is somewhat expected, since iozone is singlethreaded benchmark, so larger number of threads lead only to performance degradation), so I will test different setups (namely larger number of crypto threads and smaller number of server threads).

But this behaviour is not a problem, and I expect it to be tuned, real problem is reading performance. Right now there is only single thread, which reads from one socket: it was done intentionally, since reading data from socket is longer operation than searching page in radix tree or any other operation performed by that thread, so there is no way to saturate its capabilities. Until we start encryption, which is slow, so any subsequent data reading from the socket can not be done in parallel with crypto processing, and overall reading performance drops to ground.

This problem has to be fixed, so I plan to use the same crypto processing threads to decrypt and/or perform hash check for received data and push it up to the VFS stack.

/devel/fs :: Link / Comments (0)


Wed, 02 Jul 2008

POHMELFS crypto: feel incredibly stupid.

First, POHMELFS does need to have encryption. Because I plan to use distributed hash table approach in server (well, consider POHMELFS kernel client as a kind of bittorrent filesystem client), and as in any non-centralized system, content transferred via uncontrolled data channels has to be encrypted.

But... I'm incredibly stupid: I implemented encryption and decryption in place, i.e. VFS page is being encrypted prior to be written to the servers, so subsequent reading leads to... Yes, it reads encrypted content.
To fix this issue I plan to encrypt data into different pages and send them, leaving VFS ones as is. There are two approaches I consider:

  • allocate and send pages at writeback time - we want to send 5 pages, so allocate 5 pages, encrypt data into them and broadcast them to all needed servers.
  • allocate (potentially large) pool of pages at mount time per crypto thread and encrypt data into them. This will have about zero run-time overhead for VFS, except slightly delayed because of encryption write completion.

/devel/fs :: Link / Comments (7)


Louis Maggio trumpet school: never smile.

/life :: Link / Comments (0)


Holy shit: kernel summit.

We would like to invite you to the 2008 Kernel summit, and we hope that you will be able to join us...
I'm trying to recall previous kernel summit:



That was fun, but no one wanted to play football instead of talking about whatever we talked about.

For that year I only committed a HIFN driver into the tree, and there was no kevent :)

This time in US, thinking...

/devel/other :: Link / Comments (5)


Tue, 01 Jul 2008

Why is blocking sending considered harmful?

I frequently hear that whatever server you implement, it has to be non-blocking, since in case of parallel sending it allows to send multiple requests to fast servers, while not-sending data to slow server, since non-blocking socket will return EAGAIN.

This is only half-right solution: when we have to put given data to all servers, and can not free it until all servers replied with acknowledge, non-blocking mode can bring more damage than gain.

Mainly because it allows to eat all the memory for requests, which are still in the queue to be sent to slow server, and which was already sent to fast ones. In this case higher-level application (consider simple application which generates some data and writes it into the file in distributed filesystem, which writes file to several servers) will never block since transfer to fast servers completes quickly, and will provide more and more data, which will consume all RAM.

It is possible to deadlock system in this case, since to send some data to remote server we always have to allocate at least some data to put network headers into. With non-blocking solution we will consume all memory and kick itself into the coma.

/devel/networking :: Link / Comments (2)


Passive OS fingerprinting.

I've updated OSF modules to xtables, so you have to enable its support in kernel config and get recent iptables (I tested with 1.4.1.1, which is the latest release to date).

OSF allows you to match incoming packets by different sets of SYN-packet and determine, which remote system is on the remote end, so you can make decisions based on OS type and even version at some degreee.

Installation instruction, example and source code can be found on homepage.

I've also sent it to netfilter-devel@ and netdev@ maillists, since my previous mails never appeared there likely because of spam filters.

/devel/networking :: Link / Comments (0)