Zbr's days.
October
Sun Mon Tue Wed Thu Fri Sat
 
23
     
2007
Months
Oct

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Tue, 23 Oct 2007

DST as shared disk storage.

Yes, it is possible. It is a transport layer for high-performance parallel filesystem, everything is already completed. Consider the case, when multiple filesystem nodes, i.e. nodes which does not contain data, but only metadata, connects to the same storage nodes (which contains storage itself), it is possible to connect several remote nodes to single local export nodes and perform concurrent read/write access. Similar storage is a base for likely all computing cluster filesystems (for example GPFS, PVFS, GFS).

This requires of course a higher layer filessytem to manage locks and concurrency to preserve filesystem state, but storage layer itself is fully implemented in DST already.

/devel/dst :: Link / Comments (0)


Read balancing is not supported in Linux RAID implementation.

I can not believe that, but it looks like this is true.
Even distributed storage supports that.
It looks like after my announcements this was added into kernelnewbies projects :)

/devel/other :: Link / Comments (0)


Studying existing distributed filesystems.

I already wrote short notes about googlefs, hadoop sf (hdfs) and DragonflyBSD hammer as a part of preparation for the new filesystem development.

Now, let's move a bit into different area: IBM's GPFS (originally Tiger Shark FS) and PVFS (second version of course).

Here are my short notes about PVFS2, which I got from its design notes:

  • virtual filesystem, as it works not as real filesystem, but a userspace wrapper on top of usual filesystem - just like googlefs.
  • non-posix compliant - what I do not get, that its interface, which is heavily MPI oriented. It is possible to use usual POSIX syscalls with special kernel module, but it does not have file's sematic - there are no files, there are only some references, which can be deleted without thinking about others, who already opened it.
  • no redundancy - this problem is handled either by having shared storage, or using so called lazy redundancy, which basically means a new helper for user's applications, which allows to force redundancy writes for given file.
  • lockless metadata updates - sounds like a really good idea, which is based on strong state machine of the update process, but in practice it is possible to have complex races and fallbacks, which can be complex enough and does not worth locklesses.
  • userspace IO daemons, PVFS2 uses traditional UNIX filesystem to store data and Berkeley DB to store metadata.
  • really bad at serving several types of loads like executing off the file system, shared mmapping of files, storing mail in mbox format. PVFS2 was designed for different loads.
This filesystem was designed for the only purpose of working with heavy dataflows, created by huge scientific MPI applications. Most of it works in userspace.
But what I really like is how it was written - with bits of fun, self-irony and excellent description of what it is for - no empty advertisement words and other pathos crap.

/devel/fs :: Link / Comments (11)