|
|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Tue, 23 Oct 2007
DST as shared disk storage.
Yes, it is possible. It is a transport layer
for high-performance parallel filesystem, everything
is already completed. Consider the case, when multiple filesystem nodes,
i.e. nodes which does not contain data, but only metadata, connects
to the same storage nodes (which contains storage itself), it is possible
to connect several remote nodes to single local export nodes and perform
concurrent read/write access. Similar storage is a base for likely all
computing cluster filesystems (for example GPFS,
PVFS, GFS).
This requires of course a higher layer filessytem to manage locks
and concurrency to preserve filesystem state, but storage layer itself is fully
implemented in DST already.
/devel/dst :: Link / Comments (0)
Read balancing is not supported in Linux RAID implementation.
I can not believe that, but it looks like this is true.
Even distributed storage
supports that.
It looks like after my announcements this was added into
kernelnewbies projects :)
/devel/other :: Link / Comments (0)
Studying existing distributed filesystems.
I already wrote short notes about
googlefs,
hadoop sf (hdfs) and
DragonflyBSD hammer
as a part of preparation for the new filesystem development.
Now, let's move a bit into different area: IBM's
GPFS
(originally Tiger Shark FS) and
PVFS (second version of course).
Here are my short notes about PVFS2, which I got from its design notes:
- virtual filesystem, as it works not as real filesystem, but a userspace wrapper on top
of usual filesystem - just like googlefs.
- non-posix compliant - what I do not get, that its interface, which is heavily MPI
oriented. It is possible to use usual POSIX syscalls with special kernel module,
but it does not have file's sematic - there are no files, there are only some references,
which can be deleted without thinking about others, who already opened it.
- no redundancy - this problem is handled either by having shared storage,
or using so called lazy redundancy, which basically means a new helper for user's applications,
which allows to force redundancy writes for given file.
- lockless metadata updates - sounds like a really good idea, which is based on strong
state machine of the update process, but in practice it is possible to have complex
races and fallbacks, which can be complex enough and does not worth locklesses.
- userspace IO daemons, PVFS2 uses traditional UNIX filesystem to store data and Berkeley DB to
store metadata.
-
- really bad at serving several types of loads like executing off the file system,
shared mmapping of files, storing mail in mbox format. PVFS2 was designed for different loads.
This filesystem was designed for the only purpose of working with heavy dataflows, created
by huge scientific MPI applications. Most of it works in userspace.
But what I really like is how it was written - with bits of fun, self-irony and excellent
description of what it is for - no empty advertisement words and other pathos crap.
/devel/fs :: Link / Comments (11)
|