Recommended way to organize MooseFS cluster could be next:
MFSMaster -- > two metaloggers -- > few chunk servers -- > client machines. MooseFS developers suggest use HA solution based on ucarp. From myself I could add that ucarp is not the best solution for production, but anyway in most cases it works.
For aceleration data servers (chunk servers) use RAID 0 massives. RAID 1/5 massives are redundant due to MooseFS fault-tolerance. With disks amount under chunk servers increase IO operations speed.
MooseFS consists of four components:
mfsmount is based on the FUSE mechanism (Filesystem in USErspace), so MooseFS is available on every Operating System with a working FUSE implementation (Linux, FreeBSD, MacOS X, etc.)
Metadata is stored in the memory of the managing server and simultaneously saved to disk (as a periodically updated binary file and immediately updated incremental logs). The main binary file as well as the logs are synchronized to the metaloggers (if present).
File data is divided into fragments (chunks) with a maximum of 64MiB each. Each chunk is itself a file on selected disks on data servers (chunkservers).
High reliability is achieved by configuring as many different data servers as appropriate to realize the "goal" value (number of copies to keep) set for the given file.
All file operations on a client computer that has mounted MooseFS are exactly the same as they would be with other file systems. The operating system kernel transfers all file operations to the FUSE module, which communicates with the mfsmount process. The mfsmount process communicates through the network subsequently with the managing server and data servers (chunk servers). This entire process is fully transparent to the user.
mfsmount communicates with the managing server every time an operation on file metadata is required:
mfsmount uses a direct connection to the data server (chunk server) that stores the relevant chunk of a file. When writing a file, after finishing the write process the managing server receives information frommfsmount to update a file's length and the last modification time.
Furthermore, data servers (chunk servers) communicate with each other to replicate data in order to achieve the appropriate number of copies of a file on different machines.