Clustering


On the opposite end from virtualization is clustering, the art of bringing multiple servers together for a single task. While this challenge is primarily solved on Linux outside of the kernel (using projects such as the Apache Foundation's Hadoop), Linux 3.0 includes significantly improved support for clustering storage including two new cluster filesystems.


A cluster filesystem is one of the two key building blocks of cluster storage. In short, this type of filesystem is designed to be run against a block device which is shared between multiple hosts. This could be done through iSCSI or another network block device protocol. In this way, a cluster filesystem has some of the semantics of a network filesystem in that multiple hosts can access the same data in a safe way, but with other benefits such as redundant failover which differ depending on the filesystem involved. Linux 3.0 includes two major new cluster filesystems. The first of these, the Oracle Cluster Filesystem (OCFS2) was, as its name implies, developed by Oracle and was initially optimized for clustering Oracle databases but now works well with other types of workloads. The second of the new filesystems is the Global File System (GFS2), which has been developed and supported by RedHat as part of their Enterprise Linux product..



The other key building block of a cluster filesystem is the ability to share a block device, a disk, between multiple hosts. Linux already supports several methods to share this data (including NBD, the Network Block Device introduced in Linux 2.6), but Linux 3.0 adds one more: the Distributed Replicated Block Device (DRBD). DRBD can be used either to replicate a physical disk to a second host on the network, as a backup, or as a sharing layer for cluster filesystems like the ones described above.



Although this document only covers the aspects of the Linux kernel which are not listed as “beta” or “experimental” in Linux 3.0, one key technology to watch out for in upcoming versions is Ceph. This cluster filesystem is a rising star for enterprise applications. Unlike the simple disk-sharing filesystems described above, it can be scaled across many systems and contains intelligence to ensure that a single file will always be replicated multiple times within the storage pool. You can easily scale a Ceph cluster just by adding a new host, and the software dynamically redistributes the load and the data to the new node. This prevents loss of data if a particular node is lost. Ceph has been designed from the ground up for scalability and the early reviews look pretty good, but it is still under development and not yet recommended for production use.


Continue on to Performance and Scalability...