Wednesday, September 3, 2008

GPFS file system

The file system

A GPFS file system is built from a collection of disks which contain the file
system data and metadata. A file system can be built from a single disk or
contain thousands of disks, storing Petabytes of data. A GPFS cluster can
contain up to 256 mounted file systems. There is no limit placed upon the
number of simultaneously opened files within a single file system. As an
example, current GPFS customers are using single file systems up to 2PB in
size and others containing tens of millions of file.

Application interfaces

Applications can access files through standard UNIX® file system
interfaces or through enhanced interfaces available for parallel programs.
Parallel and distributed applications can be scheduled on GPFS clusters to
take advantage of the shared access architecture. This makes GPFS a key
component in many grid-based solutions. Parallel applications can
concurrently read or update a common file from multiple nodes in the cluster.
GPFS maintains the coherency and consistency of the file system using a
sophisticated byte level locking, token (lock) management and logging.
In addition to standard interfaces GPFS provides a unique set of extended
interfaces which can be used to provide high performance for applications with
demanding data access patterns. These extended interfaces are more
efficient for traversing a file system, for example, and provide more features
than the standard POSIX interfaces.

Performance and scalability

GPFS provides unparalleled performance especially for larger data objects
and excellent performance for large aggregates of smaller objects. GPFS
achieves high performance I/O by:
• Striping data across multiple disks attached to multiple nodes.
• Efficient client side caching.
• Supporting a large block size, configurable by the administrator, to fit
I/O requirements.
• Utilizing advanced algorithms that improve read-ahead and writebehind
file functions.
• Using block level locking based on a very sophisticated scalable token
management system to provide data consistency while allowing
multiple application nodes concurrent access to the files.
GPFS recognizes typical access patterns like sequential, reverse
sequential and random and optimizes I/O access for these patterns.
GPFS token (lock) management coordinates access to shared disks
ensuring the consistency of file system data and metadata when different
nodes access the same file. GPFS has the ability for multiple nodes to act as
token managers for a single file system. This allows greater scalability for high
transaction workloads.
Along with distributed token management, GPFS provides scalable
metadata management by allowing all nodes of the cluster accessing the file
system to perform file metadata operations. This key and unique feature
distinguishes GPFS from other cluster file systems which typically have a
centralized metadata server handling fixed regions of the file namespace. A
centralized metadata server can often become a performance bottleneck for
metadata intensive operations and can represent a single point of failure.
GPFS solves this problem by managing metadata at the node which is using
the file or in the case of parallel access to the file, at a dynamically selected
node which is using the file.

No comments: