Servers have been known to change over time, and so can the protocol that they use. So RPC provides a version number with each RPC request. This RFC describes version two of the NFS protocol. Even in the second version, there are various obsolete procedures and parameters, which will be removed in later versions. An RFC for version three of the NFS protocol is currently under preparation.
NFS assumes a file system that is hierarchical, with directories as all but the bottom-level files. Each entry in a directory (file, directory, device, etc.) has a string name. Different operating systems may have restrictions on the depth of the tree or the names used, as well as using different syntax to represent the "pathname", which is the concatenation of all the "components" (directory and file names) in the name. A "file system" is a tree on a single server (usually a single disk or physical partition) with a specified "root". Some operating systems provide a "mount" operation to make all file systems appear as a single tree, while others maintain a "forest" of file systems. Files are unstructured streams of uninterpreted bytes. Version 3 of NFS uses a slightly more general file system model.
NFS looks up one component of a pathname at a time. It may not be obvious why it does not just take the whole pathname, traipse down the directories, and return a file handle when it is done. There are several good reasons not to do this. First, pathnames need separators between the directory components, and different operating systems use different separators. We could define a Network Standard Pathname Representation, but then every pathname would have to be parsed and converted at each end. Other issues are discussed in NFS Implementation Issues below.
Although files and directories are similar objects in many ways, different procedures are used to read directories and files. This provides a network standard format for representing directories. The same argument as above could have been used to justify a procedure that returns only one directory entry per call. The problem is efficiency. Directories can contain many entries, and a remote call to return each would be just too slow.
These are the sizes, given in decimal bytes, of various XDR structures used in the protocol:
/* The maximum number of bytes of data in a READ or WRITE request */ const MAXDATA = 8192; /* The maximum number of bytes in a pathname argument */ const MAXPATHLEN = 1024; /* The maximum number of bytes in a file name argument */ const MAXNAMLEN = 255; /* The size in bytes of the opaque "cookie" passed by READDIR */ const COOKIESIZE = 4; /* The size in bytes of the opaque file handle */ const FHSIZE = 32;
The following XDR definitions are basic structures and types used in other structures described further on.
enum stat { NFS_OK = 0, NFSERR_PERM=1, NFSERR_NOENT=2, NFSERR_IO=5, NFSERR_NXIO=6, NFSERR_ACCES=13, NFSERR_EXIST=17, NFSERR_NODEV=19, NFSERR_NOTDIR=20, NFSERR_ISDIR=21, NFSERR_FBIG=27, NFSERR_NOSPC=28, NFSERR_ROFS=30, NFSERR_NAMETOOLONG=63, NFSERR_NOTEMPTY=66, NFSERR_DQUOT=69, NFSERR_STALE=70, NFSERR_WFLUSH=99 };
The stat type is returned with every procedure's results. A value of NFS_OK indicates that the call completed successfully and the results are valid. The other values indicate some kind of error occurred on the server side during the servicing of the procedure. The error values are derived from UNIX error numbers.
enum ftype { NFNON = 0, NFREG = 1, NFDIR = 2, NFBLK = 3, NFCHR = 4, NFLNK = 5 };
typedef opaque fhandle[FHSIZE];
struct timeval { unsigned int seconds; unsigned int useconds; };
struct fattr { ftype type; unsigned int mode; unsigned int nlink; unsigned int uid; unsigned int gid; unsigned int size; unsigned int blocksize; unsigned int rdev; unsigned int blocks; unsigned int fsid; unsigned int fileid; timeval atime; timeval mtime; timeval ctime; };
"mode" is the access mode encoded as a set of bits. Notice that the file type is specified both in the mode bits and in the file type. This is really a bug in the protocol and will be fixed in future versions. The descriptions given below specify the bit positions using octal numbers.
+---------------------------------------------------------------------------+ | Bit Description | +---------------------------------------------------------------------------+ |0040000 This is a directory; "type" field should be NFDIR. | |0020000 This is a character special file; "type" field should be NFCHR. | |0060000 This is a block special file; "type" field should be NFBLK. | |0100000 This is a regular file; "type" field should be NFREG. | |0120000 This is a symbolic link file; "type" field should be NFLNK. | |0140000 This is a named socket; "type" field should be NFNON. | |0004000 Set user id on execution. | |0002000 Set group id on execution. | |0001000 Save swapped text even after use. | |0000400 Read permission for owner. | |0000200 Write permission for owner. | |0000100 Execute and search permission for owner. | |0000040 Read permission for group. | |0000020 Write permission for group. | |0000010 Execute and search permission for group. | |0000004 Read permission for others. | |0000002 Write permission for others. | |0000001 Execute and search permission for others. | +---------------------------------------------------------------------------+Notes:
struct sattr { unsigned int mode; unsigned int uid; unsigned int gid; unsigned int size; timeval atime; timeval mtime; };
typedef string filename<MAXNAMLEN>;
typedef string path<MAXPATHLEN>;
union attrstat switch (stat status) { case NFS_OK: fattr attributes; default: void; };
struct diropargs { fhandle dir; filename name; };
union diropres switch (stat status) { case NFS_OK: struct { fhandle file; fattr attributes; } diropok; default: void; };
The protocol definition is given as a set of procedures with arguments and results defined using the RPC language. A brief description of the function of each procedure should provide enough information to allow implementation.
All of the procedures in the NFS protocol are assumed to be synchronous. When a procedure returns to the client, the client can assume that the operation has completed and any data associated with the request is now on stable storage. For example, a client WRITE request may cause the server to update data blocks, filesystem information blocks (such as indirect blocks), and file attribute information (size and modify times). When the WRITE returns to the client, it can assume that the write is safe, even in case of a server crash, and it can discard the data written. This is a very important part of the statelessness of the server. If the server waited to flush data from remote requests, the client would have to save those requests so that it could resend them in case of a server crash.
/* * Remote file service routines */ program NFS_PROGRAM { version NFS_VERSION { void NFSPROC_NULL(void) = 0; attrstat NFSPROC_GETATTR(fhandle) = 1; attrstat NFSPROC_SETATTR(sattrargs) = 2; void NFSPROC_ROOT(void) = 3; diropres NFSPROC_LOOKUP(diropargs) = 4; readlinkres NFSPROC_READLINK(fhandle) = 5; readres NFSPROC_READ(readargs) = 6; void NFSPROC_WRITECACHE(void) = 7; attrstat NFSPROC_WRITE(writeargs) = 8; diropres NFSPROC_CREATE(createargs) = 9; stat NFSPROC_REMOVE(diropargs) = 10; stat NFSPROC_RENAME(renameargs) = 11; stat NFSPROC_LINK(linkargs) = 12; stat NFSPROC_SYMLINK(symlinkargs) = 13; diropres NFSPROC_MKDIR(createargs) = 14; stat NFSPROC_RMDIR(diropargs) = 15; readdirres NFSPROC_READDIR(readdirargs) = 16; statfsres NFSPROC_STATFS(fhandle) = 17; } = 2; } = 100003;
void NFSPROC_NULL(void) = 0;
attrstat NFSPROC_GETATTR (fhandle) = 1;
struct sattrargs { fhandle file; sattr attributes; }; attrstat NFSPROC_SETATTR (sattrargs) = 2;
Note: The use of -1 to indicate an unused field in "attributes" is changed in the next version of the protocol.
void NFSPROC_ROOT(void) = 3;
diropres NFSPROC_LOOKUP(diropargs) = 4;
union readlinkres switch (stat status) { case NFS_OK: path data; default: void; }; readlinkres NFSPROC_READLINK(fhandle) = 5;
Note: since NFS always parses pathnames on the client, the pathname in a symbolic link may mean something different (or be meaningless) on a different client or on the server if a different pathname syntax is used.
struct readargs { fhandle file; unsigned offset; unsigned count; unsigned totalcount; }; union readres switch (stat status) { case NFS_OK: fattr attributes; opaque data<NFS_MAXDATA>; default: void; }; readres NFSPROC_READ(readargs) = 6;
Note: The argument "totalcount" is unused, and is removed in the next protocol revision.
void NFSPROC_WRITECACHE(void) = 7;
struct writeargs { fhandle file; unsigned beginoffset; unsigned offset; unsigned totalcount; opaque data<NFS_MAXDATA>; }; attrstat NFSPROC_WRITE(writeargs) = 8;
Note: The arguments "beginoffset" and "totalcount" are ignored and are removed in the next protocol revision.
struct createargs { diropargs where; sattr attributes; }; diropres NFSPROC_CREATE(createargs) = 9;
Note: This routine should pass an exclusive create flag, meaning "create the file only if it is not already there".
stat NFSPROC_REMOVE(diropargs) = 10;
Note: possibly non-idempotent operation.
struct renameargs { diropargs from; diropargs to; }; stat NFSPROC_RENAME(renameargs) = 11;
Note: possibly non-idempotent operation.
struct linkargs { fhandle from; diropargs to; }; stat NFSPROC_LINK(linkargs) = 12;
A hard link should have the property that changes to either of the linked files are reflected in both files. When a hard link is made to a file, the attributes for the file should have a value for "nlink" that is one greater than the value before the link.
Note: possibly non-idempotent operation.
struct symlinkargs { diropargs from; path to; sattr attributes; }; stat NFSPROC_SYMLINK(symlinkargs) = 13;
A symbolic link is a pointer to another file. The name given in "to" is not interpreted by the server, only stored in the newly created file. When the client references a file that is a symbolic link, the contents of the symbolic link are normally transparently reinterpreted as a pathname to substitute. A READLINK operation returns the data to the client for interpretation.
Note: On UNIX servers the attributes are never used, since symbolic links always have mode 0777.
diropres NFSPROC_MKDIR (createargs) = 14;
Note: possibly non-idempotent operation.
stat NFSPROC_RMDIR(diropargs) = 15;
Note: possibly non-idempotent operation.
struct readdirargs { fhandle dir; nfscookie cookie; unsigned count; }; struct entry { unsigned fileid; filename name; nfscookie cookie; entry *nextentry; }; union readdirres switch (stat status) { case NFS_OK: struct { entry *entries; bool eof; } readdirok; default: void; }; readdirres NFSPROC_READDIR (readdirargs) = 16;
union statfsres (stat status) { case NFS_OK: struct { unsigned tsize; unsigned bsize; unsigned blocks; unsigned bfree; unsigned bavail; } info; default: void; }; statfsres NFSPROC_STATFS(fhandle) = 17;
Note: This call does not work well if a filesystem has variable size blocks.