paper, section 3.

3. System facilities

This section discusses the system facilities that are not considered part of the kernel.

The system abstractions described are:

Directory contexts: A directory context is a position in the UNIX file system name space. Operations on files and other named objects in a file system are always specified relative to such a context.
Files: Files are used to store uninterpreted sequence of bytes on which random access reads and writes may occur. Pages from files may also be mapped into process address space.** A directory may be read as a file.
Communications domains: A communications domain represents an interprocess communications environment, such as the communications facilities of the UNIX system, communications in the INTERNET, or the resource sharing protocols and access rights of a resource sharing system on a local network.
Sockets: A socket is an endpoint of communication and the focal point for IPC in a communications domain. Sockets may be created in pairs, or given names and used to rendezvous with other sockets in a communications domain, accepting connections from these sockets or exchanging messages with them. These operations model a labeled or unlabeled communications graph, and can be used in a wide variety of communications domains. Sockets can have different types to provide different semantics of communication, increasing the flexibility of the model.
Terminals and other devices: Devices include terminals, providing input editing and interrupt generation and output flow control and editing, magnetic tapes, disks and other peripherals. They often support the generic read and write operations as well as a number of ioctls.
Processes: Process descriptors provide facilities for control and debugging of other processes.

3.1. Generic operations

Many system abstractions support the operations read, write and ioctl. We describe the basics of these common primitives here. Similarly, the mechanisms whereby normally synchronous operations may occur in a non-blocking or asynchronous fashion are common to all system-defined abstractions and are described here.

3.1.1. Read and write

The read and write system calls can be applied to communications channels, files, terminals and devices. They have the form:

cc = read(fd, buf, nbytes);
result int cc; int fd; result caddr_t buf; int nbytes;

cc = write(fd, buf, nbytes);
result int cc; int fd; caddr_t buf; int nbytes;

The read call transfers as much data as possible from the object defined by fd to the buffer at address buf of size nbytes. The number of bytes transferred is returned in cc, which is -1 if a return occurred before any data was transferred because of an error or use of non-blocking operations.

The write call transfers data from the buffer to the object defined by fd. Depending on the type of fd, it is possible that the write call will accept some portion of the provided bytes; the user should resubmit the other bytes in a later request in this case. Error returns because of interrupted or otherwise incomplete operations are possible.

Scattering of data on input or gathering of data for output is also possible using an array of input/output vector descriptors. The type for the descriptors is defined in <sys/uio.h> as:

struct iovec {
	caddr_t	iov_msg;	/* base of a component */
	int	iov_len;	/* length of a component */
};

The calls using an array of descriptors are:

cc = readv(fd, iov, iovlen);
result int cc; int fd; struct iovec *iov; int iovlen;

cc = writev(fd, iov, iovlen);
result int cc; int fd; struct iovec *iov; int iovlen;

Here iovlen is the count of elements in the iov array.

3.1.2. Input/output control

Control operations on an object are performed by the ioctl operation:

ioctl(fd, request, buffer);
int fd, request; caddr_t buffer;

This operation causes the specified request to be performed on the object fd. The request parameter specifies whether the argument buffer is to be read, written, read and written, or is not needed, and also the size of the buffer, as well as the request. Different descriptor types and subtypes within descriptor types may use distinct ioctl requests. For example, operations on terminals control flushing of input and output queues and setting of terminal parameters; operations on disks cause formatting operations to occur; operations on tapes control tape positioning.

The names for basic control operations are defined in <sys/ioctl.h>.

3.1.3. Non-blocking and asynchronous operations

A process that wishes to do non-blocking operations on one of its descriptors sets the descriptor in non-blocking mode as described in section 1.5.4. Thereafter the read call will return a specific EWOULDBLOCK error indication if there is no data to be read. The process may select the associated descriptor to determine when a read is possible.

Output attempted when a descriptor can accept less than is requested will either accept some of the provided data, returning a shorter than normal length, or return an error indicating that the operation would block. More output can be performed as soon as a select call indicates the object is writeable.

Operations other than data input or output may be performed on a descriptor in a non-blocking fashion. These operations will return with a characteristic error indicating that they are in progress if they cannot complete immediately. The descriptor may then be selected for write to find out when the operation has been completed. When select indicates the descriptor is writeable, the operation has completed. Depending on the nature of the descriptor and the operation, additional activity may be started or the new state may be tested.

3.2. File system

3.2.1. Overview

The file system abstraction provides access to a hierarchical file system structure. The file system contains directories (each of which may contain other sub-directories) as well as files and references to other objects such as devices and inter-process communications sockets.

Each file is organized as a linear array of bytes. No record boundaries or system related information is present in a file. Files may be read and written in a random-access fashion. The user may read the data in a directory as though it were an ordinary file to determine the names of the contained files, but only the system may write into the directories. The file system stores only a small amount of ownership, protection and usage information with a file.

3.2.2. Naming

The file system calls take path name arguments. These consist of a zero or more component file names separated by ``/'' characters, where each file name is up to 255 ASCII characters excluding null and ``/''.

Each process always has two naming contexts: one for the root directory of the file system and one for the current working directory. These are used by the system in the filename translation process. If a path name begins with a ``/'', it is called a full path name and interpreted relative to the root directory context. If the path name does not begin with a ``/'' it is called a relative path name and interpreted relative to the current directory context.

The system limits the total length of a path name to 1024 characters.

The file name ``..'' in each directory refers to the parent directory of that directory. The parent directory of the root of the file system is always that directory.

The calls

chdir(path);
char *path;

chroot(path)
char *path;

change the current working directory and root directory context of a process. Only the super-user can change the root directory context of a process.

3.2.3. Creation and removal

The file system allows directories, files, special devices, and ``portals'' to be created and removed from the file system.

3.2.3.1. Directory creation and removal

A directory is created with the mkdir system call:

mkdir(path, mode);
char *path; int mode;

where the mode is defined as for files (see below). Directories are removed with the rmdir system call:

rmdir(path);
char *path;

A directory must be empty if it is to be deleted.

3.2.3.2. File creation

Files are created with the open system call,

fd = open(path, oflag, mode);
result int fd; char *path; int oflag, mode;

The path parameter specifies the name of the file to be created. The oflag parameter must include O_CREAT from below to cause the file to be created. Bits for oflag are defined in <sys/file.h>:

#define	O_RDONLY	000	/* open for reading */
#define	O_WRONLY	001	/* open for writing */
#define	O_RDWR	002	/* open for read & write */
#define	O_NDELAY	004 	/* non-blocking open */
#define	O_APPEND	010	/* append on each write */
#define	O_CREAT	01000	/* open with file create */
#define	O_TRUNC	02000	/* open with truncation */
#define	O_EXCL	04000	/* error on create if file exists */

One of O_RDONLY, O_WRONLY and O_RDWR should be specified, indicating what types of operations are desired to be performed on the open file. The operations will be checked against the user's access rights to the file before allowing the open to succeed. Specifying O_APPEND causes writes to automatically append to the file. The flag O_CREAT causes the file to be created if it does not exist, owned by the current user and the group of the containing directory. The protection for the new file is specified in mode. The file mode is used as a three digit octal number. Each digit encodes read access as 4, write access as 2 and execute access as 1, or'ed together. The 0700 bits describe owner access, the 070 bits describe the access rights for processes in the same group as the file, and the 07 bits describe the access rights for other processes.

If the open specifies to create the file with O_EXCL and the file already exists, then the open will fail without affecting the file in any way. This provides a simple exclusive access facility. If the file exists but is a symbolic link, the open will fail regardless of the existence of the file specified by the link.

3.2.3.3. Creating references to devices

The file system allows entries which reference peripheral devices. Peripherals are distinguished as block or character devices according by their ability to support block-oriented operations. Devices are identified by their ``major'' and ``minor'' device numbers. The major device number determines the kind of peripheral it is, while the minor device number indicates one of possibly many peripherals of that kind. Structured devices have all operations performed internally in ``block'' quantities while unstructured devices often have a number of special ioctl operations, and may have input and output performed in varying units. The mknod call creates special entries:

mknod(path, mode, dev);
char *path; int mode, dev;

where mode is formed from the object type and access permissions. The parameter dev is a configuration dependent parameter used to identify specific character or block I/O devices.

3.2.3.4. Portal creation**

The call

fd = portal(name, server, param, dtype, protocol, domain, socktype)
result int fd; char *name, *server, *param; int dtype, protocol;
int domain, socktype;

places a name in the file system name space that causes connection to a server process when the name is used. The portal call returns an active portal in fd as though an access had occurred to activate an inactive portal, as now described.

When an inactive portal is accessed, the system sets up a socket of the specified socktype in the specified communications domain (see section 2.3), and creates the server process, giving it the specified param as argument to help it identify the portal, and also giving it the newly created socket as descriptor number 0. The accessor of the portal will create a socket in the same domain and connect to the server. The user will then wrap the socket in the specified protocol to create an object of the required descriptor type dtype and proceed with the operation which was in progress before the portal was encountered.

While the server process holds the socket (which it received as fd from the portal call on descriptor 0 at activation) further references will result in connections being made to the same socket.

3.2.3.5. File, device, and portal removal

A reference to a file, special device or portal may be removed with the unlink call,

unlink(path);
char *path;

The caller must have write access to the directory in which the file is located for this call to be successful.

3.2.4. Reading and modifying file attributes

Detailed information about the attributes of a file may be obtained with the calls:

#include <sys/stat.h>

stat(path, stb);
char *path; result struct stat *stb;

fstat(fd, stb);
int fd; result struct stat *stb;

The stat structure includes the file type, protection, ownership, access times, size, and a count of hard links. If the file is a symbolic link, then the status of the link itself (rather than the file the link references) may be found using the lstat call:

lstat(path, stb);
char *path; result struct stat *stb;

Newly created files are assigned the user id of the process that created it and the group id of the directory in which it was created. The ownership of a file may be changed by either of the calls

chown(path, owner, group);
char *path; int owner, group;

fchown(fd, owner, group);
int fd, owner, group;

In addition to ownership, each file has three levels of access protection associated with it. These levels are owner relative, group relative, and global (all users and groups). Each level of access has separate indicators for read permission, write permission, and execute permission. The protection bits associated with a file may be set by either of the calls:

chmod(path, mode);
char *path; int mode;

fchmod(fd, mode);
int fd, mode;

where mode is a value indicating the new protection of the file, as listed in section 2.2.3.2.

Finally, the access and modify times on a file may be set by the call:

utimes(path, tvp)
char *path; struct timeval *tvp[2];

This is particularly useful when moving files between media, to preserve relationships between the times the file was modified.

3.2.5. Links and renaming

Links allow multiple names for a file to exist. Links exist independently of the file linked to.

Two types of links exist, hard links and symbolic links. A hard link is a reference counting mechanism that allows a file to have multiple names within the same file system. Symbolic links cause string substitution during the pathname interpretation process.

Hard links and symbolic links have different properties. A hard link insures the target file will always be accessible, even after its original directory entry is removed; no such guarantee exists for a symbolic link. Symbolic links can span file systems boundaries.

The following calls create a new link, named path2, to path1:

link(path1, path2);
char *path1, *path2;

symlink(path1, path2);
char *path1, *path2;

The unlink primitive may be used to remove either type of link.

If a file is a symbolic link, the ``value'' of the link may be read with the readlink call,

len = readlink(path, buf, bufsize);
result int len; result char *path, *buf; int bufsize;

This call returns, in buf, the null-terminated string substituted into pathnames passing through path.

Atomic renaming of file system resident objects is possible with the rename call:

rename(oldname, newname);
char *oldname, *newname;

where both oldname and newname must be in the same file system. If newname exists and is a directory, then it must be empty.

3.2.6. Extension and truncation

Files are created with zero length and may be extended simply by writing or appending to them. While a file is open the system maintains a pointer into the file indicating the current location in the file associated with the descriptor. This pointer may be moved about in the file in a random access fashion. To set the current offset into a file, the lseek call may be used,

oldoffset = lseek(fd, offset, type);
result off_t oldoffset; int fd; off_t offset; int type;

where type is given in <sys/file.h> as one of:

#define	L_SET	0	/* set absolute file offset */
#define	L_INCR	1	/* set file offset relative to current position */
#define	L_XTND	2	/* set offset relative to end-of-file */

The call ``lseek(fd, 0, L_INCR)'' returns the current offset into the file.

Files may have ``holes'' in them. Holes are void areas in the linear extent of the file where data has never been written. These may be created by seeking to a location in a file past the current end-of-file and writing. Holes are treated by the system as zero valued bytes.

A file may be truncated with either of the calls:

truncate(path, length);
char *path; int length;

ftruncate(fd, length);
int fd, length;

reducing the size of the specified file to length bytes.

3.2.7. Checking accessibility

A process running with different real and effective user ids may interrogate the accessibility of a file to the real user by using the access call:

accessible = access(path, how);
result int accessible; char *path; int how;

Here how is constructed by or'ing the following bits, defined in <sys/file.h>:

#define	F_OK	0	/* file exists */
#define	X_OK	1	/* file is executable */
#define	W_OK	2	/* file is writable */
#define	R_OK	4	/* file is readable */

The presence or absence of advisory locks does not affect the result of access.

3.2.8. Locking

The file system provides basic facilities that allow cooperating processes to synchronize their access to shared files. A process may place an advisory read or write lock on a file, so that other cooperating processes may avoid interfering with the process' access. This simple mechanism provides locking with file granularity. More granular locking can be built using the IPC facilities to provide a lock manager. The system does not force processes to obey the locks; they are of an advisory nature only.

Locking is performed after an open call by applying the flock primitive,

flock(fd, how);
int fd, how;

where the how parameter is formed from bits defined in <sys/file.h>:

#define	LOCK_SH	1	/* shared lock */
#define	LOCK_EX	2	/* exclusive lock */
#define	LOCK_NB	4	/* don't block when locking */
#define	LOCK_UN	8	/* unlock */

Successive lock calls may be used to increase or decrease the level of locking. If an object is currently locked by another process when a flock call is made, the caller will be blocked until the current lock owner releases the lock; this may be avoided by including LOCK_NB in the how parameter. Specifying LOCK_UN removes all locks associated with the descriptor. Advisory locks held by a process are automatically deleted when the process terminates.

3.2.9. Disk quotas

As an optional facility, each file system may be requested to impose limits on a user's disk usage. Two quantities are limited: the total amount of disk space which a user may allocate in a file system and the total number of files a user may create in a file system. Quotas are expressed as hard limits and soft limits. A hard limit is always imposed; if a user would exceed a hard limit, the operation which caused the resource request will fail. A soft limit results in the user receiving a warning message, but with allocation succeeding. Facilities are provided to turn soft limits into hard limits if a user has exceeded a soft limit for an unreasonable period of time.

To enable disk quotas on a file system the setquota call is used:

setquota(special, file)
char *special, *file;

where special refers to a structured device file where a mounted file system exists, and file refers to a disk quota file (residing on the file system associated with special) from which user quotas should be obtained. The format of the disk quota file is implementation dependent.

To manipulate disk quotas the quota call is provided:

#include <sys/quota.h>

quota(cmd, uid, arg, addr)
int cmd, uid, arg; caddr_t addr;

The indicated cmd is applied to the user ID uid. The parameters arg and addr are command specific. The file <sys/quota.h> contains definitions pertinent to the use of this call.

3.3. Interprocess communications

3.3.1. Interprocess communication primitives

3.3.1.1. Communication domains

The system provides access to an extensible set of communication domains. A communication domain is identified by a manifest constant defined in the file <sys/socket.h>. Important standard domains supported by the system are the ``unix'' domain, AF_UNIX, for communication within the system, the ``Internet'' domain for communication in the DARPA Internet, AF_INET, and the ``NS'' domain, AF_NS, for communication using the Xerox Network Systems protocols. Other domains can be added to the system.

3.3.1.2. Socket types and protocols

Within a domain, communication takes place between communication endpoints known as sockets. Each socket has the potential to exchange information with other sockets of an appropriate type within the domain.

Each socket has an associated abstract type, which describes the semantics of communication using that socket. Properties such as reliability, ordering, and prevention of duplication of messages are determined by the type. The basic set of socket types is defined in <sys/socket.h>:

/* Standard socket types */
#define	SOCK_DGRAM	1	/* datagram */
#define	SOCK_STREAM	2	/* virtual circuit */
#define	SOCK_RAW	3	/* raw socket */
#define	SOCK_RDM	4	/* reliably-delivered message */
#define	SOCK_SEQPACKET	5	/* sequenced packets */

The SOCK_DGRAM type models the semantics of datagrams in network communication: messages may be lost or duplicated and may arrive out-of-order. A datagram socket may send messages to and receive messages from multiple peers. The SOCK_RDM type models the semantics of reliable datagrams: messages arrive unduplicated and in-order, the sender is notified if messages are lost. The send and receive operations (described below) generate reliable/unreliable datagrams. The SOCK_STREAM type models connection-based virtual circuits: two-way byte streams with no record boundaries. Connection setup is required before data communication may begin. The SOCK_SEQPACKET type models a connection-based, full-duplex, reliable, sequenced packet exchange; the sender is notified if messages are lost, and messages are never duplicated or presented out-of-order. Users of the last two abstractions may use the facilities for out-of-band transmission to send out-of-band data.

SOCK_RAW is used for unprocessed access to internal network layers and interfaces; it has no specific semantics.

Other socket types can be defined.

Each socket may have a specific protocol associated with it. This protocol is used within the domain to provide the semantics required by the socket type. Not all socket types are supported by each domain; support depends on the existence and the implementation of a suitable protocol within the domain. For example, within the ``Internet'' domain, the SOCK_DGRAM type may be implemented by the UDP user datagram protocol, and the SOCK_STREAM type may be implemented by the TCP transmission control protocol, while no standard protocols to provide SOCK_RDM or SOCK_SEQPACKET sockets exist.

3.3.1.3. Socket creation, naming and service establishment

Sockets may be connected or unconnected. An unconnected socket descriptor is obtained by the socket call:

s = socket(domain, type, protocol);
result int s; int domain, type, protocol;

The socket domain and type are as described above, and are specified using the definitions from <sys/socket.h>. The protocol may be given as 0, meaning any suitable protocol. One of several possible protocols may be selected using identifiers obtained from a library routine, getprotobyname.

An unconnected socket descriptor of a connection-oriented type may yield a connected socket descriptor in one of two ways: either by actively connecting to another socket, or by becoming associated with a name in the communications domain and accepting a connection from another socket. Datagram sockets need not establish connections before use.

To accept connections or to receive datagrams, a socket must first have a binding to a name (or address) within the communications domain. Such a binding may be established by a bind call:

bind(s, name, namelen);
int s; struct sockaddr *name; int namelen;

Datagram sockets may have default bindings established when first sending data if not explicitly bound earlier. In either case, a socket's bound name may be retrieved with a getsockname call:

getsockname(s, name, namelen);
int s; result struct sockaddr *name; result int *namelen;

while the peer's name can be retrieved with getpeername:

getpeername(s, name, namelen);
int s; result struct sockaddr *name; result int *namelen;

Domains may support sockets with several names.

3.3.1.4. Accepting connections

Once a binding is made to a connection-oriented socket, it is possible to listen for connections:

listen(s, backlog);
int s, backlog;

The backlog specifies the maximum count of connections that can be simultaneously queued awaiting acceptance.

An accept call:

t = accept(s, name, anamelen);
result int t; int s; result struct sockaddr *name; result int *anamelen;

returns a descriptor for a new, connected, socket from the queue of pending connections on s. If no new connections are queued for acceptance, the call will wait for a connection unless non-blocking I/O has been enabled.

3.3.1.5. Making connections

An active connection to a named socket is made by the connect call:

connect(s, name, namelen);
int s; struct sockaddr *name; int namelen;

Although datagram sockets do not establish connections, the connect call may be used with such sockets to create an association with the foreign address. The address is recorded for use in future send calls, which then need not supply destination addresses. Datagrams will be received only from that peer, and asynchronous error reports may be received.

It is also possible to create connected pairs of sockets without using the domain's name space to rendezvous; this is done with the socketpair call**:

socketpair(domain, type, protocol, sv);
int domain, type, protocol; result int sv[2];

Here the returned sv descriptors correspond to those obtained with accept and connect.

The call

pipe(pv)
result int pv[2];

creates a pair of SOCK_STREAM sockets in the UNIX domain, with pv[0] only writable and pv[1] only readable.

3.3.1.6. Sending and receiving data

Messages may be sent from a socket by:

cc = sendto(s, buf, len, flags, to, tolen);
result int cc; int s; caddr_t buf; int len, flags; caddr_t to; int tolen;

if the socket is not connected or:

cc = send(s, buf, len, flags);
result int cc; int s; caddr_t buf; int len, flags;

if the socket is connected. The corresponding receive primitives are:

msglen = recvfrom(s, buf, len, flags, from, fromlenaddr);
result int msglen; int s; result caddr_t buf; int len, flags;
result caddr_t from; result int *fromlenaddr;

and

msglen = recv(s, buf, len, flags);
result int msglen; int s; result caddr_t buf; int len, flags;

In the unconnected case, the parameters to and tolen specify the destination or source of the message, while the from parameter stores the source of the message, and *fromlenaddr initially gives the size of the from buffer and is updated to reflect the true length of the from address.

All calls cause the message to be received in or sent from the message buffer of length len bytes, starting at address buf. The flags specify peeking at a message without reading it or sending or receiving high-priority out-of-band messages, as follows:

#define	MSG_PEEK	0x1	/* peek at incoming message */
#define	MSG_OOB	0x2	/* process out-of-band data */

3.3.1.7. Scatter/gather and exchanging access rights

It is possible scatter and gather data and to exchange access rights with messages. When either of these operations is involved, the number of parameters to the call becomes large. Thus the system defines a message header structure, in <sys/socket.h>, which can be used to conveniently contain the parameters to the calls:

struct msghdr {
	caddr_t	msg_name;		/* optional address */
	int	msg_namelen;	/* size of address */
	struct	iov *msg_iov;	/* scatter/gather array */
	int	msg_iovlen;		/* # elements in msg_iov */
	caddr_t	msg_accrights;	/* access rights sent/received */
	int	msg_accrightslen;	/* size of msg_accrights */
};

Here msg_name and msg_namelen specify the source or destination address if the socket is unconnected; msg_name may be given as a null pointer if no names are desired or required. The msg_iov and msg_iovlen describe the scatter/gather locations, as described in section 2.1.3. Access rights to be sent along with the message are specified in msg_accrights, which has length msg_accrightslen. In the ``unix'' domain these are an array of integer descriptors, taken from the sending process and duplicated in the receiver.

This structure is used in the operations sendmsg and recvmsg:

sendmsg(s, msg, flags);
int s; struct msghdr *msg; int flags;

msglen = recvmsg(s, msg, flags);
result int msglen; int s; result struct msghdr *msg; int flags;

3.3.1.8. Using read and write with sockets

The normal UNIX read and write calls may be applied to connected sockets and translated into send and receive calls from or to a single area of memory and discarding any rights received. A process may operate on a virtual circuit socket, a terminal or a file with blocking or non-blocking input/output operations without distinguishing the descriptor type.

3.3.1.9. Shutting down halves of full-duplex connections

A process that has a full-duplex socket such as a virtual circuit and no longer wishes to read from or write to this socket can give the call:

shutdown(s, direction);
int s, direction;

where direction is 0 to not read further, 1 to not write further, or 2 to completely shut the connection down. If the underlying protocol supports unidirectional or bidirectional shutdown, this indication will be passed to the peer. For example, a shutdown for writing might produce an end-of-file condition at the remote end.

3.3.1.10. Socket and protocol options

Sockets, and their underlying communication protocols, may support options. These options may be used to manipulate implementation- or protocol-specific facilities. The getsockopt and setsockopt calls are used to control options:

getsockopt(s, level, optname, optval, optlen)
int s, level, optname; result caddr_t optval; result int *optlen;

setsockopt(s, level, optname, optval, optlen)
int s, level, optname; caddr_t optval; int optlen;

The option optname is interpreted at the indicated protocol level for socket s. If a value is specified with optval and optlen, it is interpreted by the software operating at the specified level. The level SOL_SOCKET is reserved to indicate options maintained by the socket facilities. Other level values indicate a particular protocol which is to act on the option request; these values are normally interpreted as a ``protocol number''.

3.3.2. UNIX domain

This section describes briefly the properties of the UNIX communications domain.

3.3.2.1. Types of sockets

In the UNIX domain, the SOCK_STREAM abstraction provides pipe-like facilities, while SOCK_DGRAM provides (usually) reliable message-style communications.

3.3.2.2. Naming

Socket names are strings and may appear in the UNIX file system name space through portals**.

3.3.2.3. Access rights transmission

The ability to pass UNIX descriptors with messages in this domain allows migration of service within the system and allows user processes to be used in building system facilities.

3.3.3. INTERNET domain

This section describes briefly how the Internet domain is mapped to the model described in this section. More information will be found in the document describing the network implementation in 4.3BSD.

3.3.3.1. Socket types and protocols

SOCK_STREAM is supported by the Internet TCP protocol; SOCK_DGRAM by the UDP protocol. Each is layered atop the transport-level Internet Protocol (IP). The Internet Control Message Protocol is implemented atop/beside IP and is accessible via a raw socket. The SOCK_SEQPACKET has no direct Internet family analogue; a protocol based on one from the XEROX NS family and layered on top of IP could be implemented to fill this gap.

3.3.3.2. Socket naming

Sockets in the Internet domain have names composed of the 32 bit Internet address, and a 16 bit port number. Options may be used to provide IP source routing or security options. The 32-bit address is composed of network and host parts; the network part is variable in size and is frequency encoded. The host part may optionally be interpreted as a subnet field plus the host on subnet; this is enabled by setting a network address mask at boot time.

3.3.3.3. Access rights transmission

No access rights transmission facilities are provided in the Internet domain.

3.3.3.4. Raw access

The Internet domain allows the super-user access to the raw facilities of IP. These interfaces are modeled as SOCK_RAW sockets. Each raw socket is associated with one IP protocol number, and receives all traffic received for that protocol. This allows administrative and debugging functions to occur, and enables user-level implementations of special-purpose protocols such as inter-gateway routing protocols.

3.4. Terminals and Devices

3.4.1. Terminals

Terminals support read and write I/O operations, as well as a collection of terminal specific ioctl operations, to control input character interpretation and editing, and output format and delays.

3.4.1.1. Terminal input

Terminals are handled according to the underlying communication characteristics such as baud rate and required delays, and a set of software parameters.

3.4.1.1.1. Input modes

A terminal is in one of three possible modes: raw, cbreak, or cooked. In raw mode all input is passed through to the reading process immediately and without interpretation. In cbreak mode, the handler interprets input only by looking for characters that cause interrupts or output flow control; all other characters are made available as in raw mode. In cooked mode, input is processed to provide standard line-oriented local editing functions, and input is presented on a line-by-line basis.

3.4.1.1.2. Interrupt characters

Interrupt characters are interpreted by the terminal handler only in cbreak and cooked modes, and cause a software interrupt to be sent to all processes in the process group associated with the terminal. Interrupt characters exist to send SIGINT and SIGQUIT signals, and to stop a process group with the SIGTSTP signal either immediately, or when all input up to the stop character has been read.

3.4.1.1.3. Line editing

When the terminal is in cooked mode, editing of an input line is performed. Editing facilities allow deletion of the previous character or word, or deletion of the current input line. In addition, a special character may be used to reprint the current input line after some number of editing operations have been applied.

Certain other characters are interpreted specially when a process is in cooked mode. The end of line character determines the end of an input record. The end of file character simulates an end of file occurrence on terminal input. Flow control is provided by stop output and start output control characters. Output may be flushed with the flush output character; and a literal character may be used to force literal input of the immediately following character in the input line.

Input characters may be echoed to the terminal as they are received. Non-graphic ASCII input characters may be echoed as a two-character printable representation, ``^character.''

3.4.1.2. Terminal output

On output, the terminal handler provides some simple formatting services. These include converting the carriage return character to the two character return-linefeed sequence, inserting delays after certain standard control characters, expanding tabs, and providing translations for upper-case only terminals.

3.4.1.3. Terminal control operations

When a terminal is first opened it is initialized to a standard state and configured with a set of standard control, editing, and interrupt characters. A process may alter this configuration with certain control operations, specifying parameters in a standard structure:**

struct ttymode {
	short	tt_ispeed;	/* input speed */
	int	tt_iflags;	/* input flags */
	short	tt_ospeed;	/* output speed */
	int	tt_oflags;	/* output flags */
};

and ``special characters'' are specified with the ttychars structure,

struct ttychars {
	char	tc_erasec;	/* erase char */
	char	tc_killc;	/* erase line */
	char	tc_intrc;	/* interrupt */
	char	tc_quitc;	/* quit */
	char	tc_startc;	/* start output */
	char	tc_stopc;	/* stop output */
	char	tc_eofc;	/* end-of-file */
	char	tc_brkc;	/* input delimiter (like nl) */
	char	tc_suspc;	/* stop process signal */
	char	tc_dsuspc;	/* delayed stop process signal */
	char	tc_rprntc;	/* reprint line */
	char	tc_flushc;	/* flush output (toggles) */
	char	tc_werasc;	/* word erase */
	char	tc_lnextc;	/* literal next character */
};

3.4.1.4. Terminal hardware support

The terminal handler allows a user to access basic hardware related functions; e.g. line speed, modem control, parity, and stop bits. A special signal, SIGHUP, is automatically sent to processes in a terminal's process group when a carrier transition is detected. This is normally associated with a user hanging up on a modem controlled terminal line.

3.4.2. Structured devices

Structures devices are typified by disks and magnetic tapes, but may represent any random-access device. The system performs read-modify-write type buffering actions on block devices to allow them to be read and written in a totally random access fashion like ordinary files. File systems are normally created in block devices.

3.4.3. Unstructured devices

Unstructured devices are those devices which do not support block structure. Familiar unstructured devices are raw communications lines (with no terminal handler), raster plotters, magnetic tape and disks unfettered by buffering and permitting large block input/output and positioning and formatting commands.

3.5. Process and kernel descriptors

The status of the facilities in this section is still under discussion. The ptrace facility of earlier UNIX systems remains in 4.3BSD. Planned enhancements would allow a descriptor-based process control facility.