Table of Contents
PDP-7 Unix
Release Date: Developed from mid-1969 to the end of 1970
Released By: Never released outside of Bell Labs
Source Code: Reconstructed from original listings
Documentation: Reconstructed from original listings
Ken Thompson began the development of the system that was to become Unix, first as a file system on paper and then on a “little-used PDP-7” (Dennis Ritchie, The Evolution of the Unix Time-sharing System).
Details from "The Evolution of Unix"
Also during 1969, Thompson developed the game of `Space Travel.' First written on Multics, then transliterated into Fortran for GECOS (the operating system for the GE, later Honeywell, 635), it was nothing less than a simulation of the movement of the major bodies of the Solar System, with the player guiding a ship here and there, observing the scenery, and attempting to land on the various planets and moons. The GECOS version was unsatisfactory in two important respects: first, the display of the state of the game was jerky and hard to control because one had to type commands at it, and second, a game cost about $75 for CPU time on the big computer. It did not take long, therefore, for Thompson to find a little-used PDP-7 computer with an excellent display processor; the whole system was used as a Graphic-II terminal. He and I rewrote Space Travel to run on this machine. The undertaking was more ambitious than it might seem; because we disdained all existing software, we had to write a floating-point arithmetic package, the pointwise specification of the graphic characters for the display, and a debugging subsystem that continuously displayed the contents of typed-in locations in a corner of the screen. All this was written in assembly language for a cross-assembler that ran under GECOS and produced paper tapes to be carried to the PDP-7.
Space Travel, though it made a very attractive game, served mainly as an introduction to the clumsy technology of preparing programs for the PDP-7. Soon Thompson began implementing the paper file system (perhaps `chalk file system' would be more accurate) that had been designed earlier. A file system without a way to exercise it is a sterile proposition, so he proceeded to flesh it out with the other requirements for a working operating system, in particular the notion of processes. Then came a small set of user-level utilities: the means to copy, print, delete, and edit files, and of course a simple command interpreter (shell). Up to this time all the programs were written using GECOS and files were transferred to the PDP-7 on paper tape; but once an assembler was completed the system was able to support itself. Although it was not until well into 1970 that Brian Kernighan suggested the name `Unix,' in a somewhat treacherous pun on `Multics,' the operating system we know today was born.
The PDP-7 Unix file system (from "The Evolution of Unix")
Structurally, the file system of PDP-7 Unix was nearly identical to today's. It had
- An i-list: a linear array of i-nodes each describing a file. An i-node contained less than it does now, but the essential information was the same: the protection mode of the file, its type and size, and the list of physical blocks holding the contents.
- Directories: a special kind of file containing a sequence of names and the associated i-number.
- Special files describing devices. The device specification was not contained explicitly in the i-node, but was instead encoded in the number: specific i-numbers corresponded to specific files.
The important file system calls were also present from the start. Read, write, open, creat (sic), close: with one very important exception, discussed below, they were similar to what one finds now. A minor difference was that the unit of I/O was the word, not the byte, because the PDP-7 was a word-addressed machine. In practice this meant merely that all programs dealing with character streams ignored null characters, because null was used to pad a file to an even number of characters. Another minor, occasionally annoying difference was the lack of erase and kill processing for terminals. Terminals, in effect, were always in raw mode. Only a few programs (notably the shell and the editor) bothered to implement erase-kill processing.
In spite of its considerable similarity to the current file system, the PDP-7 file system was in one way remarkably different: there were no path names, and each file-name argument to the system was a simple name (without `/') taken relative to the current directory. Links, in the usual Unix sense, did exist. Together with an elaborate set of conventions, they were the principal means by which the lack of path names became acceptable.
The link call took the form
link(dir, file, newname)
where dir was a directory file in the current directory, file an existing entry in that directory, and newname the name of the link, which was added to the current directory. Because dir needed to be in the current directory, it is evident that today's prohibition against links to directories was not enforced; the PDP-7 Unix file system had the shape of a general directed graph.
So that every user did not need to maintain a link to all directories of interest, there existed a directory called dd that contained entries for the directory of each user. Thus, to make a link to file x in directory ken, I might do
ln dd ken ken ln ken x x rm ken
This scheme rendered subdirectories sufficiently hard to use as to make them unused in practice. Another important barrier was that there was no way to create a directory while the system was running; all were made during recreation of the file system from paper tape, so that directories were in effect a nonrenewable resource.
The dd convention made the chdir command relatively convenient. It took multiple arguments, and switched the current directory to each named directory in turn. Thus
chdir dd ken
would move to directory ken. (Incidentally, chdir was spelled ch; why this was expanded when we went to the PDP-11 I don't remember.)
The most serious inconvenience of the implementation of the file system, aside from the lack of path names, was the difficulty of changing its configuration; as mentioned, directories and special files were both made only when the disk was recreated. Installation of a new device was very painful, because the code for devices was spread widely throughout the system; for example there were several loops that visited each device in turn. Not surprisingly, there was no notion of mounting a removable disk pack, because the machine had only a single fixed-head disk.
The operating system code that implemented this file system was a drastically simplified version of the present scheme. One important simplification followed from the fact that the system was not multi-programmed; only one program was in memory at a time, and control was passed between processes only when an explicit swap took place. So, for example, there was an iget routine that made a named i-node available, but it left the i-node in a constant, static location rather than returning a pointer into a large table of active i-nodes. A precursor of the current buffering mechanism was present (with about 4 buffers) but there was essentially no overlap of disk I/O with computation. This was avoided not merely for simplicity. The disk attached to the PDP-7 was fast for its time; it transferred one 18-bit word every 2 microseconds. On the other hand, the PDP-7 itself had a memory cycle time of 1 microsecond, and most instructions took 2 cycles (one for the instruction itself, one for the operand). However, indirectly addressed instructions required 3 cycles, and indirection was quite common, because the machine had no index registers. Finally, the DMA controller was unable to access memory during an instruction. The upshot was that the disk would incur overrun errors if any indirectly-addressed instructions were executed while it was transferring. Thus control could not be returned to the user, nor in fact could general system code be executed, with the disk running. The interrupt routines for the clock and terminals, which needed to be runnable at all times, had to be coded in very strange fashion to avoid indirection.
Process control (from "The Evolution of Unix")
By `process control,' I mean the mechanisms by which processes are created and used; today the system calls fork, exec, wait, and exit implement these mechanisms. Unlike the file system, which existed in nearly its present form from the earliest days, the process control scheme underwent considerable mutation after PDP-7 Unix was already in use. (The introduction of path names in the PDP-11 system was certainly a considerable notational advance, but not a change in fundamental structure.)
Today, the way in which commands are executed by the shell can be summarized as follows:
- The shell reads a command line from the terminal.
- It creates a child process by fork.
- The child process uses exec to call in the command from a file.
- Meanwhile, the parent shell uses wait to wait for the child (command) process to terminate by calling exit.
- The parent shell goes back to step 1).
Processes (independently executing entities) existed very early in PDP-7 Unix. There were in fact precisely two of them, one for each of the two terminals attached to the machine. There was no fork, wait, or exec. There was an exit, but its meaning was rather different, as will be seen. The main loop of the shell went as follows.
- The shell closed all its open files, then opened the terminal special file for standard input and output (file descriptors 0 and 1).
- It read a command line from the terminal.
- It linked to the file specifying the command, opened the file, and removed the link. Then it copied a small bootstrap program to the top of memory and jumped to it; this bootstrap program read in the file over the shell code, then jumped to the first location of the command (in effect an exec).
- The command did its work, then terminated by calling exit. The exit call caused the system to read in a fresh copy of the shell over the terminated command, then to jump to its start (and thus in effect to go to step 1).
The most interesting thing about this primitive implementation is the degree to which it anticipated themes developed more fully later. True, it could support neither background processes nor shell command files (let alone pipes and filters); but IO redirection (via `<' and `>') was soon there; it is discussed below. The implementation of redirection was quite straightforward; in step 3) above the shell just replaced its standard input or output with the appropriate file. Crucial to subsequent development was the implementation of the shell as a user-level program stored in a file, rather than a part of the operating system.
The structure of this process control scheme, with one process per terminal, is similar to that of many interactive systems, for example CTSS, Multics, Honeywell TSS, and IBM TSS and TSO. In general such systems require special mechanisms to implement useful facilities such as detached computations and command files; Unix at that stage didn't bother to supply the special mechanisms. It also exhibited some irritating, idiosyncratic problems. For example, a newly recreated shell had to close all its open files both to get rid of any open files left by the command just executed and to rescind previous IO redirection. Then it had to reopen the special file corresponding to its terminal, in order to read a new command line. There was no /dev directory (because no path names); moreover, the shell could retain no memory across commands, because it was reexecuted afresh after each command. Thus a further file system convention was required: each directory had to contain an entry tty for a special file that referred to the terminal of the process that opened it. If by accident one changed into some directory that lacked this entry, the shell would loop hopelessly; about the only remedy was to reboot. (Sometimes the missing link could be made from the other terminal.)
Process control in its modern form was designed and implemented within a couple of days. It is astonishing how easily it fitted into the existing system; at the same time it is easy to see how some of the slightly unusual features of the design are present precisely because they represented small, easily-coded changes to what existed. A good example is the separation of the fork and exec functions. The most common model for the creation of new processes involves specifying a program for the process to execute; in Unix, a forked process continues to run the same program as its parent until it performs an explicit exec. The separation of the functions is certainly not unique to Unix, and in fact it was present in the Berkeley time-sharing system [2], which was well-known to Thompson. Still, it seems reasonable to suppose that it exists in Unix mainly because of the ease with which fork could be implemented without changing much else. The system already handled multiple (i.e. two) processes; there was a process table, and the processes were swapped between main memory and the disk. The initial implementation of fork required only
- Expansion of the process table
- Addition of a fork call that copied the current process to the disk swap area, using the already existing swap IO primitives, and made some adjustments to the process table.
In fact, the PDP-7's fork call required precisely 27 lines of assembly code. Of course, other changes in the operating system and user programs were required, and some of them were rather interesting and unexpected. But a combined fork-exec would have been considerably more complicated, if only because exec as such did not exist; its function was already performed, using explicit IO, by the shell.
The exit system call, which previously read in a new copy of the shell (actually a sort of automatic exec but without arguments), simplified considerably; in the new version a process only had to clean out its process table entry, and give up control.
Curiously, the primitives that became wait were considerably more general than the present scheme. A pair of primitives sent one-word messages between named processes:
smes(pid, message) (pid, message) = rmes()
The target process of smes did not need to have any ancestral relationship with the receiver, although the system provided no explicit mechanism for communicating process IDs except that fork returned to each of the parent and child the ID of its relative. Messages were not queued; a sender delayed until the receiver read the message.
The message facility was used as follows: the parent shell, after creating a process to execute a command, sent a message to the new process by smes; when the command terminated (assuming it did not try to read any messages) the shell's blocked smes call returned an error indication that the target process did not exist. Thus the shell's smes became, in effect, the equivalent of wait.
A different protocol, which took advantage of more of the generality offered by messages, was used between the initialization program and the shells for each terminal. The initialization process, whose ID was understood to be 1, created a shell for each of the terminals, and then issued rmes; each shell, when it read the end of its input file, used smes to send a conventional `I am terminating' message to the initialization process, which recreated a new shell process for that terminal.
I can recall no other use of messages. This explains why the facility was replaced by the wait call of the present system, which is less general, but more directly applicable to the desired purpose. Possibly relevant also is the evident bug in the mechanism: if a command process attempted to use messages to communicate with other processes, it would disrupt the shell's synchronization. The shell depended on sending a message that was never received; if a command executed rmes, it would receive the shell's phony message, and cause the shell to read another input line just as if the command had terminated. If a need for general messages had manifested itself, the bug would have been repaired.
At any rate, the new process control scheme instantly rendered some very valuable features trivial to implement; for example detached processes (with `&') and recursive use of the shell as a command. Most systems have to supply some sort of special `batch job submission' facility and a special command interpreter for files distinct from the one used interactively.
Although the multiple-process idea slipped in very easily indeed, there were some aftereffects that weren't anticipated. The most memorable of these became evident soon after the new system came up and apparently worked. In the midst of our jubilation, it was discovered that the chdir (change current directory) command had stopped working. There was much reading of code and anxious introspection about how the addition of fork could have broken the chdir call. Finally the truth dawned: in the old system chdir was an ordinary command; it adjusted the current directory of the (unique) process attached to the terminal. Under the new system, the chdir command correctly changed the current directory of the process created to execute it, but this process promptly terminated and had no effect whatsoever on its parent shell! It was necessary to make chdir a special command, executed internally within the shell. It turns out that several command-like functions have the same property, for example login.
Another mismatch between the system as it had been and the new process control scheme took longer to become evident. Originally, the read/write pointer associated with each open file was stored within the process that opened the file. (This pointer indicates where in the file the next read or write will take place.) The problem with this organization became evident only when we tried to use command files. Suppose a simple command file contains
ls who
and it is executed as follows:
sh comfile>output
The sequence of events was
- The main shell creates a new process, which opens outfile to receive the standard output and executes the shell recursively.
- The new shell creates another process to execute ls, which correctly writes on file output and then terminates.
- Another process is created to execute the next command. However, the IO pointer for the output is copied from that of the shell, and it is still 0, because the shell has never written on its output, and IO pointers are associated with processes. The effect is that the output of who overwrites and destroys the output of the preceding ls command.
Solution of this problem required creation of a new system table to contain the IO pointers of open files independently of the process in which they were opened.
IO Redirection (from "The Evolution of Unix")
The very convenient notation for IO redirection, using the `>' and `<' characters, was not present from the very beginning of the PDP-7 Unix system, but it did appear quite early. Like much else in Unix, it was inspired by an idea from Multics. Multics has a rather general IO redirection mechanism [3] embodying named IO streams that can be dynamically redirected to various devices, files, and even through special stream-processing modules. Even in the version of Multics we were familiar with a decade ago, there existed a command that switched subsequent output normally destined for the terminal to a file, and another command to reattach output to the terminal. Where under Unix one might say
ls>xx
to get a listing of the names of one's files in xx, on Multics the notation was
iocall attach user_output file xx list iocall attach user_output syn user_i/o
Even though this very clumsy sequence was used often during the Multics days, and would have been utterly straightforward to integrate into the Multics shell, the idea did not occur to us or anyone else at the time. I speculate that the reason it did not was the sheer size of the Multics project: the implementors of the IO system were at Bell Labs in Murray Hill, while the shell was done at MIT. We didn't consider making changes to the shell (it was their program); correspondingly, the keepers of the shell may not even have known of the usefulness, albeit clumsiness, of iocall. (The 1969 Multics manual [4] lists iocall as an `author-maintained,' that is non-standard, command.) Because both the Unix IO system and its shell were under the exclusive control of Thompson, when the right idea finally surfaced, it was a matter of an hour or so to implement it.
Extant PDP-7 Unix Source Code
For many years, the only extant source code from the PDP-7 version of Unix appeared to be the source code to the dsw command that Dennis posted on the net.unix-wizards Usenet newsgroup in 1984. He wrote:
I happened to dredge up an old notebook and found a listing of the PDP-7 version of dsw. Because several people have approached me recently about reviving a version of PDP-7 Unix as a sort of paleontological exhibit, and because the subject has been discussed here, I thought people might be interested in seeing the code. I first considered net.sources, but decided not to carry whimsy too far.
Notes:
- The assembler has Knuth-style temporary labels but no literals.
- The name of the current directory was evidently “..”
- Formatting is faithfully reproduced.
- “sys save” makes a core image.
" dsw lac djmp dac .-1 oas cla cma tad d1 dac t1 sys open; dd; 0 1: lac d2 sys read; dir; 8 sna sys exit lac dir sna jmp 1b isz t1 jmp 1b wr: lac d1 sys write; dir+1; 4 lac d1 sys write; o12; 1 sys save do: sys unlink; dir+1 sys exit d1: 1 d2: 2 o12: 012 t1: 0 djmp: jmp do dd: 056056; 040040; 040040; 040040 dir: .=.+8
In October 2009, Dennis sent Warren Toomey a private e-mail that said “In other news, I have found the book that has the [PDP-7] listings that I knew I had, that of (some) of the user-level commands. I wonder what's the best way to get it scanned?”. Unfortunately, Dennis passed away before he could get the listings scanned in.
In 2016, Norman Wilson discovered a set of paper copies PDP-7 Unix listings he had done while he worked at Bell Labs. Warren Toomey organized a project to attempt to resurrect PDP-7 Unix with source code derived from scans of the listings. The listings were partial, but there was enough to create a system that would boot and run.
October 2019 saw another notebook of listings discovered (by Dennis Ritchie's heirs?). Those were scanned, proofread and corrected, creating a more complete and original PDP-7 Unix. A month later, the Living Computer Museum bootstrapped the reconstructed operating system on renovated PDP-7 hardware.
PDP-9 Unix
In several places, including the CACM paper, there is mention made of Unix running on a PDP-9. Dennis Ritchie posted a message on the PUPS mailing list in August 2002 which said:
The [PDP-]7, 9, 15 were very compatible. I think the -15 had some scheme for using an index register, which the earlier ones didn't have, but it was otherwise pretty much identical in IS architecture.
There was very little rewriting to try Unix out on the -9 and -15; perhaps just some tweaks in the disk device commands. I don't think the system actually ran on either for more than a few hours. Ken was just playing around.
The -15 may have had an electrically different bus, but I'm reasonably sure it was not a Unibus. All of them used IOT instructions, not memory-mapped IO registers.
Both of the machines we tried were being used by other groups and we couldn't squat on them as with the PDP-7. I recall that the -15's main job was controlling a step-and-repeat camera that exposed LSI masks.