In the following discussion, the machine designations, B5000, A Series, and ClearPath/MCP are used interchangeably although this needlessly conflates the features and concepts of the various machines and should be edited someday to keep clear the distinctions between the 5000, 5500, 6500 et seq, and A Series.
Thus the B5000 was based on a very powerful language. Most other vendors could only dream of implementing an ALGOL compiler and most in the industry dismissed ALGOL as being unimplementable. However, a bright young student named Donald Knuth had previously implemented ALGOL-58 on an earlier Burroughs machine during the three months of his summer break. Many wrote ALGOL off, mistakenly believing that high-level languages could not have the same power as assembler, and thus not realizing ALGOL's potential as a systems programming language, an opinion not revised until the development of the C programming language.
The Burroughs ALGOL compiler was very fast — this impressed the Dutch scientist Edsger Dijkstra when he submitted a program to be compiled at the B5000 Pasadena plant. His deck of cards was compiled almost immediately and he immediately wanted several machines for his university (Eindhoven University of Technology ) back in Europe. The compiler was fast for several reasons, but the primary reason was that it was a "one-pass compiler." Early computers did not have enough memory to store the source code, so compilers (and even assemblers) usually needed to read the source code more than once. The ALGOL syntax requires that each variable (or other object) be declared before it is used, so it is feasible to write an ALGOL compiler that reads the data only once. This concept has profound theoretical implications, but it also permits very fast compiling. Burroughs large systems could compile as fast as they could read the source code from the punched cards, and they had the fastest card readers in the industry.
The powerful Burroughs COBOL compiler was also a one-pass compiler and equally fast. A 4000-card COBOL program compiled as fast as the 1000-card/minute readers could read the code. The program was ready to use as soon as the cards were through the reader.
|B5000||1961||initial system, 2nd generation (transistor) computer|
|B5500||1964||3x speed improvement(?)|
|B6500||1969||3rd gen computer (integrated circuits), up to 4 processors|
|B5700||1971||new name for B5500|
|B6700||1971||new name/bug fix for B6500|
|B7700||1972||faster processor, cache for stack, up to 8 processors.|
|B6800||1977?||semiconductor memory, NUMA architecture|
|B7800||1977?||semiconductor memory, faster, up to 16? processors|
|A Series||1984||Re-implemented in custom-designed Motorola ECL MCA1, then MCA2 gate arrays|
|Micro A||1989||desktop "mainframe" with single-chip SCAMP processor.|
|Clearpath HMP NX 4000||198?||??|
|Clearpath HMP NX 5000||199?||??|
|Clearpath HMP LX 5000||1998||Implements Burroughs Large systems in emulation only (Xeon processors)|
|Libra 500||2005?||e.g. Libra 595|
While B5000 was designed specifically around ALGOL, this was only a starting point. Other business-oriented languages such as COBOL were also well supported, most notably by the powerful string operators which were included for the development of fast compilers.
The ALGOL used on the B5000 is an extended ALGOL subset. It includes powerful string manipulation instructions but excludes certain ALGOL constructs, notably unspecified formal parameters. A DEFINE mechanism serves a similar purpose to the #defines found in C, but is fully integrated into the language rather than being a preprocessor. The EVENT data type facilitates coordination between processes, and ON FAULT blocks enable handling program faults.
The user level of ALGOL does not include many of the insecure constructs needed by the operating system and other system software. Two levels of language extensions provide the additional constructs: ESPOL and NEWP for writing the MCP and closely related software, and DCALGOL and DMALGOL to provide more specific extensions for specific kinds of system software.
NEWP programs that contain unsafe constructs are initially non-executable. The security administrator of a system is able to "bless" such programs and make them executable, but normal users are not able to do this. (Even "privileged users", who normally have essentially root privilege, may be unable to do this depending on the configuration chosen by the site.) While NEWP can be used to write general programs and has a number of features designed for large software projects, it does not support everything ALGOL does.
NEWP has a number of facilities to enable large-scale software projects, such as the operating system, including named interfaces (functions and data), groups of interfaces, modules, and super-modules. Modules group data and functions together, allowing easy access to the data as global within the module. Interfaces allow a module to import and export functions and data. Super-modules allow modules to be grouped.
MCSs are items of software worth noting – they control user sessions and provide keeping track of user state without having to run per-user processes since a single MCS stack can be shared by many users. Load balancing can also be achieved at the MCS level. For example saying that you want to handle 30 users per stack, in which case if you have 31 to 60 users, you have two stacks, 61 to 90 users, three stacks, etc. This gives B5000 machines a great performance advantage in a server since you don't need to start up another user process and thus create a new stack each time a user attaches to the system. Thus you can efficiently service users (whether they require state or not) with MCSs. MCSs also provide the backbone of large-scale transaction processing.
The MCS talked with an external co-processor, the TCP (Terminal Control Processor). This was a 24-bit minicomputer with a conventional register architecture and hardware I/O capability to handle thousands of remote terminals. The TCP and the B6500 communicated by messages in memory, essentially packets in today's terms, and the MCS did the B6500-side processing of those messages. The TCP did have an assembler, but that assembler was the B6500 ALGOL compiler. There was one ALGOL function for each kind of TCP instruction, and if you called that function then the corresponding TCP instruction bits would be emitted to the output. A TCP program was an ALGOL program comprising nothing but a long list of calls on these functions, one for each assembly language statement. Essentially ALGOL acted like the macro pass of a macro assembler. The first pass was the ALGOL compiler; the second pass was running the resulting program (on the B6500) which would then emit the binary for the TCP.
DMALGOL preprocessing includes variables and loops, and can generate names based on compile-time variables. This enables tailoring far beyond what can be done by preprocessing facilities which lack loops.
DMALGOL is used to provide tailored access routines for DMSII databases. After a database is defined using the Data Access and Structure Definition Language (DASDL), the schema is translated by the preprocessor into tailored DMALGOL access routines and then compiled. This means that, unlike in other DBMS implementations, there is often no need for database-specific if/then/else code at run-time. In the 1970s, this "tailoring" was used very extensively to reduce the code footprint and execution time. It became much less used in later years, partly because low-level fine tuning for memory and speed became less critical, and partly because eliminating the proprocessing made coding simpler and thus enabled more important optimizations.
Roy Guck of Burroughs was one of the main developers of DMSII.
In later years, with compiler code size being less of a concern, most of the preprocessing constructs were made available in the user level of ALGOL. Only the unsafe constructs and the direct processing of the database descirption file remain restricted to DMALGOL.
Multitasking is also very efficient on B5000 machines. There is one specific instruction to perform process switches – MVST (move stack). Each stack represents a process (task or thread) and tasks can become blocked waiting on resource requests (which includes waiting for a processor to run on if the task has been interrupted because of preemptive multitasking). User programs cannot issue an MVST, and there is only one line of code in the operating system where this is done.
So a process switch proceeds something like this – a process requests a resource that is not immediately available, maybe a read of a record of a file from a block which is not currently in memory, or the system timer has triggered an interrupt. The operating system code is entered and run on top of the user stack. It turns off user process timers. The current process is placed in the appropriate queue for the resource being requested, or the ready queue waiting for the processor if this is a preemptive context switch. The operating system determines the first process in the ready queue and invokes the instruction move_stack, which makes the process at the head of the ready queue active.
Thus the designers of the current B5000 systems can optimize in whatever is the latest technique, and programmers do not have to adjust their code for it to run faster – they do not even need to recompile, thus protecting software investment. Some programs have been known to run for years over many processor upgrades. Such speed up is limited on register-based machines.
Another point for speed as promoted by the RISC designers was that processor speed is considerably faster if everything is on a single chip. It was a valid point in the 1970s when more complex architectures such as the B5000 required too many transistors to fit on a single chip. However, this is not the case today and every B5000 successor machine now fits on a single chip as well as the performance support techniques such as caches and instruction pipelines.
In fact, the A Series line of B5000 successors included the first single chip mainframe, the Micro-A of the late 1980s. This "mainframe" chip (named SCAMP for Single-Chip A-series Mainframe Processor) sat on an Intel-based plug-in PC board.
Here is an example of how programs map to the stack architecture
— This is lexical level 2 (level zero is reserved for the operating system and level 1 for code segments).
— At level 2 we place global variables for our program.
integer i, j, k
real f, g
array a [0:9]
procedure p (real p1, p2)
value p1 — p1 passed by value, p2 implicitly passed by reference.
— This block is at lexical level 3
real r1, r2
r2 := p1 * 5
p2 := r2 — This sets 'g' to the value of r2
p1 := r2 — This set 'p2' to r2, but not 'f'
— Since this overwrites the original value of f in p1 it most likely indicates
— an error. Few of ALGOL's successors have corrected this situation by
— making value parameters read only – most have not.
if r2 > 10 then
— A variable declared here makes this lexical level 4
— The declaration of a variable makes this a block, which will invoke some
— stack building code. Normally you won't declare variables here, in which
— case this would be a compound statement, not a block.
... <== sample stack is executing somewhere here.
p (f, g)
Each stack frame corresponds to a lexical level in the current execution environment. As you can see, lexical level is the static textual nesting of a program, not the dynamic call nesting. The visibility rules of ALGOL, a language designed for single pass compilers, mean that only variables declared before the current position are visible at that part of the code except for forward declarations. All variables declared in enclosing blocks are visible. Another case is that variables of the same name may be declared in inner blocks and these effectively hide the outer variables which become inaccessible.
Since lexical nesting is static, it is very rare to find a program nested more than five levels deep, and it could be argued that such programs would be poorly structured. B5000 machines allow nesting of up to 32 levels. Procedures can be invoked in four ways – normal, call, process, and run.
The normal invocation invokes a procedure in the normal way any language invokes a routine, by suspending the calling routine until the invoked procedure returns.
The call mechanism invokes a procedure as a coroutine. Coroutines have partner tasks, where control is explicitly passed between the tasks by means of a CONTINUE instruction. These are synchronous processes.
The process mechanism invokes a procedure as an asynchronous task and in this case a separate stack is set up starting at the lexical level of the processed procedure. As an asynchronous task, there is no control over exactly when control will be passed between the tasks, unlike coroutines. Note also that the processed procedure still has access to the enclosing environment and this is a very efficient IPC (Inter Process Communication) mechanism. Since two or more tasks now have access to common variables, the tasks must be synchronized to prevent race conditions, which is handled by the EVENT data type, where processes can WAIT on an event until they are caused by another cooperating process. EVENTs also allow for mutual exclusion synchronization through the PROCURE and LIBERATE functions. If for any reason the child task dies, the calling task can continue – however, if the parent process dies, then all child processes are automatically terminated. On a machine with more than one processor, the processes may run simultaneously. This EVENT mechanism is a basic enabler for multiprocessing in addition to multitasking.
The last invocation type is run. This runs a procedure as an independent task which can continue on after the originating process terminates. For this reason, the child process cannot access variables in the parent's environment, and all parameters passed to the invoked procedure must be call-by-value.
Thus Burroughs Extended ALGOL had all of the multi-processing and synchronization features of later languages like Ada, with the added benefit that support for asynchronous processes was built into the hardware level.
One last possibility is that a procedure may be declared INLINE, that is when the compiler sees a reference to it the code for the procedure is generated inline to save the overhead of a procedure call. This is best done for small pieces of code and is like a define, except you don't get the problems with parameters that you can with defines. This facility is available in NEWP.
In the example program only normal calls are used, so all the information will be on a single stack. For asynchronous calls, the stack would be split into multiple stacks so that the processes share data but run asynchronously.
A stack hardware optimization is the provision of D (or "display") registers. These are registers that point to the start of each called stack frame. These registers are updated automatically as procedures are entered and exited and are not accessible by any software. There are 32 D registers, which is what limits to 32 levels of lexical nesting.
Consider how we would access a lexical level 2 (D) global variable from lexical level 5 (D). Suppose the variable is 6 words away from the base of lexical level 2. It is thus represented by the address couple (2, 6). If we don't have D registers, we have to look at the control word at the base of the D frame, which points to the frame containing the D environment. We then look at the control word at the base of this environment to find the D environment, and continue in this fashion until we have followed all the links back to the required lexical level. Note this is not the same path as the return path back through the procedures which have been called in order to get to this point. (The architecture keeps both the data stack and the call stack in the same structure, but uses control words to tell them apart.)
As you can see, this is quite inefficient just to access a variable. With D registers, the D register points at the base of the lexical level 2 environment, and all we need to do to generate the address of the variable is to add its offset from the stack frame base to the frame base address in the D register. (There is an efficient linked list search operator LLLU, which could search the stack in the above fashion, but the D register approach is still going to be faster.) With D registers, access to entities in outer and global environments is just as efficient as local variable access.
D Tag Data — Comments
0| n | — The integer 'n' address couple (4, 1)|-----------|
D ==>3| MSCW | — The Mark Stack Control Word containing the link to D.|===========|
0 | r2 | — The real 'r2' address couple (3, 5)|-----------|
0 | r1 | — The real 'r1' address couple (3, 4)|-----------|
1 | p2 | — An SIRW reference to 'g' at (2,6)|-----------|
0 | p1 | — The parameter 'p1' from value of 'f' address couple (3, 2)|-----------|
3| RCW | — A return control word|-----------|
D ==>3| MSCW | — The Mark Stack Control Word containing the link to D.|===========| — The array 'a' address couple (2, 7)
1 | a | ====================>[ten word memory block]|-----------|
0 | g | — The real 'g' address couple (2, 6)|-----------|
0 | f | — The real 'f' address couple (2, 5)|-----------|
0 | k | — The integer 'k' address couple (2, 4)|-----------|
0 | j | — The integer 'j' address couple (2, 3)|-----------|
0 | i | — The integer 'i' address couple (2, 2)|-----------|
3| RCW | — A return control word|-----------|
D ==>3| MSCW | — The Mark Stack Control Word containing the link to the previous stack frame.
============= — Stack bottom
If we had invoked the procedure p as a coroutine, or a process instruction, the D environment would have become a separate D-based stack. Note that this means that asynchronous processes still have access to the D environment as implied in ALGOL program code. Taking this one step further, a totally different program could call another program’s code, creating a D stack frame pointing to another process’ D environment on top of its own process stack. At an instant the whole address space from the code’s execution environment changes, making the D environment on the own process stack not directly addressable and instead make the D environment in another process stack directly addressable. This is how library calls are implemented. At such a cross-stack call, the calling code and called code could even originate from programs written in different source languages and be compiled by different compilers.
Note that the D and D environments do not occur in the current process's stack. The D environment is the code segment dictionary, which is shared by all processes running the same code. The D environment represents entities exported by the operating system.
Stack frames actually don’t even have to exist in a process stack. This feature was used early on for file IO optimization, the FIB (file information block) was linked into the display registers at D during IO operations. In the early nineties, this ability was implemented as a language feature as STRUCTURE BLOCKs and – combined with library technology - as CONNECTION BLOCKs. The ability to link a data structure into the display register address scope implemented object orientation. Thus, the B5000 actually used a form of object orientation long before the term was ever used.
One nice thing about the stack structure is that if a program does happen to fail, a stack dump is taken and it is very easy for a programmer to find out exactly what the state of a running program was. Compare that to core dumps and exchange packages of other systems.
Another thing about the stack structure is that programs are implicitly recursive. FORTRAN was not a recursive language and perhaps one stumbling block to people's understanding of how ALGOL was to be implemented was how to implement recursion. On the B5000, this was not a problem – in fact, they had the reverse problem, how to stop programs from being recursive. In the end they didn't bother, even the Burroughs FORTRAN compiler was recursive, since it was unproductive to stop it being so.
Thus Burroughs FORTRAN was better than any other implementation of FORTRAN. In fact, Burroughs became known for its superior compilers and implementation of languages, including the object-oriented Simula (a superset of ALGOL), and Iverson, the designer of APL declared that the Burroughs implementation of APL was the best he'd seen. John McCarthy, the language designer of LISP disagreed, since LISP was based on modifiable code, he did not like the unmodifiable code of the B5000, but most LISP implementations would run in an interpretive environment anyway.
Note also that stacks automatically used as much memory as was needed by a process. There was no having to do SYSGENs on Burroughs systems as with competing systems in order to preconfigure memory partitions in which to run tasks. In fact, Burroughs really championed "plug and play" in that extra peripherals could be plugged into the system without having to recompile the operating system with new peripheral tables. Thus these machines could be seen as the forerunners of today's USB and FireWire devices.
In the original B5000, a bit in each word was set aside to identify the word as a code or data word. This was a security mechanism to stop programs from being able to corrupt code, in the way that crackers do today.
An advantage to unmodifiable code is that B5000 code is fully reentrant: it does not matter how many users are running a program, there will only be one copy of the code in memory, thus saving substantial memory; these machines are actually very memory and disk efficient.
Later, when the B6500 was designed, it was realized that the 1-bit code/data distinction was a powerful idea and this was extended to three bits outside of the 48 bit word into a tag. The data bits are bits 0-47 and the tag is in bits 48-50. Bit 48 was the read-only bit, thus odd tags indicated control words that could not be written by a user-level program. Code words were given tag 3. Here is a list of the tags and their function:
|0||Data||All kinds of user and system data (text data and single precision numbers)|
|2||Double||Double Precision data|
|4||SIW||Step Index word (used in loops)|
|1||IRW||Indirect Reference Word|
|SIRW||Stuffed Indirect Reference Word|
|3||Code||Program code word|
|MSCW||Mark Stack Control Word|
|RCW||Return Control Word|
|TOSCW||Top of Stack Control Word|
|5||Descriptor||Data block descriptors|
|7||PCW||Program Control Word|
Note: Internally, some of the machines had 60 bit words, with the extra bits being used for engineering purposes such as a Hamming code error-correction field, but these were never seen by programmers.
Note: The current incarnation of these machines, the Unisys ClearPath has extended tags further into a four bit tag. The microcode level that specified four bit tags was referred to as level Gamma.
Even-tagged words are user data which can be modified by a user program as user state. Odd-tagged words are created and used directly by the hardware and represent a program's execution state. Since these words are created and consumed by specific instructions or the hardware, the exact format of these words can change between hardware implementation and user programs do not need to be recompiled, since the same code stream will produce the same results, even though system word format may have changed.
Tag 1 words represent on-stack data addresses. The normal IRW simply stores an address couple to data on the current stack. The SIRW references data on any stack by including a stack number in the address.
Tag 5 words are descriptors, which are more fully described in the next section. Tag 5 words represent off-stack data addresses.
Tag 7 is the program control word which describes a procedure entry point. When operators hit a PCW, the procedure is entered. The ENTR operator explicitly enters a procedure (non-value-returning routine). Functions (value-returning routines) are implicitly entered by operators such as value call (VALC). Note that global routines are stored in the D environment as SIRWs that point to a PCW stored in the code segment dictionary in the D environment. The D environment is not stored on the current stack because it can be referenced by all processes sharing this code. Thus code is reentrant and shared.
Tag 3 represents code words themselves, which won't occur on the stack. Tag 3 is also used for the stack control words MSCW, RCW, TOSCW.
The figure to the left shows how the Burroughs Large System architecture was fundamentally a hardware architecture for Object-oriented programming, something that still doesn't exist in conventional architectures.
Notable operators are:
HEYU — send an interrupt to another processor
RDLK — Low-level semaphore operator: Load the A register with the memory location given by the A register and place the value in the B register at that memory location in a single uninterruptible cycle
WHOI — Processor identification
IDLE — Idle until an interrupt is received
B5000 machines were programmed exclusively in high-level languages, there is no assembler, obviously this would not apply to emode which is intended to run on stock hardware and thus breaks the fundamental basis of the architecture noted above except as noted below.
The B5000 stack architecture inspired Chuck Moore, the designer of the programming language Forth, who encountered the B5500 while at MIT. In Forth - The Early Years, Moore described the influence, noting that Forth's DUP, DROP and SWAP came from the corresponding B5500 instructions (DUPL, DLET, EXCH).
Hewlett-Packard systems were influenced by the B5000, since some Burroughs engineers found later employment designing machines for HP and these also were stack machines. Bob Barton's work on reverse Polish notation (RPN) found its way into HP calculators beginning with the 9100A, and notably the HP-35 and subsequent calculators.
The NonStop systems designed by Tandem Computers in the late 1970s and early 1980s were also stack machines, influenced by the B5000 indirectly through the HP connection, as several of the early Tandem engineers were formerly with HP. Around 1990, these systems migrated to a RISC architecture and now contain only odd vestiges of their stack architecture.
Bob Barton was also very influential on Alan Kay. Kay was also impressed by the data-driven tagged architecture of the B5000 and this influenced his thinking in his developments in object-oriented programming and Smalltalk.
Another facet of the B5000 architecture was that it was a secure architecture that runs directly on hardware. This technique has descendants in the virtual machines of today in their attempts to provide secure environments. One notable such product is the Java JVM which provides a secure sandbox in which applications run.
The value of the hardware-architecture binding that existed before emmode would be substantially preserved in iAPXx86 to the extent that MCP was the one and only control program but the support provided by stock hardwares is still inferior to that of the native mode. A little-known Intel processor architecture that actually preceded iAPXx86 (Intel iAPX 432) would have provided an equivalent physical basis, as it too was essentially an object oriented dataflow architecture.