Alpha: The History in Facts and Comments

AlphaPowered logo-type Dig my grave both long and narrow
Make my coffin neat and strong 

(from an old American song)

Contents:
 

Foreword

Part 1.  PDP and VAX

Part 2.  The PRISM Project

Part 3.  The Alpha Project

Part 4.  EV4, LCA4, EV45, LCA45

Part 5.  EV5, EV56, PCA56, PCA57

Part 6.  The Fall of DEC

Part 7.  EV6, EV67, EV68C, EV68A

Part 8.  The Epoch of Compaq

Part 9.  EV7, EV79, EV7z, EV8

Epilogue

Paul V. Bolotoff

Released: 14th of April, 2005
Modified: 1st of June, 2005

in Russian


 

Foreword

This work starts a set of articles dedicated to Alpha processors and the architecture, as well as to other areas connected. The set, because presenting the whole material available in a single overview would be somewhat problematic and generally inadequate from the author's point of view. Besides, the theme about to be opened is vast very much, fundamental in many aspects, and there are no preceding papers comparable to author's expected in means of fullness and scalability, taking into account all the architecture not separate products distanced significantly in time. Maybe this article would be looking better if was written and published several years ago, when Alpha processors were real kings considering performance, and their future was expected to be bright very much. However, only nowadays it seems to be the right time to draw the final line, to explain what happened, and why one of the most interesting and promising computer architectures has been thrown into oblivion.

Generally, this paper is a historical overview with some elements of analysis, so it should be considered as such. It doesn't pretend to be universal though contains a real lot of reference information. On the other hand, it isn't a necrologue or a funeral prayer, definitely...

back to the contents
 

Part 1. PDP and VAX

DEC's logo-type

Digital Equipment Corporation (abbreviated to DEC) was founded in 1957 by two engineers, Kenneth Olsen and Harlan Anderson, graduates of Massachusetts Institute of Technology, and was one of the oldest and most known companies of the world computer industry.

Before founding, Olsen worked for Lincoln Laboratory at the institute mentioned above, which was supported by the Department of Defense (USA), and participated in development of one of world's first transistor-based computers, TX-2. The company was producing and selling backplane modules for computers initially, but in 1960 it offered the first computer of its own, 18-bit PDP-1 (Programmable Data Processor - 1), able of about 100 thousand operations per second. By the way, that machine was used to run the first computer game in known history, Spacewar of Steven Russell. 12-bit PDP-8, introduced in 1964, deserved to be called the first "minicomputer" (sized like a small wardrobe) manufactured in series. In addition, the price was attractive: about 18000 USD (1965) for a standard configuration. Because of an excellent price/performance ratio PDP-8 was able to stand against those famous mainframe systems of IBM as a real competitor. There were about 1450 machines produced until 1968 (not counting numerous modifications following). 36-bit PDP-10 was ready in the same 1968, based upon the design of an experimental PDP-6, and targeted for data processing centres, research laboratories, and military needs; different versions of PDP-10 were manufactured until 1983. There were attempts taken towards improvements of that 36-bit architecture, organised within the Unicorn project under supervision of Leonard Hughes and David Rogers, but the project was closed in June of 1975, and all its resources were transferred to support another, 32-bit, architecture.

16-bit PDP-11 was launched in production in the beginning of 1970's. It was the first DEC's computer to feature use of 8-bit bytes, and a direct successor of the PDP-8 model line. Because of a simple and fortunate Unibus-based architecture (or a modified one, based upon Q-bus), a considerably effective instruction set, and, what matters, carrying low manufacturing costs the model line of PDP-11 had faced a success. Quite obviously, PDP-11 became a subject of cloning actions all over the world, including even those "countries of people's democracy": CM-4 (USSR, Bulgaria, Hungary), CM-1420 (USSR, Bulgaria, German Democratic Republic), CM-1600 (USSR), IZOT-1016 (Bulgaria), DVK (USSR). There were many operating systems developed for PDP-11: DEC offered P/OS, RSX-11, RT-11, RSTS/E, also several derivatives of DOS, and finally, the first release of UNIX OS was completed in Bell Laboratories on PDP-7 and PDP-11 machines in 1971 using their assembly languages. PDP-11 left the market during 1980's because of one, but inevitable reason: the lack of address space. A new, 32-bit though still CISC, architecture was promoted to the market.

So, that architecture was VAX (Virtual Address eXtension), approved officially during a VAX Architecture Committee session in April of 1975. The architecture was developed in several months while the Star project was operational and supervised by Gordon Bell, in parallel with the Unicorn project mentioned above. Upon completion of both the projects it was decided to cancel any further development of 36-bit systems, and to concentrate resources available to support 32-bit VAXen (the plural of VAX). In fact, the Star project was to prove the necessity of increasing general registers' width of PDP-11 to 32 bits, their number from 8 to 16, and a significant redesign of the instruction set. The first VAX machine was announced in October of 1977, model 11/780. A few months later, in February of 1978, was released a new operating system for VAXen, VMS (Virtual Memory System) v1.0. It was a multi-user and multi-tasking OS supporting up to 64Mb of main memory, networking functions (DECnet), an adaptive task scheduler, an extended process management, and many more innovations hard to be seen before. Renamed to VAX/VMS, v2.0 was presented in April of 1980 carrying numerous improvements. In addition, the classical UNIX was ported to VAX soon. VAXen were manufactured and sold with a real success during 1980's, and were shipped in limited quantities under special contracts even when close to the end of the century. The whole model line included several dozens of kinds ranging from compact workstations to 6-processor mainframe-class servers. Even nowadays, thousands of VAXen keep working at structures of the Department of Defense and the NSA (National Security Agency), also at numerous commercial organisations. Nevertheless, the epoch of VAXen was 1980's, at least because DEC had bet on a new architecture in 1990's.
VAX 11/780 brochure cover VMS sales update cover

back to the contents
 

Part 2. The PRISM Project

In the beginning of 1980's DEC was on the paramount of its financial wealth, mostly because of high revenues related to growing constantly sales of VAX machines. However, nothing lasts forever, and it was obvious that some day VAX would have to leave the market in favour of a new architecture as it was happening with PDP-11. Those days many companies started to pay more and more attention to RISC concepts and implementations, and DEC had no intention to ignore that trend. There were several subdivisions inside of DEC between 1982 and 1985, which researched actively over the RISC area:

  • Titan, a high-speed design by Western Research Laboratory (DECwest) in Palo Alto (California), supervised by Forest Baskett, since 1982;
     
  • SAFE (Streamline Architecture for Fast Execution), supervised by Alan Kotok and David Orbits, since 1983;
     
  • HR-32 (Hudson RISC 32-bit), located at DEC's factory in Hudson (Massachusetts), supervised by Richard Witek and Daniel Dobberpuhl, since 1984;
     
  • CASCADE by David Cutler in Seattle (Washington), since 1984.

In 1985, after Cutler's initiative about creating a "corporate RISC plan" all 4 projects were merged into one, PRISM (Parallel Instruction Set Machine), and the first draft for a new RISC processor was released in August of 1985. To mention, DEC had participated in the development of MIPS R3000 processor those days, and even initiated the creation of Advanced Computing Environment consortium to promote that architecture.

No wonder that the processor developed had inherited many features of MIPS architecture, but at the same time the differences were obvious. All instructions were fixed-length of 32 bits with the upper 6 and the lower 5 presenting an instruction code actually, and the rest 21 were reserved for immediate data or addressing needs. There were 64 primary 32-bit general-purpose registers defined (MIPS supposed 32), also 16 additional 64-bit vector registers, and 3 control registers for vector operations: two 7-bit (vector length and vector count), and one 64-bit (vector mask). There was no processor state register, thus a result of two scalar operands compared was placed into a general-purpose register, but a result of two vector operands compared -- into the vector mask. There was no built-in floating-point unit. A set of special instructions (Epicode, or extended processor instruction code) was created in software, utilising a loadable microcode, to facilitate handling of special tasks required for an environment or an operating system given, and not supported by the standard instruction set otherwise. In future, this function was implemented for Alpha architecture under the name of PALcode (Privileged Architecture Library code).

In 1988, when the project was still in development the high management of DEC decided to close it considering any further financial support as a waste of money. Protesting against that decision, Cutler resigned and went to Microsoft to supervise a department developing Windows NT.

In the beginning of 1989, DEC presented its first RISC-powered workstations, DECstation 3100 with 32-bit MIPS R2000 inside clocked at 16MHz, and DECstation 2100 using the same processor type but clocked at 12MHz. Both the machines were running Ultrix OS, and were priced rather inexpensively (about 8000 USD (1990) for DECstation 2100).

back to the contents
 

Part 3. The Alpha Project

In 1989, aging VAX architecture was hardly able to compete with RISC architectures of the 2nd generation such as MIPS and SPARC, and it was obvious that the next generation of RISC hardware would leave not so many chances to survive for VAX. In the middle of 1989, DEC engineers had received a task to create a competitive RISC architecture with a long-term potential, but at the same time carrying a minimal set of incompatibilities with VAX. That was because VAX/VMS and all accompanying applications had to be ported to the new architecture, which was also defined to be 64-bit right from the start, since competitors were about to release their 64-bit solutions. A development group was created with Richard Witek and Richard Sites involved as the chief architects.

Alpha architecture was mentioned officially for the first time on the 25th of February 1992, during a conference in Tokyo. In addition, most key features of the new architecture were listed within a concise overview (for comp.arch, a USENET conference); it was also mentioned that "Alpha" was an internal code-name, and an official name would be provided later. The new processor was of a clean 64-bit RISC design to execute fixed-length instructions (32 bits every), with 32 integer 64-bit registers, operated with 43-bit virtual addresses (with a possibility to expand up to 64 bits in future implementations). Like VAX, it used little-endian byte order (i.e. when a low byte of a register occupies a low memory address when stored, in contrary to big-endian byte order, introduced by Motorola and used in most processor architectures, where a low byte of a register occupies a high memory address when stored). A mathematical co-processor was built into the core, with 32 floating-point 64-bit registers using random access order, unlike primitive stack access order, implemented in Intel x87 co-processors. The total lifetime of the new architecture was estimated in not less than 25 years.

The instruction set was simplified to facilitate pipelining actions as much as possible, and consisted of 5 groups:

  • integer instructions;
  • floating-point instructions;
  • branch and compare instructions;
  • load and store instructions;
  • PALcode instructions.

To mention, there were no integer divide instructions, because they were most complex and thus badly pipeline-able, so were emulated.

Alpha architecture was a "real" RISC (in contrary to modern processors of i386 architecture, which are RISC only inside). The conceptual difference between RISC (Reduced Instruction Set Computing) and CISC (Complex Instruction Set Computing) was (and still is) in a few moments:
 
Feature CISC RISC
Instruction
length
Variable,
depends upon
an instruction type
Fixed,
doesn't depend upon
an instruction type
Instruction
set
Wide,
adapted for
programmer's needs
Balanced,
adapted for
processor's execution
convenience
Memory
access
Allowed for different
kinds of instructions
Allowed for load/store
instructions only

The processor was supposed to be launched in production at a very high frequency -- 150MHz, which should be increased for up to 200MHz while utilising the same engineering limits. This appeared to be possible because of a successful architecture, as well as because of engineers' reject to involve automatic design systems, and doing all the work just by hands.

The project entered manufacturing stage, and was reorganised into a regular division of DEC soon.

Because of DEC marketing department's efforts, the new architecture was called AXP (or Alpha AXP), though still not known for sure what exactly this abbreviation meant. Quite possible, nothing at all; in past, DEC had legal problems with its VAX brand, because there was another pretending company, a manufacturer of vacuum cleaners, and that time the conflict was taken to court. By the way, it was also motivated that DEC's equipment sales suffered because of the other company's slogan, "Nothing sucks like a Vax!" After all, a joke had shown up saying that AXP meant "Almost Exactly PRISM"

back to the contents
 

Part 4. EV4, LCA4, EV45, LCA45

The first processor of Alpha family was called 21064 (21 pointed that Alpha was an architecture of the 21st century, 0 -- a processor's generation, 64 -- computational capability in bits), also code-named as EV4 (EV was [supposedly] the abbreviation of "Extended VAX", and 4 -- a technical process' generation, CMOS4; in turn, CMOS stood for "Complementary Metal Oxide Semiconductor"). To mention, a prototype of EV4 was ready in 1991, using CMOS3 process, therefore with cache sizes reduced and with no floating-point unit; but, it was important to have considering a possibility given to tune the architecture and software environment. EV4 was introduced in November of 1992, and was manufactured using an advanced for those days 3-layer 0.75µ technical process (in future, it was modified towards 0.675µ CMOS4S, the optical modification of CMOS4). Was designed for 3.3V supply, and with core frequencies ranging from 150MHz to 200MHz (TDP from 21W to 27W). Consisted of 1.68 mln. transistors, and utilised a die size of 233mm². Supported multi-processing, as one of the architecture's key features. Form-factor: PGA-431 (Pin Grid Array).

The L1 cache was integrated: 8Kb for instructions (I-cache, instruction cache), direct-mapped, also 8Kb for data (D-cache, data cache), direct-mapped and write-through. Read latency of D-cache was 3 ticks. Every line of I-cache consisted of 32 bytes of instructions, a 21-bit tag record, an 8-bit branch history field, and of several auxiliary fields; every line of D-cache consisted of 32 bytes of data and a 21-bit tag record. The L2 cache (B-cache, backup cache) was a recommended option, using external synchronous or asynchronous SRAM chips, direct-mapped, write-back, write-ahead, sized up to 16Mb (from 512Kb to 2Mb usually); every line consisted of 32 bytes of data or instructions with a 1-bit long-word parity or 7-bit long-word ECC field, a 17-bit maximum tag record with an additional 1-bit long-word parity protection, and a 3-bit condition flag with an additional parity bit. Read and write speed of B-cache was programmable, in processor's ticks. The system data bus was either 64-bit or 128-bit wide (programmable, with a 1-bit long-word parity or 7-bit long-word ECC field), and was multiplexed with B-cache data bus, switching if necessary; the system address bus was 34-bit wide. B-cache was organised to be inclusive to D-cache, i.e. contained a full copy of the latter. A mechanism called victim write was used to store data from B-cache to memory. The processor and only was able to perform read/write operations with B-cache, the system logic was just granted a permission to read B-tag data (that was of the top importance for multi-processor systems especially, to maintain cache coherence of all processors available within a machine).

The processor was powered with one integer pipeline (E-box, 7 stages), and one floating-point pipeline (F-box, 10 stages). The instruction decoder and scheduler (I-box) was able to supply up to 2 commands per tick to the functional units, namely E-box, F-box, and load/store unit (A-box), in-order. The cache and system bus controller (C-box) worked in cooperation with A-box, and supervised integrated I-cache and D-cache, as well as external B-cache. The branch prediction unit maintained a 4096-entry branch prediction table, 2 bits per entry. There were I-TLB of 8 entries for 8Kb pages and 4 entries for 4Mb pages, and D-TLB of 32 entries; all fully associative.
 
Floor-plan of EV4
EV4 (front) EV4 (back)

With a respect to its excellent performance, EV4 was expensive considerably for most potential customers, and thus a low-priced brother was released in September of 1993, 21066 (LCA4, or LCA4S). It was based upon EV4's core, but with memory and PCI controllers integrated additionally, also several secondary functions. Although the system data bus width was reduced to 64 bits causing a negative impact on performance. LCA4 was manufactured using 0.675µ CMOS4S process with the die size even lesser than of EV4 (209mm² compared to 234mm²), and clock frequencies were lowered (from 100MHz to 166MHz), presumably to avoid potential overheating issues for ventilated badly desktop cases of those days, also to avoid creating an additional competitor to EV4. Contained 1.75 mln. transistors, and required 3.3V supply. The design of this processor was licensed to Mitsubishi, and it manufactured LCA4 as well (including a 200MHz version).

21064A (EV45) was announced on Microprocessor Forum in October of 1993. It was a modified EV4, produced using 4-layer 0.5µ CMOS5 process. 21066A (LCA45) was presented on COMDEX'94 in November of 1994; it was LCA4 modified almost the same way as EV4 towards EV45 was. To mention, DEC's marketing people had developed a habit to add a letter to a processor's model name after a redesign for a more advanced technical process. Both the cores of EV45 and LCA45 were changed not so much; I-cache and D-cache of EV45 were doubled in size (16Kb I-cache + 16Kb D-cache), and their data and tag fields gained a parity bit each, branch history fields of I-cache were expanded to 16 bits, D-cache had become 2-way set associative, and 1-bit byte parity mode was added to those existing integrity modes of the system data bus. In addition, both EV45 and LCA45 were awarded with a modified F-box (division optimisation: EV4 could execute in 34 ticks for single-precision operands and in 63 ticks for double-precision operands, with no dependance upon operands' values; EV45 could do the same in 19 to 34 ticks for single-precision operands and in 29 to 63 ticks for double-precision operands, dependable upon operands' values). LCA45 was also manufactured by Mitsubishi. Both the die sizes were decreased, to 164mm² for EV45 and 161mm² for LCA45. The transistors' count increased to 2.85 mln. for EV45, and remained the same for LCA45 (1.75 mln.). Finally, power consumption per tick decreased for both the processors, though voltage didn't change (3.3V). Core frequencies of EV45 were ranging from 200MHz to 300MHz (TDP from 24W to 36W), LCA45 -- from 166MHz to 233MHz.

DEC developed equipment for the Department of Defense, so 21068 66MHz and 21068A 100MHz were introduced in 1994. They derived from LCA4 and LCA45 respectively, advanced for military needs (passive cooling, extreme temperature conditions, etc.).

First chipsets for EV4 featured support for TURBOchannel, FutureBus+, and XMI peripheral buses. Though all of them were high-speed designs for those days (about 100Mb/s per bus), they didn't obtain any significant support, thus a very limited set of peripherals was available for them. So, DEC paid certain attention to industry-standard bus architectures, such as PCI and ISA (EISA). A new chipset was introduced in 1994, DEC Apecs, in two editions: for 64-bit system data bus (21071), and for 128-bit (21072). The difference was, that 21071 consisted of 4 chips (1 universal controller, 2 data slices, 1 PCI bus controller), but 21072 -- of 6 (additionally, 2 data slices). Supported 33MHz system bus frequency, up to 16Mb of B-cache, up to 4Gb of FPM parity memory with access time from 100 to 50ns. Support for ISA or EISA buses could be implemented through use of standard bridges, such as i82378IB (ISA) or i82378EB (EISA).

The first workstation of Alpha architecture was available in November of 1992, DEC 3000 Model 500 AXP (code-named as Flamingo), with EV4 150MHz, 512Kb of B-cache, 32Mb of main memory, 1Gb SCSI HDD, SCSI CD-ROM, built-in 10Mbit Ethernet controller (thick coaxial and twisted pair), built-in sound and ISDN controllers, also a 19" monitor (1280x1024 8-bit). The price was shocking: 38995 USD.


Drawing of DEC Apecs

back to the contents
 

Part 5. EV5, EV56, PCA56, PCA57

DEC had unveiled the very first information about its 2nd generation Alpha processor on Hot Chips conference located in Palo Alto (California), which started on the 14th of August 1994. Though the official release of 21164 (EV5) was dated by the 7th of September 1994, after a respective press release by DEC. The processor was based upon the core of EV45, and was rather an evolution of the latter than a revolutionary new design. The number of pipelines was doubled, both integer and floating-point, when compared to EV4 or EV45. In addition, the floating-point pipelines were transformed to run through 9 stages rather than 10. But, the integer pipelines weren't all the same if compared to each other: while both were capable of elementary arithmetical and logical operations, the 1st only could multiply and shift, and the 2nd only was able to process conditional/unconditional jumps; also, both the pipelines could calculate virtual addresses for load instructions, but the 1st only -- for store. The floating-point pipelines were different as well: the 1st could execute any floating-point code except of multiply instructions, which were the only code the 2nd pipeline could process. I-box was able to fetch and decode up to 4 instructions per tick, to provide the execution units with a necessary load. Was manufactured using the same 4-layer 0.5µ CMOS5 process as EV45, required 3.3V supply, contained 9.3 mln. transistors (including 7.8 mln. for integrated cache areas), utilised a die size of 299mm² -- very close to theoretical limits of the technical process involved. Core frequencies were ranging from 266MHz to 333MHz (TDP from 46W to 56W). Form-factor: IPGA-499 (Interstitial Pin Grid Array).

I-cache and D-cache were sized and organised just like in EV4, i.e. 8Kb each. D-cache remained write-through, but it was made dual-ported, i.e. was able to deliver data for 2 load instructions per tick. Sacrificing transistors for the sake of performance, D-cache was composed physically of 2 identical absolutely parts 8Kb each, so data could be read from either, but must be written to the both. The processor had 96Kb of the integrated L2 cache (S-cache, secondary cache), write-back, 3-way set associative, so C-box was made able of utilising it through a dedicated 128-bit data bus; at the same time, B-cache was also functional (though remained optional, consisted of external cache SRAMs, and could be as large as 64Mb, though usually from 1Mb to 4Mb) -- in other words, EV5 supported 3 cache levels. S-cache was accessed through 4-stage pipeline: two ticks for tag search and modification, and two ticks for data access and delivery; every S-cache line was 64 bytes wide (though was possible to address it as two sublines 32 bytes wide each), and one tag per line. D-cache read latencies were reduced to 2 ticks, and S-cache could deliver data in 7 ticks (like mentioned above, 4 ticks for first 16 bytes, and 1 tick for every next 16 bytes to fill a whole line). Like in EV4, contents of D-cache were doubled, but in S-cache; in turn, B-cache was inclusive to S-cache, regardless of the difference in associativities. I-TLB held 48 entries (for pages sized from 8Kb to 4Mb), D-TLB -- 64 entries, and it had become dual-ported like D-cache. The system data bus was fixed-length at 128 bits (with additional 16 bits for ECC protection), and still multiplexed with the data path to B-cache; the system address bus was 40-bit, the control -- 10-bit.
 
Micrograph of EV5 Floor-plan of EV5

21164A (EV56) was introduced on Microprocessor Forum, in October of 1995. It was a modified release of EV5, after a technology shrink to 0.35µ CMOS6, manufactured at the same factory in Hudson (DEC had invested about 450 mln. USD prior in modernisation). The most important architectural difference was BWX (Byte-Word Extension) -- a set of 6 additional commands to load/store data in 8- or 16-bit quanta; right from the start, Alpha architecture was forced to load/store data in 32- or 64-bit quanta, what caused certain difficulties while porting or emulating code belonging to other processor architectures, such as i386 or MIPS. A request to implement BWX in hardware appeared in June of 1994 by Richard Sites, and was approved in June of 1995. Though to utilise BWX a chipset should be aware of it as well. EV56 was manufactured with core frequencies ranging from 366MHz to 666MHz (TDP from 31W to 55W), starting from the summer of 1996. Also was produced by Samsung under a license agreement signed in June of 1996 (a 666MHz version was shipped from Samsung only). Contained 9.66 mln. transistors, utilised a die size of 209mm², and required dual voltage (2.5V for primary and 3.3V for input-output circuits).
 
Micrograph of EV56
EV56 (front) EV56 (back)

21164PC (PCA56) was introduced on the 17th of March, 1997. It was a low-cost version of EV56 designed by DEC and Mitsubishi in cooperation. S-cache was absent as well as accompanying logic, but I-cache size was increased by factor of two (to 16Kb). Contained 3.5 mln. transistors, utilised a die size of 141mm², and the same technical process and voltage as EV56, though the form-factor did change: IPGA-413 instead of IPGA-499. Core frequencies were ranging from 400MHz to 533MHz (TDP from 26W to 35W). In future, 0.28µ 21164PC (PCA57) was manufactured by Samsung, with I-cache and D-cache doubled in size, and with 2-way set associativity of D-cache. The transistors' count increased to 5.7 mln., but the die size decreased to 101mm². Required lower voltages: 2.0V for primary and 2.5V for input-output logic. Core frequencies were ranging from 533 to 666MHz (TDP from 18W to 23W).

Behind of BWX instructions inherited from EV56, PCA56 supported a new set, MVI (Motion Video Instructions), targeted to accelerate video and audio calculations using SIMD (Single Instruction -- Multiple Data) approach, somewhat comparable to MMX.

The first standard chipset developed for EV5 was DEC Alcor (21171). It supported 33MHz system bus, up to 64Mb of B-cache, up to 8Gb of main memory (FPM ECC, using 256-bit wide memory path), also 64-bit PCI bus (33MHz). Support for ISA or EISA bus could be added through a standard bridge, like before. There was no built-in IDE controller (could be installed separately, using a third-party chip). The chipset consisted physically of 5 chips: 1 universal controller (including PCI bus support), and 4 data switches. After launching EV56 in production a new release of Alcor was completed, with a support for BWX -- Alcor 2 (21172). It was followed soon with Pyxis (21174), a single-chip solution supporting 66MHz system bus and 66MHz SDRAM ECC memory (although, using 128-bit wide memory path). There was also VLSI Polaris developed for PCA57-based systems.

back to the contents
 

Part 6. The Fall of DEC

On the 26th of January, 1998, a news flashed all over the computer world that struggling financially DEC was purchased by Compaq, and the deal was about to be approved by shareholders' meetings of both the companies. DEC's shareholders ratified the agreement on the 2nd of February 1998. The amount of sale was 9.6 mlrd. USD, compared to DEC's estimated market capitalisation of about 7 mlrd. USD. The process of integrating DEC's functional units into Compaq's business structure was finished about half a year later with the DEC's legal end, when its shares were taken off the New York Stock Exchange on the 11th of June, 1998. To mention, negotiations between DEC and Compaq started in 1995, but finished unsuccessfully in 1996 because of DEC high management's position insisting on a merger, not an acquisition. Nevertheless, the question: how could it happen that a huge company (dated by 1989: almost 130 thousand of personnel, gross revenue of about 14 mlrd. USD per year, i.e. the second company in the industry after IBM), having very high R&D potential and significant manufacturing facilities, had to sell itself to a large computer-building company from Texas? There was no definite answer to this question, though reasons mentioned were various. About them, in detail.

A long time ago, Kenneth Olsen, a founder, president and CEO of DEC until almost the end, said that well-engineered products would sell themselves. Thus, have no need in any advertising campaigns or other instruments of market promotion. He also mentioned that there is no reason anyone would want a computer at home. Perhaps, these thoughts were correct in those "old good times", when computer equipment was manufactured in limited quantity by professionals and for professionals, and cost a hefty sum of money per unit. However, not close to the end of the 20th century when they were sold in million units per year, and a very regular computer could be taken together using a screwdriver and parts from the nearest computer shop for an hour maximum, and would cost more than 10 times less than mentioned previously. Finally, to purchase a whole working box right at that shop, with a free delivery. Considering that such a regular machine would be purchased most likely not by a professional manager, realising clearly what TCO (Total Cost of Ownership) means, but by an aunt Marge or a young prankster Johnny, making no difference between a transistor and a resistor, so such customers should be motivated definitely not by engineering advantages of a potential purchase. Mistake #1.

When at the very beginning of Alpha architecture's way, DEC's high management made a great strategic mistake. It was a known fact that first prototypes of EV4 were presented on a computer conference in February 1991. Among others, there were engineers of Apple Computer looking for a new processor architecture to power company's future computers, and they were impressed by advantages of EV4. John Sculley, the Apple's CEO of those days, met with Kenneth Olsen in June of the same year, and offered him to use the DEC's new processor in future Macs. Olsen refused the offer, motivating that the processor was not ready for the market, besides VAX architecture hadn't reached end-of-life yet. Several months later, rumours said that new Macs would be powered by PowerPC processors from the alliance of Apple, IBM and Motorola. William Demmer, a former VP of VAX and Alpha divisions who resigned in 1995, said later in his interview to the Business Week (the 28th of April 1997): "Ken did not want the company's future to run on Alpha." Mistake #2.

DEC manufactured Alpha processors, as well as accompanying chipsets and numerous peripherals, at its own factory in Hudson (Massachusetts). It designed and produced mainboards exclusively for desktops (they were even called so, Evaluation Board or AlphaPC), in a limited assortment though; neither of them supported SMP, though almost all Alpha servers by DEC were multi-processor. Nevertheless, all mainboards were very well-engineered though expensive like Alpha processors. Their layout schemes were available for public access, so several companies (Aspen, Polywell, Enorex, etc.) manufactured fully qualified clones; the only company to develop and produce stand-alone designs was DeskStation. In general, it could be stated for sure that DEC considered a priority to produce workstations and servers of its own, but not to fill the market of computer components for those workstations and servers mentioned. It's possible to survive by following such an approach, but not possible to conquer the market. Mistake #3.

Despite all attempts taken, DEC didn't manage to make pricing of its products (considering processors, chipsets, and mainboards first) affordable for most potential customers. For example, in the beginning of 1995, 266MHz and 300MHz EV5 were offered for 2052 and 2937 USD respectively, in lots of 1000 units -- both the enormous prices, even if taking into account average manufacturing costs (MPR model) of 430 USD per unit. Considering price per one "cacadoo" of SPECint92, EV5 cost about 2 times higher than competitive RISC designs! At the same time, a standard chipset for EV5 (Alcor) was offered much cheaper -- 295 USD in lots of 5000 units, though the only Alcor-based mainboard from DEC (EB164, with 1Mb of B-cache), bundled with a processor and 16Mb of main memory (by the way, that was not enough to run most applications even of those days), carried a list price of about 7500 USD. Mistake #4.

Although Alpha was declared an "open architecture" right from the start, there was no consortium to develop it. All R&D actions were handled by DEC itself, and sometimes in cooperation with Mitsubishi. In fact, though the architecture was free de jure, most important hardware designs of it were pretty much closed de facto, and had to be paid-licensed (if possible at all). So, it wasn't that thing helping to promote the architecture. To mention, soon after introduction of EV4, DEC's high management offered to license manufacturing rights to Intel, Motorola, NEC, and Texas Instruments. But all these companies were involved in different projects and were of very little to no interest in EV4, so they refused. Perhaps, the conditions could be also unacceptable, or something else. Mistake #5.

After all, even the fastest computer without an operating system and accompanying software is just an expensive source of noise and an environmental heater. DEC targeted its Alpha hardware for Windows NT, Digital UNIX, and OpenVMS, following this priority order exactly. Could be not bad, but...

Windows NT was an operating system designed for users when right out-of-the-box, not for programmers (no software development tools supplied), hence dependant heavily upon precompiled applications, commercial notably. In fact, numbers of Alpha-ready and i386-ready software titles were different by times. Although there was FX!32, an excellent emulator and translator of x86 code to Alpha, completed by Anton Chernoff's team in 1996; while being a useful solution itself, couldn't help with performance decrease of 40%, at least, when compared to the same source code compiled natively. Next, there were drivers; FX!32 was of no help there, and considering a fact that very few hardware manufactures honoured Alpha architecture enough to release any of them, users had to rely mostly upon Microsoft and DEC. Finally, Windows NT (3.51 as well as 4.0) was a 32-bit OS regardless of running on 64-bit Alpha hardware, thus was unable of utilising it to the full extent. However, all these issues didn't prevent DEC to promote its Alpha systems with a slogan "Born to run Windows NT". In brief, such an OS shouldn't be positioned as primary for Alpha architecture, though having it available as an option was a big plus for the architecture. Mistake #6.
 
OpenVMS OpenVMS and Digital UNIX (also known as DEC OSF/1, and later as Compaq Tru64 UNIX), two reliable and scalable commercial operating systems by DEC, they didn't obtain any vast popularity because of high prices (for example, over 1000 USD for one copy of Digital UNIX in 1997), and as a result, of closed source code. Though there were other drawbacks available (such as even more limited hardware base supported when compared to Windows NT), if either of these OS was given freedom, including DEC's excellent development tools, it could increase Alpha architecture's market share strongly. Mistake #7. Digital UNIX

 
NetBSD DEC didn't support free open-source operating systems, though the very first of them, NetBSD, was ported to Alpha in 1995, followed by Linux, OpenBSD and FreeBSD. It was strange, at least, because these OS were (and still are) very popular in the Alpha environment, also their market value was obvious to estimate even for those days, and was increasing constantly. Besides, these OS featured no worse performance than commercial Digital UNIX or OpenVMS, and hardware support comparable to Windows NT (much better nowadays), as well as many other benefits you may expect from open-source software. Mistake #8. OpenBSD
Linux FreeBSD

The list of DEC's strategic mistakes could be continued, including a complete disregard paid to the revolution of mass and cheap personal computers, an over-diversified business model, and others less important and unrelated directly to Alpha architecture. Therefore, the final conclusion could be derived, from the author's point of view: DEC had done a real lot of efforts to make as much money as possible with Alpha architecture, but had done almost no efforts to help the architecture itself.

The board of directors, motivated by numerous DEC's failures during the late 1980's and early 1990's, suspended Olsen from managing the company in June of 1992, and appointed Robert Palmer. He did a hard try to reorganise the company's managing model in 1994, turning existing "matrix" model (when departments different functionally were cooperating to make a decision) into traditional "vertical" (with authorities and responsibilities defined clearly from the very top to the very bottom). From 1991 to 1994, DEC's net losses figured into over 4 mlrd. USD, including 2 mlrd. just from July of 1993 to June of 1994 (in turn, including 1.2 mlrd. spent for restructurisation). The number of personnel was reduced to 85 thousand. Accordingly to the Palmer's programme, the company should get rid of many divisions considered non-priority, so the global sale began. In July of 1994, the Storage Business Unit manufacturing disk and tape drives was sold to Quantum for 400 mln. USD, soon after a fiasco of the first models of thin-film hard drives (RA90 and RA92), which entered the market too late because of design flaws, and didn't survive in competition. In August of 1994, the Database Software Unit was sold to Oracle fot 100 mln. USD, also 7.8% share in Italian Olivetti was redeemed for 140 mln. USD. In November of 1997, a deal was arranged to transfer the Network Product Business Unit to Cabletron for 430 mln. USD.

The fall of DEC was loud enough. It sued Intel in May of 1997, accusing in infringements upon 10 patents issued for Alpha architecture while designing Pentium, Pentium Pro and Pentium II processors. Intel started a lawsuit against DEC in September of 1997, claiming its 14 patents to be dishonoured while designing Alpha processors. The peace was reached finally on the 27th of October 1997: both the companies took their complaints back, DEC licensed to Intel manufacturing rights to all its hardware available (except of Alpha segment), also agreed to support future IA-64 architecture, and Intel purchased from DEC the factory in Hudson accompanied with designing centres in Jerusalem (Israel) and Austin (Texas) for 625 mln. USD, also agreed to manufacture DEC's Alpha processors in future. Additionally, an agreement was signed to cross-license their patents for 10 years. The deal was finished on the 18th of May 1998; by that time, Compaq had adopted DEC's primary divisions, including 38 thousand of personnel (compared to 32 thousand of Compaq before the acquisition), though many of them were laid off in the very near future.

Well to mention, not so long before DEC's end and soon after that, many leading engineers who created DEC's realm in fact, left for other employers: Derrick Meyer quit to AMD to design K7; also to AMD, but as a K8's architect, went James Keller; Daniel Leibholz was hired by Sun to create UltraSPARC V; Richard Sites, one of primary Alpha architects during all the previous years, also abandoned the ship. Intel was lucky much less: StrongARM architecture, inherited from DEC, seemed to be at a dead end because no one of those chief architects who designed StrongARM-110 previously, such as Daniel Dobberpuhl, Richard Witek, Gregory Hoeppner and Liam Madden, decided to join the new owner. More about that: Witek's team, which was working in Austin upon the second generation of StrongARM core, resigned completely, so Intel had to design the core literally from the scratch, involving own engineers who were working on i960 before.

back to the contents
 

Part 7. EV6, EV67, EV68C, EV68A

Although 21264 (EV6) processor was developed by DEC, and was mentioned first on Microprocessor Forum in October of 1996, the final silicon implementation was done by February of 1998, when DEC was in process of liquidation. The processor itself was a significant step forward when compared to EV5, revolutional in many aspects. One of the most important innovations was out-of-order execution, which implied a fundamental core redesign, and lowered functional units' dependancy upon cache and main memory's bandwidth. EV6 could reorder up to 80 instructions on the fly, and that was much more than other competitive products could (say, Intel P6 architecture utilised out-of-order execution for up to 40 [micro-commands], HP PA-8x00 -- up to 56, MIPS R12000 -- up to 48, IBM Power3 -- up to 32, but PowerPC G4 -- up to 5; Sun UltraSPARC II, developed by that moment, didn't support instruction reordering at all). Out-order-execution was supported by register renaming technique, so there were 48 integer and 40 floating-point additional physical registers implemented (the number of logical registers, also referred as programmable, remained unchanged).

The number of integer pipelines was increased to 4 (organised in 2 clusters), but they were somewhat different functionally: the 2nd pipeline was capable of multiplying (7 ticks per instruction) and shifting (1 tick), the 4th of executing MVI code (3 ticks) and shifting; besides, all 4 pipelines supported elementary arithmetical and logical operations (1 tick). Every cluster possessed an integer register file of its own (80 entries, like mentioned above), but they were identical (synchronised). The 1st and the 3rd pipelines also handled some tasks of A-box, by calculating virtual addresses for load/store instructions. A-box itself worked with I-TLB and D-TLB (128 entries each), load and store queues (32 commands each), and 8 64-byte buffers (miss address file) for transactions with B-cache and main memory. Floating-point pipelines also were different functionally: the 1st supported adding (4 ticks), dividing (12 ticks for single-precision and 15 ticks for double-precision), square root calculating (15 and 30 ticks), but the 2nd was only capable of multiplying (4 ticks). By the way, square root calculating unit and relative instructions were new to Alpha architecture. Like before in EV5, decoder was able to process up to 4 instructions per tick, and scheduler separated them for 2 queues: to integer pipelines (I-queue, 20 commands), and floating-point pipelines (F-queue, 15 commands). Behind of square root calculations, prefetch instructions were implemented, also commands to transfer data between integer and floating-point registers.

C-box was redesigned significantly, and was capable of supporting only 2 cache levels. The integrated L1 consisted of 64Kb I-cache and 64Kb D-cache, both 2-way set associative and with 64-byte lines. D-cache was write-back, though still was doubled in B-cache. Because of large size and more complicated associativity policy, D-cache read/write latencies were increased to 3 ticks (to/from an integer register) and 4 ticks (to/from a floating-point register). D-cache remained dual-ported, though unlike in EV5 it wasn't of 2 equal synchronised parts, but of a single part clocked at double the core frequency. External B-cache of 1Mb to 16Mb, direct-mapped, write-back, utilised an independant 128-bit bidirectional data bus (with an additional 16-bit ECC protection), and also an independant 20-bit unidirectional address bus; consisted of LW SSRAM chips (late write), or later of DDR SSRAM units (double data rate); B-cache's speed was programmable, from 2/3 to 1/8 of a core frequency, and unlike for the previous generations of Alpha processors, B-cache itself wasn't optional. The system data bus was only 64-bit wide (with an additional 8-bit ECC protection), bidirectional, but utilised DDR technique. The system address bus was 44-bit wide, implemented physically through two 15-bit unidirectional paths, with no DDR support. The system control bus was 15-bit, also no DDR. The basic working principle of the system bus was changed, so the bus became dedicated (instead of shared), thus every processor possessed an own path to a chipset.

The branch prediction logic was redesigned completely. It followed a 2-level scheme: with a local history table of 1024 records 10-bit each, and a local predictor of 1024 records 3-bit each, also with a global predictor of 4096 records 3-bit each, and a history path of 12 bits. Both the algorithms worked independantly, and if the local traced every branch detected, so the global traced sequences of branches. The chooser analysed results of both the algorithms, and made conclusions to a separate choice predictor of 4096 records 2-bit each, which was the source of a preferred decision if the predictions were different. Such a cooperative approach allowed achieving better results than any of both of them if used stand-alone.

While engineering EV6, considering a large number of advanced functional units and other complications, the clock subsystem was redesigned entirely. More efficient signal flow allowed the core to reach frequencies of much simpler core of EV56, involving the same technical process. Overall, power consumed by the clock subsystem of EV6 was about 32% of the total core power; to compare, for EV56 it was about 25%, for EV5 -- about 37%, for EV4 -- about 40%.


Clock driver placements for Alpha CPUs

EV6 was manufactured physically using the same 6-layer 0,35µ CMOS6 process, like EV56, and consisted of 15.2 mln. transistors (including about 9 mln. spent for I-cache, D-cache, and branch predictors), utilised a die size of 314mm², and required 2.1V to 2.3V supply. The core frequencies were ranging from 466MHz to 600MHz (TDP approx. from 80W to 110W). Form-factor: PGA-587 (Pin Grid Array).
 
Micrograph of EV6 Floor-plan of EV6
EV6 (front) EV6 (back)

21264A (EV67) entered the market in the end of 1999. Produced by Samsung using 0.25µ CMOS7 process, with a die size of 210mm², and requiring a lower supply of 2.0V. No architectural differences compared to EV6. The core frequencies were ranging from 600MHz to 833MHz (TDP approx. from 70W to 100W), what allowed Alpha to bring back the leadership on integer tasks, which was lost not so much time ago to Intel and AMD processors.

The first samples of 21264B (EV68C) were delivered in the beginning of 2000. Produced by IBM using 0.18µ CMOS8 process involving copper conductors. Despite of absence of any architectural differences still, the promising technology allowed to rise the core frequencies right to 1250MHz. In 2001, Samsung was able to manufacture in series 21264B (EV68A) using own 0.18µ process, but involving aluminium conductors, reducing the die size to 125mm² and voltage to 1.7V, and placing the core frequencies between 750MHz and 940MHz (TDP approx. from 60W to 75W).
 
Prototype of EV68A (front) Prototype of EV68A (back)

Different sources mention 21264C and 21264D, code-named as EV68CB and EV68DC, manufactured by IBM using the same technology as EV68C, and running at the same frequencies, so they could be considered as minor modifications. The only noticeable difference was a new form-factor, "pinless" CLGA-675 (Ceramic Land Grid Array) instead of PGA-587.

There were 2 chipsets designed for processors of 21264 series: DEC Tsunami (21272; also known as Typhoon) and AMD Irongate (AMD-751), though could be many more, taking into account that both 21264 and Athlon used almost the same system bus (it was licensed by AMD from DEC).

DEC Tsunami was a highly scalable chipset. It could be used to design single-processor, as well as dual-processor and quad-processor systems, with the memory path ranging from 128 to 512 bits wide (SDRAM ECC registered, 83MHz), and supporting from one to several PCI buses (64-bit, 33MHz). Such a flexibility was reached because of chipset separation for components: system bus controllers (C-chips, one per processor), memory bus controllers (D-chips, one per every 64 bits of bus width), and PCI bus controllers (P-chips, one per bus needed). So there is no wonder that some systems (for example, AlphaPC 264DP) had chipsets consisting of 12 chips...

Although AMD Irongate was developed to serve as a north bridge for Athlon-based mainboards, it was used in some Alpha mainboards (for example, in UP1000 and UP1100). Being a single-chip solution, it cost much less than DEC Tsunami, and consumed much less energy. However, it wasn't the best solution for 21264, because lacked multi-processor support and had a narrow memory path (64-bit, SDRAM ECC unbuffered, 100MHz). Nevertheless, Irongate was the first chipset for Alpha to feature AGP bus support. And the last.

back to the contents
 

Part 8. The Epoch of Compaq

In fact, Compaq purchased the rests of DEC because of significant assembling facilities, a wide distributional network (in 98 countries), and that cross-licensing agreement with Intel (for example, allowing to manufacture 8-processor Profusion servers). As it seemed to be, the division developing Alpha architecture wasn't welcome really: Compaq produced workstations and servers based upon Intel's processors for a very long time, and also paid a high attention to AMD's processors. So, in June of 1998, Compaq established an alliance with Samsung to develop the architecture (to mention, DEC and Samsung signed an agreement in February of 1998, which gave a full access to all Alpha-related patents to the latter, allowed to manufacture Alpha processors developed already, and even to design new ones on Samsung's own). A new company was incorporated mutually, API (Alpha Processor Inc.), to promote the architecture (some ones seemed to make right conclusions based upon DEC's history). In the summer of 1998, EV6-based systems entered mass production stage featuring the best price/performance ratios compared to other competing products available on the market. Serious problems with future Itanium from Intel were reasonable enough to conclude that the situation described would remain unchanged in near future. Outside of Samsung, EV6 processors were manufactured by Intel using Fab-6 in Hudson, accordingly to the final agreement with late DEC...

Year 1999 was unsuccessful to Compaq because of falling sales on the market of personal computers. The most frequently named reason was an underestimation of possibilities given by the Internet to promote and sell PCs, unlike Dell, which adapted its business model and offered computer equipment priced most attractively among all top brands. The Compaq's CEO, Eckhard Pfeiffer, resigned after a financial disaster in the 1st quarter of 1999. Trying to reduce losses, Compaq started to minimise its presence in certain areas, and that affected Alpha systems: in May of 1999, an assembling line of AlphaServers in Salem (New Hampshire) was announced to shut down soon.

On the 23rd of August, 1999, a notorious event took place: Compaq announced to discontinue participating in development of Windows NT, and stopped to supply this OS with Alpha systems of its own, and, in fact, laid off a team of about 120 programmers from former Western Research Laboratory of DEC (DECwest) working on this project. Accordingly to Compaq's statistics, among all preinstalled OS on Alpha systems Tru64 held a share of 65%, OpenVMS -- of 35%, and Windows NT just about of 5%, so there was no reason to keep flogging a dead horse. A week after, Microsoft announced in return that there would be no Windows 2000 for Alpha released. Considering a fact that support of PowerPC and MIPS architectures was abandoned by Microsoft in 1997, the future of a "universal OS" was tied to a single architecture if to discount IA-64...

In December of 1999, Compaq and Samsung signed a memorandum to support the leadership of Alpha architecture in near future. Both the sides agreed to invest 500 mln. USD into the architecture (Samsung was obligated to spend 200 mln. USD while developing and tuning new technical processes, and Compaq was supposed to spend 300 mln. USD while designing new server solutions and developing Tru64 UNIX further). In addition, during the same month Compaq and IBM agreed that the latter would manufacture Alpha processors using a copper-conductor technology of its own, upon completion. At the same time, Samsung was granted to remain a primary supplier of Alpha processors. The year passed not so good for Compaq, illustrated well by a price per share delta: from 51 USD in February to 28 USD in December. Though many analysts stated it could be worse.

Y2K passed for Compaq quietly. Samsung wasn't able to tune its 0.18µ process unlike IBM, which started to supply EV68C to Compaq in limited quantities, and the market had to enjoy considerably slow EV67. Development of 21364 (EV7, also known as Marvel) was delayed, though 21464 (EV8, also known as Araña) was mentioned here and there. The fall of dot-coms affected Compaq's shares, which dropped in price to 15 USD per share by December, i.e. for 44% since January. Could be strange, but that was a good result; other companies, more dependable upon e-commerce, lost much more: Gateway -- 75%, Apple -- 71%, Dell -- 65%. Dot-coms themselves were either bankrupts, or close to that; Yahoo.com lost 95% of its market value, Priceline.com -- 97%.

In the beginning of 2001, Samsung started to manufacture EV68A in quantity, but the right moment had gone. Compaq planned to ship EV68C-based systems (GS-class AlphaServers), and also to modernise those already in production. EV7 was still somewhere there when something happened not expected at all: on the 25th of June, 2001 ("black Monday"), Compaq proclaimed to transfer all its server solutions from Alpha to IA-64 architecture by 2004. EV8 was cancelled immediately (though some details about its internals were available on Microprocessor Forum in October of 1999), and EV7 was scheduled for release not earlier than the beginning of 2002. Afterwards, the Alpha Microprocessor Division had to be disbanded, and most of its personnel should be employed by Intel. Samsung and IBM ceased producing Alpha processors soon. Later, the situation became even more interesting: on the 3rd of September 2001, Hewlett-Packard announced its intentions to acquire Compaq, which experienced certain financial difficulties, and its price-per-share value was of 10 USD in December of 2001. The deal was approved by shareholders' meetings of both the corporations, as well as by the governments of the USA and Canada, and finished in May of 2002.

On the 21st of October 2001, API (renamed by that moment to API NetWorks) transferred all rights to support (including warranty service) Alpha systems to Microway, the largest [after Compaq] builder of Alpha workstations and servers, an old partner of late DEC. API itself left the market of Alpha products, and concentrated its efforts on network technologies, development of HyperTransport bus, and data storage systems.

As a conclusion, it could be said that though Compaq didn't follow many of those mistakes made by DEC before, it didn't unveil all power of the architecture. High-performance Alpha systems based upon 21264A and 21264B didn't hit the price tag of 2000 USD, and low-cost 21264PC never appeared. A possibility of producing low-priced mainboards in volume using AMD Irongate was ignored, and pricey DEC Tsunami (offered by Compaq for over 1000 USD per set in OEM lots) left no chance to Alpha systems to enter the mid-range computer category. Other chipset-manufacturers for AMD Athlon didn't adapt them for 21264, though VIA had such an intention initially.

back to the contents
 

Part 9. EV7, EV79, EV7z, EV8

The first news about the architecture of 21364 (EV7) was from Microprocessor Forum in October of 1998; it said, that the processor would be based upon the core of EV6, but with Direct Rambus DRAM controller (presumably, 4-channel), and a L2 cache (1.5Mb 6-way set associative), both integrated. It was also mentioned that no differences in the core of EV6 were planned, though could be another reason: no one could handle this hard task, because not so many developing engineers were employed by Compaq. The design was expected to be completed by 2000.

HP inherited Alpha architecture after the acquisition of Compaq, though didn't need that bequest in fact, because developed own 64-bit PA-RISC architecture (Precision Architecture RISC), and held the alliance with Intel to develop IA-64 architecture (i.e. Itanium). So HP's actions regarding Alpha architecture were limited to selling EV6/EV67/EV68-based servers inherited from Compaq, and launching EV7 into production, presented finally in January of 2002.

Like expected, EV7 contained the core of EV68 (unchanged absolutely), and several units integrated additionally: two memory controllers (two Z-boxes, for Direct Rambus DRAM PC800), a multi-functional router (R-box, for multi-processor support and networking), and a full-speed L2 cache (S-cache, 1.75Mb 7-way set associative). The data path to S-cache was 128-bit wide, and the cache itself worked with significant latencies (12 ticks while reading). Both Z-boxes and R-box were clocked at 2/3 the core frequency. Memory channels' speed depended upon Z-boxes and was 1/2 their frequency (1/3 the core frequency, respectively), but utilised DDR technology.

Every Z-box supported 5 memory channels (4 primary and 1 auxiliary), 18-bit wide each (16 for commands/data/addresses, 2 for ECC). The auxiliary channel was optional, and could be used to organise a failure-tolerant array in memory (roughly speaking, like RAID3); for example, when writing a quad-word (64 bits) to memory it was divided for 4 words (16 bits), each of them sent through a dedicated channel, and the auxiliary was used to store a checksum. In addition, every Z-box could have up to 1024 memory pages opened. The total theoretical memory bandwidth of one EV7 was about 12Gb/s. Obviously, since every EV7 in a multi-processor system had a memory area of its own, such a memory model was called NUMA (Non-Uniform Memory Access), in contrary to traditional SMP (Symmetric Multi-Processing), which implied all processors installed to have access to a single (common) memory area. Thus, every processor in a system (128 maximum) could access memory through controllers of its own as well as through other processors' controllers. R-box carried a communicative function between processors, also between a particular processor and local peripherals. It supported 4 independant channels with a theoretical bandwidth of 6Gb/s each (one per every neighbourous processor connected), also 1 additional channel for high-speed input/output transfers.

Because EV7 inherited internally all the interfaces of EV6, so the processor should have implemented a unit able to handle the system bus interface of EV6. Although this part of the processor's design was nowhere documented or even mentioned, some assumptions about its performance could be made. Since the minimal operating bus multiplier supported by EV6 was equal to 3, so the theoretical bandwidth of this core-unit interface was 3Gb/s for EV7, or 4 times less than both Z-boxes could deliver. Frankly speaking, another pro argument that EV7 was designed for high-end multi-processor systems right from the start.

EV7 processors could be connected to each other using various algorithms, but so-called "torus" and "shuffle" interconnects were implemented in real hardware. In addition, the second one was more effective potentially in some situations (for example, considering 8-processor systems, "shuffle" allowed each processor to be connected straightforward to 4 others, when "torus" -- to 3 others only; a good guess that for 12-processor systems this difference vanished).


Interconnections of 8-way EV7 systems

Was manufactured using 7-layer 0.18µ CMOS8 process, consisted of 152 mln. transistors (including 137 mln. for I-cache, D-cache and S-cache), and therefore a very large die size (397mm²). Prototypes were clocked at 1250MHz (TDP of 155W), though those processors installed in systems produced by HP were running at 1000MHz to 1150MHz. From an engineering point of view, EV7 couldn't stay on par with the previous representatives of Alpha architecture considering density of functional units placed on a die, what affected the maximal core frequencies reachable, S-cache's latencies, and, in turn, performance.


The floor-plan of EV7

In December of 2002, HP let go out a press release saying that first EV7-based servers would be available in January of 2003. Later, EV79 ought to be produced (using 0.13µ SOI process), and there were no further Alpha processors supposed. In March of 2003, a prototype of EV79 was observed on ISSCC'2003, with a die size of 251mm², requiring 1.2V supply, and clocked at 1450MHz (TDP of 100W). But in October of 2003 a news about manufacturing problems sneaked out of IBM, and half a year after the processor was cancelled finally.

In August of 2004, the last Alpha processor was announced, EV7z clocked at 1300MHz, using the same 0.18µ process. Like EV7, it was decided to be installed into HP's products only; also there was mentioned that servers and workstations of Alpha architecture were subjects for sale until 2006, and for support until 2011, but no more.

21464 (EV8), cancelled, was supposed to be the successor to EV7, with the number of primary functional units doubled (8 integer and 4 floating-point pipelines), with 3Mb S-cache. There should be implemented a new technology, SMT (Simultaneous Multi-Threading), which meant a concurrent execution of up to 4 software streams inside of one core (maybe, this technology was related somehow to HyperThreading by Intel). The die size was estimated at 420mm² if 0.13µ SOI process was used.

back to the contents
 

Epilogue

At the moment of writing (April of 2005) Alpha systems were still offered, mostly through HP and Microway. The latter even listed relatively inexpensive workstations based upon 21164A and AlphaPC 164LX for Linux (2000 USD for a standard configuration). Many retired, but still working workstations and servers, as well as their parts, were offered through "online flea markets"; most of those systems were working under Windows NT, and many of them would not accept neither Digital UNIX nor OpenVMS, and some even *BSD (systems with no SRM console available), though it could be still possible to install Linux from ARC/AlphaBIOS. If you have some intention to purchase an Alpha system, clarify this question before giving money, unless you feel having a lack of problems.

Accordingly to statistics, DEC and Compaq sold about 800 thousand Alpha workstations and servers until June of 2001. There is no exact number how many systems have been assembled and sold by others, but estimated to be over 500 thousand.

Many people say that Alpha architecture has died on its own. Hope, after passing through this article you will have no doubts that it has been buried. Alive. Because it has been more profitable to do so.

There were many cases in history when a poorly crafted product prevailed over a better one. Maybe, the first product cost much less than the second. Also possible, the second product was promoted passively too much. Or license fees were incomparable. Everything could be possible. Some would admit that marketing boys and girls while promoting some goods, understanding well their poor functionality, exert themselves to the utmost extent, realising clearly, that their next salary could be the last paid off otherwise.

Life goes on...

back to the contents


Powered by Digital

 

Literature:

1. Rich Witek, Dick Sites. Alpha Architecture Technical Summary, 1992.

2. Richard L. Sites. Alpha AXP Architecture, Digital Technical Journal, Vol. 4, No. 4, Special Issue, 1992.

3. Daniel W. Dobberpuhl, and others. A 200-MHz 64-bit Dual-issue CMOS Microprocessor, Digital Technical Journal, Vol. 4, No. 4, Special Issue, 1992.

4. Edward McLellan. The Alpha AXP Architecture and 21064 Processor, IEEE Micro, 1993.

5. Dina L. McKinney, and others. Digital's DECchip 21066: The First Cost-focused Alpha AXP chip, Digital Technical Journal, 1994.

6. Robert Couranz. The E2COTS System and Alpha AXP Technology: The New Computer Standard for Military Use, Digital Technical Journal, Vol. 6, No. 2, 1994.

7. Samyojita A. Nadkarni, and others. Development of Digital's PCI Chip Sets and Evaluation Kit for the DECchip 21064 Microprocessor, Digital Technical Journal, Vol. 6, No. 2, 1994.

8. Linley Gwennap. Digital Leads the Pack with 21164, Microprocessor Report, Vol. 8, No. 12, 1994.

9. William J. Bowhill, and others. Circuit Implementation of a 300-MHz 64-bit Second-generation CMOS Alpha CPU, Digital Technical Journal, Vol. 7, No. 1, 1995.

10. David P. Hunter, Eric B. Betts. Measured Effects of Adding Byte and Word Instructions to the Alpha Architecture, Digital Technical Journal, Vol. 8, No. 4, 1996.

11. Linley Gwennap. Digital, MIPS Add Multimedia Extensions, Microprocessor Report, Vol. 10, No. 15, 1996.

12. Daniel Leibholz, Rahul Razdan. The Alpha 21264: A 500 MHz Out-of-Order Execution Microprocessor, Proceedings of of IEEE COMPCON'97, 1997.

13. Michael K. Gowan, Larry L. Biro, Daniel B. Jackson. Power Considerations in the Design of the Alpha 21264 Microprocessor, DAC 98, June 15-19, 1998.

14. Linley Gwennap. Compaq, Intel Fight Digital Brain Drain, Microprocesor Report, Vol. 12, No. 14, October 26, 1998.

15. Linley Gwennap. Alpha 21364 to Ease Memory Bottleneck, Microprocessor Report, Vol. 12, No. 14, October 26, 1998.

16. M. Matson, and others. Circuit Implementation of a 600 MHz Superscalar RISC Microprocessor, Compaq Technology Journal, 1998.

17. Chart Watch: Workstation Processors, Microprocessor Report, May 10, 1999.

18. Daniel W. Bailey. High-Performance Alpha Microprocessor Design, Compaq Computer Corporation, 1999.

19. Exploring Alpha Power for Technical Computing, Compaq Technology Brief, April 2000.

20. Zarka Cvetanovic. Performance Analysis of the Alpha 21364-based HP GS1280 Multiprocessor, Hewlett-Packard Corporation, 2002.

21. Kevin Krewell. Alpha EV7 Processor: A High-Performance Tradition Continues, Microprocessor Report, April 5, 2002.

22. Ronald P. Preston. Design of an 8-wide Superscalar RISC Microprocessor with Simultaneous Multihreading, Compaq Computer Corporation, ISSCC Report, 2002.

23. Peter N. Glaskowsky. Moore, Moore, and More at ISSCC, Microprocessor Report, March 23, 2003.

Many technical documents by DEC and Compaq have been used; this article wouldn't be complete without them.

The author pays a credit to Wikipedia for information about DEC's early history as well as products of those old days, also to Terry Shannon for his regular and informative newsletter "Shannon Knows {DEC, Compaq, HPC}"

This paper contains information collected from many unofficial Internet-resources, the full list of which is long too much to be placed below; a big and sincere appreciation goes to all their authors, for especially interesting facts, comments, points of view, etc.

The photographs of EV4 and EV6 are courtesy of cpu-collector.com

A special credit for extremely useful notes and suggestions made while preparing this article goes to ISA_user, VLev, Yury_Malich, Stranger_NN, and of course, to matik!

Copyright (c) Paul V. Bolotoff, 2005. All rights reserved.
A full or partial reprint without a reference given to the original is prohibited.