Chris Rowen, President and CEO, Tensilica, Inc.
Designing SOCs with Configured Cores: Unleashing the Tensilica Xtensa and Diamond Coresby Steve Leibson
Microprocessor cores used for SOC design are the direct descendents of Intel’s original 4004 microprocessor. Just as packaged microprocessor ICs vary widely in their attributes, so do microprocessors packaged as IP cores. However, SOC designers still compare and select processor cores the way they previously compared and selected packaged microprocessor ICs. The big problem with this selection method is that it assumes that the laws of the microprocessor universe have remained unchanged for decades. This assumption is no longer valid.
Processor cores for SOC designs can be far more plastic than microprocessor ICs for board-level system designs. Shaping these cores for specific applications produces much better processor efficiency and much lower system clock rates. Together, Tensilica’s Xtensa and Diamond processor cores constitute a family of software-compatible microprocessors covering an extremely wide performance range from simple control processors, to DSPs, to 3-way superscalar processors. Yet all of these processors use the same software-development tools so that programmers familiar with one processor in the family can easily switch to another.
This book emphasizes a processor-centric MPSOC (multiple-processor SOC) design style shaped by the realities of the 21st-century and nanometer silicon. It advocates the assignment of tasks to firmware-controlled processors whenever possible to maximize SOC flexibility, cut power dissipation, reduce the size and number of hand-built logic blocks, shrink the associated verification effort, and minimize the overall design risk.
· An essential, no-nonsense guide to the design of 21st-century mega-gate SOCs using nanometer silicon.
· Discusses today's key issues affecting SOC design, based on author's decades of personal experience in developing large digital systems as a design engineer while working at Hewlett-Packard's Desktop Computer Division and at EDA workstation pioneer Cadnetix, and covering such topics as an award-winning technology journalist and editor-in-chief for EDN magazine and the Microprocessor Report.
· Explores conventionally accepted boundaries and perceived limits of processor-based system design and then explodes these artificial constraints through a fresh outlook on and discussion of the special abilities of processor cores designed specifically for SOC design.
· Thorough exploration of the evolution of processors and processor cores used for ASIC and SOC design with a look at where the industry has come from, and where it's going.
· Easy-to-understand explanations of the capabilities of configurable and extensible processor cores through a detailed examination of Tensilica's configurable, extensible Xtensa processor core and six pre-configured Diamond cores.
· The most comprehensive assessment available of the practical aspects of configuring and using multiple processor cores to achieve very difficult and ambitious SOC price, performance, and power design goals.
Chris Rowen, President and CEO, Tensilica, Inc.
Read an Excerpt
Designing SOCs with Configured CoresUnleashing the Tensilica Xtensa and Diamond Cores
By Steve Leibson
MORGAN KAUFMANNCopyright © 2006 Elsevier Inc.
All right reserved.
Chapter OneIntroduction to 21st-Century SOC Design
The past is prologue for the future —common saying, frequently ignored
Systems-on-chips (SOCs) are, by definition, electronic systems built on single chips. Also by definition, every SOC incorporates at least one microprocessor. Some SOCs use two or three microprocessors to accomplish required tasks and a few SOC designs employ many dozens of processors. To see how SOC design got where it is today and where market and technological forces will take it tomorrow, we start by first looking at how electronic system design has evolved since the microprocessor's introduction.
1.1 THE START OF SOMETHING BIG
The course of electronic systems design changed irreversibly on November 15, 1971, when Intel introduced the first commercial microprocessor, the 4004. Before that date, system design consisted of linking many hardwired blocks, some analog and some digital, with point-to-point connections. After the 4004's public release, electronic system design began to change in two important ways.
First, and most obvious, was the injection of software or firmware into the system-design lexicon. Prior to the advent of the microprocessor, the vast majority of system designers had only analog and digital design skills. If they had learned any computer programming, it was used for developing design-automation aids or simulation programs, not for developing system components. After the Intel 4004's introduction, system developers started to learn software programming skills, first in assembly language and then in high-level languages such as PL/1, Pascal, and C as compilers got better and memory became less expensive.
The other major change to system design caused by the advent of the microprocessor—a change that's often overlooked—is the use of buses to interconnect major system blocks. Figure 1.1 shows a block diagram of a Hewlett-Packard 3440A digital voltmeter that was designed in 1963, eight years before the microprocessor appeared. This block diagram, typical of the era, shows a mixture of analog and digital elements interconnected with point-to-point connections. Even the all-digital measurement counter, which stores the result of the analog-to-digital conversion, consists of four independent decade counters. Each decade counter communicates to the next counter over one wire and each counter drives its own numeric display. There are no buses in this design because none are needed. Note that there are no microprocessors in this design either. Microprocessors wouldn't appear for another eight years.
Figure 1.2 illustrates how a system designer might implement a digital voltmeter like the HP 3440A today. A microprocessor controls all of the major system components in this modern design implementation. One key change: the processor communicates with the other components over a common bus—the microprocessor's main bus.
From a systems-design perspective, there are significant differences between the digital voltmeter design from the early 1960s and the modern implementation. For the purposes of a systems-level discussion, the most significant difference is perhaps the massive amount of parallelism occurring in the early 1960s design versus the modern design's ability to perform only one operation at a time over the microprocessor bus. For example, if the microprocessor in Figure 1.2 is reading a word from RAM or ROM over the bus, it cannot also be reading a word from the A/D converter at the same time. The processor bus is a shared resource and can only support one operation at a time.
This loss of concurrent operation arose because of the economics of microprocessor packaging and printed-circuit board design. It's less expensive to use buses to move data into and out of packaged microprocessors and it's often much easier to route buses on a circuit board than to accommodate multiple point-to-point connections.
The consequence of creating a shared resource like a microprocessor bus is that the use of the shared resource must be multiplexed in time. As a result of the multiplexed operation, the operating frequency of the shared resource must increase to accommodate the multiple uses of the shared resource—the bus in this case. When the signal frequencies are low to start with, as they are for a voltmeter design that's more than 40 years old, then the final operating frequency of the shared resource places little strain on the system design.
However, as the various tasks performed by a system become more ambitious causing the work to be done every clock cycle to increase, the aggregated requirements for all of the tasks begin to approach the bandwidth limit of the shared resource. As this happens, system-design margins shrink and then vanish. At that point, the system design jumps the threshold from marginal to faulty.
As a reminder: the point-to-point architecture of the original HP 3440A digital voltmeter concurrently operated all of the various systems blocks, which means that there's a lot more design margin in the interconnection scheme than for the microprocessor-based version of the system design. This loss of design margin is an engineering tradeoff and it undoubtedly reduces the implementation cost of the design, as long as no future performance increases are envisioned that would further reduce design margin.
1.2 FEW PINS = MASSIVE MULTIPLEXING
As shown in Figure 1.3, the 4-bit Intel 4004 microprocessor was packaged in a 14-pin dual-inline package (DIP). Consequently, this microprocessor's 4-bit bus not only multiplexed access to the various components in the system, it also had to multiplex the bus-access addresses with the data on the same four wires. It took three clock cycles to pass a 12-bit address out over the bus and two or four more clock cycles to read back an 8- or 16-bit instruction. All instructions came from ROM in a 4004-based system. RAM accesses were even slower because they required one instruction to pass out the target address and then a second instruction to read data from or write data to the selected location. With a maximum operating frequency of 740 kHz and long, multi-cycle bus operations, the Intel 4004 microprocessor was far too slow to take on many system control tasks and the electronics design community largely ignored the world's first microprocessor.
The world's second commercial microprocessor, Intel's 8008 introduced in April, 1972, was not much better than the 4004 processor in terms of bus bandwidth. The 8008 microprocessor's 8-bit bus needed two cycles to pass a 14-bit address and one to three cycles to accept an 8- to 24-bit instruction. The 8008 microprocessor had a wider, 8-bit bus that Intel squeezed into an unconventional 18-pin DIP (shown in Figure 1.4). Although the instruction times for the 4004 and 8008 microprocessors were similar (10.5 µsec versus 12.5 to 20 µsec, respectively), the 8008 microprocessor's RAM accesses were faster than those of the Intel 4004 processor because the Intel 8008 processor used a more conventional RAM-access cycle that output an address and then performed the data transaction during the same instruction cycle. The 8008 microprocessor ran at clock rates of 500–800 kHz. Consequently, like its predecessor, it was too slow to fire the imagination of many system designers.
1.3 THIRD TIME'S A CHARM
Intel finally got it right in April, 1974 when the company introduced its third microprocessor, the 8-bit 8080. The 8080 microprocessor had a non-multiplexed bus with separate address and data lines. Its address bus was 16 bits wide, allowing a 64-Kbyte address range. The data bus was 8 bits wide. As shown in Figure 1.5, Intel used a 40-pin DIP to house the 8080 microprocessor. This larger package and the microprocessor's faster 2-MHz clock rate finally brought bus bandwidth up to usable levels. Other microprocessor vendors such as Motorola and Zilog also introduced microprocessors in 40-pin DIPs around this time and system designers finally started to adopt the microprocessor as a key system building block.
1.4 THE MICROPROCESSOR: A UNIVERSAL SYSTEM BUILDING BLOCK
Over the next 30 years, microprocessor-based design has become the nearly universal approach to systems design. Once microprocessors had achieved the requisite processing and I/O bandwidth needed to handle a large number of system tasks, they began to permeate system design. The reason for this development is simply engineering economics. Standard microprocessors offered as individual integrated circuits (ICs) provide a very economical way to package thousands of logic transistors in standard, testable configurations. The resulting mass-produced microprocessor ICs have become cheap, often costing less than $1 per chip, and they deliver abilities that belie their modest cost.
Microprocessors also dominate modern electronic system design because hardware is far more difficult to change than software or firmware. To change hardware, the design team must redesign and re-verify the logic, change the design of the circuit board (in pre-SOC times), and then re-run any required functional and environmental tests. Software or firmware developers can change their code, recompile, and then burn new ROMs or download the new code into the existing hardware.
In addition, a hardware designer can design a microprocessor-based system and build it before the system's function is fully defined. Pouring the software or firmware into the hardware finalizes the design and this event can occur days, weeks, or months after the hardware has been designed, prototyped, verified, tested, manufactured, and even fielded. As a consequence, microprocessor-based system design buys the design team extra time because hardware and firmware development can occur concurrently, which telescopes the project schedule (at least under ideal conditions).
With the advent of practical 8-bit microprocessors in the mid-1970s, the microprocessor's low cost and high utility snowballed and microprocessor vendors have been under great pressure to constantly increase their products' performance as system designers think of more tasks to execute on processors. There are some obvious methods to increase a processor's performance and processor vendors have used three of them.
The first and easiest performance-enhancing technique used was to increase the processor's clock rate. Intel introduced the 8086 microprocessor in 1978. It ran at 10 MHz, five times the clock rate of the 8080 microprocessor introduced in 1974. Ten years later, Intel introduced the 80386 microprocessor at 25 MHz, faster by another factor of 2.5. In yet another ten years, Intel introduced the Pentium II processor at 266 MHz, better than a ten times clock-rate increase yet again. Figure 1.6 shows the dramatic rise in microprocessor clock rate over time.
Note that Intel was not the only microprocessor vendor racing to higher clock rates. At different times, Motorola and AMD have also produced microprocessors that vied for the "fastest clock rate" title and Digital Equipment Corporation (DEC) hand-tuned its series of Alpha microprocessors to world-beating clock rates. (That is until Compaq bought the company in 1998 and curtailed the Alpha development program. HP then acquired Compaq in late 2001.)
At the same time, microprocessor data-word widths and buses widened so that processors could move more data during each clock period. Widening the processor's bus is the second way to increase processing speed and I/O bandwidth. Intel's 16-bit 8086 microprocessor had a 16-bit data bus and the 32-bit 80386 microprocessor had a 32-bit data bus.
The third way to increase processor performance and bus bandwidth is to add more buses to the processor's architecture. Intel did exactly this with the addition of a separate cache-memory bus to its Pentium II processor. The processor could simultaneously run separate bus cycles to its high-speed cache memory and to other system components attached to the processor's main bus.
As processor buses widen and as processor architectures acquire extra buses, the microprocessor package's pin count necessarily increases. Figure 1.7 shows how microprocessor pin count has increased over the years. Like the rising curve plotted in Figure 1.6, the increasing pin count shown in Figure 1.7 is a direct result of system designers' demand for more processor performance.
Excerpted from Designing SOCs with Configured Cores by Steve Leibson Copyright © 2006 by Elsevier Inc.. Excerpted by permission of MORGAN KAUFMANN. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
and post it to your social network
Most Helpful Customer Reviews
See all customer reviews >