Read an Excerpt
The Designer's Guide to the Cortex-M Processor Family
A Tutorial Approach
By Trevor Martin Elsevier
Copyright © 2013 Elsevier Ltd.
All rights reserved.
ISBN: 978-0-08-098299-1
CHAPTER 1
Introduction to the Cortex-M Processor Family
Cortex Profiles
In 2004, ARM introduced its new Cortex family of processors. The Cortex processor family is subdivided into three different profiles. Each profile is optimized for different segments of embedded systems applications.
The Cortex processor family has three profiles—application, real time, and microcontroller. The Cortex-A profile has been designed as a high-end application processor. Cortex-A processors are capable of running feature-rich operating systems such as WinRT and Linux. The key applications for Cortex-A are consumer electronics such as smart phones, tablet computers, and set-top boxes. The second Cortex profile is Cortex-R. This is the real-time profile that delivers a high-performance processor which is the heart of an application-specific device. Very often a Cortex-R processor forms part of a "system-on-chip" design that is focused on a specific task such as hard disk drive (HDD) control, automotive engine management, and medical devices. The final profile is Cortex-M or the microcontroller profile. Unlike earlier ARM CPUs, the Cortex-M processor family has been designed specifically for use within a small microcontroller. The Cortex-M processor currently comes in five variants: Cortex-M0, Cortex-M01+, Cortex-M1, Cortex-M3, and Cortex-M4. The Cortex-M0 and Cortex-M01+ are the smallest processors in the family. They allow silicon manufacturers to design low-cost, low-power devices that can replace existing 8-bit microcontrollers while still offering 32-bit performance. The Cortex-M1 has much of the same features as the Cortex-M0 but has been designed as a "soft core" to run inside an Field Programmable Gate Array (FPGA) device. The Cortex-M3 is the mainstay of the Cortex-M family and was the first Cortex-M variant to be launched. It has enabled a new generation of high-performance 32-bit microcontrollers which can be manufactured at a very low cost. Today, there are many Cortex-M3-based microcontrollers available from a wide variety of silicon manufacturers. This represents a seismic shift where Cortex-M-based microcontrollers are starting to replace the traditional 8\16-bit microcontrollers and even other 32-bit microcontrollers. The highest performing member of the Cortex-M family is the Cortex-M4. This has all the features of the Cortex-M3 and adds support for digital signal processing (DSP) and also includes hardware floating point support for single precision calculations.
In the late 1990s, various manufacturers produced microcontrollers based on the ARM7 and ARM9 CPUs. While these microcontrollers were a huge leap in performance and competed in price with existing 8/16-bit architectures, they were not always easy to use. A developer would first have to learn how to use the ARM CPU and then have to understand how a specific manufacturer had integrated the ARM CPU into their microcontroller system. If you have moved to another ARM-based microcontroller you might have gone through another learning curve of the microcontroller system before you could confidently start development. Cortex-M changes all that; it is a complete Microcontroller Unit (MCU) architecture, not just a CPU core. It provides a standardized bus interface, debug architecture, CPU core, interrupt structure, power control, and memory protection. More importantly, each Cortex-M processor is the same across all manufacturers, so once you have learned to use one Cortex-M-based processor you can reuse this knowledge with any other manufacturers of Cortex-M microcontrollers. Also within the Cortex-M family, once you have learned the basics of how to use a Cortex-M3, then you can use this experience to develop using a Cortex-M0, Cortex-M0+, or a Cortex-M4 device. Through this book, we will use the Cortex-M3 as a reference device and then look at the differences between Cortex-M3 and Cortex-M0, Cortex-M0+, and Cortex-M4, so that you will have a practical knowledge of all the Cortex-M processors.
Cortex-M3
Today, the Cortex-M3 is the most widely used of all the Cortex-M processors. This is partly because it has been available not only for the longest period of time but also it meets the requirements for a general-purpose microcontroller. This typically means it has a good balance between high performance, low power consumption, and low cost.
The heart of the Cortex-M3 is a high-performance 32-bit CPU. Like the ARM7, this is a reduced instruction set computer (RISC) processor where most instructions will execute in a single cycle.
This is partly made possible by a three-stage pipeline with separate fetch, decode, and execute units.
So while one instruction is being executed, a second is being decoded, and a third is being fetched. The same approach was used on the ARM7. This is great when the code is going in a straight line, however, when the program branches, the pipeline must be flushed and refilled with new instructions before execution can continue. This made branches on the ARM7 quite expensive in terms of processing power. However, the Cortex-M3 and Cortex-M4 include an instruction to fetch unit that can handle speculative branch target fetches which can reduce the bench penalty. This helps the Cortex-M3 and Cortex-M4 to have a sustained processing power of 1.25 DMIPS/MHz. In addition, the processor has a hardware integer math unit with hardware divide and single cycle multiply. The Cortex-M3 processor also includes a nested vector interrupt unit (NVIC) that can service up to 240 interrupt sources. The NVIC provides fast deterministic interrupt handling and from an interrupt being raised to reaching the first line of "C" in the interrupt service routine takes just 12 cycles every time. The NVIC also contains a standard timer called the systick timer. This is a 24-bit countdown timer with an auto reload. This timer is present on all of the different Cortex-M processors. The systick timer is used to provide regular periodic interrupts. A typical use of this timer is to provide a timer tick for small footprint real-time operating systems (RTOS). We will have a look at such an RTOS in Chapter 6. Also next to the NVIC is the wakeup interrupt controller (WIC); this is a small area of the Cortex-M processor that is kept alive when the processor is in low-power mode. The WIC can use the interrupt signals from the microcontroller peripherals to wake up the Cortex-M processor from a low-power mode. The WIC can be implemented in various ways and in some cases does not require a clock to function; also, it can be in a separate power region from the main Cortex-M processor. This allows 99% of the Cortex-M processor to be placed in a low-power mode with just minimal current being used by the WIC.
The Cortex-M debug architecture is consistent across the Cortex-M family and contains up to three real-time trace units in addition to the run control unit. The Cortex-M family also has a very advanced debug architecture called CoreSight. The earlier ARM7/9 processors could be debugged through a joint test action group (JTAG) debug interface. This provided a means to download the application code into the on-chip flash memory and then exercise the code with basic run/stop debugging. While a JTAG debugger provided a low-cost way of debugging, it had two major problems. The first was a limited number of breakpoints, generally two with one being required for single stepping code and secondly, when the CPU was executing code the microcontroller became a black box with the debugger having no visibility to the CPU, memory, or peripherals until the microcontroller was halted. The CoreSight debug architecture within the Cortex-M processors is much more sophisticated than the old ARM7 or ARM9 processors. It allows up to eight hardware breakpoints to be placed in code or data regions. CoreSight also provides three separate trace units that support advanced debug features without intruding on the execution of the Cortex CPU. The Cortex-M3 and Cortex-M4 are always fitted with a data watchpoint and trace (DWT) unit and an instrumentation trace macrocell (ITM) unit. The debug interface allows a low-cost debugger to view the contents of memory and peripheral registers "on the fly" without halting the CPU, and the DWT can export a number of watched data, everything that is accessed by the processor, without stealing any cycles from the CPU. The second trace unit is called the instrumentation trace. This trace unit provides a debug communication method between the running code and the debugger user interface. During development, the standard IO channel can be redirected to a console window in the debugger. This allows you to instrument your code with printf() debug messages which can then be read in the debugger while the code is running. This can be useful for trapping complex runtime problems. The instrumentation trace is also very useful during software testing as it provides a way for a test harness to dump data to the PC without needing any specific hardware on the target. The instrumentation trace is actually more complex than a simple UART, as it provides 32 communication channels which can be used by different resources within the application code. For example, we can provide extended debug information about the performance of an RTOS by placing the code in the RTOS kernel that uses an instrumentation trace channel to communicate with the debugger. The final trace unit is called the embedded trace macrocell (ETM). This trace unit is an optional fit and is not present on all Cortex-M devices. Generally, a manufacturer will fit the ETM on their high-end microcontrollers to provide extended debug capabilities. The ETM provides instruction trace information that allows the debugger to build an assembler and High level language trace listing of the code executed. The ETM also enables more advanced tools such as code coverage monitoring and timing performance analysis. These debug features are often a requirement for safety critical and high integrity code development.
Advanced Architectural Features
The Cortex-M3 and Cortex-M4 can also be fitted with another unit to aid high integrity code execution. The memory protection unit allows developers to segment the Cortex-M memory map into regions with different access privileges. We will look at the operating modes of the Cortex-M processor in Chapter 5, but to put it in simple terms, the Cortex CPU can execute the code in a privileged mode or a more restrictive unprivileged mode. The memory protection unit (MPU) can define privileged and unprivileged regions over the 4 GB address space (i.e., code, ram, and peripheral). If the CPU is running in unprivileged mode and it tries to access a privileged region of memory, the MPU will raise an exception and execution will vector to the MPU fault service routine. The MPU provides hardware support for more advanced software designs. For example, you can configure the application code so that an RTOS and low-level device drivers have full privileged access to all the features of the microcontroller while the application code is restricted to its own region of code and data. Like the ETM, the MPU is an optional unit which may be fitted by the manufacturers during design of the microcontroller. The MPU is generally found on high-end devices which have large amounts of flash memory and SRAM. Finally, the Cortex-M3 and Cortex-M4 are interfaced to the rest of the microcontroller through a Harvard bus architecture. This means that they have a port for fetching instructions and constants from code memory and a second port for accessing SRAM and peripherals. We will look at the bus interface more closely in Chapter 5, but in essence, the Harvard bus architecture increases the performance of the Cortex-M processor but does not introduce any additional complexity for the programmer.
The earlier ARM CPUs, ARM7 and ARM9, supported two instruction sets. This code could be compiled either as 32-bit ARM code or as 16-bit Thumb code. The ARM instruction set would allow code to be written for maximum performance, while Thumb code would achieve a greater code density. During development, the programmer had to decide which function should be compiled with the ARM 32-bit instruction set and which should be built using the Thumb 16-bit instruction set. The linker would then interwork the two instruction sets together. While the Cortex-M processors are code compatible with the original Thumb instruction set, they are designed to execute an extended version of the Thumb instruction set called Thumb-2. Thumb-2 is a blend of 16- and 32-bit instructions that has been designed to be very C friendly and efficient. For even the smallest Cortex-M project, all of the code can be written in a high-level language, typically C, without any need to use an assembler.
(Continues...)
Excerpted from The Designer's Guide to the Cortex-M Processor Family by Trevor Martin. Copyright © 2013 Elsevier Ltd.. Excerpted by permission of Elsevier.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.