In part 1 we did the absolute minimal setup necessary to program our MCU. We manually defined the addresses of peripheral registers and invoked the compiler and debugger directly from the command line with a rather long list of arguments. In this post we are going to make things a bit easier for ourselves. In order to get definitions for all core and peripheral registers, we are going to add the Common Microcontroller Software Interface Standard (CMSIS) Core library from Arm along with a device header from STMicroelectronics. We are going to write a Makefile
and use the GNU Make (aka make
) build tool to facilitate the build process. Lastly, we are going to configure the system clock for maximum performance and revise our blink application.
CMSIS and register definitions
In part 1, when we wanted to access the GPIO registers, we had to manually look up the memory addresses in the reference manual and make a #define
for each register. If we had to do that for every single register for all peripherals, we would probably go insane. Luckily, someone has already done the legwork for us.
CMSIS is a collection of components for Arm Cortex-based microcontrollers, including an API to the core registers, a DSP library, an RTOS abstraction layer and more. The peripherals of the STM32F4 are covered in the STM32F4 CMSIS Device component made available by STMicroelectronics.
Let us go ahead and add the CMSIS Core(M) component and the STM32F4 Device component to our project by cloning the Git repositories. If you do not already have Git installed, go ahead and download Git for Windows. Open a Git Bash terminal, create a vendor
folder and clone CMSIS in there:
mkdir vendor
cd vendor
git clone https://github.com/ARM-software/CMSIS_5 CMSIS
Next, create an ST
folder inside the CMSIS/Device
folder and clone the Device component:
cd CMSIS/Device
mkdir ST
cd ST
clone https://github.com/STMicroelectronics/cmsis_device_f4 STM32F4
You might notice that the CMSIS
folder takes up a bit of space (more than 300 MB on my machine), because it contains all the available CMSIS components. Since we only need the Core component (CMSIS/CMSIS/Core
) and the STM32F4 Device component (CMSIS/Device/ST/STM32F4
), we can safely delete everything else (including the .git
folder for each repository).
To use the register definitions, just #include "stm32f4xx.h"
which in turn includes stm32f410rx.h
– as long as we remember to #define STM32F410Rx
or pass it to directly to the preprocessor with the -D
flag when compiling. In main.c
we can now remove all the peripheral register address definitions we typed in manually in part 1, and instead use the definitions from the CMSIS Device header:
#include <stdint.h>
#include "stm32f4xx.h"
#define LED_PIN 5
void main(void)
{
RCC->AHB1ENR |= (1 << RCC_AHB1ENR_GPIOAEN_Pos);
// do two dummy reads after enabling the peripheral clock, as per the errata
volatile uint32_t dummy;
dummy = RCC->AHB1ENR;
dummy = RCC->AHB1ENR;
GPIOA->MODER |= (1 << GPIO_MODER_MODER5_Pos);
while(1)
{
GPIOA->ODR ^= (1 << LED_PIN);
for (uint32_t i = 0; i < 1000000; i++);
}
}
Now, when we compile we must add both the CMSIS/Core/Include
folder and CMSIS/Device/ST/STM32F4/Include
to our include path and also compile the source file CMSIS/Device/ST/STM32F4/Source/Templates/system_stm32f4xx.c
. Additionally, we are going to specify which MCU we are using in order for stm32f4xx.h
to select the correct device header. Our complete compilation command is now:
arm-none-eabi-gcc main.c startup.c vendor/CMSIS/Device/ST/STM32F4/Source/Templates/system_stm32f4xx.c -T linker_script.ld -o blink.elf -Ivendor/CMSIS/CMSIS/Core/Include -Ivendor/CMSIS/Device/ST/STM32F4/Include -mcpu=cortex-m4 -mthumb -nostdlib -DSTM32F410Rx
That’s quite a long command. Let us take a look at how we can make this a bit more manageable.
GNU make and the Makefile
GNU Make is a build automation tool commonly used on Linux (if you are a Linux user you have probably used it together with autotools
to install software with the commands ./configure
, make
and make install
). Basically, make
works by reading a Makefile
and executes shell commands based on the rules defined in that Makefile
. What we gain from this is, instead of typing in the very long arm-none-eabi-gcc
command to compile and then calling openocd
with the correct configuration files to flash the .elf
to the target, we can simply run make
or make flash
. Additionally, make
automatically keeps track of whether a source file has changed since the last time it was compiled, so we avoid having to recompile the entire project every time we build – very useful for larger projects.
To install make
, let us open an MSYS2 MINGW64 terminal and enter:
pacman -S make
Now, in the root of the project folder, we are going to create a file called Makefile
(no extension). In this file we are going to create rules for:
- Compiling our source files into object files
- Linking the object files into an executable
- Flashing the executable to the target
- Cleaning the build directory
A rule in the Makefile
is structured as follows:
target: prerequisites
command
Note that the command must be indented by a tab. When invoking make
the target is given as an argument (e.g. make flash
or make install
). If no argument is given (i.e. just make
) then the first target in the Makefile
is selected, which by convention is named all
. We could write a really simple Makefile
by just copying the commands we typed in manually before:
all: blink.elf
blink.elf: main.c startup.c vendor/CMSIS/Device/ST/STM32F4/Source/Templates/system_stm32f4xx.c
arm-none-eabi-gcc main.c startup.c vendor/CMSIS/Device/ST/STM32F4/Source/Templates/system_stm32f4xx.c -T linker_script.ld -o blink.elf -Ivendor/CMSIS/CMSIS/Core/Include -Ivendor/CMSIS/Device/ST/STM32F4/Include -mcpu=cortex-m4 -mthumb -nostdlib -DSTM32F410Rx
flash: blink.elf
openocd -f interface/stlink.cfg -f target/stm32f4x.cfg -c "program blink.elf verify reset exit"
This will work just fine, but it looks quite messy and we are not really taking advantage of the power of make
. Let us improve the Makefile
by first defining a few variables:
CC=arm-none-eabi-gcc
CFLAGS=-mcpu=cortex-m4 -mthumb -nostdlib
CPPFLAGS=-DSTM32F410Rx \
-Ivendor/CMSIS/Device/ST/STM32F4/Include \
-Ivendor/CMSIS/CMSIS/Core/Include
LINKER_FILE=linker_script.ld
LDFLAGS=-T $(LINKER_FILE)
The variables CC
, CFLAGS
, CPPFLAGS
and LDFLAGS
are implicit variables used to define the C compiler, compiler flags, pre-processor flags and linker flags, respectively. These variables are used when executing implicit rules. To keep things simple for now, we are just going to use these variables explicitly in our rules. Let us now make a rule for each of our source files and an additional target to link them into a .elf
file:
all: blink.elf
blink.elf: main.o startup.o system_stm32f4xx.o
$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $^ -o blink.elf
main.o: main.c
$(CC) $(CFLAGS) $(CPPFLAGS) main.c -c
startup.o: startup.c
$(CC) $(CFLAGS) $(CPPFLAGS) startup.c -c
system_stm32f4xx.o: vendor/CMSIS/Device/ST/STM32F4/Source/Templates/system_stm32f4xx.c
$(CC) $(CFLAGS) $(CPPFLAGS) vendor/CMSIS/Device/ST/STM32F4/Source/Templates/system_stm32f4xx.c -c
If you do not specify an output filename (with -o
), then gcc
defaults to using the same filename as the input file but with a .o
extension instead. The $^
in the blink.elf
rule is known as an automatic variable and is short-hand for “the names of all the prerequisites”. Similarly, we could have used $@
to insert the name of the target as the output filename, but I have written it explicitly here to keep it simple. If we run make
from the command line, we should see the three object files being built and lastly linked together to create blink.elf
. If we run make
again, we should the message:
make: Nothing to be done for ‘all’.
Since none of the files have changed, make
decides that there is no need to rebuild anything. If you make a change in main.c
and then run make
again, you should see only main.o
and blink.elf
being rebuilt.
Suppose you want to clean your project of all output files and rebuild everything. We could write a rule for a phony target that deletes all .o
and .elf
files. A phony target is simply a target that does not produce an output file, but is just a name for a specific command to be executed. We will add a clean
target to the Makefile
:
.PHONY: clean
clean:
rm -f *.o *.elf
Now, if you run make clean
all the output files will be deleted, and if you run make
again everything will be rebuilt.
Last thing we need to do is create a rule for flashing the .elf
file to the target MCU:
PROGRAMMER=openocd
PROGRAMMER_FLAGS=-f interface/stlink.cfg -f target/stm32f4x.cfg
flash: blink.elf
$(PROGRAMMER) $(PROGRAMMER_FLAGS) -c "program blink.elf verify reset exit"
That should do it. Whenever we want to flash our firmware to the target, we can simply run make flash
from the command line. In fact, we do not even have to run make
after making changes to the source files – make
will figure it out based on the prerequisites we have given in the Makefile
rules.
Clock configuration
Now that we have a basic build system and easy access to all core and peripheral registers, let us set up the MCU for maximum performance by increasing the system clock frequency to its maximum value of 100 MHz.
By default, the MCU is configured to use its high-speed internal (HSI) oscillator as the system clock, which is a 16 MHz RC oscillator that can reach an accuracy of 1% at room temperature with user-trimming. It comes with a fair amount of clock jitter and is quite sensitive to temperature – the datasheet specifies -8% to 5.5% over the range of -40 °C to 125 °C! It might work fine for some things if trimmed and kept at room temperature, but for things like USB or CAN communication, you are probably better off using a more accurate high-speed external (HSE) oscillator.
Using a high-speed external (HSE) oscillator
An HSE can be connected either in the form of an oscillator between the OSC_IN
and OSC_OUT
pins or as a external clock signal connected directly to OSC_IN
. On the NUCLEO board I am using, the integrated ST-LINK is clocked by an 8 MHz crystal oscillator which (as per the bill-of-materials) has an accuracy of ± 20 ppm, which is a whole lot better than the HSI. The ST-LINK MCU outputs this clock on its MCO
pin and feeds it to OSC_IN
on the main MCU. To use this external clock signal we are going to the set HSE bypass bit and turn on the HSE in the RCC control register:
// Set HSE bypass (to use external clock on OSC_IN, not a crystal) and enable HSE
RCC->CR |= RCC_CR_HSEBYP_Msk | RCC_CR_HSEON_Msk;
while (!(RCC->CR & RCC_CR_HSERDY_Msk));
So now we have a stable and accurate 8 MHz clock source, but how do we get that up to 100 MHz? We will get back to that in a second, but we must take care of a few other things first.
Voltage scaling and flash latency
When changing the system clock, we must make sure that we first configure both the internal voltage regulator and the embedded flash memory controller to support this clock.
The internal voltage regulator supplies roughly 1.2 V to all the digital circuitry in the MCU. To reduce power consumption, it is possible to scale down this voltage when using system clock frequencies less than the maximum. In order to run at a 100 MHz, we must configure the power controller to scale mode 1 as specified in the reference manual under “PWR power control register”. Let us enable the power controller (and do a couple of dummy reads to ensure that is is enabled) and then select the scale mode:
// Enable power controller and set voltage scale mode 1
RCC->APB1ENR |= RCC_APB1ENR_PWREN_Msk;
volatile uint32_t dummy;
dummy = RCC->APB1ENR;
dummy = RCC->APB1ENR;
PWR->CR |= (0b11 << PWR_CR_VOS_Pos);
Next, the flash controller must be configured with the correct number of wait states, which depend on both the supply voltage and the core clock (see Table 6 in the reference manual). With a supply voltage of 3.3 V and a 100 MHz core clock, we must configure it for 3 wait states:
// Configure flash controller for 3V3 supply and 100 MHz -> 3 wait states
FLASH->ACR |= FLASH_ACR_LATENCY_3WS;
Increasing clock frequency with a phase-locked loop
Alright, back to the issue of increasing the 8 MHz HSE clock to 100 MHz for our system clock. The trick to achieving this is using the MCU’s internal phase-locked loop (PLL). Let us take a look at the clock tree in the reference manual (Figure 12 in the “Reset and clock control” section):

In the bottom left we see the PLL which can take either the HSI or HSE as input (the smaller of the two red circles). The clock is then divided by M, multiplied by N and divided by either P, Q or R depending on the PLL clock output. There are some constraints to the clock frequency at each step through the PLL, which we will get back to in a second. The P output is fed to the SYSCLK multiplexer (the larger of the red circles) where we can select which clock to use as our system clock. The system clock can then be divided further before clocking the AHB and the APBs. This last part is important, because the maximum clock frequency of the low-speed APB (APB1) is 50 MHz.
Now, when deciding the M, N and P values, we must consider not just the final PLL output frequency, but the frequency at each stage of the PLL (see the descriptions in “RCC PLL configuration register” in the reference manual). The input for the PLL must be between 1 and 2 MHz, preferably 2 MHz to reduce jitter. So we will choose M = 4
. Next, the output of the voltage-controlled oscillator (VCO) must be between 100 and 432 MHz, so let us bump up our 2 MHz to 400 MHz by setting the multiplier N = 200
. Lastly, to divide the VCO output down to 100 MHz for the PLL output, we will set P = 4
. Now that we have all the values we need, we can configure the PLL:
// Clear PLLM, PLLN and PLLP bits
RCC->PLLCFGR &= ~(RCC_PLLCFGR_PLLM_Msk |
RCC_PLLCFGR_PLLN_Msk |
RCC_PLLCFGR_PLLP_Msk);
// Set PLLM, PLLN and PLLP, and select HSE as PLL source
RCC->PLLCFGR |= ((4 << RCC_PLLCFGR_PLLM_Pos) |
(200 << RCC_PLLCFGR_PLLN_Pos) |
(1 << RCC_PLLCFGR_PLLP_Pos) |
(1 << RCC_PLLCFGR_PLLSRC_Pos));
// Set APB1 prescaler to 2
RCC->CFGR |= (0b100 << RCC_CFGR_PPRE1_Pos);
// Enable PLL and wait for ready
RCC->CR |= RCC_CR_PLLON_Msk;
while (! (RCC->CR & RCC_CR_PLLRDY_Msk));
// Select PLL output as system clock
RCC->CFGR |= (RCC_CFGR_SW_PLL << RCC_CFGR_SW_Pos);
while (! (RCC->CFGR & RCC_CFGR_SWS_PLL));
That is it for the clock configuration! To cleans things up a bit, I will wrap everything up in a clock_init()
function and call that from main()
. For good measure let us also call SystemCoreClockUpdate()
to make CMSIS aware of the modifications and allow it to change its internal clock variable. It does not really do anything for us at the moment, but it is important if we decide to use ST’s HAL library later on.
Verifying the clock frequency with SysTick
To ensure that we have set up the clock correctly, we will use the Cortex-M4’s SysTick timer to create a simple busy waiting delay function. We can then use this in our blink application instead of the rather crude for
loop we used in part 1.
The idea is to set up the SysTick timer to trigger an interrupt every millisecond and then increment a counter variable. The CMSIS Core library includes the SysTick_Config()
function which handles all this configuration for us – we just need to specify the timer’s reload value. Since our core clock is 100 MHz and we want a 1 kHz interrupt rate, we need to divide down the clock by 100000, i.e. a reload value of 100000 - 1
. Of course, we must also enable global interrupts. The following code is added after the clock initialization:
SysTick_Config(100000);
__enable_irq();
Now, recall that the SysTick interrupt handler was defined as a weak alias to the default_handler()
in the startup code we wrote in part 1. Let us now overwrite this by redefining systick_handler()
in main.c
:
uint32_t ticks;
void systick_handler()
{
ticks++;
}
Next we can write a delay function that simply waits for a specified number of milliseconds (or ticks
):
void delay_ms(uint32_t milliseconds)
{
uint32_t start = ticks;
uint32_t end = start + milliseconds;
if (end < start) // handle overflow
{
while (ticks > start); // wait for ticks to wrap around to zero
}
while (ticks < end);
}
Lastly, in the super loop we will use this new delay_ms()
function to blink the LED:
while (1)
{
GPIOA->ODR ^= (1 << LED_PIN);
delay_ms(500);
}
If everything is configured correctly, the LED should now blink at a (very accurate) frequency of 1 Hz.
Next time
In part 3 we will take a look at how we can integrate the C standard library in our project and try to use printf()
for some primitive debugging.
I think this blog is becoming one of my favorite resource. Please keep on writing, the articles are very informative !
That means a lot to me! I learn a lot myself from writing these articles, and I am glad to hear that others benefit from them as well.
Hi Kristian,
I have some issues trying to flash my program in Nucleo-F103rb using openOCD. But I got this error :
Open On-Chip Debugger 0.12.0+dev-01168-g682f927f8 (2023-05-02-21:22)
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
srst_only separate srst_nogate srst_open_drain connect_deassert_srst
Info : clock speed 1000 kHz
Info : STLINK V2J33M25 (API v2) VID:PID 0483:374B
Info : Target voltage: 3.260198
Info : [stm32f1x.cpu] Cortex-M3 r1p1 processor detected
Info : [stm32f1x.cpu] target has 6 breakpoints, 4 watchpoints
Info : starting gdb server for stm32f1x.cpu on 3333
Info : Listening on port 3333 for gdb connections
[stm32f1x.cpu] halted due to debug-request, current mode: Thread
xPSR: 0x01000000 pc: 0x0800014c msp: 0x20005000
** Programming Started **
Info : device id = 0x20036410
Info : flash size = 128 KiB
Error: Section at 0x080002f8 overlaps section ending at 0x08000314
Error: Flash write aborted.
** Programming Failed **
shutdown command invoked
I modified the linker script but still the same error. Could you if possible figure out the source of the error?
I surf the all internet web, and I didn’t find such a good article describing details from scratch without any IDEs. Keep going. Thank you for the articles. I’m waiting for the next 😊