STM32 without CubeIDE (Part 2): CMSIS, make and clock configuration

In part 1 we did the absolute minimal setup necessary to program our MCU. We manually defined the addresses of peripheral registers and invoked the compiler and debugger directly from the command line with a rather long list of arguments. In this post we are going to make things a bit easier for ourselves. In order to get definitions for all core and peripheral registers, we are going to add the Common Microcontroller Software Interface Standard (CMSIS) Core library from Arm along with a device header from STMicroelectronics. We are going to write a Makefile and use the GNU Make (aka make) build tool to facilitate the build process. Lastly, we are going to configure the system clock for maximum performance and revise our blink application.

All code from this blog post is available on Github.

CMSIS and register definitions

In part 1, when we wanted to access the GPIO registers, we had to manually look up the memory addresses in the reference manual and make a #define for each register. If we had to do that for every single register for all peripherals, we would probably go insane. Luckily, someone has already done the legwork for us.

CMSIS is a collection of components for Arm Cortex-based microcontrollers, including an API to the core registers, a DSP library, an RTOS abstraction layer and more. The peripherals of the STM32F4 are covered in the STM32F4 CMSIS Device component made available by STMicroelectronics.

Let us go ahead and add the CMSIS Core(M) component and the STM32F4 Device component to our project by cloning the Git repositories. If you do not already have Git installed, go ahead and download Git for Windows. Open a Git Bash terminal, create a vendor folder and clone CMSIS in there:

mkdir vendor
cd vendor
git clone https://github.com/ARM-software/CMSIS_5 CMSIS

Next, create an ST folder inside the CMSIS/Device folder and clone the Device component:

cd CMSIS/Device
mkdir ST
cd ST
clone https://github.com/STMicroelectronics/cmsis_device_f4 STM32F4

You might notice that the CMSIS folder takes up a bit of space (more than 300 MB on my machine), because it contains all the available CMSIS components. Since we only need the Core component (CMSIS/CMSIS/Core) and the STM32F4 Device component (CMSIS/Device/ST/STM32F4), we can safely delete everything else (including the .git folder for each repository).

To use the register definitions, just #include "stm32f4xx.h" which in turn includes stm32f410rx.h – as long as we remember to #define STM32F410Rx or pass it to directly to the preprocessor with the -D flag when compiling. In main.c we can now remove all the peripheral register address definitions we typed in manually in part 1, and instead use the definitions from the CMSIS Device header:

#include <stdint.h>
#include "stm32f4xx.h"

#define LED_PIN 5

void main(void)
{
  RCC->AHB1ENR |= (1 << RCC_AHB1ENR_GPIOAEN_Pos);
  
  // do two dummy reads after enabling the peripheral clock, as per the errata
  volatile uint32_t dummy;
  dummy = RCC->AHB1ENR;
  dummy = RCC->AHB1ENR;

  GPIOA->MODER |= (1 << GPIO_MODER_MODER5_Pos);
  
  while(1)
  {
    GPIOA->ODR ^= (1 << LED_PIN);
    for (uint32_t i = 0; i < 1000000; i++);
  }
}

Now, when we compile we must add both the CMSIS/Core/Include folder and CMSIS/Device/ST/STM32F4/Include to our include path and also compile the source file CMSIS/Device/ST/STM32F4/Source/Templates/system_stm32f4xx.c. Additionally, we are going to specify which MCU we are using in order for stm32f4xx.h to select the correct device header. Our complete compilation command is now:

arm-none-eabi-gcc main.c startup.c vendor/CMSIS/Device/ST/STM32F4/Source/Templates/system_stm32f4xx.c -T linker_script.ld -o blink.elf -Ivendor/CMSIS/CMSIS/Core/Include -Ivendor/CMSIS/Device/ST/STM32F4/Include -mcpu=cortex-m4 -mthumb -nostdlib -DSTM32F410Rx

That’s quite a long command. Let us take a look at how we can make this a bit more manageable.

GNU make and the Makefile

GNU Make is a build automation tool commonly used on Linux (if you are a Linux user you have probably used it together with autotools to install software with the commands ./configure, make and make install). Basically, make works by reading a Makefile and executes shell commands based on the rules defined in that Makefile. What we gain from this is, instead of typing in the very long arm-none-eabi-gcc command to compile and then calling openocd with the correct configuration files to flash the .elf to the target, we can simply run make or make flash. Additionally, make automatically keeps track of whether a source file has changed since the last time it was compiled, so we avoid having to recompile the entire project every time we build – very useful for larger projects.

To install make, let us open an MSYS2 MINGW64 terminal and enter:

pacman -S make

Now, in the root of the project folder, we are going to create a file called Makefile (no extension). In this file we are going to create rules for:

Compiling our source files into object files
Linking the object files into an executable
Flashing the executable to the target
Cleaning the build directory

A rule in the Makefile is structured as follows:

target: prerequisites
	command

Note that the command must be indented by a tab. When invoking make the target is given as an argument (e.g. make flash or make install). If no argument is given (i.e. just make) then the first target in the Makefile is selected, which by convention is named all. We could write a really simple Makefile by just copying the commands we typed in manually before:

all: blink.elf

blink.elf: main.c startup.c vendor/CMSIS/Device/ST/STM32F4/Source/Templates/system_stm32f4xx.c
	arm-none-eabi-gcc main.c startup.c vendor/CMSIS/Device/ST/STM32F4/Source/Templates/system_stm32f4xx.c -T linker_script.ld -o blink.elf -Ivendor/CMSIS/CMSIS/Core/Include -Ivendor/CMSIS/Device/ST/STM32F4/Include -mcpu=cortex-m4 -mthumb -nostdlib -DSTM32F410Rx

flash: blink.elf
	openocd -f interface/stlink.cfg -f target/stm32f4x.cfg -c "program blink.elf verify reset exit"

This will work just fine, but it looks quite messy and we are not really taking advantage of the power of make. Let us improve the Makefile by first defining a few variables:

CC=arm-none-eabi-gcc
CFLAGS=-mcpu=cortex-m4 -mthumb -nostdlib
CPPFLAGS=-DSTM32F410Rx \
	 -Ivendor/CMSIS/Device/ST/STM32F4/Include \
	 -Ivendor/CMSIS/CMSIS/Core/Include

LINKER_FILE=linker_script.ld
LDFLAGS=-T $(LINKER_FILE)

The variables CC, CFLAGS, CPPFLAGS and LDFLAGS are implicit variables used to define the C compiler, compiler flags, pre-processor flags and linker flags, respectively. These variables are used when executing implicit rules. To keep things simple for now, we are just going to use these variables explicitly in our rules. Let us now make a rule for each of our source files and an additional target to link them into a .elf file:

all: blink.elf

blink.elf: main.o startup.o system_stm32f4xx.o
	$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $^ -o blink.elf

main.o: main.c
	$(CC) $(CFLAGS) $(CPPFLAGS) main.c -c

startup.o: startup.c
	$(CC) $(CFLAGS) $(CPPFLAGS) startup.c -c

system_stm32f4xx.o: vendor/CMSIS/Device/ST/STM32F4/Source/Templates/system_stm32f4xx.c
	$(CC) $(CFLAGS) $(CPPFLAGS) vendor/CMSIS/Device/ST/STM32F4/Source/Templates/system_stm32f4xx.c -c

If you do not specify an output filename (with -o), then gcc defaults to using the same filename as the input file but with a .o extension instead. The $^ in the blink.elf rule is known as an automatic variable and is short-hand for “the names of all the prerequisites”. Similarly, we could have used $@ to insert the name of the target as the output filename, but I have written it explicitly here to keep it simple. If we run make from the command line, we should see the three object files being built and lastly linked together to create blink.elf. If we run make again, we should the message:

make: Nothing to be done for ‘all’.

Since none of the files have changed, make decides that there is no need to rebuild anything. If you make a change in main.c and then run make again, you should see only main.o and blink.elf being rebuilt.

Suppose you want to clean your project of all output files and rebuild everything. We could write a rule for a phony target that deletes all .o and .elf files. A phony target is simply a target that does not produce an output file, but is just a name for a specific command to be executed. We will add a clean target to the Makefile:

.PHONY: clean
clean:
	rm -f *.o *.elf

Now, if you run make clean all the output files will be deleted, and if you run make again everything will be rebuilt.

Last thing we need to do is create a rule for flashing the .elf file to the target MCU:

PROGRAMMER=openocd
PROGRAMMER_FLAGS=-f interface/stlink.cfg -f target/stm32f4x.cfg

flash: blink.elf
	$(PROGRAMMER) $(PROGRAMMER_FLAGS) -c "program blink.elf verify reset exit"

That should do it. Whenever we want to flash our firmware to the target, we can simply run make flash from the command line. In fact, we do not even have to run make after making changes to the source files – make will figure it out based on the prerequisites we have given in the Makefile rules.

Clock configuration

Now that we have a basic build system and easy access to all core and peripheral registers, let us set up the MCU for maximum performance by increasing the system clock frequency to its maximum value of 100 MHz.

By default, the MCU is configured to use its high-speed internal (HSI) oscillator as the system clock, which is a 16 MHz RC oscillator that can reach an accuracy of 1% at room temperature with user-trimming. It comes with a fair amount of clock jitter and is quite sensitive to temperature – the datasheet specifies -8% to 5.5% over the range of -40 °C to 125 °C! It might work fine for some things if trimmed and kept at room temperature, but for things like USB or CAN communication, you are probably better off using a more accurate high-speed external (HSE) oscillator.

Using a high-speed external (HSE) oscillator

An HSE can be connected either in the form of an oscillator between the OSC_IN and OSC_OUT pins or as a external clock signal connected directly to OSC_IN. On the NUCLEO board I am using, the integrated ST-LINK is clocked by an 8 MHz crystal oscillator which (as per the bill-of-materials) has an accuracy of ± 20 ppm, which is a whole lot better than the HSI. The ST-LINK MCU outputs this clock on its MCO pin and feeds it to OSC_IN on the main MCU. To use this external clock signal we are going to the set HSE bypass bit and turn on the HSE in the RCC control register:

// Set HSE bypass (to use external clock on OSC_IN, not a crystal) and enable HSE
RCC->CR |= RCC_CR_HSEBYP_Msk | RCC_CR_HSEON_Msk;
while (!(RCC->CR & RCC_CR_HSERDY_Msk));

So now we have a stable and accurate 8 MHz clock source, but how do we get that up to 100 MHz? We will get back to that in a second, but we must take care of a few other things first.

Voltage scaling and flash latency

When changing the system clock, we must make sure that we first configure both the internal voltage regulator and the embedded flash memory controller to support this clock.

The internal voltage regulator supplies roughly 1.2 V to all the digital circuitry in the MCU. To reduce power consumption, it is possible to scale down this voltage when using system clock frequencies less than the maximum. In order to run at a 100 MHz, we must configure the power controller to scale mode 1 as specified in the reference manual under “PWR power control register”. Let us enable the power controller (and do a couple of dummy reads to ensure that is is enabled) and then select the scale mode:

// Enable power controller and set voltage scale mode 1
RCC->APB1ENR |= RCC_APB1ENR_PWREN_Msk;
volatile uint32_t dummy;
dummy = RCC->APB1ENR;
dummy = RCC->APB1ENR;
PWR->CR |= (0b11 << PWR_CR_VOS_Pos);

Next, the flash controller must be configured with the correct number of wait states, which depend on both the supply voltage and the core clock (see Table 6 in the reference manual). With a supply voltage of 3.3 V and a 100 MHz core clock, we must configure it for 3 wait states:

// Configure flash controller for 3V3 supply and 100 MHz -> 3 wait states
FLASH->ACR |= FLASH_ACR_LATENCY_3WS;

Increasing clock frequency with a phase-locked loop

Alright, back to the issue of increasing the 8 MHz HSE clock to 100 MHz for our system clock. The trick to achieving this is using the MCU’s internal phase-locked loop (PLL). Let us take a look at the clock tree in the reference manual (Figure 12 in the “Reset and clock control” section):

In the bottom left we see the PLL which can take either the HSI or HSE as input (the smaller of the two red circles). The clock is then divided by M, multiplied by N and divided by either P, Q or R depending on the PLL clock output. There are some constraints to the clock frequency at each step through the PLL, which we will get back to in a second. The P output is fed to the SYSCLK multiplexer (the larger of the red circles) where we can select which clock to use as our system clock. The system clock can then be divided further before clocking the AHB and the APBs. This last part is important, because the maximum clock frequency of the low-speed APB (APB1) is 50 MHz.

Now, when deciding the M, N and P values, we must consider not just the final PLL output frequency, but the frequency at each stage of the PLL (see the descriptions in “RCC PLL configuration register” in the reference manual). The input for the PLL must be between 1 and 2 MHz, preferably 2 MHz to reduce jitter. So we will choose M = 4. Next, the output of the voltage-controlled oscillator (VCO) must be between 100 and 432 MHz, so let us bump up our 2 MHz to 400 MHz by setting the multiplier N = 200. Lastly, to divide the VCO output down to 100 MHz for the PLL output, we will set P = 4. Now that we have all the values we need, we can configure the PLL:

// Clear PLLM, PLLN and PLLP bits
RCC->PLLCFGR &= ~(RCC_PLLCFGR_PLLM_Msk |
                  RCC_PLLCFGR_PLLN_Msk |
                  RCC_PLLCFGR_PLLP_Msk);
  
// Set PLLM, PLLN and PLLP, and select HSE as PLL source
RCC->PLLCFGR |= ((4 << RCC_PLLCFGR_PLLM_Pos) | 
                 (200 << RCC_PLLCFGR_PLLN_Pos) |
                 (1 << RCC_PLLCFGR_PLLP_Pos) |
                 (1 << RCC_PLLCFGR_PLLSRC_Pos));
  
// Set APB1 prescaler to 2
RCC->CFGR |= (0b100 << RCC_CFGR_PPRE1_Pos);
  
// Enable PLL and wait for ready
RCC->CR |= RCC_CR_PLLON_Msk;
while (! (RCC->CR & RCC_CR_PLLRDY_Msk));

// Select PLL output as system clock
RCC->CFGR |= (RCC_CFGR_SW_PLL << RCC_CFGR_SW_Pos);
while (! (RCC->CFGR & RCC_CFGR_SWS_PLL));

That is it for the clock configuration! To cleans things up a bit, I will wrap everything up in a clock_init() function and call that from main(). For good measure let us also call SystemCoreClockUpdate() to make CMSIS aware of the modifications and allow it to change its internal clock variable. It does not really do anything for us at the moment, but it is important if we decide to use ST’s HAL library later on.

Verifying the clock frequency with SysTick

To ensure that we have set up the clock correctly, we will use the Cortex-M4’s SysTick timer to create a simple busy waiting delay function. We can then use this in our blink application instead of the rather crude for loop we used in part 1.

The idea is to set up the SysTick timer to trigger an interrupt every millisecond and then increment a counter variable. The CMSIS Core library includes the SysTick_Config() function which handles all this configuration for us – we just need to specify the timer’s reload value. Since our core clock is 100 MHz and we want a 1 kHz interrupt rate, we need to divide down the clock by 100000, i.e. a reload value of 100000 - 1. Of course, we must also enable global interrupts. The following code is added after the clock initialization:

SysTick_Config(100000);
__enable_irq();

Now, recall that the SysTick interrupt handler was defined as a weak alias to the default_handler() in the startup code we wrote in part 1. Let us now overwrite this by redefining systick_handler() in main.c:

uint32_t ticks;

void systick_handler()
{
  ticks++;
}

Next we can write a delay function that simply waits for a specified number of milliseconds (or ticks):

void delay_ms(uint32_t milliseconds)
{
  uint32_t start = ticks;
  uint32_t end = start + milliseconds;

  if (end < start) // handle overflow
  {
    while (ticks > start); // wait for ticks to wrap around to zero
  }

  while (ticks < end);
}

Lastly, in the super loop we will use this new delay_ms() function to blink the LED:

while (1)
{
  GPIOA->ODR ^= (1 << LED_PIN);
  delay_ms(500);
}

If everything is configured correctly, the LED should now blink at a (very accurate) frequency of 1 Hz.

Next time

In part 3 we will take a look at how we can integrate the C standard library in our project and try to use printf() for some primitive debugging.

7 thoughts on “STM32 without CubeIDE (Part 2): CMSIS, make and clock configuration”

Sam says:

April 1, 2023 at 14:47

I think this blog is becoming one of my favorite resource. Please keep on writing, the articles are very informative !

Log in to Reply
1. Kristian Klein-Wengel says:
  
  April 1, 2023 at 19:30
  
  That means a lot to me! I learn a lot myself from writing these articles, and I am glad to hear that others benefit from them as well.
  
  Log in to Reply
  1. Ricci says:
    
    May 3, 2023 at 18:24
    
    Hi Kristian,
    I have some issues trying to flash my program in Nucleo-F103rb using openOCD. But I got this error :
    
    Open On-Chip Debugger 0.12.0+dev-01168-g682f927f8 (2023-05-02-21:22)
    Licensed under GNU GPL v2
    For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
    Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
    srst_only separate srst_nogate srst_open_drain connect_deassert_srst
    Info : clock speed 1000 kHz
    Info : STLINK V2J33M25 (API v2) VID:PID 0483:374B
    Info : Target voltage: 3.260198
    Info : [stm32f1x.cpu] Cortex-M3 r1p1 processor detected
    Info : [stm32f1x.cpu] target has 6 breakpoints, 4 watchpoints
    Info : starting gdb server for stm32f1x.cpu on 3333
    Info : Listening on port 3333 for gdb connections
    [stm32f1x.cpu] halted due to debug-request, current mode: Thread
    xPSR: 0x01000000 pc: 0x0800014c msp: 0x20005000
    ** Programming Started **
    Info : device id = 0x20036410
    Info : flash size = 128 KiB
    Error: Section at 0x080002f8 overlaps section ending at 0x08000314
    Error: Flash write aborted.
    ** Programming Failed **
    shutdown command invoked
    
    I modified the linker script but still the same error. Could you if possible figure out the source of the error?
    
    Log in to Reply
Pingback: STM32 without CubeIDE (Part 3): The C Standard Library and printf() - Klein Embedded
Ricci says:

May 1, 2023 at 13:31

I surf the all internet web, and I didn’t find such a good article describing details from scratch without any IDEs. Keep going. Thank you for the articles. I’m waiting for the next 😊

Log in to Reply
Nora says:

October 29, 2025 at 14:52

It’s an amazing blog, Kristian!
It really helps that you mention where to look for things too.
Related question – How do you recommend getting familiar with CMSIS syntax?

Log in to Reply
1. Kristian Klein-Wengel says:
  
  October 30, 2025 at 08:03
  
  Glad to hear it, Nora!
  
  The official CMSIS documentation is a good place to start:
  https://arm-software.github.io/CMSIS_6/latest/General/index.html
  It provides a brief overview and usage instructions for each of the “sub-libraries” (i.e. Core, DSP, and so on), as well as register mappings and API reference.
  
  If you mean the bit manipulation to set, clear, toggle and check registers, this thread explains the basics:
  https://stackoverflow.com/questions/47981/how-to-set-clear-and-toggle-a-single-bit
  
  Log in to Reply