STM32 without CubeIDE (Part 3): The C Standard Library and printf()

In part 2 we wrote a Makefile to build the project, configured the system clock and created a basic blink application. So far, we have built the project without the C standard library by invoking gcc with the -nostdlib flag. In this article we are going to take a look at how to integrate the C standard library into our project and set up printf() to send messages to our host machine via UART for some primitive debugging capabilities.

As an aside: Since the last article I have switched OS on my development machine from Windows 10 to Ubuntu 22.04. However, since we were using MSYS2 on Windows, the commands should be pretty much the same.

All code from this blog post is available on Github.

The C standard library

If you are compiling a C program with gcc for a desktop computer you are probably using the GNU C library, or glibc. This is several megabytes in size and is not really suitable for use on microcontrollers. Therefore, most embedded toolchains (including the Arm GNU toolchain that we are using) ships with the smaller Newlib C library, which is made specifically for use on resource-constrained systems. With the Arm toolchain you also have the option to use the even smaller Newlib-nano (also referred to as Nanolib) which, apart from having a smaller code size, also uses less RAM than the regular Newlib. Do note, that it does not have floating point support for printf() and scanf() enabled by default, so you will have to enable this explicitly if you need it.

To get an idea of the size difference between the libraries, I compiled the project into three different binaries with no standard library, Newlib and Newlib-nano, respectively. Checking the size of the binaries with arm-none-eabi-size gave the following results. This is without compiler optimization.

Without a C library (-nostdlib):

$ arm-none-eabi-size blink.elf 
   text    data     bss     dec     hex filename
    984     460       4    1448     5a8 blink.elf

With Newlib (removed -nostdlib):

$ arm-none-eabi-size blink.elf 
   text    data     bss     dec     hex filename
   7132    1824     817    9773    262d blink.elf

With Newlib-nano (--specs=nano.specs):

$ arm-none-eabi-size blink.elf 
   text    data     bss     dec     hex filename
   4140     564     369    5073    13d1 blink.elf

Here, the binary linked with Newlib (~9.5 kB) is almost twice the size of the one linked with Newlib-nano (~5 kB). I did the same comparison again after implementing everything in the rest of this article and got binaries of about 45 kB and 20 kB.

Since I am working on an STM32F410 with 128 kB flash and 32 kB RAM, either one should be fine. For this article I am going to go with Newlib-nano as it seems to be the most popular choice.

Adding Newlib-nano

In order to link to Newlib-nano, the first thing we have to do is change our compiler and linker options. Instead of -nostdlib we are going to give gcc a spec file for linking to Newlib-nano by appending --specs=nano.specs both to our compiler flags and linker flags. The nano.specs file is included in the toolchain. On my installation it is located under /opt/gcc-arm-none-eabi/arm-none-eabi/lib.

In this article we are going to be retargeting printf() to a UART, but if we wanted to do semi-hosting instead we would also add the flag --specs=rdimon.specs. In short, semi-hosting lets you communicate with the target from your host machine via the debugger. If the debugger is not attached, the target is not going to run. If you want to know more about semi-hosting, you can read about it here.

System calls

If you try running make after adding the spec file, gcc will complain about undefined references to symbols, such as __bss_start__ and __bss_end__, and several functions including _read, _write and _sbrk. The two symbols are used by the library to mark the start and end of the .bss section. We already have such symbols in our linker script, but let’s just duplicate these with the required names:

  .bss :
  {
    . = ALIGN(4);
    _sbss = .;
    __bss_start__ = _sbss;
		
    *(.bss)
		
    . = ALIGN(4);
    _ebss = .;
    __bss_end__ = _ebss;
  } >SRAM

The missing functions are the system calls which are the interface between the library and our specific hardware. We must provide an implementation for these system calls, but since we will only be using a few of them, we can just implement stubs for the rest. You can find a list of all the system calls and suggested minimal implementations in the Newlib documentation. Let us start off by copying all of these stubs into a new file syscalls.c, adding a new target for it in the Makefile and making it a dependency of the BINARY target:

$(BINARY): main.o startup.o system_stm32f4xx.o syscalls.o
	$(CC) $(CFLAGS) $(LDFLAGS) $^ -o $(BINARY)

syscalls.o: syscalls.c
	$(CC) $(CFLAGS) $(CPPFLAGS) $^ -c

When we call printf() the characters of the formatted string will eventually be passed to the _write() system call, so we need to provide an actual implementation of that to send the characters over UART to the host machine. Until we get a UART set up, let’s just leave it blank.

Because printf() uses dynamic memory allocation, we also have to implement _sbrk() (which is called by malloc()) to keep track of the heap. Recall that our stack starts at the end of SRAM and thus grows “downwards” in memory, so we will start the heap at the beginning of SRAM (right after the .bss section) and let it grow “upwards”. With the stack and the heap growing towards each other, there is a risk that they overflow into each other – which is bad – so we are going to check for that. My _sbrk(), which is a slightly modified version of the one from the documentation, looks like this:

register char * stack_ptr asm("sp");

caddr_t _sbrk(int incr) {
  extern char __bss_end__;	/* Defined by the linker */
  static char *heap_end;
  char *prev_heap_end;
 
  if (heap_end == 0) {
    heap_end = &__bss_end__;
  }
  prev_heap_end = heap_end;
  if (heap_end + incr > stack_ptr) {
    while (1)
    {
        // Heap and stack collision
    }
  }

  heap_end += incr;
  return (caddr_t) prev_heap_end;
}

If you are worried about dynamic memory allocation, there are several alternative printf() implementations available that rely only on static memory allocation. I have tried this by Marco Paland which is very easy to set up and seems to work fine.

Initialize the library

Before using the library in our application, we must call the initialization function __libc_init_array() in the startup code. Define a prototype for the function in the startup file and then call it just before main():

void main(void);
void __libc_init_array();

void reset_handler(void)
{
  // ... .data/.bss initialization left out
  
  __libc_init_array();
  main();
}

Revise the linker script

We should be able to compile our project now, but if we inspect the .elf file with arm-none-eabi-objdump -h blink.elf we will see a whole bunch of new sections in our executable. We will have to add these new sections to either .text, .data or .bss depending on the section type. This is important because we are explicitly using location counters from the .data and .bss section to set up initialized and uninitialized data in the startup file. So if we leave it as is, these new sections will not be initialized. I have the three sections looking like this in my linker script:

  .text :
  {
    . = ALIGN(4);
		
    *(.text)
    *(.text.*)
    *(.rodata)
    *(.rodata.*)
    KEEP(*(.init))
    KEEP(*(.fini))
    *(.eh_frame)
    *(.ARM.exidx)
		
    . = ALIGN(4);
    _etext = .;
  } >FLASH

  _sidata = LOADADDR(.data);

  .data :
  {
    . = ALIGN(4);
    _sdata = .;
		
    *(.data)
    *(.data.*)
    KEEP(*(.init_array))
    KEEP(*(.fini_array))

    . = ALIGN(4);
    _edata = .;
  } >SRAM AT> FLASH

  .bss :
  {
    . = ALIGN(4);
    _sbss = .;
    __bss_start__ = _sbss;
		
    *(.bss)
    *(.bss.*)
		
    . = ALIGN(4);
    _ebss = .;
    __bss_end__ = _ebss;
  } >SRAM

The KEEP keyword is used to ensure that the sections will not removed by the linker even if seemingly unused. Notice that I have also defined a new symbol _sidata which contains the load address of the .data section. Back in part 1, we simply assumed that the end of the text section (_etext) would always be the same as the load address of the data section. This holds true as long as no other sections squeeze in between, but this new approach is more robust. The reference to _etext when initializing the data section in the startup code has of course been changed to use _sidata instead.

Now we are ready to try calling printf() in our application. In main.c simply #include <stdio.h> and call printf("Hello, World!") in the super loop. After flashing it to the target, if all went well, you should still see the LED blinking. Of course, since we haven’t implemented _write() yet, it won’t actually send anything to the host. We will get to that next.

Setting up the UART

On the Nucleo-board I am using, the microcontroller’s USART2 pins PA2 (TX) and PA3 (RX) are connected to the on-board ST-LINK by default. This allows us to get a virtual COM port through the ST-LINK. The USART peripheral is not particularly complicated, and looking through the registers in the reference manual, it looks like we only have to worry about a few control registers and the baud rate register. For monitoring the status and sending data, we will use the status register and data register. To initialize the USART we will have to do the following:

Enable peripheral clocks for USART2 and GPIOA
Configure PA2 and PA3 for alternate function mode, functioning as TX and RX
Configure the USART control and baud rate register
Enable the USART for transmission

I will create a new module, usart.h and usart.c, for all of our USART functionality and create a function usart_init() for the initialization code.

Enabling the peripheral clocks is done in the same manner as we have done previously:

  /* Enable USART2 clock */
  RCC->APB1ENR |= (1 << RCC_APB1ENR_USART2EN_Pos);
  // do two dummy reads after enabling the peripheral clock, as per the errata
  volatile uint32_t dummy;
  dummy = RCC->APB1ENR;
  dummy = RCC->APB1ENR;

  /* Enable GPIOA clock*/
  RCC->AHB1ENR |= (1 << RCC_AHB1ENR_GPIOAEN);
  // do two dummy reads after enabling the peripheral clock, as per the errata
  dummy = RCC->AHB1ENR;
  dummy = RCC->AHB1ENR;

Now the GPIO pins must be set to alternate function mode in the mode register and to alternate function 7 (AF7) in the alternate function register. The alternate functions for each GPIO pin can be found in the microcontroller’s datasheet. For the STM32F410 I found it in Table 10 on page 41.

  /* Set PA2 and PA3 to alternate function */
  GPIOA->MODER &= ~(GPIO_MODER_MODE2_Msk | GPIO_MODER_MODE3_Msk);
  GPIOA->MODER |= (0b10 << GPIO_MODER_MODE2_Pos) | (0b10 << GPIO_MODER_MODE3_Pos);

  // USART2 is AF7 (found in datasheet)
  GPIOA->AFR[0] &= ~(GPIO_AFRL_AFRL2 | GPIO_AFRL_AFRL3);
  GPIOA->AFR[0] |= (7 << GPIO_AFRL_AFSEL2_Pos) | (7 << GPIO_AFRL_AFSEL3_Pos);

When configuring the USART we only have to set the baud rate register. The default values in the control registers configures the USART with 16x oversampling, 1 start bit, 8 data bits, no stop bits and no parity bit. This will work fine for our purposes. When we have configured the baud rate register we can enable the USART and its transmitter in the control register:

  /* Configure and enable USART2 */
  USART2->BRR = 434; // 115200 baud @ 50 MHz APB1 clock and 16x oversampling
  USART2->CR1 |= USART_CR1_UE | USART_CR1_TE; // USART enable and transmitter enable

The value for the baud rate register might require some explanation. What we are actually configuring in the baud rate register is a clock divider in fixed point 12Q4 format. The USART is clocked by APB1 which we set to 50 MHz in part 2 of this series. If we choose a baud rate of 115200 bps and we know that the USART is oversampling by 16, we can calculate the value for the divider:

\[ \text{DIV} = \frac{50 \text{MHz}}{16 \times 115200 \text {bps}} = 27.127\]

To convert this to 12Q4 format we simply left-shift by 4 (i.e. multiply by 2⁴) and get a register value of 434. This is also explained in section 24.4.4 in the reference manual.

Redirecting printf() to the UART

Now that we have both a standard library and USART in place, we can finish implementing the _write() function and try out printf() for real.

I have opted for a very primitive implementation for now, where I place a character in the USART data register and then wait for the transmission complete flag to be raised in the status register before returning:

void usart_write(USART_TypeDef *usart, char c)
{
    usart->DR = c;
    while (!(usart->SR & USART_SR_TC));
}

This is definitely not an efficient way to do it, but it is good enough for a proof of concept. I’ll let you fiddle around with DMA on your own.

Let’s go back to main() and add a call to usart_init() and print a “Hello, World!” in the super loop with a tick timestamp to check that formatting is working correctly:

  usart_init(USART2);

  while(1)
  {
    GPIOA->ODR ^= (1 << LED_PIN);
    printf("[%d] Hello, World!\r\n", ticks);
    delay_ms(500);
  }

On Ubuntu, I am using minicom to open up a serial port with minicom -D /dev/ttyACM0 and I get:

[0] Hello, World!
[501] Hello, World!
[1003] Hello, World!
[1505] Hello, World!

If we change the time stamp to show seconds as a float instead of ticks like so:

printf("[%.3f] Hello, World!\r\n", (float)ticks/1000.0f);

We get no output:

[] Hello, World!
[] Hello, World!
[] Hello, World!
[] Hello, World!

However, if we enable float support by adding -u _printf_float to our linker flags in the Makefile, it should work as expected:

[0.000] Hello, World!
[0.502] Hello, World!
[1.004] Hello, World!
[1.506] Hello, World!
[2.008] Hello, World!

Conclusion

That’s all there is to it! If you want to take it a step further, you can also receive data from the host with scanf() by implementing the _read() system call.

In part 4 we will be looking at integrating CMake as a meta-build system and perhaps adding some STM libraries to make working with the peripherals a bit easier.

2 thoughts on “STM32 without CubeIDE (Part 3): The C Standard Library and printf()”

Milos says:

July 7, 2023 at 07:59

Hi Kristian,
Thanks for a valuable and very useful information. Just a small remark, I have noticed your git archive for part 3 uses syscalls.c with undefined “_end” symbol and not the “__bss_end__” one mentioned in the Part 3 of your description. This causes error during linking. Maybe you would like to remark it (as you already have done in the Part 1) in your article that syscalls.c should be corrected if someone wants to use your git archive.
Anyway, great job and keep going ,,-:)

Log in to Reply
1. Kristian Klein-Wengel says:
  
  July 8, 2023 at 17:27
  
  Thanks for the heads up, Milos. I’ll get it fixed 🙂
  
  Log in to Reply