I have recently been working on some Python code for interacting with an embedded device from a PC, e.g. setting and reading configuration parameters, reading measurements and doing firmware upgrades. The device uses our own communication protocol, for which we have already accumulated quite a few Python scripts. However, this protocol is used both for our main product as well as several customer projects, so we are looking to make the code a bit more “release-ready”. This includes:
- Reviewing the existing code and refactoring/re-designing for easier maintainability, extensibility and user friendliness
- Writing (in-code) documentation
- Testing and performing static analysis to ensure code quality
- Packaging everything up nicely and publishing it to The Python Package Index so it can be installed from anywhere with
pip install my_package
The purpose of this post is to give a general idea of my process of developing the package, and provide you with inspiration for tools, language features, design patterns, etc. you might want to use in your own projects. I will cover a lot of different subjects in this post, so I will not go into depth with each of them, but I will provide links for further reading. Keep reading below or jump straight to a specific section:
Setting up the development environment
I usually write Python code in PyCharm which is free to use, has intelligent code completion with IntelliSense, checks syntax and PEP8 conformity as you type, makes it easy to set up and manage a virtual environment and you can configure it to use Eclipse hotkey mappings – a good thing for an embedded developer who is used to Eclipse-based IDEĀ“s such as STM32CubeIDE or Code Composer Studio.
I created a new project with a new virtual environment using Python 3.10.
Virtual environment
I practically always create a new virtual environment when starting a new project. You can think of a virtual environment as an isolated Python installation that includes only the packages need for your specific project – completely separated from your global Python installation. This allows you to:
- Use a specific Python version
- Use specific packages (perhaps you need an older version than the one installed globally)
- Keep track of which packages are required for your project
- Avoid polluting the global environment with packages for every single project you create
This is very practical if you want to clone another developers environment or have another developer clone your environment. PyCharm asks you whether it should create a virtual environment for you when you create a new project, making the setup trivial. Even if you prefer to use the command line, the setup is as simple as running:
$ python -m venv .\venv
to create a virtual environment in the directory venv
, and then activating it by running:
$ .\venv\Scripts\activate.bat
We’ll now see (venv)
appear at the beginning of the command line. If we run pip list
we will see that the environment only contains the packages pip
and setuptools
. If we run pip install <some_package>
, the package will only be installed in the virtual environment and have no effect on the global environment. When sharing the project with other developers, we can generate a list (usually named requirements.txt
) of all the packages installed in the virtual environment (and thus required to run the project) with the command pip freeze > requirements.txt
. Other developers can now create their own virtual environment, install all the required packages with pip install -r requirements.txt
and be up and running in no time.
Folder structure
When creating a Python package (described in detail here), there are a few mandatory files which must be added to the project:
pyproject.toml
which tells the build tool how to build your packagesetup.cfg
which contains information about your package, such as name, author, dependencies, etc.
It is recommended to also include a README.md
and a LICENSE
file.
I will create the directory src/<my_package>/
where the source code will live. I will also add an __init__.py
file to the directory to indicate that it is a package. The file will be left empty for now.
I will also add a doc
folder for documentation, tests
folder for unit tests and of course we have a venv
folder for the virtual environment, which PyCharm created for us. What we end up with is a structure like this:
my_project
|--doc/
|--src/
| |--my_package/
| |-- __init__.py
|--tests/
|--venv/
| pyproject.toml
| setup.cfg
| README.md
| LICENSE
In order to try out the package while developing it, we can install it as an editable package with pip install -e <path_to_package_root>
inside the virtual environment. Now we can open up a Python console, type import my_package
and test things out. When you make changes to the source code, just re-open the console and import the package again – no need to reinstall it.
Now that everything is set up, we can get started on the code.
Re-design and refactoring
As with any project that starts out with modest requirements and then grows over time, there comes a time when it will be a good idea to stop and re-evaluate the design and refactor the code before it becomes too unwieldy. If this process is omitted, each new feature will take increasingly longer time to implement, until the code is so spaghetti’d up that implementing even a simple feature takes a inordinate amount of time – and you will probably risk breaking something in the process. I took some time to discuss the existing code with the original developers and together we came up with a list of improvements and design ideas.
The existing code consisted of a single Python file containing mostly procedural code and just a few (fairly large) classes. The first steps were to group related functionality together, determine areas where we should be able to extend the code in the future and generally take on a more object-oriented approach. Keeping the SOLID principles in mind is good way to keep the code clean as you go along.
In the process of refactoring the code I made use of concepts such as abstract base classes, the factory design pattern and type hints, which I will describe below.
Abstract base class (or an “interface” class)
In our custom communication protocol, Device
objects communicate with each other using Packet
objects that consist of a Header
and a Payload
. The Header
always contains the same type of information and is interpreted in the same way, but the interpretation of the Payload
depends on the “payload type” which is indicated in the Header
– and the protocol should be extendable with new user-defined payload types. When the user creates a Payload
to send to a device, he/she knows about the specific type of Payload
. The module responsible for transmitting Packet
s, however, does not care which specific type it is dealing with. It just needs a way to determine the length, get an array of bytes to transmit and perhaps a string representation it can print out the console. To make sure that all payload types conform to this common interface, we can create an abstract base class which all payload types must inherit from:
from abc import ABC, abstractmethod
class Payload(ABC):
@abstractmethod
def __len__(self):
pass
@abstractmethod
def __str__(self):
pass
@abstractmethod
def to_bytes(self):
pass
To create a concrete implementation of the interface, we simply inherit from Payload
:
class ConcretePayload(Payload):
def __init__(self, data):
self.data = data
def __len__(self):
return len(self.data)
def __str__(self):
print("I am a concrete payload")
def to_bytes(self):
return bytes(self.data)
If we forget to implement an @abstractmethod
in the concrete implementation, Python will give us an error as soon as an instance is created – as opposed to raising a NotImplementedError
in the base class, which will only throw an exception if the method is called.
Using an abstract base class is not strictly necessary in Python since the language uses duck typing where the philosophy is: “If it walks()
like a Duck
and quacks()
like a Duck
, it’s probably a Duck
“. This means that if we expect to be able to call e.g. a to_bytes()
method on the object we pass into the transmit function, any object that has a to_bytes()
method will work – the type does not matter.
However, I think creating an abstract base class is a good practice as it makes it very clear which methods are required and will make sure the developer actually implements them in the subclass.
Factory design pattern
Continuing the example from above, when receiving data from a device, I needed a way to create the correct Payload
subclass depending on the payload type in the Header
. The solution I chose for this was to create a factory function that takes the raw byte data and the payload type as arguments and then returns the correct Payload
subclass:
def create_payload(data: array.array, payload_type: int) -> Payload:
if payload_type in payload_subclass_map:
return payload_subclass_map[payload_type](data)
The payload_subclass_map
is simply a dictionary that maps payload_type
to a specific subclass of Payload
. Now, whenever we need to add a new payload type, we simply implement a new Payload
subclass and add it to the payload_subclass_map
.
Type hints and circular imports
Since I usually do most of my programming in statically typed languages such as C and C++, Python’s dynamic typing can make me a bit uncomfortable at times. I like being able to tell a function’s parameter types directly from its definition. Luckily, Python supports type hints which allow us to indicate the type for variables, objects, parameters and return values like so:
a: int = 42
b: float = 3.14
c: str = "Hello, World!"
def is_even(value: int) -> bool:
return (value % 2 == 0)
It is important to remember that these are only hints, meaning that Python will not enforce type annotations and will happily reassign a variable from an int to a boolean. Type hinting does, however, allow us to perform static type checking (e.g. with mypy
), makes it easier to follow objects passed into functions/methods and also nudges the code completion tool in the right direction.
One problem I have run into using type hints is circular imports. Say we have a class Foo
with a helper class Bar
in two separate modules. Foo
imports Bar
and creates an instance of Bar
by passing a reference to itself in the constructor:
from bar import Bar
class Foo:
def __init__(self):
pass
def create_bar(self) -> Bar:
return Bar(self)
def __str__(self) -> str:
return "Foo"
If we write bar.py
without type hints, and run main.py
everything will work just fine:
class Bar:
def __init__(self, foo):
print(f"I am helping {foo}")
from foo import Foo
my_foo = Foo()
my_bar = my_foo.create_bar()
I am helping Foo
However, if we specify that our foo
parameter is of the type Foo
– and thus need to import Foo – we get a circular import error:
from foo import Foo
class Bar:
def __init__(self, foo: Foo):
print(f"I am helping {foo}")
ImportError: cannot import name ‘Foo’ from partially initialized module ‘foo’ (most likely due to a circular import)
Since we are only importing Foo in order to perform static type checking, we can use the typing
package along with postponed evaluation of annotations (introduced in Python 3.7) to ensure that the import only happens when we are running our static type checking tool. To avoid having to enter type hints as strings, we can use from __future__ import annotations
and just enter type hints as usual:
from __future__ import annotations
import typing
if typing.TYPE_CHECKING:
from .foo import Foo
class Bar:
def __init__(self, foo: Foo):
print(f"I am helping {foo}")
And now the program runs as expected. We can now let mypy
analyze the source code and check for type errors.
Testing, type checking and linting
Unit testing
For writing unit tests I will be using the unit testing framework unittest which is part of the standard library. To create a unit test, simply import the unittest
package, create a class that inherits from unittest.TestCase
which will serve as a “test group” and then implement each test as a method in that class. If the module is invoked directly, call unittest.main()
to start the test runner. When using PyCharm, this is automatically generated when you create a new Python unit test file:
import unittest
class MyTestCase(unittest.TestCase):
def test_something(self):
self.assertEqual(True, False) # add assertion here
if __name__ == '__main__':
unittest.main()
As the project grows we will have several test files in the tests
folder. In order for unittest
to automatically discover all the tests in the folder, we will prefix all the test files with test_
(e.g. test_something.py
) and add an empty __init__.py
file. The folder will now look like this:
|--tests/
| |-- __init__.py
| |-- test_something.py
We can now discover and execute all tests in the tests
folder by running the following command from the command line:
$ python -m unittest discover -s tests
Static type checking
Since we have added type hints to our code, we can run a static type check with a tool such as mypy. I installed the package with pip install mypy
and can now run the static type check recursively on the src
folder with:
$ mypy src
You can configure mypy by invoking it with command line arguments or by adding a [mypy]
section with your configuration in either setup.cfg
or a separate mypy.ini
.
Linting
Apart from static type checking, I’ll also do static code analysis with pylint. As described in the documentation, it “checks for errors, enforces a coding standard, looks for code smells, and can make suggestions about how the code could be refactored“.
Basically, it will keep your code a bit cleaner. By default it is very pedantic and you probably want to configure it by creating a .pylintrc
configuration file specifying exactly which errors, warnings, etc. you are interested in.
Automating it all with Jenkins
When writing code I tend to run both the unit test suite and mypy/pylint quite regularly – and (almost) always before pushing to the Git repository. However, to ensure that the tests and analyses are run on every commit, I added a job for the project on our Jenkins build server.
I created a Jenkinsfile
at the root of the repository in which I defined a declarative pipeline with the following stages:
- Do a clean checkout from Git
- Set up virtual environment and install requirements
- Perform static type check (mypy)
- Perform static code analysis/linting (pylint)
- Run unit tests
- Build documentation
The build server polls the Git repository once in a while and executes the pipeline if it detects any new commits. The build results are published to our Microsoft Teams group, so we are immediately made aware if a commit breaks the build.
Documentation
For documentation I used Sphinx, which is undeniably the most widely used documentation tool for Python. Plain text source files are written in the markup language reStructuredText (or reST) which can then be compiled to make a pretty output in various different formats such as HTML or PDF. The documentation can be written both as comments directly in the Python files or as separate .rst
files.
In case we want to host the documentation online, this can be done for free at ReadTheDocs.org. They also provide a Sphinx theme that I like a lot better than the default theme and thus will be using for this project.
To install Sphinx and the ReadTheDocs theme, I open up a terminal in the virtual environment and run:
$ pip install sphinx sphinx-rtd-theme
Now the basic folder structure, configuration files and build files can be created in the docs
folder by running:
$ sphinx-quickstart docs
The docs
folder now looks like this:
|-- docs/
| |-- build/
| |-- source/
| |-- conf.py
| |-- index.rst
| |-- make.bat
| |-- Makefile
The source
folder contains a configuration file conf.py
and a single reST file index.rst
which serves as the entry point for our documentation. The build
folder will contain the compile output which is generated when running the batch script make.bat
.
To generate HTML output we can try to run make.bat html
from the command line, and we will see that the build
directory has been populated. We can open build/html/index.html
to view the documentation.
To use the ReadTheDocs theme instead of the default theme, I’ll add the following to conf.py
:
import sphinx_rtd_theme
extensions = [
...
'sphinx_rtd_theme',
]
html_theme = "sphinx_rtd_theme"
In order to generate documentation from docstrings in our Python source code, we must also enable the autodoc
feature and tell Sphinx where our Python package is located (i.e. in the parent directory). At the top of conf.py
in the “Path setup” section, make sure that these lines are uncommented and that the path points to the parent (..
) instead of the current (.
) directory:
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
Lastly, add autodoc
to the list of extensions:
extensions = [
...
'sphinx_rtd_theme',
'sphinx.ext.autodoc'
]
Now, we can document classes, methods, functions, enums, etc. directly in the Python code using docstrings and include this documentation in the reST files where we find it appropriate using autodoc directives. I find that keeping most of the documentation in the code (where the developers are forced to look at it) increases the probability it being maintained properly.
Building and publishing a package
Building the package
When we get to a point where the code is ready for release, we first check conf.py
to make sure that all package details are the way that we want them – and remember to update the version number for each release. We will first install the build
package with pip install build
, and then run the build command from the root of our package directory:
$ python -m build
A distribution folder dist/
is created with a zip archive (.tar.gz) and a wheel (.whl). The wheel is used to install the package (i.e. pip install my_package.whl
) and the zip contains the source code, LICENSE, README.md, setup.cfg and pyproject.toml. The zip does not contain docs, tests or any other folders – those are just for the developers.
We now have the files needed to publish the package.
Publishing the package
Whenever we are installing a Python package with pip
, it searches for the package in the Python Package Index (PyPI) by default. You can also tell pip
explicitly which index to use by specifying the --index-url
at the command line. For example to install a package from Test PyPI, which is a separate index created to let you play around with the distribution tools, we can run pip
like this:
$ pip install --index-url https://test.pypi.org/project/my-package
To upload our package to the index we must first create an account. I already have an account for Test PyPI, so I will use that for now.
The tool used to publish the package is called twine
and can be installed with pip install twine
. Then publishing to Test PyPI is as simple as calling twine
with the index and the path to the local distribution files:
$ python -m twine upload --repository testpypi dist/*
After entering the credentials, the package should be visible at https://test.pypi.org/project/<my_package> after a few minutes.
And that’s it! I learned a lot about Python through this process and hope this post has given you some ideas that you can use in your own projects.