Tag Archive for: HIL

How to publish GitHub Actions summaries

Automated testing is core to maintaining a stable codebase. But running the tests isn’t enough. You also need to get useful information from the outcome. The Golioth Firmware SDK is now running 530 tests for each PR/merge, and recently we added a summary to the GitHub action that helps to quickly review and discover the cause of failures. This uses a combination of pytest (with or without Zephyr’s twister test runner), some Linux command line magic, and an off-the-shelf GitHub summary Action.

Why summaries are helpful

A table with a different hardware device in each row and columns for passed, failed, skipped, and time

Test summary shown on GitHub Actions page

From this summary you can pretty quickly figure out if failures are grouped around a particular type of hardware, or across the board. Clicking the name of a dev board jumps to another table summarizing the outcomes.

A table with test names in each row and columns for passed, failed, skipped, and time

JUnit XML board summary

Reviewing the summary for the mimxrt1024_evk board, we can see that six test suites were run, with one test in the “ota” suite failing. Clicking that test suite name jumps to the block under the table that shows the error. In this case, we test all possible OTA reason and state codes the device can send to the server. This test failed because there is a latency issue between the device sending the code and the test reading it from the server.

While this doesn’t show everything you might want to know, it’s enough to decide if you need to dig further or if this is just a fleeting issue.

It’s worth noting that we also use Allure Reports (here is the latest for our main branch) to track our automated tests. This provides quite a bit more information, like historic pass/fail reporting. I’ll write a future post to cover how we added those reports to our CI.

Generating test summaries

Today we’re discussing test summaries using JUnit XML reports. Pytest already includes report generation for JUnit XML which makes it a snap to add to your existing tests. Browsing through pytest --help we see a simple flag will generate the report.

--junit-xml=path      Create junit-xml style report file at given path
--junit-prefix=str    Prepend prefix to classnames in junit-xml output

For our testing, we simply added a name for the summary file. Here’s an example using pytest directly:

pytest --rootdir . tests/hil/tests/$test      \
  --some-other-flags-here                     \
  --junitxml=summary/hil-linux-${test}.xml

If you use Zephyr, running twister with pytest already automatically produces JUnit XML formatted test reports at twister-out/twister_suite_report.xml.

Gathering and post-processing reports

Gathering up Twister-generated reports is pretty easy since Twister already batches reports into suites that are uniquely named by test. Since we’re running matrix tests, we give the files a unique name and store them in a summary/ folder in order to use the artifact-upload action later.

- name: Prepare CI report summary
  if: always()
  run: |
    rm -rf summary
    mkdir summary
    cp twister-out/twister_suite_report.xml summary/samples-zephyr-${{ inputs.hil_board }}.xml

On the other hand, the integration tests we run using pytest directly require a bit more post-processing. Since those generate individual suites, a bit of xml hacking is necessary to group them by the board used in the test run.

- name: Prepare summary
  if: always()
  shell: bash
  run: |
    sudo apt install -y xml-twig-tools
    xml_grep \
      --pretty_print indented \
      --wrap testsuites \
      --descr '' \
      --cond "testsuite" \
      summary/*.xml \
      > combined.xml
    mv combined.xml summary/hil-zephyr-${{ inputs.hil_board }}.xml

This step installs the xml-twig-tools package so that we have access to xml_grep. This magical Linux tool is able to match xml summary output filename patterns, unwrapping the testsuites item in each and rewrapping them all into one testsuites entry in a new file. This way we get a summary entry for each board instead of individual entries for every suite that a board runs.

As noted before, we use the upload-artifact action to upload all of these XML files for summarization at the end of the CI run.

Publish the comprehensive summary

The final step is to download all the XML files we’ve collected and publish a summary. I found the test-reporting action is excellent for this. We use a trigger at the end of our test run to handle the summary.

publish_summary:
  needs:
    - hil_sample_esp-idf
    - hil_sample_zephyr
    - hil_sample_zephyr_nsim_summary
    - hil_test_esp-idf
    - hil_test_linux
    - hil_test_zephyr
    - hil_test_zephyr_nsim_summary
  if: always()
  uses: ./.github/workflows/reports-summary-publish.yml

Note that this uses the needs property to ensure all of the tests running in parallel finish before trying to generate the summary.

name: Publish Summary of all test to Checks section

on:
  workflow_call:

jobs:
  publish-test-summaries:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout repository
      uses: actions/checkout@v4

    - name: Gather summaries
      uses: actions/download-artifact@v4
      with:
        path: summary
        pattern: ci-summary-*
        merge-multiple: true

    - name: Test Report
      uses: phoenix-actions/test-reporting@v15
      if: success() || failure()
      with:
        name: Firmware Test Summary
        path: summary/*.xml
        reporter: java-junit
        output-to: step-summary

Finally, the job responsible for publishing the test summary runs. Each different pytest job uploaded its JUnit XML summary files with the ci-summary- prefix, which is used now to download and merge all files into a summary directory. The test-reporting action is then called with a path pattern to locate those files. The summary is automatically added to the bottom of the GitHub actions summary page.

One note on visibility with this GitHub action: we added the output-to: step-summary after first implementing this system. Ideally, these summaries should be available as their own line item on the “checks” tab of a GitHub pull request. But in practice we found they got lumped in as a step on a random job often unrelated to the HIL tests. Outputting to the step-summary ensures we always know how to find them. Go to the HIL Action page and scrolling down past the job/step graph.

Making continuous integration more useful

There’s a lot that can go wrong in these types of integration tests. We rely on the build, the programmer, the device, the network connection (cellular, WiFi, and Ethernet), the device-to-cloud connection, and the test-runner to cloud API to all work flawlessly for these tests to pass. When they don’t, we need to know where to look for trouble and how to analyze the root cause.

Adding this summary has made it much easier to glean useful knowledge from hundreds of tests. The next installment of our hardware testing series will cover using Allure Reports to add more context to why and how often tests are failing. Until then, check our our guide on using pytest for hardware testing and our series on setting up your own Hardware-in-the-Loop (HIL) tests.

There is no substitute for testing firmware on real target hardware, but manual testing is both error prone and time consuming. Today, I’ll walk you through the process of using pytest to automate your hardware testing. There are three main building blocks to this approach

1. Compile the firmware you want to test

2. Use a pytest fixture to program the firmware onto the device and monitor the output

3. Run the tests to verify the device output

Pytest is a framework for — you guessed it — testing! When working with embedded hardware you need to spend some time setting up fixtures that connect it to pytest. Once that’s in place, you’ll be surprised at how fast you can write tests for your firmware projects. Let’s dive in.

This approach works for any firmware platform

In today’s example, the only platform-specific portion is a program() function that needs to know the commands used to flash firmware on your device. But this is easy to adapt for any platform. At Golioth we take advantage of this since the Golioth Firmware SDK tests hardware on two different RTOSes, using silicon from a handful of different vendors.

For this post I’m targeting a Nordic nRF52840 using Zephyr. However, it depends purely on pytest and is not using Twister (Zephyr’s dedicated test running application). We do use Twister on our Zephyr sample testing, but those are only a portion of our hardware-in-the-loop tests. We’ll publish a separate post detailing that process in the future.

Today’s demo involves adding a subfolder to your firmware project named pytest, and creating two file inside:

  • pytest/conftest.py
  • pytest/test_hello_world.py

A code walk-through follows, with the full source available as a gist.

Step 1: Compile your firmware

We’ll be testing firmware by monitoring output from a serial connection. Let’s start with a “Hello World” application. The hello_world sample in the Zephyr tree simply prints out “Hello World! <name of your board>” which is perfect for this demonstration.

west build -b nrf52840dk_nrf52840 . -p

The above build command generates a binary at build/zephyr/zephyr.hex. We do not need to flash it to the device, we’ll use a pytest function for this.

Step 2: Write a pytest fixture for this board

In this section we’ll populate a conftest.py file that can be reused by multiple tests.

Install dependencies

Make sure you have Python installed and then use pip to install the libraries needed for this project. We’ll be using the AnyIO and Trio libraries for asynchronous support, and Python will need access to the serial port. We also want a Python library for programming the target chip:

pip install pytest anyio trio pyserial

# This one is used to program Nordic devices
pip install pynrfjprog

Now add the necessary imports to the top of your conftest.py file:

import pytest
import re
import serial
from time import time

# Used to flash binary to nrf52840dk
from pynrfjprog import LowLevel

import pytest
import re
import serial
from time import time

# Used to flash binary to nrf52840dk
from pynrfjprog import LowLevel

@pytest.fixture(scope='session')
def anyio_backend():
    return 'trio'

The final block is a special pytest fixture that tells AnyIO to use the Trio backend.

Setting up command line arguments

While not strictly necessary for a one-off test, adding command line arguments makes your test easier to run using continuous integration (CI) tools like GitHub Actions.

def pytest_addoption(parser):
    parser.addoption("--port",
            help="The port to which the device is attached (eg: /dev/ttyACM0)")
    parser.addoption("--baud", type=int, default=115200,
            help="Serial port baud rate (default: 115200)")
    parser.addoption("--fw-image", type=str,
            help="Firmware binary to program to device")
    parser.addoption("--serial-number", type=str,
            help="Serial number to identify on-board debugger")

@pytest.fixture(scope="session")
def port(request):
    return request.config.getoption("--port")

@pytest.fixture(scope="session")
def baud(request):
    return request.config.getoption("--baud")

@pytest.fixture(scope="session")
def fw_image(request):
    return request.config.getoption("--fw-image")

@pytest.fixture(scope="session")
def serial_number(request):
    return request.config.getoption("--serial-number")

Above we’ve used a special pytest_addoption() function to add command line flags for port, baud, firmware filename, and programmer serial number. A fixture is added to return each of these so they are available to other fixtures (and in the tests themselves).

Create a class for your board

We want to create a class to represent the device under test. Golioth has an entire directory full of board class definitions for different vendors which we use in our automated testing. For this example we really just need a way to program the board and monitor its serial output.

class Board():
    def __init__(self, port, baud, fw_image, serial_number):
        self.port = port
        self.baud = baud
        self.fw_image = fw_image
        self.serial_number = serial_number

        #program firmware
        self.program(fw_image)

        self.serial_device = serial.Serial(port, self.baud, timeout=1, write_timeout=1)

    def program(self, fw_image):
        with LowLevel.API() as api:
            api.connect_to_emu_with_snr(int(self.serial_number))
            api.erase_all()
            api.program_file(self.fw_image)
            api.sys_reset()
            api.go()
            api.close()

    def wait_for_regex_in_line(self, regex, timeout_s=20, log=True):
        start_time = time()
        while True:
            self.serial_device.timeout=timeout_s
            line = self.serial_device.read_until().decode('utf-8', errors='replace').replace("\r\n", "")
            if line != "" and log:
                print(line)
            if time() - start_time > timeout_s:
                raise RuntimeError('Timeout')
            regex_search = re.search(regex, line)
            if regex_search:
                return regex_search

The Board class implements a program() function specific to flashing firmware onto this device. You will want to replace this function for your own target hardware. Note that when this is instantiated, the init() function will call the program() function, flashing the firmware onto the board at the start of the test suite.

The Board class also implements a wait_for_regex_in_line() function that is a fancy way to match lines printed in the serial terminal. This should be transferable to any board that prints to serial (in our case, via a USB connection). This function includes a timeout feature, which means your test will not wait forever when a device is misbehaving.

@pytest.fixture(scope="session")
def board(port, baud, fw_image, serial_number):
    return Board(port, baud, fw_image, serial_number)

The final piece of the puzzle is a fixture that makes the Board available to your test. The session scope ensures that your board will only be instantiated once per test-run.

Step 3: Write your tests

The hard work is behind us, this step is simple by comparison. Create a file prefixed with the word “test” to host your tests. We’ll call this test_hello_world.py.

import pytest

pytestmark = pytest.mark.anyio

async def test_hello(board):
    board.wait_for_regex_in_line("Hello World");

We begin by importing pytest, then using the special pytestmark directive to indicate we want to use AnyIO for asynchronous functions. Each function that is declared with the test_ prefix in the function name will be automatically run by pytest and individually reported with a pass/fail.

Notice in this case that we are using a regex to match “Hello World” even though the full message received will be “Hello World! nrf52840dk_nrf52840”.

Running the test

Let’s run this test, remembering to supply the necessary arguments for the board programming and serial output to work:

➜ pytest pytest/test_hello_world.py --port /dev/ttyACM0 --baud 115200 --fw-image build/zephyr/zephyr.hex --serial-number 1050266122

Pytest includes colorful output with a summary of the tests:

Black terminal screen showing the green success messages from a pytest run.

You don’t get a lot of output for a successful test. But if you change the matched test to something that is not expected, we should get a timeout error. You can see that warnings and errors cause more information to be printed. Note the angle bracket that indicates the assert that caused the failure. The error message is printed further down in red, followed by the actual output received from the board:

Black terminal screen shows an error output several lines long indicating a timeout occurred during a test.

While this example includes just a single test that watches for output, you can grow this to many tests that interact with the board. For instance, the Golioth RPC service tests implement about a dozen tests that prompt the target board to react by sending remote procedure calls from the Golioth cloud via our REST API, verifying the output from each.

Automating the tests

We have already automated this hello_world test. But we’re still running it manually on the command line. You can use GitHub self-hosted runners to connect your own devices to GitHub Actions for true automation.

Golioth has built extensive hardware-in-the-loop automation that does just this. You can check out our workflows to get a better picture of how this works. The step that calls pytest should include syntax you recognize, pointing to the test file and passing the configuration in as command line arguments:

- name: Run test
  shell: bash
  env:
    hil_board: ${{ inputs.hil_board }}
  run: |
    source /opt/credentials/runner_env.sh
    PORT_VAR=CI_${hil_board^^}_PORT
    SNR_VAR=CI_${hil_board^^}_SNR
    for test in `ls tests/hil/tests`
    do
      pytest --rootdir . tests/hil/tests/$test                            \
        --board ${hil_board}                                              \
        --port ${!PORT_VAR}                                               \
        --fw-image ${test}-${{ inputs.binary_name }}                      \
        --serial-number ${!SNR_VAR}                                       \
        --api-url ${{ inputs.api-url }}                                   \
        --api-key ${{ secrets[inputs.api-key-id] }}                       \
        --wifi-ssid ${{ secrets[format('{0}_WIFI_SSID', runner.name)] }}  \
        --wifi-psk ${{ secrets[format('{0}_WIFI_PSK', runner.name)] }}    \
        --mask-secrets                                                    \
        --timeout=600
    done

Resources

Golioth is an IoT company that supports as much custom hardware as possible: a multitude of microcontrollers and many different connection types. This presents a challenge when testing on real hardware.

We developed tooling that tests the Golioth Firmware SDK on actual boards. Known as Hardware-in-the-Loop (HIL) testing, it’s an important part of our CI strategy. Our latest development is automatically detecting boards connected to our CI machines; these can recognize what type of hardware is attached to each self-hosted test runner. In a nutshell, you plug in the board to USB and run a workflow to figure out what you just plugged it.

Let’s take a look at how we did that, and why it’s necessary.

Goals and Challenges of Hardware Testing

So the fundamental goals here are:

  1. Run tests on actual hardware
  2. Make the number of tests scalable
  3. Make the number of hardware types scalable

If we were running all of our firmware tests manually, we’d be fine on goal #1. But when scaling and repeatability comes into play, we start to look at automation. In particular, we look at making it easy to create more of the self-hosted runners (the computers running the automated test programs).

Adding a new board to one of those runners is no small task. If you build a self-hosted runner today, then next week add two new hardware variations that need testing, how will the runner know how to interface with those boards?

Frequent readers of the Golioth blog will remember that we already set up self-hosted runners, so go back and look the first post and second post in this series. Those articles detail how we are using GitHub’s self-hosted runner infrastructure to use workflows to perform Hardware-in-the-Loop tests. I also did a Zephyr tech talk on self-hosted runners.

Github workflow output from board recognition processTo make things more scalable, we developed a workflow that queries all of the hardware connected to a runner. It runs some tests to determine the type of hardware, the port, and the programmer serial number (or other unique id) of that hardware. This information is written to a yaml file for that runner and made available to all other workflows that need to run on real hardware.

Recognizing Microcontroller Boards Connected Via USB

Self-hosted runners are nothing more than a single-board computer running Linux and connected to the internet (and in our case connected to GitHub). These runners are in the home offices of my colleagues around the world, but it doesn’t matter where they are. Our goal is that we don’t need to touch the runners after initial set up. If we add new hardware, just plug in a USB cable and the hardware part of the setup is done.

The first step in recognizing what new hardware has been plugged in is listing all the serial devices. Linux provides a neat little way of doing this by the device ID:

golioth@orangepi3-lts:~$ ls /dev/serial/by-id -1a
.
..
usb-SEGGER_J-Link_000729255509-if00
usb-SEGGER_J-Link_001050266122-if00
usb-SEGGER_J-Link_001050266122-if02
usb-Silicon_Labs_CP2102N_USB_to_UART_Bridge_Controller_4ea3e6704b5fec1187ca2c5f25bfaa52-if00-port0

This represents three devices: Nordic, NXP, and Espressif. For now we assume the SiLabs USB-to-UART is an Espressif device because we don’t have any other devices that use that chip. However, there are multiple SEGGER entries, so we need to sort those out. We also need to know which type of ESP32 board is connected.

We use regular expressions to pull out the unique serial numbers from each of these and perform the board identification in the next step.

Using J-Link to Identify Board Type

Boards that use J-Link include a serial number in their by-id listing that can be used to gather more information. This works great for Nordic development boards. It works less great for NXP boards.

from pynrfjprog import LowLevel
from board_classes import Board

chip2board = {
    # Identifying characteristic from chip: Board name used by Golioth
    LowLevel.DeviceName.NRF9160: "nrf9160dk",
    LowLevel.DeviceName.NRF52840: "nrf52840dk",
    LowLevel.DeviceName.NRF5340: "nrf7002dk"
    }

def get_nrf_device_info(snr):
    with LowLevel.API() as api:
        api.connect_to_emu_with_snr(snr)
        api.connect_to_device()
        api.halt()
        return api.read_device_info()

def nrf_get_device_name(board: Board):
    with LowLevel.API() as api:
        snr_list = api.enum_emu_snr()

    if board.snr not in snr_list:
        return None

    try:
        device_info = get_nrf_device_info(board.snr)
        if not device_info:
            autorecognized = False
        else:
            for i in device_info:
                if i in chip2board:
                    board.name = chip2board[i]
                    autorecognized = True
    except Exception as e:
        print("Exception in find_nrf_device():", e)
        autorecognized = False

    return autorecognized

Nordic provides a Python package for their nrfjprog tool. Its low-level API can read the target device type from the J-Link programmer. Since we already have the serial number included in the port listing, we can use the script above to determine which type of chip is on each board. For now, we are only HIL testing three Nordic boards and they all have different processors which makes it easy to automatically recognize these boards.

An NXP mimxrt1024_evk board is included in our USB output from above. This will be recognized by the script but the low-level API will not be able to determine what target chip is connected to the debugger. For now, we take the port and serial number information and manually add the board name to this device. We will continue to refine this method to increase automation.

Using esptool.py to Guess Board Type

Espressif boards are slightly trickier to quantify. We can easily assume which devices are ESP32 based on the by-id path. And esptool.pyEspressif’s software tool for working with these chips – can be used in a roundabout way to guess which board is connected.

import esptool
from board_classes import Board

chip2board = {
    # Identifying characteristic from chip: Board name used by Golioth
    'ESP32': 'esp32_devkitc_wroom',
    'ESP32-C3': 'esp32c3_devkitm',
    'ESP32-D0WD-V3': 'esp32_devkitc_wrover',
    'ESP32-S2': 'esp32s2_saola',
    'ESP32-S3': 'esp32s3_devkitc'
    }

## caputure output: https://stackoverflow.com/a/16571630/922013
from io import StringIO 
import sys

class Capturing(list):
    def __enter__(self):
        self._stdout = sys.stdout
        sys.stdout = self._stringio = StringIO()
        return self
    def __exit__(self, *args):
        self.extend(self._stringio.getvalue().splitlines())
        del self._stringio    # free up some memory
        sys.stdout = self._stdout
## end snippet

def detect_esp(port):
    try:
        with Capturing() as output:
            esptool.main(["--port", port, "read_mac"])

        for line in output:
            if line.startswith('Chip is'):
                chip = line.split('Chip is ')[1].split(' ')[0]
                if chip in chip2board:
                    return chip2board[chip]
                else:
                    return chip
    except Exception as e:
        print("Error", e)
        return None

def esp_get_device_name(board: Board):
    board.name = detect_esp(board.port)

One of the challenges here is that esptool.py doesn’t include an API to simply return the chip designation. However, part of the standard output for any command includes that information. So we listen to the output when running the read_mac command and use simple string operations to get the chip designation.

golioth@orangepi3-lts:~$ esptool.py read_mac
esptool.py v4.6.2
Found 4 serial ports
Serial port /dev/ttyUSB0
Connecting....
Detecting chip type... Unsupported detection protocol, switching and trying again...
Connecting.....
Detecting chip type... ESP32
Chip is ESP32-D0WD-V3 (revision v3.0)
Features: WiFi, BT, Dual Core, 240MHz, VRef calibration in efuse, Coding Scheme None
Crystal is 40MHz
MAC: 90:38:0c:eb:0a:28
Uploading stub...
Running stub...
Stub running...
MAC: 90:38:0c:eb:0a:28
Hard resetting via RTS pin...

While this doesn’t tell us what board is connected, a lookup table of the ESP32 dev boards Golioth commonly tests makes it easy to identify the board. I feel like there should be a way to simply use the mac address to determine the chip type (after all, esptool.py is doing this somehow) but I haven’t yet found documentation about that.

Using Board Configuration

Now we know the board name, serial port, and serial number for each connected board. What should be done with that information? This is the topic of a future post. Generally speaking, we write it to a yaml file in the repository that is running the recognition workflow.

Yaml file showing board name, serial port, and serial number

In a future workflow, that information is written back to the self-hosted runner where it can be used by other workflows. And as part of that process, we assign labels such as has_nrf52840dk so GitHub knows which runners to use for any given test.

Take Golioth for a Test Drive

We do all this testing so that you know the Golioth platform is a rock solid way to manage your IoT fleet. Don’t reinvent the wheel, give Golioth’s free Dev Tier a try and get your fleet up and running fast.

This is the second part of How Golioth uses Hardware-in-the-Loop (HIL) Testing. In the first part, I covered what HIL testing is, why we use it at Golioth, and provided a high-level view of the system.

In this post, I’ll talk about each step we took to set up our HIL. It’s my belief that HIL testing should be a part of any firmware testing strategy, and my goal with this post is to show you that it’s not so difficult to get a minimum viable HIL set up within one week.

The Big Pieces

To recap, this is the system block diagram presented at the end of part 1:

The Raspberry Pi acts as a GitHub Actions Self-Hosted Runner which communicates with microcontroller devices via USB-to-serial adapters.

The major software pieces required are:

  • Firmware with Built-In-Test Capabilities – Runs tests on the device (e.g. ESP32-S3-DevKitC), and reports pass/fail for each test over serial.
  • Python Test Runner Script – Initiates tests and checks serial output from device. Runs on the self-hosted runner (e.g. Raspberry Pi).
  • GitHub Actions Workflow – Initiates the python test runner and reports results to GitHub. Runs on the self-hosted runner.

Firmware Built-In-Test

To test our SDK, we use a special build of firmware which exposes commands over a shell/CLI interface. There is a command named built_in_test which runs a suite of tests in firmware:

esp32> built_in_test
...
12 Tests 0 Failures 0 Ignored 
../main/app_main.c:371:test_connects_to_wifi:PASS
../main/app_main.c:378:test_golioth_client_create:PASS
../main/app_main.c:379:test_connects_to_golioth:PASS
../main/app_main.c:381:test_lightdb_set_get_sync:PASS
../main/app_main.c:382:test_lightdb_set_get_async:PASS
../main/app_main.c:383:test_lightdb_observation:PASS
../main/app_main.c:384:test_golioth_client_heap_usage:PASS
../main/app_main.c:385:test_request_dropped_if_client_not_running:PASS
../main/app_main.c:386:test_lightdb_error_if_path_not_found:PASS
../main/app_main.c:387:test_request_timeout_if_packets_dropped:PASS
../main/app_main.c:388:test_client_task_stack_min_remaining:PASS
../main/app_main.c:389:test_client_destroy_and_no_memory_leaks:PASS

These tests cover major functionality of the SDK, including connecting to Golioth servers, and sending/receiving CoAP messages. There are also a couple of tests to verify that stack and heap usage are within allowed limits, and that there are no detected memory leaks.

Here’s an example of one of the tests that runs on the device, test_connects_to_golioth:

#include "unity.h"
#include "golioth.h"

static golioth_client_t _client;
static SemaphoreHandle_t _connected_sem;

static void on_client_event(golioth_client_t client, 
                            golioth_client_event_t event, 
                            void* arg) {
    if (event == GOLIOTH_CLIENT_EVENT_CONNECTED) {
        xSemaphoreGive(_connected_sem);
    }
}

static void test_connects_to_golioth(void) {
    TEST_ASSERT_NOT_NULL(_client);
    TEST_ASSERT_EQUAL(GOLIOTH_OK, golioth_client_start(_client));
    TEST_ASSERT_EQUAL(pdTRUE, xSemaphoreTake(_connected_sem, 10000 / portTICK_PERIOD_MS));
}

This test attempts to connect to Golioth servers. If it connects, a semaphore is given which the test waits on for up to 10 seconds (or else the test fails). For such a simple test, it covers a large portion of our SDK, including the network stack (WiFi/IP/UDP), security (DTLS), and CoAP Rx/Tx.

We use the Unity test framework for basic test execution and assertion macros.

If you’re curious about the details, you can see the full test code here: https://github.com/golioth/golioth-firmware-sdk/blob/main/examples/esp_idf/test/main/app_main.c

Python Test Runner

Another key piece of software is a Python script that runs on the self-hosted runner and communicates with the device over serial. We call this script verify.py. In this script, we provision the device with credentials, issue the built_in_test command, and verify the serial output to make sure all tests pass.

Here’s the main function of the script:

def main():
    port = sys.argv[1]

    # Connect to the device over serial and use the shell CLI to interact and run tests
    ser = serial.Serial(port, 115200, timeout=1)

    # Set WiFi and Golioth credentials over device shell CLI
    set_credentials(ser)
    reset(ser)

    # Run built in tests on the device and check output
    num_test_failures = run_built_in_tests(ser)
    reset(ser)

    if num_test_failures == 0:
        run_ota_test(ser)

    sys.exit(num_test_failures)

This opens the serial port and calls various helper functions to set credentials, reset the board, run built-in-tests, and run an OTA test. The exit code is the number of failed tests, so 0 means all passed.

The script spends most of its time in this helper function which looks for a string in a line of serial text:

def wait_for_str_in_line(ser, str, timeout_s=20, log=True):
    start_time = time()
    while True:
        line = ser.readline().decode('utf-8', errors='replace').replace("\r\n", "")
        if line != "" and log:
            print(line)
        if "CPU halted" in line:
            raise RuntimeError(line)
        if time() - start_time > timeout_s:
            raise RuntimeError('Timeout')
        if str in line:
            return line

The script will continually read lines from serial for up to timeout_s seconds until it finds what it’s looking for. And if “CPU halted” ever appears in the output, that means the board crashed, so a runtime error is raised.

This verify.py script is what gets invoked by the GitHub Actions Workflow file.

GitHub Actions Workflow

The workflow file is a YAML file that defines what will be executed by GitHub Actions. If one of the steps in the workflow returns a non-zero exit code, then the workflow will fail, which results in a red ❌ in CI.

Our workflow is divided into two jobs:

  • Build the firmware and upload build artifacts. This is executed on cloud servers, since building the firmware on a Raspberry Pi would take a prohibitively long time (adding precious minutes to CI time).
  • Download firmware build, flash the device, and run verify.py. This is executed on the self-hosted runner (my Raspberry Pi).

The workflow file is created in .github/workflows/test_esp32s3.yml. Here are the important parts of the file:

name: Test on Hardware

on:
  push:
    branches: [ main ]
  pull_request:

jobs:
  build_for_hw_test:
    runs-on: ubuntu-latest
    steps:
    ...
    - name: Build test project
      uses: espressif/esp-idf-ci-action@v1
      with:
        esp_idf_version: v4.4.1
        target: esp32s3
        path: 'examples/test'
    ...

  hw_flash_and_test:
    needs: build_for_hw_test
    runs-on: [self-hosted, has_esp32s3]
    ...
    - name: Flash and Verify Serial Output
      run: |
        cd examples/test
        python flash.py $CI_ESP32_PORT && python verify.py $CI_ESP32_PORT

The first job runs in a normal GitHub runner based on Ubuntu. For building the test firmware, Espressif provides a handy CI Action to do the heavy lifting.

The second job has a needs: build_for_hw_test requirement, meaning it must wait until the first job completes (i.e. firmware is built) before it can run.

The second job has two further requirements – it must run on self-hosted and there must be a label has_esp32s3. GitHub will dispatch the job to the first available self-hosted runner that meets the requirements.

If the workflow completes all jobs, and all steps return non-zero exit code, then everything has passed, and you get the green CI checkmark ✅.

Self-Hosted Runner Setup

With all of the major software pieces in place (FW test framework, Python runner script, and Actions workflow), the next step is to set up a machine as a self-hosted runner. Any machine capable of running one of the big 3 operating systems will do (Linux, Windows, MacOS).

It helps if the machine is always on and connected to the Internet, though that is not a requirement. GitHub will simply not dispatch jobs to any self-hosted runners that are not online.

Install System Dependencies

There are a few system dependencies that need to be installed first. On my Raspberry Pi, I had to install these:

sudo apt install \
    python3 python3-pip python3-dev python-dev \
    libffi-dev cmake direnv wget gcovr clang-format \
    libssl-dev git
pip3 install pyserial

Next, log into GitHub and create a new self-hosted runner for your repo (Settings -> Actions -> Runners -> New self-hosted runner):

When you add a new self-hosted runner, GitHub will provide some very simple instructions to install the Actions service on your machine:

When I ran the config.sh line, I added --name and --unattended arguments:

./config.sh --url <https://github.com/golioth/golioth-firmware-sdk> --token XXXXXXXXXXXXXXXXXXXXXXXXXXXXX --name nicks-raspberry-pi --unattended

Next, install the service, which ensures GitHub can communicate with the self-hosted runner even if it reboots:

cd actions-runner
sudo ./svc.sh install

If there are any special environment variables needed when executing workflows, these can be added to runsvc.sh. For instance, I defined a variable for the serial port of the ESP32:

# insert anything to setup env when running as a service
export CI_ESP32_PORT=/dev/ttyUSB0

Finally, you can start the service:

sudo ./svc.sh start

And now you should see your self-hosted runner as “Idle” in GitHub Actions, meaning it is ready to accept jobs:

I added two custom labels to my runner: has_esp32s3 and has_nrf52840dk, to indicate that I have those two dev boards attached to my self-hosted runner. The workflow file uses these labels so that jobs get dispatched to self-hosted runners only if they have the required board.

And that’s it! With those steps, a basic HIL is up and running, integrated into CI, and covering a large amount of firmware code on real hardware.

Security

There’s an important note from GitHub regarding security of self-hosted runners:

Warning: We recommend that you only use self-hosted runners with private repositories. This is because forks of your repository can potentially run dangerous code on your self-hosted runner machine by creating a pull request that executes the code in a workflow.

I also stand by that recommendation. But it’s not always feasible if you’re maintaining an open-source project. Our SDKs are public, so this warning certainly applies to us.

If you’re not careful, you could give everyone on the Internet root access to your machine! It wouldn’t be very hard for someone to submit a pull request that modifies the workflow file to run whatever arbitrary Linux commands they want.

We mitigate most of the security risks with three actions.

1. Require approval to run workflows for all outside contributors.

This means someone from the Golioth organization must click a button to run the workflow. This allows the repo maintainer time to review any workflow changes before allowing the workflow to run.

This option can be enabled in Settings -> Actions -> General:

Once enabled, pull requests will have a new “Approve and Run” button:

This is not ideal if you’re running a large open-source project with lots of outside contributors, but it is a reasonable compromise to make during early stages of a project.

2. Avoid logging sensitive information

The log output of any GitHub Actions runner is visible publicly. If there is any sensitive information printed out in the workflow steps or from the serial output of the device, then this information could easily fall into the wrong hands.

For our purposes, we disabled logging WiFi and Golioth credentials.

3. Utilize GitHub Actions Secrets

You can use GitHub Actions Secrets to store sensitive information that is required by the runner. In our repo, we define a secret named `GOLIOTH_API_KEY` that allows us access to the Golioth API, which is used to deploy OTA images of built firmware during the workflow run.

If you want to know more about self-hosted runner security, these resources were helpful to me:

Tips and Tricks

Here are a few extra ideas to utilize the full power of GitHub Actions Self-Hosted Runners (SHRs):

  • Use the same self-hosted runner in multiple repos. You can define SHRs at the organization level and allow their use in more than one repo. We do this with our SHRs so they can be used to run tests in the Golioth Zephyr SDK repo and the Golioth ESP-IDF SDK repo.
  • Use labels to control workflow dispatch. Custom labels can be applied to a runner to control when and how a workflow gets dispatched. As mentioned previously, we use these to enforce that the SHR has the dev board required by the test. We have a label for each dev board, and if the admin of the SHR machine has those dev boards attached via USB, then they’d add a label for each one. This means that machine will only receive workflows that are compatible with that machine. If a board is acting up or requires maintenance, the admin can simply disconnect the board and remove the label to prevent further workflows from executing.
  • Distributed self-hosted runners to load balance and minimize CI bottlenecks. If there is only one self-hosted runner, it can become a bottleneck if there are lots of jobs to run. Also, if that runner loses Internet, then CI is blocked indefinitely. To balance the load and create more robust HIL infrastructure, it’s recommended to utilize multiple self-hosted runners. Convince your co-workers to take their unused Intel NUC off the shelf and follow the steps in the “Self-Hosted Runner Setup” section to add another runner to the pool. This allows for HIL tests to run in a distributed way, no matter where in the world you are.

Here’s a scenario that might sound familiar.

You’ve added a new feature to the firmware, about 800 lines of code, along with some simple unit tests. All unit tests are passing. The code coverage report is higher than ever. You pat yourself on the back for a job well done.

But then – for a brief moment, conviction wavers as a voice in your head whispers: “The mocks aren’t real”; “Does this code really work?”; “You forgot to test OTA.”

But the moment passes, and your faith is restored as you admire the elegant green CI check mark ✅.

Do you trust your unit tests enough to deploy the firmware right now to thousands of real devices via Over-the-Air (OTA) firmware update?

If the answer is “well, no, not really, we still need to test a few things with the real hardware”, then automated Hardware-in-the-Loop (HIL) testing might give you the confidence to upgrade that answer to “heck yeah, let’s ship it!”. 🚀

In this post, I’ll explain what HIL testing is, and why we use it at Golioth to continuously verify the firmware for our Zephyr and ESP-IDF SDKs.

HIL Testing: A Definition

If you search online, you’ll find a lot of definitions of what HIL testing is, but I’ll try to formulate it in my own simple terms.

There are three major types of tests:

  • Unit – the smallest kind of test, often covering just one function in isolation
  • Integration – testing multiple functions or multiple subsystems together
  • System – testing the entire system, fully integrated

A firmware test can run in three places:

  • On your host machine – Running a binary compiled for your system
  • On emulated target hardware – Something like QEMU or Renode
  • On real target hardware – Running cross-compiled code for the target, like an ARM or RISC-V processor.

A HIL test is a system test that runs on real target hardware, with real peripherals, and real connections. Typically, a HIL setup will consist of:

  • A board running your firmware (the code under test)
  • A host test machine, executing the tests
  • (optional) Other peripherals and test equipment to simulate real-world conditions, controlled by the host test machine

Coverage and Confidence

Why bother with HIL tests? Aren’t unit tests good enough?

To address this, I’ll introduce two terms:

  • Coverage: How much of the firmware is covered by tests?
  • Confidence: How confident are you that the firmware behaves as intended?

Coverage is easily measured by utilizing open-source tools like gcov. It’s an essential part of any test strategy. Confidence is not as easily measured. It’s often a gut feeling.

It’s possible to have 100% coverage and 0% confidence. You can write tests that technically execute every line of code and branch combination without testing any intended functionality. Similarly, you can have 0% coverage and 100% confidence, but if you encounter someone like this, they are, to put it bluntly, delusional.

Unit tests are great at increasing coverage. The code under test is isolated from the rest of the system, and you can get into all the dark corners. “Mocks” are used to represent adjacent parts of the system, usually in a simplistic way. This might be a small piece of code that simulates the response over a UART. But this means is you can’t be 100% confident the code will work the same way in the context of the full system.

HIL system tests are great at increasing confidence, as they demonstrate that the code is functional in the real world. However, testing things at this level makes it difficult to cover the dark corners of code. The test is limited to the external interfaces of the system (ie. the UART you had a mock for in the above example).

A test strategy involving both unit tests and HIL tests will give you high coverage and high confidence.

Continuously Verifying Golioth SDK Firmware

Why do we use HIL testing at Golioth?

We have the concept of Continuously Verified Boards, which are boards that get first-class support from us. We test these boards on every SDK release across all Golioth services.

Previously, these boards were manually tested each release, but there were several problems we encountered:

  • It was time-consuming
  • It was error-prone
  • It did not scale well as we added boards
  • OTA testing was difficult, with lots of clicking and manual steps involved
  • The SDK release process was painful
  • We were unsure whether there had been regressions between releases

We’ve alleviated all of these problems by introducing HIL tests that automatically run in pull requests and merges to the main branch. These are integrated into GitHub Actions (Continuous Integration, CI), and the HIL tests must pass for new code to merge to main.

Here’s the output of the HIL test running in CI on a commit that merged to main recently in the ESP-IDF SDK:

HIL Test Running in CI

This is just the first part of the test where the ESP32 firmware is being flashed.

Further down in the test, you can see we’re running unit tests on hardware:

Automated test output

And we even have an automated OTA test:

Automated OTA Test Output

As a developer of this firmware, I can tell you that I sleep much better at night knowing that these tests pass on the real hardware. In IoT devices, OTA firmware update is one of the most important features to get right (if not the most important), so testing it automatically has been a big win for us.

HIL on a Budget

Designing a prototype HIL system does not need to be a huge up-front investment.

At Golioth, one engineer was able to build a prototype HIL within one week, due primarily to our simple hardware configuration (each tested board just needs power, serial, and WiFi connection) and GitHub Actions Self-Hosted Runners, which completely solves the infrastructure and DevOps part of the HIL system.

The prototype design looks something like this:

 

I’m using my personal Raspberry Pi as the self-hosted runner. GitHub makes it easy to register any kind of computer (Linux, MacOS, or Windows) as a self-hosted runner.

And here’s a pic of my HIL setup, tucked away in a corner and rarely touched or thought about:

My DIY HIL Setup

Next Time

That’s it for part 1! We’ve covered the “why” of HIL testing at Golioth and a little bit of the “how”.

In the next part, we’ll dive deeper into the steps we took to create the prototype HIL system, the software involved, and some of the lessons learned along the way.

 

Thanks to Michael for the photo of Tiger and Turtle