Golioth continuously checks real hardware targets to ensure our new features and code fixes work across a wide swath of devices. The last time I checked, we’re running 541 hardware-in-the-loop tests on each pull-request of our SDK. This includes sample and integration tests for the Golioth Firmware SDK on 8 hardware devices (3 ESP-IDF and 5 Zephyr), two Zephyr native simulator devices, and on Linux. Understanding what tests failed–and finding out why–became almost impossible. So we added an open source tool to visualize each test and deliver historic insight as well. Today I’ll walk through how we use Allure Report to understand continuous integration tests.

Allure Report is a test reporting tool

Allure Report summary page for 541 test cases

Allure Report gathers files generated during each CI test, using them to generate an HTML site that visualizes the outcomes of every test. Better yet, error output and logs are available at the click of a button and the history of a test can be viewed to see if something is frequently failing.

Since the Golioth Firmware SDK is open source, you can use our approach to automatic Allure Report generation to your own CI.

Why are our tests failing frequently enough to need a reporting tool?

Great question. Hardware in the Loop (HIL) testing for Internet of Things (IoT) devices is a gauntlet of technologies. We’re testing Cellular, Ethernet, and WiFi, any of which can have connectivity issues. The range of hardware we’re testing relies on different programmers and flashing tools, and network latency is an ever-changing variable.

But the real reason to have a tool is to help clarify if a test failed due to one of the reasons above, or because of a legitimate bug that only rarely occurs. This could be an issue with the Firmware SDK, or something that changed on Golioth’s cloud platform. We want to know right away, and we want to be able to track how often we’re seeing the same types of failures.

Add Allure Report file generation to a test

All Golioth CI testing uses pytest which has support for Allure Report generation built in. However, there’s a ton of frameworks that are supported so if you’re running a different test automation, chances are you can use Allure Report.

pip install allure-pytest

With the pytest plugin installed, add a flag to your pytest command that specifies an output directory for the generated report files.

pytest path/to/your/test --alluredir=your-report-output-directory

That’s it, when you run this test, the Allure Report files will be generated.

Manually generate the Allure Report Website

Viewing test output locally is quite simple. Follow the Allure Report install instructions for your OS and then just run the serve command to show the test output:

allure serve /path/to/your-report-output-directory

A live server will open up in your web browser to visualize the report files you provided. This is a great preview to start poking around, but it’s missing one of the best features which is the ability to view report history. For that, you should add report history to CI so everything is handled automatically.

Add Allure Report to CI

Just as before, there are two parts to using Allure Report: generating the files and collecting those files into a website.

Generating report files in CI

As mentioned above, generating the report files is as simple as adding a flag to your pytest (or other test framework) command in your GitHub workflow. However, we run our tests in parallel, using at least 5 self-hosted runners and innumerable GitHub hosted runners. All of these generate report files and we want all of them in one place. For that we take the following approach:

  1.  Individual tests all upload report files as artifacts with a common prefix in the name
  2. A separate job collects reports once all tests are complete
steps:
  - name:
    run: |
      pytest path/to/your/test --alluredir=allure-reports
  - name: Safe upload Allure reports
    if: success() || failure()
    uses: actions/upload-artifact@v4
    with:
      name: allure-reports-samples-zephyr-${{ matrix.platform }}
      path: allure-reports

To collect files after each test, use the upload artifact action. Here the files are stored in a directory named allure-reports.

merge_reports:
  runs-on: ubuntu-latest
  needs:
    - hil_sample_esp-idf
    - hil_sample_zephyr
    - hil_sample_zephyr_nsim
    - hil_test_esp-idf
    - hil_test_linux
    - hil_test_zephyr
  if: always()
  steps:
  - name: Gather reports
    uses: actions/download-artifact@v4
    with:
      path: reports/allure-reports
      pattern: allure-reports-*
      merge-multiple: true
  - name: Upload reports
    uses: actions/upload-artifact@v4
    with:
      name: allure-reports-alltest
      path: reports/allure-reports

The collection job should add each job that generates reports to the needs list. This causes the job to wait for the others to complete. It uses the download artifact action to download all reports and merge them into a single reports/allure-reports directory.

That collection is then uploaded using allure-reports-alltest as the artifact name. The upload step stores these artifacts so you can optionally download them from the GitHub action summary page and view them locally. However, we also want to automatically generate and host an Allure Report site.

Publishing the Allure Report site using CI

There are two GitHub actions useful for publishing your Allure Report site:

Publishing to GitHub pages is easy, but remember that anything shown in your test logs will be public on that page. We scrub our logs of all secrets before uploading the artifacts (that’s a story for a different day).

steps:
  - name: Get Allure history
    uses: actions/checkout@v4
    if: always()
    continue-on-error: true
    with:
      ref: gh-pages
      path: gh-pages

  - name: Generate new Allure Report site
    uses: simple-elf/allure-report-action@v1
    if: always()
    id: allure-report
    with:
      allure_results: reports/allure-reports
      gh_pages: gh-pages
      allure_report: allure-report
      allure_history: allure-history

  - name: Deploy report to Github Pages
    if: always()
    uses: peaceiris/actions-gh-pages@v4
    env:
      PERSONAL_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      PUBLISH_BRANCH: gh-pages
      PUBLISH_DIR: allure-history

The steps above have the following effect:

  1. Check out the last time you published the Allure Report site (from the gh-pages branch) so that the history is available for the newly generated page.
  2. Generate the report using the downloaded/merged artifacts in the reports/allure-reports directory from the previous section (combine all of these steps together). The history that we checked out in this branch will be used in this build step.
  3. Publish the changes to GitHub pages.

You can now go to your-org-name.github.io/your-repo-name to see the report. For instance, the last twenty Golioth merges and scheduled tests on the main branch are found at https://golioth.github.io/allure-reports.

Customizations: Suite names, tags, parameters

We heavily depend on Allure Reports so we’ve made a number of customizations to better organize the generated reports. The most important is to assign a dynamic parameter for the board name and platform so that we can differentiate the tests:

@pytest.hookimpl(wrapper=True)
def pytest_runtest_setup(item):
    board_name = item.config.getoption("--allure-board") or item.config.getoption("--board")
    platform_name = item.config.getoption("--platform")
    suitename = item.config.getoption("--custom-suitename") or "hil"


    allure.dynamic.tag(board_name)
    allure.dynamic.tag(platform_name)
    allure.dynamic.parameter("board_name", board_name)
    allure.dynamic.parameter("platform_name", platform_name)
    allure.dynamic.parent_suite(f"{suitename}.{platform_name}.{board_name}")


    if runner_name is not None:
        allure.dynamic.tag(item.config.getoption("--runner-name"))


    yield

We are also publishing our GitHub pages to a different repo. The report that we generate separates the main branch from all other branches so that we don’t confuse PR test failures from merge and scheduled tests on main.

Here’s a list of files you might want to browse to see the current state of our Allure Report generation:

We care a lot about software and firmware quality here at Golioth and we want our users to feel confident that they are using an SDK that is reliable and tightly coupled to the cloud services that enable the underlying IoT Devices. If you decide to follow in our footsteps, we hope this will deliver a better understanding of your CI runs as it has for Golioth!

Hardware developers have missed out on the benefits of Docker and similar localized container solutions for one big reason: USB. Today we’re seeing how devices can start to reliably connect from the host system to the container.

I have a career-long passion (borderline obsession) with developer tools. I want them to be great and search for ways to make them better for every developer. All steps in the program / test / debug loop require access to devices via USB. For some time, Docker has supported USB device access via --device on Linux, but left Windows and macOS users in a lurch. As of Docker Desktop 4.35.0, it is now possible to access USB devices on Windows and macOS as well! It’s early days and caveats abound, but we can genuinely say Docker is starting to finally become a useful tool for hardware developers.

Why USB in Docker Matters

Working with USB devices inside containers adds a bit of complexity compared to native tools. They add a variable in the process and are yet another tool to manage. But there’s value in using containers for development: namely reproducibility and environment isolation. It’s also worth noting that USB device access isn’t only relevant to hardware devs; anyone using a USB-based peripheral like a midi controller also know the struggle.

On Linux, Docker can take advantage of native OS features to allow containerized processes to directly interact with USB devices. However, on Windows and macOS, Docker runs within a virtual machine (VM), meaning it doesn’t have direct access to the host’s hardware in the same way Linux does.

USB Limitations for macOS and Windows Users

For years, Windows and macOS users have faced a clunky workaround:

  • Running Docker inside a Linux VM
  • Passing USB devices through to that VM
  • Passing them into Docker containers.

While this approach works, it is resource-intensive, slow, and far from seamless. Many developers voiced frustration over this in GitHub issues, such as this long-running thread that discusses the challenges and demands for native USB passthrough support.

In reality, most hardware developers don’t end up leveraging Docker for local development because of the lack of USB access, only relegating containers to CI/CD and test automation.

A Solution: USB/IP

USB/IP is an open-source protocol that is part of the Linux kernel that allows USB devices to be shared over a network. By using a USB/IP server on the host system, USB devices can be exposed to clients (such as remote servers or containers) over the network. The client, running inside a Docker container, can access and interact with the device as though it were directly connected to the container.

usbip-design

Via https://usbip.sourceforge.net/

In theory, enabling USB/IP for Docker would allow USB devices to be accessed from within containers, bypassing the need for complex VM setups. This would simplify the experience, offering access to USB devices without the overhead of virtualization.

Making It Happen: Collaborating with Docker

While USB/IP provided the perfect solution for Docker’s USB problem, Docker hadn’t yet implemented USB/IP as part of its official features.

In February of this year, I reached out to the Docker team to see if it would be possible to integrate USB/IP into Docker Desktop. I got connected to Piotr Stankiewicz, a senior software engineer at Docker, to discuss the possibility of adding support for USB/IP. Piotr was immediately interested, and after a few discussions, we began collaborating on requirements.

Finally, after several months of work by Piotr and testing by me, Docker Desktop 4.35.0 was released, featuring support for USB/IP.

Key Concepts Before We Begin

Before diving into the steps for enabling USB device access, it’s important to understand the basic configuration. The key idea is that USB devices are passed through from the host system into a shared context within Docker. This means that every container on your machine could theoretically access the USB device. While this is great for development, be cautious on untrusted machines, as it could expose your USB devices to potentially malicious containers.

Setting Up a Persistent Device Manager Container

One of the best ways to manage this is by setting up a lightweight “device manager” container, which I’ll refer to as devmgr. This container will act as a persistent context for accessing USB devices, so you don’t have to repeat the setup every time you want to develop code in a new container. By keeping devmgr running, you can ensure that the USB device is always available to other containers as needed.

I’ve created a simple image (source) and pushed it to Docker Hub for convenience, though you can use your own image as a device manager.

Serving up USB

Next we need to figure out how to expose USB devices from the host machine, aka “the server” in USB/IP terms. Windows already has a mature USB/IP server you may have not known about; it’s called usbipd-win and is what enables USB access on WSL (which I previously wrote about.) Unfortunately, macOS doesn’t have a complete server but I was able to use this Python library to program a development board (I also tried this Rust library but it’s missing support for USB to serial devices). Developing a USB/IP server for macOS is certainly one area the community can contribute!

With the pre-reqs out of the way, we can get going.

Getting Started with Windows

1. Install prerequisites

Make sure you install Docker, usbipd-win and any device drivers your USB devices might need.

2. Select a USB device to share

Open an admin Powershell and list out the available USB devices. Note the Bus ID as we’ll use that to identify the device next.

usbipd list

3. Share a USB device via USB/IP

Binding a device to the USB/IP server makes it available to Docker and other clients. You can confirm the device is shared by calling list again.

usbipd bind -b <BUSID>

Here’s an example of the output from my machine:

PS C:\Users\jberi> usbipd list
Connected:
BUSID   VID:PID   DEVICE                                                      STATE
2-1     10c4:ee60 CP2102 USB to UART Bridge Controller                        Not shared
2-3     06cb:00bd Synpatics UWP WBDI                                          Not shared
2-4     13d3:5406 Integrated Camera, Integrated IR Camera, Camera DFU Device  Not shared
2-10    8087:0032 Intel(R) Wireless Bluetooth(R)                              Not shared

Persisted:
GUID                                DEVICE

PS C:\Users\jberi> usbipd bind -b 2-1
PS C:\Users\jberi> usbipd list
Connected:
BUSID   VID:PID   DEVICE                                                      STATE
2-1     10c4:ee60 CP2102 USB to UART Bridge Controller                        Shared
2-3     06cb:00bd Synpatics UWP WBDI                                          Not shared
2-4     13d3:5406 Integrated Camera, Integrated IR Camera, Camera DFU Device  Not shared
2-10    8087:0032 Intel(R) Wireless Bluetooth(R)                              Not shared

Persisted:
GUID                                DEVICE

4. Create a container to centrally manage devices

Start a devmgr container with the appropriate flags (--privileged --pid=host are key.)

docker run --rm -it --privileged --pid=host jonathanberi/devmgr

5. See which devices are available to Docker

In the devmgr container, we can list all the devices that the USB/IP client can grab (USB/IP calls it attaching.) One important note – we need to use host.docker.internal as the remote address to make Docker magic happen.

usbip list -r host.docker.internal

6. Attach a device to Docker

Now we want to attach the device shared from the host. Make sure to use the <BUSID> from the previous step.

usbip attach -r host.docker.internal -b <BUSID>

7. Verify that the device was attached

USB/IP can confirm the operation but since you now have a real-but-virtual device, ls /dev/tty* or lsusb should also work. You’ll need the /dev name for the next step anyway.

usbip port

Here’s another example of the output from my machine:

b6b86f127561:/# usbip list -r host.docker.internal
usbip: error: failed to open /usr/share/hwdata//usb.ids
Exportable USB devices
======================
 - host.docker.internal
        2-1: unknown vendor : unknown prodcut (239a:0029)
           : USB\VID_239A&PID_))29\4445FEEF71F2274A
           : unknown class / unknown subclass / unknown protocol (ef/02/01)
           :  0 - unknown class / unknown subclass / unknown protocol (02/02/00)
           :  1 - unknown class / unknown subclass / unknown protocol (0a/00/00)
           :  2 - unknown class / unknown subclass / unknown protocol (08/06/50)

b6b86f127561:/# usbip attach -r host.docker.internal -b 2-1
b6b86f127561:/# usbip port
usbip: error: failed to open /usr/share/hwdata//usb.ids
Imported USB devices
======================
Port 00: <Port in Use> at Full Speed(12Mbps)
       unknown vendor : unknown prodcut (239a:0029)
       1-1 -> usbip://host.docker.internal:3240/2-1
           -> remote bus/dev 002/001
b6b86f127561:/# lsusb
Bus 001 Device 001: ID 1d6b:0002
Bus 001 Device 009: ID 239a:0029
Bus 002 Device 001: ID 1d6b:0003
b6b86f127561:/# ls /dev/ttyA*
/dev/ttyACM0

8. Use your newly-attached USB device

The moment of truth! We can finally use our USB device in a development container. From the Docker perspective it is a real USB device so you can use it to flash, debug, and twiddle bits. You can, of course, use it for non-development things like hooking up a MIDI* controller – but hardware is more fun, right?

There’s unlimited configurations here so I’ll just give an example of how I’d flash a Zephyr application using an image from github.com/embeddedcontainers. Note we’re passing the /dev with --device like we would on a Linux host.

docker run --rm -it -v ${PWD}:/workdir --device "/dev/ttyACM0" ghcr.io/embeddedcontainers/zephyr:arm-0.17.0SDK

Watching west flash work for the first time felt like ✨magic✨.

* One caveat with USB/IP is that Docker may need to enable device drivers for your particular hardware. Many popular USB to Serial chips are already enabled. File an issue on github.com/docker/for-win to request additional drivers or for any other related issues.

Getting Started with macOS

The steps for macOS are nearly identical to Windows except we need to use a different USB/IP server. I’ll use one that I know works with a CP2102. It’s still flaky and only partially implemented, so we need the community to pitch in!

git clone https://github.com/tumayt/pyusbip
cd pyusbip
python3 -m venv .venv
source .venv/bin/activate
pip install libusb1
python pyusbip

Watch as we flash and monitor an ESP32 running Zephyr in Docker!

As before, Docker may need to enable device drivers for your hardware, so file any issue at github.com/docker/for-mac.

Limitations to the USB/IP approach

Obviously the glaring issue here is the UX. The manual setup is clunky and each time a device reboots, it may need to be re-attached–auto attach works pretty well in usbipd-win, though! Also, we need a proper USB/IP server implementation for macOS. Lastly, device support may be limited based on driver availability.

However, I’m optimistic that these challenges can be overcome and that we will see solutions in short time (Open Source, ftw 💪.)

Conclusion

The ability to access USB devices within Docker on both Windows and macOS is a big leap forward for developers, hardware enthusiasts, and anyone working with USB peripherals. While there are still some setup hurdles and device limitations to overcome, this feature has the potential to streamline development and testing processes.

Give it a try and see how it works for you. Just keep in mind that it’s still early days for this feature, so you may run into some bumps along the way. Share your feedback on our forum, and let’s help improve the developer experience of hardware developers everywhere!

Jonathan Beri is the founder and CEO of Golioth, an IoT SaaS platform making it easier for embedded devices to connect to the internet and do all the things that large scale computing enables. He can be found most nights and weekends tinkering with containers and dev boards.

There is no substitute for testing firmware on real target hardware, but manual testing is both error prone and time consuming. Today, I’ll walk you through the process of using pytest to automate your hardware testing. There are three main building blocks to this approach

1. Compile the firmware you want to test

2. Use a pytest fixture to program the firmware onto the device and monitor the output

3. Run the tests to verify the device output

Pytest is a framework for — you guessed it — testing! When working with embedded hardware you need to spend some time setting up fixtures that connect it to pytest. Once that’s in place, you’ll be surprised at how fast you can write tests for your firmware projects. Let’s dive in.

This approach works for any firmware platform

In today’s example, the only platform-specific portion is a program() function that needs to know the commands used to flash firmware on your device. But this is easy to adapt for any platform. At Golioth we take advantage of this since the Golioth Firmware SDK tests hardware on two different RTOSes, using silicon from a handful of different vendors.

For this post I’m targeting a Nordic nRF52840 using Zephyr. However, it depends purely on pytest and is not using Twister (Zephyr’s dedicated test running application). We do use Twister on our Zephyr sample testing, but those are only a portion of our hardware-in-the-loop tests. We’ll publish a separate post detailing that process in the future.

Today’s demo involves adding a subfolder to your firmware project named pytest, and creating two file inside:

  • pytest/conftest.py
  • pytest/test_hello_world.py

A code walk-through follows, with the full source available as a gist.

Step 1: Compile your firmware

We’ll be testing firmware by monitoring output from a serial connection. Let’s start with a “Hello World” application. The hello_world sample in the Zephyr tree simply prints out “Hello World! <name of your board>” which is perfect for this demonstration.

west build -b nrf52840dk_nrf52840 . -p

The above build command generates a binary at build/zephyr/zephyr.hex. We do not need to flash it to the device, we’ll use a pytest function for this.

Step 2: Write a pytest fixture for this board

In this section we’ll populate a conftest.py file that can be reused by multiple tests.

Install dependencies

Make sure you have Python installed and then use pip to install the libraries needed for this project. We’ll be using the AnyIO and Trio libraries for asynchronous support, and Python will need access to the serial port. We also want a Python library for programming the target chip:

pip install pytest anyio trio pyserial

# This one is used to program Nordic devices
pip install pynrfjprog

Now add the necessary imports to the top of your conftest.py file:

import pytest
import re
import serial
from time import time

# Used to flash binary to nrf52840dk
from pynrfjprog import LowLevel

import pytest
import re
import serial
from time import time

# Used to flash binary to nrf52840dk
from pynrfjprog import LowLevel

@pytest.fixture(scope='session')
def anyio_backend():
    return 'trio'

The final block is a special pytest fixture that tells AnyIO to use the Trio backend.

Setting up command line arguments

While not strictly necessary for a one-off test, adding command line arguments makes your test easier to run using continuous integration (CI) tools like GitHub Actions.

def pytest_addoption(parser):
    parser.addoption("--port",
            help="The port to which the device is attached (eg: /dev/ttyACM0)")
    parser.addoption("--baud", type=int, default=115200,
            help="Serial port baud rate (default: 115200)")
    parser.addoption("--fw-image", type=str,
            help="Firmware binary to program to device")
    parser.addoption("--serial-number", type=str,
            help="Serial number to identify on-board debugger")

@pytest.fixture(scope="session")
def port(request):
    return request.config.getoption("--port")

@pytest.fixture(scope="session")
def baud(request):
    return request.config.getoption("--baud")

@pytest.fixture(scope="session")
def fw_image(request):
    return request.config.getoption("--fw-image")

@pytest.fixture(scope="session")
def serial_number(request):
    return request.config.getoption("--serial-number")

Above we’ve used a special pytest_addoption() function to add command line flags for port, baud, firmware filename, and programmer serial number. A fixture is added to return each of these so they are available to other fixtures (and in the tests themselves).

Create a class for your board

We want to create a class to represent the device under test. Golioth has an entire directory full of board class definitions for different vendors which we use in our automated testing. For this example we really just need a way to program the board and monitor its serial output.

class Board():
    def __init__(self, port, baud, fw_image, serial_number):
        self.port = port
        self.baud = baud
        self.fw_image = fw_image
        self.serial_number = serial_number

        #program firmware
        self.program(fw_image)

        self.serial_device = serial.Serial(port, self.baud, timeout=1, write_timeout=1)

    def program(self, fw_image):
        with LowLevel.API() as api:
            api.connect_to_emu_with_snr(int(self.serial_number))
            api.erase_all()
            api.program_file(self.fw_image)
            api.sys_reset()
            api.go()
            api.close()

    def wait_for_regex_in_line(self, regex, timeout_s=20, log=True):
        start_time = time()
        while True:
            self.serial_device.timeout=timeout_s
            line = self.serial_device.read_until().decode('utf-8', errors='replace').replace("\r\n", "")
            if line != "" and log:
                print(line)
            if time() - start_time > timeout_s:
                raise RuntimeError('Timeout')
            regex_search = re.search(regex, line)
            if regex_search:
                return regex_search

The Board class implements a program() function specific to flashing firmware onto this device. You will want to replace this function for your own target hardware. Note that when this is instantiated, the init() function will call the program() function, flashing the firmware onto the board at the start of the test suite.

The Board class also implements a wait_for_regex_in_line() function that is a fancy way to match lines printed in the serial terminal. This should be transferable to any board that prints to serial (in our case, via a USB connection). This function includes a timeout feature, which means your test will not wait forever when a device is misbehaving.

@pytest.fixture(scope="session")
def board(port, baud, fw_image, serial_number):
    return Board(port, baud, fw_image, serial_number)

The final piece of the puzzle is a fixture that makes the Board available to your test. The session scope ensures that your board will only be instantiated once per test-run.

Step 3: Write your tests

The hard work is behind us, this step is simple by comparison. Create a file prefixed with the word “test” to host your tests. We’ll call this test_hello_world.py.

import pytest

pytestmark = pytest.mark.anyio

async def test_hello(board):
    board.wait_for_regex_in_line("Hello World");

We begin by importing pytest, then using the special pytestmark directive to indicate we want to use AnyIO for asynchronous functions. Each function that is declared with the test_ prefix in the function name will be automatically run by pytest and individually reported with a pass/fail.

Notice in this case that we are using a regex to match “Hello World” even though the full message received will be “Hello World! nrf52840dk_nrf52840”.

Running the test

Let’s run this test, remembering to supply the necessary arguments for the board programming and serial output to work:

➜ pytest pytest/test_hello_world.py --port /dev/ttyACM0 --baud 115200 --fw-image build/zephyr/zephyr.hex --serial-number 1050266122

Pytest includes colorful output with a summary of the tests:

Black terminal screen showing the green success messages from a pytest run.

You don’t get a lot of output for a successful test. But if you change the matched test to something that is not expected, we should get a timeout error. You can see that warnings and errors cause more information to be printed. Note the angle bracket that indicates the assert that caused the failure. The error message is printed further down in red, followed by the actual output received from the board:

Black terminal screen shows an error output several lines long indicating a timeout occurred during a test.

While this example includes just a single test that watches for output, you can grow this to many tests that interact with the board. For instance, the Golioth RPC service tests implement about a dozen tests that prompt the target board to react by sending remote procedure calls from the Golioth cloud via our REST API, verifying the output from each.

Automating the tests

We have already automated this hello_world test. But we’re still running it manually on the command line. You can use GitHub self-hosted runners to connect your own devices to GitHub Actions for true automation.

Golioth has built extensive hardware-in-the-loop automation that does just this. You can check out our workflows to get a better picture of how this works. The step that calls pytest should include syntax you recognize, pointing to the test file and passing the configuration in as command line arguments:

- name: Run test
  shell: bash
  env:
    hil_board: ${{ inputs.hil_board }}
  run: |
    source /opt/credentials/runner_env.sh
    PORT_VAR=CI_${hil_board^^}_PORT
    SNR_VAR=CI_${hil_board^^}_SNR
    for test in `ls tests/hil/tests`
    do
      pytest --rootdir . tests/hil/tests/$test                            \
        --board ${hil_board}                                              \
        --port ${!PORT_VAR}                                               \
        --fw-image ${test}-${{ inputs.binary_name }}                      \
        --serial-number ${!SNR_VAR}                                       \
        --api-url ${{ inputs.api-url }}                                   \
        --api-key ${{ secrets[inputs.api-key-id] }}                       \
        --wifi-ssid ${{ secrets[format('{0}_WIFI_SSID', runner.name)] }}  \
        --wifi-psk ${{ secrets[format('{0}_WIFI_PSK', runner.name)] }}    \
        --mask-secrets                                                    \
        --timeout=600
    done

Resources

Almost two years ago I wrote a guide on how to interface with USB devices from WSL2 because many of our users were developing on Windows but wanted to use Linux-native tools for projects like Zephyr. I’ve heard from countless devs thanking me for the guide but with one wish – a Graphical User Interface (GUI) to simplify the process of managing USB devices. The most common alternative to running Linux tools on Windows is usually a full blown VM like VirtualBox, which has a nice UI. usbipd-win is certainly no GUI.

USB Settings in VirtualBox

I recently had to set up a new Windows machine and decided to see if there’s been any enhancements to the USB support in WSL2. I was pleasantly surprised to find not one but two GUIs! The rest of this post will walk you through the two options but will assume you have read the original blog.

WSL USB Manager

The first tool I found is a nice Python-based desktop GUI hosted over on GitLab. Installation is a breeze thanks to the MSI installer included in official releases. It delivers on the fundamentals of being a USB manager – list, view state, attach/detach – but it also adds some niceties.

screenshot of wsl-usb-gui

Source: https://gitlab.com/alelec/wsl-usb-gui

For example, usbipd-win added an auto-attach feature since my original article, which can be very handy when your embedded device reboots and needs to be re-attached. However, it can be tricky to set up from the commandline. WSL USB Manager allows you to define a profile right from the UI that handles auto-attach more smoothly. Another neat thing it does is automatically create the correct UDEV rule for you so you have one less permission issue to deal with.

USBIP Connect for VS Code

Given the popularity of Visual Studio Code, it’s no surprise someone created USBIP Connect. Setup is simple but there’s a few important details to get it working correctly.

First, you need to install the official WSL extension and connect to WSL. Install USBIP Connect from the marketplace only after connected, otherwise things won’t work. You’ll know if you’ve done the setup correctly if you see USBIP Connect listed under the “WSL: UBUNTU” list of extensions, not “LOCAL.”

USBIP Connect for VS Code installed in WSL

The extension adds a simple “Attach” and “Detach” button to the status bar and corresponding commands to the Command Palette. Simplicity is the name of the game, which is a plus for a VS Code extension.

The only issue is that permissions are not automated like in WSL USB Manager, so you’ll get an error the first time you try to Attach a device.

USBIP Connect permission error

You’ll either need to use a terminal with Administrator privileges or WSL USB Manager, which is what I did while writing this post.

USB all the things, again!

So there you have it, two different GUIs for managing USB devices with WSL2. Hopefully they help the remaining Windows devs who prefer something graphical over using the commandline to get work done with a little less frustration. But if you run into any issues, please post in our forum.

Golioth is an IoT company that supports as much custom hardware as possible: a multitude of microcontrollers and many different connection types. This presents a challenge when testing on real hardware.

We developed tooling that tests the Golioth Firmware SDK on actual boards. Known as Hardware-in-the-Loop (HIL) testing, it’s an important part of our CI strategy. Our latest development is automatically detecting boards connected to our CI machines; these can recognize what type of hardware is attached to each self-hosted test runner. In a nutshell, you plug in the board to USB and run a workflow to figure out what you just plugged it.

Let’s take a look at how we did that, and why it’s necessary.

Goals and Challenges of Hardware Testing

So the fundamental goals here are:

  1. Run tests on actual hardware
  2. Make the number of tests scalable
  3. Make the number of hardware types scalable

If we were running all of our firmware tests manually, we’d be fine on goal #1. But when scaling and repeatability comes into play, we start to look at automation. In particular, we look at making it easy to create more of the self-hosted runners (the computers running the automated test programs).

Adding a new board to one of those runners is no small task. If you build a self-hosted runner today, then next week add two new hardware variations that need testing, how will the runner know how to interface with those boards?

Frequent readers of the Golioth blog will remember that we already set up self-hosted runners, so go back and look the first post and second post in this series. Those articles detail how we are using GitHub’s self-hosted runner infrastructure to use workflows to perform Hardware-in-the-Loop tests. I also did a Zephyr tech talk on self-hosted runners.

Github workflow output from board recognition processTo make things more scalable, we developed a workflow that queries all of the hardware connected to a runner. It runs some tests to determine the type of hardware, the port, and the programmer serial number (or other unique id) of that hardware. This information is written to a yaml file for that runner and made available to all other workflows that need to run on real hardware.

Recognizing Microcontroller Boards Connected Via USB

Self-hosted runners are nothing more than a single-board computer running Linux and connected to the internet (and in our case connected to GitHub). These runners are in the home offices of my colleagues around the world, but it doesn’t matter where they are. Our goal is that we don’t need to touch the runners after initial set up. If we add new hardware, just plug in a USB cable and the hardware part of the setup is done.

The first step in recognizing what new hardware has been plugged in is listing all the serial devices. Linux provides a neat little way of doing this by the device ID:

golioth@orangepi3-lts:~$ ls /dev/serial/by-id -1a
.
..
usb-SEGGER_J-Link_000729255509-if00
usb-SEGGER_J-Link_001050266122-if00
usb-SEGGER_J-Link_001050266122-if02
usb-Silicon_Labs_CP2102N_USB_to_UART_Bridge_Controller_4ea3e6704b5fec1187ca2c5f25bfaa52-if00-port0

This represents three devices: Nordic, NXP, and Espressif. For now we assume the SiLabs USB-to-UART is an Espressif device because we don’t have any other devices that use that chip. However, there are multiple SEGGER entries, so we need to sort those out. We also need to know which type of ESP32 board is connected.

We use regular expressions to pull out the unique serial numbers from each of these and perform the board identification in the next step.

Using J-Link to Identify Board Type

Boards that use J-Link include a serial number in their by-id listing that can be used to gather more information. This works great for Nordic development boards. It works less great for NXP boards.

from pynrfjprog import LowLevel
from board_classes import Board

chip2board = {
    # Identifying characteristic from chip: Board name used by Golioth
    LowLevel.DeviceName.NRF9160: "nrf9160dk",
    LowLevel.DeviceName.NRF52840: "nrf52840dk",
    LowLevel.DeviceName.NRF5340: "nrf7002dk"
    }

def get_nrf_device_info(snr):
    with LowLevel.API() as api:
        api.connect_to_emu_with_snr(snr)
        api.connect_to_device()
        api.halt()
        return api.read_device_info()

def nrf_get_device_name(board: Board):
    with LowLevel.API() as api:
        snr_list = api.enum_emu_snr()

    if board.snr not in snr_list:
        return None

    try:
        device_info = get_nrf_device_info(board.snr)
        if not device_info:
            autorecognized = False
        else:
            for i in device_info:
                if i in chip2board:
                    board.name = chip2board[i]
                    autorecognized = True
    except Exception as e:
        print("Exception in find_nrf_device():", e)
        autorecognized = False

    return autorecognized

Nordic provides a Python package for their nrfjprog tool. Its low-level API can read the target device type from the J-Link programmer. Since we already have the serial number included in the port listing, we can use the script above to determine which type of chip is on each board. For now, we are only HIL testing three Nordic boards and they all have different processors which makes it easy to automatically recognize these boards.

An NXP mimxrt1024_evk board is included in our USB output from above. This will be recognized by the script but the low-level API will not be able to determine what target chip is connected to the debugger. For now, we take the port and serial number information and manually add the board name to this device. We will continue to refine this method to increase automation.

Using esptool.py to Guess Board Type

Espressif boards are slightly trickier to quantify. We can easily assume which devices are ESP32 based on the by-id path. And esptool.pyEspressif’s software tool for working with these chips – can be used in a roundabout way to guess which board is connected.

import esptool
from board_classes import Board

chip2board = {
    # Identifying characteristic from chip: Board name used by Golioth
    'ESP32': 'esp32_devkitc_wroom',
    'ESP32-C3': 'esp32c3_devkitm',
    'ESP32-D0WD-V3': 'esp32_devkitc_wrover',
    'ESP32-S2': 'esp32s2_saola',
    'ESP32-S3': 'esp32s3_devkitc'
    }

## caputure output: https://stackoverflow.com/a/16571630/922013
from io import StringIO 
import sys

class Capturing(list):
    def __enter__(self):
        self._stdout = sys.stdout
        sys.stdout = self._stringio = StringIO()
        return self
    def __exit__(self, *args):
        self.extend(self._stringio.getvalue().splitlines())
        del self._stringio    # free up some memory
        sys.stdout = self._stdout
## end snippet

def detect_esp(port):
    try:
        with Capturing() as output:
            esptool.main(["--port", port, "read_mac"])

        for line in output:
            if line.startswith('Chip is'):
                chip = line.split('Chip is ')[1].split(' ')[0]
                if chip in chip2board:
                    return chip2board[chip]
                else:
                    return chip
    except Exception as e:
        print("Error", e)
        return None

def esp_get_device_name(board: Board):
    board.name = detect_esp(board.port)

One of the challenges here is that esptool.py doesn’t include an API to simply return the chip designation. However, part of the standard output for any command includes that information. So we listen to the output when running the read_mac command and use simple string operations to get the chip designation.

golioth@orangepi3-lts:~$ esptool.py read_mac
esptool.py v4.6.2
Found 4 serial ports
Serial port /dev/ttyUSB0
Connecting....
Detecting chip type... Unsupported detection protocol, switching and trying again...
Connecting.....
Detecting chip type... ESP32
Chip is ESP32-D0WD-V3 (revision v3.0)
Features: WiFi, BT, Dual Core, 240MHz, VRef calibration in efuse, Coding Scheme None
Crystal is 40MHz
MAC: 90:38:0c:eb:0a:28
Uploading stub...
Running stub...
Stub running...
MAC: 90:38:0c:eb:0a:28
Hard resetting via RTS pin...

While this doesn’t tell us what board is connected, a lookup table of the ESP32 dev boards Golioth commonly tests makes it easy to identify the board. I feel like there should be a way to simply use the mac address to determine the chip type (after all, esptool.py is doing this somehow) but I haven’t yet found documentation about that.

Using Board Configuration

Now we know the board name, serial port, and serial number for each connected board. What should be done with that information? This is the topic of a future post. Generally speaking, we write it to a yaml file in the repository that is running the recognition workflow.

Yaml file showing board name, serial port, and serial number

In a future workflow, that information is written back to the self-hosted runner where it can be used by other workflows. And as part of that process, we assign labels such as has_nrf52840dk so GitHub knows which runners to use for any given test.

Take Golioth for a Test Drive

We do all this testing so that you know the Golioth platform is a rock solid way to manage your IoT fleet. Don’t reinvent the wheel, give Golioth’s free Dev Tier a try and get your fleet up and running fast.

IoT devices are usually battery-operated and, more often than not, need to run on a single battery charge for multiple years. Before we know it, MCU power consumption becomes a huge deal when developing a product. Measuring power consumption of an MCU can be challenging since it does not depend on just one thing. It depends on multiple factors like clock frequency, what is connected to the outputs, which peripherals are enabled, and also the different MCU-specific power modes in use.

With known average current consumption and battery voltage, we can easily calculate the lifetime of a battery charge. Notice the word average; we are interested in the average current consumption over some fixed time period. To maximize battery life, developers must minimize power consumption over the life of the product.

For this blog post, we are going to measure current consumption with the Power Profiler Kit II. We’ll utilize a well-known friend as the target, the nRF9160 DK.

Device Operating Modes

Most battery-powered devices spend much of their time asleep, waking up to perform their functions, and then going back to sleep. For these applications, battery life depends, at least, on three major aspects of the microcontroller:

  • Run/Active mode: device is using the most power (fetching sensor data, communication with the Cloud, running algorithms, etc.)
  • Standby/Idle mode: the “do-nothing” state (an idle thread, sleeping in a while loop, etc.)
  • Sleep Mode: deepest internal power saving mode the system can enter (System OFF mode in the case of nRF9160 DK)

The “do-nothing” state of the microcontroller often has many different forms.
Start-up time, that being how long it takes the microcontroller to go from the do-nothing state to running the application state and the application runtime. The current consumption also varies depending on the operating temperature and schedule of the tasks.

Let’s consider an example where a device needs to measure an ambient temperature from a BME280 sensor, send the reading over cellular to the cloud, and wait for a fixed time period. This is the gist of what we do with our Cold Chain Asset Tracker. Since ambient temperature is not changing rapidly, we can safely say that the fixed time period (task schedule), can be 5 minutes and that starting the measurement and sending data to the cloud takes 5 seconds. This means the duration of the Idle mode is 60 times more than the Active state (application duty cycle).

Current, Power and Energy Consumption

It is common for developers to start their processor power analysis by considering active processing power. Though it may seem counterintuitive, the power the microcontroller consumes when not operating is often more important than active processing power. Referring back to the remote sensing application, the system typically wakes up from standby mode once every 5 minutes, so it remains in standby mode more than 98% of the time.

Power is defined as:

P = I • V [W]

 

where I is the current drawn from the battery, and V is the battery voltage.

Energy is defined as:

Energy = I • V • Time = P • Time [Wh]

So energy is nothing more than Power used over a period of time. This can be expressed as [V • Ah], where, Ah is the charge stored in the battery (check the back of your power bank 🙂 ).

Lithium Ion Batteries are usually 3.7 V, with different charge stored in the battery (ranging from 100 mAh to 10 Ah or more).

Let’s consider an example. A battery has a rating of 2 Ah at 1.5V (typical AA/AAA battery); the energy stored in the battery is:

E = 2 • 1.5 = 3 [Wh]

If we connect a 1 Ω resistor to the battery (neglecting the internal battery resistance), and use Ohm’s law, it will draw 1.5 A of current. That’s a load of 2.25 W, meaning we’ll run out of battery charge after 1.3 hours.

Now that we know our device has to take advantage of its low-power mode and other techniques to lower power consumption, how do we measure the current consumption? That’s where the Power Profiles Kit 2 from Nordic comes in.

Power Profiler Kit II

The Power Profiler Kit II (PPK2) is a standalone unit, which can measure and optionally supply currents all the way from sub-μA and as high as 1 A on all Nordic DKs, in addition to external hardware. The PPK2 is powered via a standard 5 V USB cable, which can supply up to 500 mA of current. In order to supply up to 1 A of current, two USB cables are required.

An ampere meter-only mode, as well as a source mode (shown as AMP and source measure unit (SMU) respectively on the PCB) are supported. For the ampere meter mode, an external power supply must source VCC levels between 0.8 and 5 V to the device under test (DUT).

For the source mode, the PPK2 supplies VCC levels between 0.8 and 5 V and the on-board regulator supplies up to 1 A of current to external applications. It is possible to measure low sleep currents, higher active currents, as well as short current peaks on all Nordic DKs, in addition to external hardware.

Connecting the PPK2 to the nRF9160 DK is straightforward and explained on Nordic’s website.

Power Consumption of Golioth’s hello example

For the first example, we are going to measure the current consumption of Golioth’s hello sample without modifications and use the PPK2 in ampere meter mode.

We focus on three areas:

  1. Device connects to the cellular tower, Golioth Cloud, and sends a hello message
  2. Device sends a hello message to Golioth Cloud
  3. Device is in an Idle mode

From the image, the most power-intensive is the first area, where the nRF9160 DK is connecting to the cellular tower, then to Golioth Cloud, and finally, it sends a hello message. Once connected to Golioth, the device is provisioned and does not need to go through the process again (unless the IP address of the device changes or it’s using Connection ID). That’s the reason the power consumption in the second area is lower compared to the first area. Current spikes of ~100 mA come from the modem using its RF circuitry to transmit and receive data from the cellular tower. The third area is when the application is in Idle mode (there is a k_sleep call in the while loop in the sample), and the modem is not in use, with a sustained current consumption of ~35 mA.

For the second example, the hello sample is a bit altered; after five hello messages are sent with a five second delay between them, the Golioth system client is stopped, and lte_lc_offline API call is made, which sets the device to flight mode, disabling both transmit and receive RF circuits and deactivating LTE and GNSS services.

There are still current spikes when the modem is in use, but the current drawn in Idle mode (marked with 4) is 10 times smaller (~4 mA), which will prolong the battery life substantially.

Please note: the 4 mA number above had no optimizations in place and do not represent the low current capabilities of the nRF9160. We will be doing a followup article where we showcase just how low things can go!

 

Conclusion

While we were successful in lowering current consumption by a factor of 10 by disabling the modem, there are still gains to be found! In the next blog post, we’ll write more about Zephyr’s Power Mode Subsystem and how to utilize it to reduce current consumption even more!

As embedded developers, we’re consistently seeking ways to make our processes more efficient and our teams more collaborative. The magic ingredient? DevOps.Its origins stem from the need to break down silos between development (Dev) and operations (Ops) teams. DevOps fosters greater collaboration and introduces innovative processes and tools to deliver high-quality software and products more efficiently.

In this talk, we will explore how we can bring the benefits of DevOps into the world of IoT. We will focus on using GitHub Actions for continuous integration and delivery (CI/CD) while also touching on how physical device operations like shipping & logistics which can be streamlined using a DevOps approach.

Understanding the importance of DevOps in IoT is crucial to unlocking efficiencies and streamlining processes across any organization that manages connected devices. This talk, originally given at the 2023 Embedded Online Conference (EOC), serves as one of the many specialized talks freely accessible on the EOC site.

GitHub Actions for IoT

To illustrate how to put these concepts into practice, we’re going to look at a demo using an ESP32 with a feather board and Grove sensors for air quality monitoring. It’s important to note that while we utilize GitHub Actions in this instance, other CI tools like Jenkins or CircleCI can also be effectively used in similar contexts based on your team’s needs and preferences.

For this example we use GitHub Actions to automate the build and deployment process.

The two main components of our GitHub Actions workflow are ‘build’ and ‘deploy’ jobs. The ‘build’ job uses the pre-built GitHub Action for ESP-IDF to compile our code, and is triggered when a new tag is pushed or when a pull request is made. The ‘deploy’ job installs the Golioth CLI, authenticates with Golioth, uploads our firmware artifact, and releases that artifact to a set of devices over-the-air (OTA).

Imagine an organization that manages a fleet of remote air quality monitors across multiple cities. This GitHub Actions workflow triggers the build and deployment process automatically when the development team integrates new features or bug fixes into the main branch and tags the version. The updated firmware is then released and deployed to all connected air quality monitors, regardless of their location, with no additional logistics or manual intervention required. This continuous integration and deployment allows the organization to respond rapidly to changes and ensures that the monitors always operate with the latest updates.

Let’s delve into the GitHub Actions workflow and walk through each stage:

  1. Trigger: The workflow is activated when a new tag is pushed or a pull request is created.
    on:
      push:
        # Publish semver tags as releases.
        tags: [ 'v*.*.*' ]
      pull_request:
        branches: [ main ]
  2. Build: The workflow checks out the repository, builds the firmware using the ESP-IDF GitHub Action, and stores the built firmware artifact.
    jobs:
      build:
        runs-on: ubuntu-latest
        steps:
        - name: Checkout repo
          uses: actions/checkout@v3
          with:
            submodules: 'recursive'
        - name: esp-idf build
          uses: espressif/esp-idf-ci-action@v1
          with:
            esp_idf_version: v4.4.4
            target: esp32
            path: './'
          env:
            WIFI_SSID: ${{ secrets.WIFI_SSID }}
            WIFI_PASS: ${{ secrets.WIFI_PASS }}
            PSK_ID: ${{ secrets.PSK_ID }}
            PSK: ${{ secrets.PSK }}
        - name: store built artifact
          uses: actions/upload-artifact@v3
          with:
            name: firmware.bin
            path: build/esp-air-quality-monitor.bin
  3. Deploy: The workflow installs the Golioth CLI, authenticates with Golioth, downloads the built firmware artifact, and uploads it to Golioth for OTA updates.
    jobs:
      build:
        runs-on: ubuntu-latest
        steps:
        - name: Checkout repo
          uses: actions/checkout@v3
          with:
            submodules: 'recursive'
        - name: esp-idf build
          uses: espressif/esp-idf-ci-action@v1
          with:
            esp_idf_version: v4.4.4
            target: esp32
            path: './'
          env:
            WIFI_SSID: ${{ secrets.WIFI_SSID }}
            WIFI_PASS: ${{ secrets.WIFI_PASS }}
            PSK_ID: ${{ secrets.PSK_ID }}
            PSK: ${{ secrets.PSK }}
        - name: store built artifact
          uses: actions/upload-artifact@v3
          with:
            name: firmware.bin
            path: build/esp-air-quality-monitor.bin

For those eager to dive in and start implementing DevOps into their own IoT development process, we’ve provided an example Github Actions workflow file on GitHub. Feel free to fork this repository and use it as a starting point for streamlining your own IoT firmware development process. Remember, the best way to learn is by doing. So, get your hands dirty, experiment, iterate, and innovate. If you ever need help or want to share your experiences, please reach out in our our community forum.

Golioth collects data from your entire IoT sensor fleet and makes it easy to access from the cloud. Data visualization is a common use case and we love using Grafana to make dashboards for our fleets. It can start to get tricky if your devices are sending back mountains of different data. We’ll demonstrate how you can use Golioth REST API Queries to capture specific data endpoints from devices. By doing so, you can focus on the most important data points and utilize them to create powerful visual dashboards in Grafana.

Don’t Mix Your Endpoints

By far the easiest way to query streaming sensor data is just to ask for all of the events. But this presents an interesting challenge in Grafana. If your query response includes records that don’t have your desired endpoint, Grafana will see that as null and most often this will break your graph:

Grafana error screenshot

Grafana showing the “Fields have different lengths” error

This is because the graph is targetting the sensor endpoint, but if we look at Golioth we can see that the device is sending data to the battery endpoint in between sensor readings. The sensor endpoint doesn’t exist for those readings.

Battery and sensor data being received on Golioth's LightDB Stream service

Let’s diagnose, and fix this issue using the Query Builder button in the upper right of the LightDB Stream page of the Golioth Console.

Using Query Builder to Filter Data

As it stands now, the query is asking for all data from all events by using the * wildcard operator:

Query Builder getting all endpoints

What we really want is just one endpoint that we can graph (along with the time and deviceId). So let’s change the query builder to target the current measurement on channel 0 of the incoming sensor data. I’ll also use an alias to make it easy to access the returned value.

Targeting the sensor.cur.ch0 endpoint

This is a good start. But if you look at the data, we still haven’t solved the problem:

Golioth LightDB endpoints sometimes null

The dashes are the null values returned when trying to query our sensor data when it was a battery endpoint that was received. The final part of the puzzle is to add a filter that checks that our desired endpoint is not null.

Filtering out null values on an endpoint Now Golioth will only send data that includes the value we want to graph:

Correctly filtered sensor value

Now that we are receiving exactly the data we need, we can copy the query from the Query Builder by clicking the JSON button and then the Copy button.

Using the Query in Grafana

We previously set up a REST API data source for Golioth in our Grafana Dashboard. Use the Body tab to enter your query using the new values we created with the Query Builder.

Grafana JSON query

In the Fields tab we’re using the alias that we specified. You could use the full endpoint but as your panels become more intricate you’ll thank yourself for using aliases!

Fields for graphing values in Grafana

For completeness, we’ll leave you with the verbose body object from this Grafana panel. You can see the parts we crafted in the Query Builder make up the query field.

{
  "start": "${__from:date}",
  "end": "${__to:date}",
  "perPage": 99999,
  "query": {
    "fields": [
      {
        "path": "time"
      },
      {
        "path": "deviceId"
      },
      {
        "path": "sensor.cur.ch0",
        "alias": "cur0"
      }
    ],
    "filters": [
      {
        "path": "sensor.cur.ch0",
        "op": "<>",
        "value": "null"
      }
    ]
  }
}

Give Golioth a Try Today!

The proof of concept for your IoT fleet can be up and running in days, not weeks if you use Golioth as your instant IoT cloud. Combined with visualization tools like Grafana, you’ll have no problem getting to “yes” with your customers (or your boss). Your first 50 devices are free with our Dev Tier, so give Golioth a try today!

When I started at Golioth, I wanted to understand the service offerings from different companies in the space. What resulted, was a chart (below), slideshow (below), and a presentation I have now given a couple different times (video, above). All of these things showcase how I started to investigate and understand the inter-operating pieces of any modern Internet of Things (IoT) system.

Only 12 Layers?

Let’s start with the obvious: I could have chosen just about any number of layers to explain IoT systems.

I chose to highlight the pieces of IoT systems as someone working on building Reference Designs, which have a Device component and a Cloud component. I framed many of these capabilities from the perspective of my time at Golioth and the aspects of the system that I found to be crucial to a deployment. I wanted to understand the competitive landscape as a starting point (finding other companies providing IoT platforms) and expanded from there.

I stopped at 12, because that is roughly the amount I thought I could describe without audience members falling asleep. 12 layers also captures a good amount of functionality. However, it misses many other pieces of IoT implementation and very easily could have been the 100 layers of IoT.

The Chart

The true beginning of this talk was a diagram I created. It shows how I understood offerings from different IoT providers, with varying levels of “vertical integration” (how much of the solution they provide to their users). Some providers go all the way from the hardware up to the Cloud. Some provide key infrastructure pieces (like Golioth) and point you at external providers in order to give you more flexibility. Some are hyper specialized in one area.

The key idea was to point out that these providers exist and that there are different reasons to choose one over another. That is to say: each has a valid business proposition and fits a customer profile. One provider might serve an operations group that is looking to add connectivity to an existing business and simply spit data out onto the internet—many businesses want that. Other times, a technology group inside a larger product development company might be looking to supplement their product design capabilities and not do everything in-house. It’s important to understand the landscape as a design engineer evaluating what they should buy and what they should DIY. Each layer in the presentation has the pro’s and con’s of “Buy vs DIY”.

Beyond IoT Platform Providers

Unstated and unshown on the chart is the “DIY” of IoT subsystems. As mentioned in the video and slides, it’s possible that companies want to develop everything from the ground up and maintain all of their own IP. The downside is the high cost in terms of people and time, and often the “full DIY” method of developing is relegated to the largest companies looking to develop an IoT product.

Others are utilizing building blocks from hyper-scalers like AWS and Azure. Sometimes this takes the form of using hyper scaler IoT platforms (AWS IoT, Azure IoT), and other times, they use even more basic computing elements in the cloud (EC2s, S3 buckets, Lambdas). In all these cases, there is a significant amount of “Cloud Software” that is written and maintained by the company looking to develop an IoT product.

How do the 12 Layers of IoT impact you?

These resources exist to help engineers and business people new to the IoT industry to understand how to create successful IoT deployments. Most importantly, this talk sought to remedy the problem of “you don’t know what you don’t know” (reference). If you don’t know about potential pitfalls in deploying an IoT solution and what you might need 2 or 3 years down the line, you won’t be able to take steps up front to prevent those problems.

A tangible example is in “layer 10”, which is listed as “deployment flexibility”. If you want to hire an IoT platform to get your deployment off the ground, but later will want to run your own cloud infrastructure, you need to choose different options when creating your system. Platforms like Golioth allow you to run your own cloud infrastructure as part of our Enterprise plan. Companies that don’t choose this path at the beginning of their deployment find themselves re-architecting their entire solution (all the way down to the hardware) at a later time in order to implement a bespoke cloud solution that fits their needs.

The Presentation

Below is a refined version of the talk that I gave at All Things Open in Raleigh NC in November 2022. Unfortunately that version of the talk wasn’t recorded, but the slides below are the most up-to-date version I have.

Did I miss a layer?

I am continually refining my concept of what comprises IoT deployments and the required pieces. It’s possible I missed out on something important. Maybe there are critical pieces of infrastructure that you think I glossed over. We’d love to hear your thoughts on our forum or on social media (Twitter, LinkedIn, Facebook).

The Golioth Zephyr SDK is now 100% easier to use. The new v0.4.0 release was tagged on Wednesday and delivers all of the Golioth features with less coding work needed to use them in your application. Highlights include:

  • Asynchronous function/callback declaration is greatly simplified
    • User-defined data can now be passed to callbacks
  • Synchronous versions of each function were added
  • API is now CoAP agnostic (reply handling happens transparently)
  • User code no longer needs to register an on_message() callback
  • Verified with the latest Zephyr v3.2.0 and NCS v2.1.0

The release brings with it many other improvements that make your applications better, even without knowing about them. For the curious, check out the changelog for full details.

Update your code for the new API (or not)

The new API includes some breaking changes and to deal with this you have two main options:

  1. Update legacy code to use the new API
  2. Lock your legacy code to a previous Golioth version

1. Updating legacy code

The Golioth Docs have been updated for the new API, and reading through the Firmware section will give you a great handle on how everything works. The source of truth continues to be the Golioth Zephyr SDK reference (Doxygen).

Updating to the new API is not difficult. I’ve just finished working on that for a number of our hardware demos, including the magtag-demo repository we use for our Developer Training sessions. The structure of your program will remain largely the same, with Golioth function names and parameters being the most noticeable change.

Consider the following code that uses the new API to perform an asynchronous get operation for LightDB State data:

/* The callback function */
static int counter_handler(struct golioth_req_rsp *rsp)
{
    if (rsp->err) {
        LOG_ERR("Failed to receive counter value: %d", rsp->err);
        return rsp->err;
    }

    LOG_HEXDUMP_INF(rsp->data, rsp->len, "Counter (async)");

    return 0;
}

/* Register the LightDB Get callback from somewhere in your code */
static int my_function(void)
{
    int err;
    err = golioth_lightdb_get_cb(client, "counter",
                                 GOLIOTH_CONTENT_FORMAT_APP_JSON,
                                 counter_handler, NULL);
}

Previously, the application code would have needed to allocate a coap_reply, pass it as a parameter in the get function call, use the on_message callback to process the reply, then unpack the payload in the reply callback before acting on it. All of that busy work is gone now!

With the move to the v0.4.0 API, we don’t need to worry about anything other than:

  • Registering the callback function
  • Working with the data (or an error message) when we hear back from Golioth.

You can see the response struct makes the data itself, the length of the data, and the error message available in a very straightforward way.

A keen eye already noticed the NULL as the final parameter. This is a void * type that lets you pass your user-defined data to the callback. Any value that’s 4-bytes or less can be passed directly, or you can pass a pointer to a struct packed with information. Just be sure to be mindful of the memory allocation lifespan of what you pass.

All of the asynchronous API function calls follow this same pattern for callbacks and requests. The synchronous calls are equally simple to understand. I found the Golioth sample applications to be a great blueprint for updating the API calls in my application code. The changelog also mentions each API-altering commit which you may find useful for additional migration info.

The Golioth Forum is a great place to ask questions and share your tips and tricks when getting to know the new syntax.

2. Locking older projects to an earlier Golioth

While we recommend updating your applications, if you do have the option to continue using an older version of Golioth instead. For that, we recommend using a west manifest to lock your project to a specific version of Golioth.

Manifest files specify the repository URL and tag/hash/branch that should be checked out. That version is used when running west update, which then imports a version of Zephyr and all supporting modules specified in the Golioth SDK manifest to be sure they can build the project in peace and harmony.

By adding a manifest to your project that references the Golioth Zephyr SDK v0.3.1 (the latest stable version before the API change) you can ensure your application will build in the future without the need to change your code. Please see our forum thread on using a west manifest to set up a “standalone” repository for more information.

A friendlier interface improves your Zephyr experience

Version 0.4.0 of the Golioth Zephyr SDK greatly improves the ease-of-use when adding device management features to your IoT applications. Device credentials and a few easy-to-use APIs are all it takes to build in data handling, command and control, and Over-the-Air updates into your device firmware. With Golioth’s Dev Tier your first 50 devices are free so you can start today, and as always, get in touch with us if you have any questions along the way!