Controlling 10 devices is easy, controlling 10,000 is a different story. The trick is to plan for scale, which is what we specialize in here at Golioth.

A few weeks ago we announced the Golioth Device Settings Service that enables you to change settings for your entire fleet at the click of a button. Of course, you can drill down to groups of devices and single devices too. While the announcement post showed the Settings Service using our ESP-IDF SDK, today we’ll look at this setting service from the device point of view with Zephyr RTOS.

Device-side building blocks

The good news is that the Golioth Zephyr SDK has done all the hard work for you. We just need to do three things to start using the settings service in any Zephyr firmware project:

  1. Enable the settings service in KConfig
  2. Register a callback function when the device connects to Golioth
  3. Validate the data and do something useful when a new setting arrives

Code walk-through

Golioth’s settings sample code is a great starting point. It watches for a settings key named LOOP_DELAY_S to remotely adjust the delay between sending messages back to the server. Let’s walk through the important parts of that code for a better understanding of what is involved.

1. Enable settings in KConfig

CONFIG_GOLIOTH_SETTINGS=y

To turn the settings service on, the CONFIG_GOLIOTH_SETTINGS symbol needs to be selected. Add the code above to your prj.conf file.

2. Register a callback for settings updates

static void golioth_on_connect(struct golioth_client *client)
{
    if (IS_ENABLED(CONFIG_GOLIOTH_SETTINGS)) {
        int err = golioth_settings_register_callback(client, on_setting);

        if (err) {
            LOG_ERR("Failed to register settings callback: %d", err);
        }
    }
}

To do anything useful with the settings service, we need to be able to execute a function when new settings are received. Register the callback function each time the device connects to Golioth. Your app may already have a golioth_on_connect() function declared, look for the following code at the beginning of main and add it if it’s not already there.

	client->on_connect = golioth_on_connect;
	client->on_message = golioth_on_message;
	golioth_system_client_start();

Each time the Golioth Client detects a new connection, the golioth_on_connect() function will run, which in turn registers your on_setting() function to run.

3. Validate and process the received settings

enum golioth_settings_status on_setting(
        const char *key,
        const struct golioth_settings_value *value)
{
    LOG_DBG("Received setting: key = %s, type = %d", key, value->type);
    if (strcmp(key, "LOOP_DELAY_S") == 0) {
        /* This setting is expected to be numeric, return an error if it's not */
        if (value->type != GOLIOTH_SETTINGS_VALUE_TYPE_INT64) {
            return GOLIOTH_SETTINGS_VALUE_FORMAT_NOT_VALID;
        }

        /* This setting must be in range [1, 100], return an error if it's not */
        if (value->i64 < 1 || value->i64 > 100) {
            return GOLIOTH_SETTINGS_VALUE_OUTSIDE_RANGE;
        }

        /* Setting has passed all checks, so apply it to the loop delay */
        _loop_delay_s = (int32_t)value->i64;
        LOG_INF("Set loop delay to %d seconds", _loop_delay_s);

        return GOLIOTH_SETTINGS_SUCCESS;
    }

    /* If the setting is not recognized, we should return an error */
    return GOLIOTH_SETTINGS_KEY_NOT_RECOGNIZED;
}

The callback function will provide a key (the name of the settings) that is a string type, and a golioth_settings_value that includes the value, as well as an indicate of the value type (bool, float, string, or int64). The callback function should validate that the expected key was received, and test that the received value is the proper type and within your expected bounds before acting upon the data.

The final piece of the device settings puzzle is to return an enum value indicating the device status after receiving a new setting. This is used by the Golioth Cloud to indicate sync status. Our example demonstrates the GOLIOTH_SETTINGS_SUCCESS and GOLIOTH_SETTINGS_KEY_NOT_RECOGNIZED status values, but it’s worth checking out the doxygen reference for golioth_settings_status to see how different invalid key and value messages can be sent to Golioth for display in the web console.

Coordinating with the Golioth Cloud

The device settings key format is tightly specified by Golioth, allowing only “A-Z (upper case), 0-9 and underscore (_) characters”. We recommend that you add a key to the Golioth web console while writing your firmware to ensure the device knows what to expect.

Each device page includes a status indicator to show if the settings are in sync with the Golioth cloud.

Additional information is available when a device is out of sync. Here you can see that hovering over the information icon indicates the key is not valid. For this example I created a VERY_IMPORTANT_BOOL_7 but didn’t add support for that to the firmware. This is the result of our callback function returning GOLIOTH_SETTINGS_KEY_NOT_RECOGNIZED. In cases like this, Golioth’s Over-the-Air firmwmare update (OTA) service can help you to push a new firmware to your devices in the field that will recognize the newly created Settings variable.

IoT Fleet control on day one

The pain of IoT often comes when you need to scale. Golioth’s mission is to make sure you avoid that pain and are ready to grow starting from day one.

Our device settings service gives you the power to remotely change settings on your entire fleet, on a group of devices, and all the way down to a single device. You are able to verify your devices are in sync, and receive helpful information when they are not. Take Golioth for a test drive today, our dev tier is free for the first 50 devices.

We use Grafana at Golioth to help us showcase data for internal and external demos. As in any IoT deployment, the hardware does not exist just in the physical realm; it also sends data back to the internet and we need some way to show that data. Grafana is an open source solution that can run locally on your computer, on your own cloud, or on the Grafana Cloud. We have written about our past use of Grafana, including our custom WebSockets plugin that makes it easier to see changes to your data as it happens.

When you’re ready to highlight changes from a device, graphics can make it easier to understand what’s happening. Let’s take a look at how to add dynamic graphics to your Grafana Dashboard.

Why you should use graphics

These graphics convey immediate understanding for magnet sensor state

Humans are a visual species. Our eyes naturally track movement in nature, but that also translates to computer screens. Adding some dynamic graphics will not only grab attention, but allow users to quickly assess if there are problems. A quick scan over a graphical interface allows you to take fast action on critical changes to your deployment.

In the “Red Demo” video below, you can see the device detecting a change in proximity sensor state, which could indicate something like a break-in at a home or business. A dedicated graphic on the dashboard clearly indicates this change to the user.

Another reason to look at simplified graphics is the changing nature of your IoT Fleet Deployment. As you continue to add people to a project, the knowledge around the technical details often diffuses; the people that designed and deployed the firmware and hardware aren’t always the ones maintaining the deployment. You want to design a dashboard that anyone can walk up to and understand the current state of the device, regardless of their technical background. Most importantly, you want to indicate whether any action needs to be taken. This could include adding callouts to your graphics or other indicators that the device is nominal or in an error state. It’s much beyond this article, but there are books on proper design of Human Machine Interfaces and case studies of how they have gone wrong.

Finally, let’s compare graphics to the common alternative in dashboards: graphs. While it’s useful to see data changing over time, a new user won’t have context for the changes. If they walk up to a dashboard and glance at a moving chart of temperature, there’s no way to understand why it’s being shown and required actions (“the chart just went up, was that good or bad?”). Even if you start with stateful based indicators like dials and readouts, these can be enhanced by adding contextual information directly to the graphics.

How to add dynamic graphics

First, you will need a Grafana instance, configured to access the datasource of your Golioth deployment. We have found it makes the most sense to highlight the current state of the device by using a WebSockets source, instead of polling the REST API on a regular interval. Using WebSockets will ensure you are immediately notified of the changes as they propagate through your device deployment.

Plugin

To start, you need to install the “Dynamic Image Panel” Plugin from the Grafana plugin marketplace. This is a signed plugin, and should be safe for all installs.  In the example, they showcase a weather application, talking to an API that fetches different weather condition graphics. We will be creating a slightly different implementation, based on hosted images on a cloud server without an API. This is not as robust generally, but is much simpler as an implementation.

Installing the plugin from the Grafana marketplace (available locally or on the cloud)

Hosted Graphics

Next you need to put the associated graphics somewhere that the Grafana instance will be able to access them. In our case, we used Firebase hosting to create a static file location. This can be any service that allows public access to the images, but we do caution you to test that you can access the images when you’re not logged in, or your dashboard may not be able to pull in those images you created.

Assigning graphics to device states

We created the “Red Demo” graphics to correspond to proximity sensors based on sensing a magnetic field. The device is reporting the state of that mag sensor over LightDB Stream, along with temperature and accelerometer data on a regular basis (or on-demand when the button is pushed, as shown in the demo video inset above). The logic is reverse for this sensor (‘0’ means there is a magnet present, ‘1’ means it is not), which we maintained throughout the system. We then created graphics that append that reading to a “base” address to access each individual image. When there is no reading available, it defaults to the NULL case, which means nothing is appended to the “base” address, and we show that it is in an unknown state (far left image).On the plugin, you can customize all of this and make it match your needs. Instead of ‘1’ and ‘0’, a device might be sending linear values from 0-10, or it might be sending “true” or “false” (boolean), or even sending back general strings. The main thing to consider when designing your graphics is that you want your device and the Golioth Cloud to categorize physical readings into “buckets” that are easy to display.

Creating “movement” in graphics

Sometimes you’ll want to make changes in state a bit more subtle, especially if your device is reporting more than a binary state (‘on’/’off’). In Golioth’s recently released “Green Demo”, you’ll see there are 6 states for a lamp used in our Greenhouse example. The graphics change along with the intensity of the light coming out of the lamp, while also overlaying a numeric value.

Enhance your next dashboard

Using graphics in your next IoT dashboard can help you add interesting informational content, alert your users, or just spice up what is normally a set of boring charts. If you want help getting your next dashboard started, reach out on our forum, our discord, or send us a note.

This is the second part of How Golioth uses Hardware-in-the-Loop (HIL) Testing. In the first part, I covered what HIL testing is, why we use it at Golioth, and provided a high-level view of the system.

In this post, I’ll talk about each step we took to set up our HIL. It’s my belief that HIL testing should be a part of any firmware testing strategy, and my goal with this post is to show you that it’s not so difficult to get a minimum viable HIL set up within one week.

The Big Pieces

To recap, this is the system block diagram presented at the end of part 1:

The Raspberry Pi acts as a GitHub Actions Self-Hosted Runner which communicates with microcontroller devices via USB-to-serial adapters.

The major software pieces required are:

  • Firmware with Built-In-Test Capabilities – Runs tests on the device (e.g. ESP32-S3-DevKitC), and reports pass/fail for each test over serial.
  • Python Test Runner Script – Initiates tests and checks serial output from device. Runs on the self-hosted runner (e.g. Raspberry Pi).
  • GitHub Actions Workflow – Initiates the python test runner and reports results to GitHub. Runs on the self-hosted runner.

Firmware Built-In-Test

To test our SDK, we use a special build of firmware which exposes commands over a shell/CLI interface. There is a command named built_in_test which runs a suite of tests in firmware:

esp32> built_in_test
...
12 Tests 0 Failures 0 Ignored 
../main/app_main.c:371:test_connects_to_wifi:PASS
../main/app_main.c:378:test_golioth_client_create:PASS
../main/app_main.c:379:test_connects_to_golioth:PASS
../main/app_main.c:381:test_lightdb_set_get_sync:PASS
../main/app_main.c:382:test_lightdb_set_get_async:PASS
../main/app_main.c:383:test_lightdb_observation:PASS
../main/app_main.c:384:test_golioth_client_heap_usage:PASS
../main/app_main.c:385:test_request_dropped_if_client_not_running:PASS
../main/app_main.c:386:test_lightdb_error_if_path_not_found:PASS
../main/app_main.c:387:test_request_timeout_if_packets_dropped:PASS
../main/app_main.c:388:test_client_task_stack_min_remaining:PASS
../main/app_main.c:389:test_client_destroy_and_no_memory_leaks:PASS

These tests cover major functionality of the SDK, including connecting to Golioth servers, and sending/receiving CoAP messages. There are also a couple of tests to verify that stack and heap usage are within allowed limits, and that there are no detected memory leaks.

Here’s an example of one of the tests that runs on the device, test_connects_to_golioth:

#include "unity.h"
#include "golioth.h"

static golioth_client_t _client;
static SemaphoreHandle_t _connected_sem;

static void on_client_event(golioth_client_t client, 
                            golioth_client_event_t event, 
                            void* arg) {
    if (event == GOLIOTH_CLIENT_EVENT_CONNECTED) {
        xSemaphoreGive(_connected_sem);
    }
}

static void test_connects_to_golioth(void) {
    TEST_ASSERT_NOT_NULL(_client);
    TEST_ASSERT_EQUAL(GOLIOTH_OK, golioth_client_start(_client));
    TEST_ASSERT_EQUAL(pdTRUE, xSemaphoreTake(_connected_sem, 10000 / portTICK_PERIOD_MS));
}

This test attempts to connect to Golioth servers. If it connects, a semaphore is given which the test waits on for up to 10 seconds (or else the test fails). For such a simple test, it covers a large portion of our SDK, including the network stack (WiFi/IP/UDP), security (DTLS), and CoAP Rx/Tx.

We use the Unity test framework for basic test execution and assertion macros.

If you’re curious about the details, you can see the full test code here: https://github.com/golioth/golioth-firmware-sdk/blob/main/examples/esp_idf/test/main/app_main.c

Python Test Runner

Another key piece of software is a Python script that runs on the self-hosted runner and communicates with the device over serial. We call this script verify.py. In this script, we provision the device with credentials, issue the built_in_test command, and verify the serial output to make sure all tests pass.

Here’s the main function of the script:

def main():
    port = sys.argv[1]

    # Connect to the device over serial and use the shell CLI to interact and run tests
    ser = serial.Serial(port, 115200, timeout=1)

    # Set WiFi and Golioth credentials over device shell CLI
    set_credentials(ser)
    reset(ser)

    # Run built in tests on the device and check output
    num_test_failures = run_built_in_tests(ser)
    reset(ser)

    if num_test_failures == 0:
        run_ota_test(ser)

    sys.exit(num_test_failures)

This opens the serial port and calls various helper functions to set credentials, reset the board, run built-in-tests, and run an OTA test. The exit code is the number of failed tests, so 0 means all passed.

The script spends most of its time in this helper function which looks for a string in a line of serial text:

def wait_for_str_in_line(ser, str, timeout_s=20, log=True):
    start_time = time()
    while True:
        line = ser.readline().decode('utf-8', errors='replace').replace("\r\n", "")
        if line != "" and log:
            print(line)
        if "CPU halted" in line:
            raise RuntimeError(line)
        if time() - start_time > timeout_s:
            raise RuntimeError('Timeout')
        if str in line:
            return line

The script will continually read lines from serial for up to timeout_s seconds until it finds what it’s looking for. And if “CPU halted” ever appears in the output, that means the board crashed, so a runtime error is raised.

This verify.py script is what gets invoked by the GitHub Actions Workflow file.

GitHub Actions Workflow

The workflow file is a YAML file that defines what will be executed by GitHub Actions. If one of the steps in the workflow returns a non-zero exit code, then the workflow will fail, which results in a red ❌ in CI.

Our workflow is divided into two jobs:

  • Build the firmware and upload build artifacts. This is executed on cloud servers, since building the firmware on a Raspberry Pi would take a prohibitively long time (adding precious minutes to CI time).
  • Download firmware build, flash the device, and run verify.py. This is executed on the self-hosted runner (my Raspberry Pi).

The workflow file is created in .github/workflows/test_esp32s3.yml. Here are the important parts of the file:

name: Test on Hardware

on:
  push:
    branches: [ main ]
  pull_request:

jobs:
  build_for_hw_test:
    runs-on: ubuntu-latest
    steps:
    ...
    - name: Build test project
      uses: espressif/esp-idf-ci-action@v1
      with:
        esp_idf_version: v4.4.1
        target: esp32s3
        path: 'examples/test'
    ...

  hw_flash_and_test:
    needs: build_for_hw_test
    runs-on: [self-hosted, has_esp32s3]
    ...
    - name: Flash and Verify Serial Output
      run: |
        cd examples/test
        python flash.py $CI_ESP32_PORT && python verify.py $CI_ESP32_PORT

The first job runs in a normal GitHub runner based on Ubuntu. For building the test firmware, Espressif provides a handy CI Action to do the heavy lifting.

The second job has a needs: build_for_hw_test requirement, meaning it must wait until the first job completes (i.e. firmware is built) before it can run.

The second job has two further requirements – it must run on self-hosted and there must be a label has_esp32s3. GitHub will dispatch the job to the first available self-hosted runner that meets the requirements.

If the workflow completes all jobs, and all steps return non-zero exit code, then everything has passed, and you get the green CI checkmark ✅.

Self-Hosted Runner Setup

With all of the major software pieces in place (FW test framework, Python runner script, and Actions workflow), the next step is to set up a machine as a self-hosted runner. Any machine capable of running one of the big 3 operating systems will do (Linux, Windows, MacOS).

It helps if the machine is always on and connected to the Internet, though that is not a requirement. GitHub will simply not dispatch jobs to any self-hosted runners that are not online.

Install System Dependencies

There are a few system dependencies that need to be installed first. On my Raspberry Pi, I had to install these:

sudo apt install \
    python3 python3-pip python3-dev python-dev \
    libffi-dev cmake direnv wget gcovr clang-format \
    libssl-dev git
pip3 install pyserial

Next, log into GitHub and create a new self-hosted runner for your repo (Settings -> Actions -> Runners -> New self-hosted runner):

When you add a new self-hosted runner, GitHub will provide some very simple instructions to install the Actions service on your machine:

When I ran the config.sh line, I added --name and --unattended arguments:

./config.sh --url <https://github.com/golioth/golioth-firmware-sdk> --token XXXXXXXXXXXXXXXXXXXXXXXXXXXXX --name nicks-raspberry-pi --unattended

Next, install the service, which ensures GitHub can communicate with the self-hosted runner even if it reboots:

cd actions-runner
sudo ./svc.sh install

If there are any special environment variables needed when executing workflows, these can be added to runsvc.sh. For instance, I defined a variable for the serial port of the ESP32:

# insert anything to setup env when running as a service
export CI_ESP32_PORT=/dev/ttyUSB0

Finally, you can start the service:

sudo ./svc.sh start

And now you should see your self-hosted runner as “Idle” in GitHub Actions, meaning it is ready to accept jobs:

I added two custom labels to my runner: has_esp32s3 and has_nrf52840dk, to indicate that I have those two dev boards attached to my self-hosted runner. The workflow file uses these labels so that jobs get dispatched to self-hosted runners only if they have the required board.

And that’s it! With those steps, a basic HIL is up and running, integrated into CI, and covering a large amount of firmware code on real hardware.

Security

There’s an important note from GitHub regarding security of self-hosted runners:

Warning: We recommend that you only use self-hosted runners with private repositories. This is because forks of your repository can potentially run dangerous code on your self-hosted runner machine by creating a pull request that executes the code in a workflow.

I also stand by that recommendation. But it’s not always feasible if you’re maintaining an open-source project. Our SDKs are public, so this warning certainly applies to us.

If you’re not careful, you could give everyone on the Internet root access to your machine! It wouldn’t be very hard for someone to submit a pull request that modifies the workflow file to run whatever arbitrary Linux commands they want.

We mitigate most of the security risks with three actions.

1. Require approval to run workflows for all outside contributors.

This means someone from the Golioth organization must click a button to run the workflow. This allows the repo maintainer time to review any workflow changes before allowing the workflow to run.

This option can be enabled in Settings -> Actions -> General:

Once enabled, pull requests will have a new “Approve and Run” button:

This is not ideal if you’re running a large open-source project with lots of outside contributors, but it is a reasonable compromise to make during early stages of a project.

2. Avoid logging sensitive information

The log output of any GitHub Actions runner is visible publicly. If there is any sensitive information printed out in the workflow steps or from the serial output of the device, then this information could easily fall into the wrong hands.

For our purposes, we disabled logging WiFi and Golioth credentials.

3. Utilize GitHub Actions Secrets

You can use GitHub Actions Secrets to store sensitive information that is required by the runner. In our repo, we define a secret named `GOLIOTH_API_KEY` that allows us access to the Golioth API, which is used to deploy OTA images of built firmware during the workflow run.

If you want to know more about self-hosted runner security, these resources were helpful to me:

Tips and Tricks

Here are a few extra ideas to utilize the full power of GitHub Actions Self-Hosted Runners (SHRs):

  • Use the same self-hosted runner in multiple repos. You can define SHRs at the organization level and allow their use in more than one repo. We do this with our SHRs so they can be used to run tests in the Golioth Zephyr SDK repo and the Golioth ESP-IDF SDK repo.
  • Use labels to control workflow dispatch. Custom labels can be applied to a runner to control when and how a workflow gets dispatched. As mentioned previously, we use these to enforce that the SHR has the dev board required by the test. We have a label for each dev board, and if the admin of the SHR machine has those dev boards attached via USB, then they’d add a label for each one. This means that machine will only receive workflows that are compatible with that machine. If a board is acting up or requires maintenance, the admin can simply disconnect the board and remove the label to prevent further workflows from executing.
  • Distributed self-hosted runners to load balance and minimize CI bottlenecks. If there is only one self-hosted runner, it can become a bottleneck if there are lots of jobs to run. Also, if that runner loses Internet, then CI is blocked indefinitely. To balance the load and create more robust HIL infrastructure, it’s recommended to utilize multiple self-hosted runners. Convince your co-workers to take their unused Intel NUC off the shelf and follow the steps in the “Self-Hosted Runner Setup” section to add another runner to the pool. This allows for HIL tests to run in a distributed way, no matter where in the world you are.

Here’s a scenario that might sound familiar.

You’ve added a new feature to the firmware, about 800 lines of code, along with some simple unit tests. All unit tests are passing. The code coverage report is higher than ever. You pat yourself on the back for a job well done.

But then – for a brief moment, conviction wavers as a voice in your head whispers: “The mocks aren’t real”; “Does this code really work?”; “You forgot to test OTA.”

But the moment passes, and your faith is restored as you admire the elegant green CI check mark ✅.

Do you trust your unit tests enough to deploy the firmware right now to thousands of real devices via Over-the-Air (OTA) firmware update?

If the answer is “well, no, not really, we still need to test a few things with the real hardware”, then automated Hardware-in-the-Loop (HIL) testing might give you the confidence to upgrade that answer to “heck yeah, let’s ship it!”. 🚀

In this post, I’ll explain what HIL testing is, and why we use it at Golioth to continuously verify the firmware for our Zephyr and ESP-IDF SDKs.

HIL Testing: A Definition

If you search online, you’ll find a lot of definitions of what HIL testing is, but I’ll try to formulate it in my own simple terms.

There are three major types of tests:

  • Unit – the smallest kind of test, often covering just one function in isolation
  • Integration – testing multiple functions or multiple subsystems together
  • System – testing the entire system, fully integrated

A firmware test can run in three places:

  • On your host machine – Running a binary compiled for your system
  • On emulated target hardware – Something like QEMU or Renode
  • On real target hardware – Running cross-compiled code for the target, like an ARM or RISC-V processor.

A HIL test is a system test that runs on real target hardware, with real peripherals, and real connections. Typically, a HIL setup will consist of:

  • A board running your firmware (the code under test)
  • A host test machine, executing the tests
  • (optional) Other peripherals and test equipment to simulate real-world conditions, controlled by the host test machine

Coverage and Confidence

Why bother with HIL tests? Aren’t unit tests good enough?

To address this, I’ll introduce two terms:

  • Coverage: How much of the firmware is covered by tests?
  • Confidence: How confident are you that the firmware behaves as intended?

Coverage is easily measured by utilizing open-source tools like gcov. It’s an essential part of any test strategy. Confidence is not as easily measured. It’s often a gut feeling.

It’s possible to have 100% coverage and 0% confidence. You can write tests that technically execute every line of code and branch combination without testing any intended functionality. Similarly, you can have 0% coverage and 100% confidence, but if you encounter someone like this, they are, to put it bluntly, delusional.

Unit tests are great at increasing coverage. The code under test is isolated from the rest of the system, and you can get into all the dark corners. “Mocks” are used to represent adjacent parts of the system, usually in a simplistic way. This might be a small piece of code that simulates the response over a UART. But this means is you can’t be 100% confident the code will work the same way in the context of the full system.

HIL system tests are great at increasing confidence, as they demonstrate that the code is functional in the real world. However, testing things at this level makes it difficult to cover the dark corners of code. The test is limited to the external interfaces of the system (ie. the UART you had a mock for in the above example).

A test strategy involving both unit tests and HIL tests will give you high coverage and high confidence.

Continuously Verifying Golioth SDK Firmware

Why do we use HIL testing at Golioth?

We have the concept of Continuously Verified Boards, which are boards that get first-class support from us. We test these boards on every SDK release across all Golioth services.

Previously, these boards were manually tested each release, but there were several problems we encountered:

  • It was time-consuming
  • It was error-prone
  • It did not scale well as we added boards
  • OTA testing was difficult, with lots of clicking and manual steps involved
  • The SDK release process was painful
  • We were unsure whether there had been regressions between releases

We’ve alleviated all of these problems by introducing HIL tests that automatically run in pull requests and merges to the main branch. These are integrated into GitHub Actions (Continuous Integration, CI), and the HIL tests must pass for new code to merge to main.

Here’s the output of the HIL test running in CI on a commit that merged to main recently in the ESP-IDF SDK:

HIL Test Running in CI

This is just the first part of the test where the ESP32 firmware is being flashed.

Further down in the test, you can see we’re running unit tests on hardware:

Automated test output

And we even have an automated OTA test:

Automated OTA Test Output

As a developer of this firmware, I can tell you that I sleep much better at night knowing that these tests pass on the real hardware. In IoT devices, OTA firmware update is one of the most important features to get right (if not the most important), so testing it automatically has been a big win for us.

HIL on a Budget

Designing a prototype HIL system does not need to be a huge up-front investment.

At Golioth, one engineer was able to build a prototype HIL within one week, due primarily to our simple hardware configuration (each tested board just needs power, serial, and WiFi connection) and GitHub Actions Self-Hosted Runners, which completely solves the infrastructure and DevOps part of the HIL system.

The prototype design looks something like this:

 

I’m using my personal Raspberry Pi as the self-hosted runner. GitHub makes it easy to register any kind of computer (Linux, MacOS, or Windows) as a self-hosted runner.

And here’s a pic of my HIL setup, tucked away in a corner and rarely touched or thought about:

My DIY HIL Setup

Next Time

That’s it for part 1! We’ve covered the “why” of HIL testing at Golioth and a little bit of the “how”.

In the next part, we’ll dive deeper into the steps we took to create the prototype HIL system, the software involved, and some of the lessons learned along the way.

 

Thanks to Michael for the photo of Tiger and Turtle

In this article, we showcase how to use Wireshark–an open source, free network analysis tool–to troubleshoot wireless mesh networks set up using OpenThread, Zephyr, and Golioth. The tooling shown here can also be used for other Thread-based devices, assuming you understand the layers of the network. We used these tools internally to help us when get Thread devices to connect with Golioth and take advantage of all of the features we have to offer IoT device makers.

Building Thread Networks

Golioth started building out Thread networks when several users approached us about their interest in creating Golioth-managed Thread devices. We created example projects to show our users how to create mesh networks of custom low-power sensors and connect them back to the internet. We benefit from the fact that Thread network devices are IPv6 devices (thanks to 6LoWPAN), and that they talk over the CoAP protocol, all of which aligns very well with Golioth capabilities. We showed this in our most recent blog post about custom Thread nodes connecting through an OpenThread Border Router (OTBR) back to Golioth and transmitting information that can be displayed anywhere on the web.

Hardware and firmware engineers can utilize the Golioth Zephyr SDK to implement a wide range of features on Thread nodes and interact with those nodes like any other internet-connected device. In the Golioth Red Demo showcased at a number of recent live events, we had nodes that could report back sensor data and react to stimulus from the cloud; future versions could also get firmware updates directly from the cloud.

As in any hardware and firmware development process, things didn’t always go according to plan. When we were troubleshooting our early Proof of Concept, we needed to check which part of the chain was not passing packets along. We broke out Wireshark to start sniffing packets and figured out that there was a mismatch in the number of bytes being sent (since fixed). We think this kind of pinpoint accuracy in troubleshooting is a tool that all our users should have in their toolbox.

Good Security is meant to slow you down

Golioth is secure by default, which means all packets going to our Cloud must be encrypted. Normally, this is a feature! You don’t want anyone with a packet sniffer to be able to see plain-text data. However, when you do want to see what’s inside a packet during troubleshooting, you need to make sure you have the keys to unlock everything. You also need to make sure you have the tools properly configured for the various layers involved in Thread networks. These will be the steps we review below and in the video.

Setting up a sniffer

In order to use Wireshark to troubleshoot a Thread network, you’ll need the following:

Pretty simple!

The first step is getting the tools onto the dongle. Much like the dongle was used as a Radio Co-Processor on the Open Thread Border Router, we’ll be using a different set of firmware to sniff radio packets and hand them over USB to the computer. This firmware is specific to 802.15.4, which is the Physical and MAC layer used by Thread. Download the firmware from Nordic Semiconductor and load it onto your dongle and you’re ready to go!

Next you need to be able to interact with the output of the dongle. This includes installing a python script in Wireshark that is located in the Nordic Semiconductor repository. Around the 2:15 mark in the video, Mike shows where and how to install this in Wireshark.

Configuration Settings

Other important parts of the video include things like:

  • 3:45Choosing the correct 802.15.4 channel (channel 15 for most Golioth examples)
  • 5:00Network settings for Wireshark to capture Thread network traffic
  • 7:15Adding a Pre-Shared Key (PSK) to decrypt DTLS packets

Once those configuration steps are done, you can view wireless traffic coming from your Thread node, through each of the layers, up to the Golioth servers, onto specified endpoints like /logs. In the decoded payload area (bottom window), you can also see the messages that are actually being sent, in this example a log message saying “starting connect”.

Tools for when you need them

Wireshark and plugins developed by the community make for a powerful set of tools for troubleshooting network problems. We hope that our examples and tutorials allow you to quickly deploy a Thread network and build out custom Thread nodes; but when you need a bit more insight or are looking to try something new, Wireshark can help.

We’re here to help too! You can reach us on our Discord or Forum, and can always reach us at [email protected].

TL;DR: we’ve enabled people to compile Zephyr programs from a computer with no toolchain installed, almost instantly.

Part of our charter at Golioth is to help people prototype and scale IoT devices faster. That’s why we offer an open source SDK built on top of Zephyr. We think this represents a “fast forward” or “cheat code” for quickly standing up an IoT device prototype. On the cloud side, our servers represent hundreds of hours of customization and testing; you can instantly connect and get access to resources that allow hardware and firmware developers to scale to thousands or millions of devices. But sometimes it can be scary to get started in a new ecosystem or Real Time Operating System (RTOS) like Zephyr, even if it will speed things up later. As such, we do public and private training for companies and individuals.

As part of the resources we offer, we maintain a Training site that walks people through how to get started using Zephyr, normally targeting remote training. You can follow along right now; you’ll need to purchase an Adafruit MagTag board and sign up for a free Dev Tier account, but everything else is covered on the training site. At the end of the training, you should understand how to interact with hardware in Zephyr and send data to and from the Golioth cloud over WiFi. It’s a short jump from there to re-target other hardware, including your custom designs.

The tripping points for the training often revolve around the installation process. This is multi-pronged:

  • The size of a Zephyr install is relatively large, even when you are only targeting a specific platform. Having multiple people in a room, even with good WiFi or network connectivity, means that the shared bandwidth will be a limiting factor. More trainees means slower downloads.
  • Everyone comes to training with a computer in a different state. They might have tried to install Zephyr tools in the past, or they might have a particularly rare Linux distro, or many other possible variations. It would be best if everyone showed up with a fresh OS install…but that is very unrealistic.
  • There are different expectations around how installations should go. Many embedded engineers are “Windows first” and expect a complete IDE for any new platform. Some silicon vendors help to support this in Zephyr, such as Nordic Semiconductor. But Zephyr was originally targeting Linux-based machines, and we have found the smoothest flow for installing tools for all of the platforms that Zephyr can target means you are Linux-first.

In this article, we’re going to talk about our attempt to normalize setups and have pre-installed tools using Kasm and Docker. These are not the only tools in this space; we have previously written about GitPod and are investigating GitHub Codespaces, but this is a look at one of the latest experiments we’re running at Golioth.

Kasm thin client

The concept of a browser based client or a “thin client” is nothing new. They were all the rage back in the day of time share servers (really those were “dumb terminals”) and then again in the 90s as computing was more ubiquitous throughout the office (with a centralized set of servers). The difference is that now things are much more graphical and running completely inside the browser.

Kasm was started in 2017 and includes an open source project run by Kasm Technologies. The company behind Kasm has a per seat licensing model or they will run the servers directly for you (once you’re past 5 trial seats). They specialize in visualizations around containers. Once you log into a Kasm server, you are able to launch a range of containers, normally a desktop view or a single app that will load up in your browser. You can try this for yourself on the Kasm demo page.

The server that we’re running on is a pre-configured image that I pulled from the Digital Ocean marketplace. I was able to install all of the required software on a provisioned server running in some unknown datacenter. All I did was log in the first time to get my credentials for a user and an admin, and the rest of my interaction was on the web interface that the Kasm server presents to me as an admin.

Docker

As a hardware engineer, Docker is one of those things I heard about for a long time and never really “got it”. I’m still not sure I do. But following the tutorial for customizing a Kasm container, I started to understand a bit more. In that set of tutorials I started from a base Operating System image (Ubuntu Focal) that allowed visualization through the browser. Then I was able to start customizing, adding things like custom files on the desktop, custom icons to launch programs I installed, or adding background images pulled in from the web. It was in this customization section that I could add all of the commands from the Golioth Docs for installing Zephyr tools.

My layman explanation of Docker would be “Creating a virtual computer where I can automatically install a bunch of software using shell scripts. Once I have built that virtual computer, I am able to use it over and over again, including different instances of that virtual computer (for this Kasm scenario)”. The analogy would be if I bought a bunch of laptops, had an install CD (remember those?) with all of the required software on them, and then I mailed the freshly installed laptop to everyone who is taking our training. Sound crazy? That’s one of the best solutions we have seen, where a trainer will bring a pelican case with 24 laptops freshly imaged to on-site training. Their training works flawlessly every time!

I don’t have much else to mention about Docker aside from the idea that it’s possible to script a bunch of install commands that match the install instructions we have on our Zephyr getting started guide. In fact, I used those very directions to build the container shown in the video above. So all I’m doing in this case is automating the install process, doing it once, and then deploying the container (with all of the software and dependencies installed) over and over again for different users.

Challenges

We don’t think this is the ultimate solution for our training, so much as an experiment that showcases what we can do with containerized solutions. There are some remaining challenges, and we would love to have some help from our community.

Loading firmware onto the device

Currently our plan (as shown in the video) is to have our users/trainees pull the final built binary to their local computer to run it on the device like the MagTag. This echoes the way the mbed online compiler worked.

If there is a bootloader and a USB to serial connection, it’s possible to directly load onto the embedded device. In the case of some Espressif boards, this would be something like having ESPtool.py installed locally on your machine. There are an increasing amount of tools that make this process easier, such as an ESP tool that allows you to load firmware using WebUSB. Certain specialized bootloaders like the one that comes default on the MagTag loads UF2 files. When the MagTag is plugged in over USB and a sequence of buttons are hit, the device shows up as a mass storage drive. You drop a UF2 formatted binary–which is just an alternative form of compiled format–onto the drive and the device reboots and starts running the code.

If it’s a board without a bootloader, the user would need to have a debugger and local tools to communicate with that debugger, such as a JLink device and JFlash software. This means they would still need some OS specific loader tools to get the binary into the embedded device. The user would not be able to take advantage of the built-in tools in west that allow direct loading onto the device.

You steppin’?

If you would like to do debugging instead of “printf/printk” debugging, you simply need to download a different file from the container. If you download the zephyr.elf file instead of the zephyr.bin file, you can load it into a 3rd party debugger like Segger Ozone (made by the same company as the JLink). We have done some experiments with this in the past, including also analyzing where the device is spending its time using SystemView. This would once again require installing local programs that could talk over the USB port to something like a JLink.

Experimental port forwarding and WebUSB

Some GDB debuggers/servers will host the control of the debugger over a port on the machine’s localhost. We have some experiments we’re trying where we forward this port to the container so we could directly run a debugger from a software debugger inside the container.

We have also heard some whispers of a WebUSB implementation that can tunnel to the container. So we could plug in a board on our host machine (ie. my laptop) and connect to it over WebUSB, and then forward all information along to the container machine (ie. the browser based desktop running on the Kasm server).

We would love to hear about other projects that are trying this.

Shared resources

The final challenge we are dealing with is the fact that we’re basically “renting” a computer to do exactly what we could be doing with the host machine sitting right in front of us. Most developers have access to very powerful machines and we are instead using the resources on a remote machine (the Kasm server). The cost of standardization is the cost of renting server time for each person in the workshop. It might be worth it, but it is a constraint and a challenge.

Containers are another tool

Anyone reading this with a web background is likely thinking, “Yeah, containers, cool, 2010 called and wants their headline back”. But we are excited about it because these tools are finally making their way into the historically sluggish embedded industry. While our use case of containers is mostly around zero-install-time training, others are using containers to automate their testing and implementing best software engineering practices for the range of devices they have on their desk or in the field.

We’d love to hear how you think we can improve our training and make it easier for you to learn more about Golioth, Zephyr, and building code instantly. Check out our forums, our Discord, ping us on Twitter, or send us an email at [email protected]

We are getting ready for trade-show season here at Golioth. We will be at the Zephyr Developer Summit next week and Embedded World later in the month. We’re excited to be able to show off some of our capabilities live and in person!

Showcasing data in a cogent way is a key component to IoT deployments. As a hardware developer, the “atoms” (hardware implementation) are the fun part to work on, but the “bits” (data implementation) are how you get the message across. This isn’t just for trade shows. Your clients, co-workers, and customers are all going to want to see what kind of useful data is being generated by your IoT system.

Today, let’s dig into how you might want to showcase “real-time” versus “historic” data. We are using Grafana for our data dashboard, and there are some interesting differences in how you configure and show the data to users.

Configuration Differences

We have written about and showcased Grafana before. It’s an “observability platform” that helps chart data. Normally it targets monitoring infrastructure like servers and cloud implementations, but it’s also possible to use Grafana for IoT systems. We like it because there is a hosted cloud version (which we use for this demo), and it can also be deployed locally in a container.

We are interacting with Golioth data in two different ways: using WebSockets for Real-time data and using our REST API for Historical data. For the former, we actually wrote and published a WebSockets plugin for Grafana. This allows us to monitor when there are changes to variables, both on LightDB State and LightDB Stream. For more information on getting this set up, Mike has written an in-depth tutorial on how to hook up a sensor and get it transmitting data using WebSockets.

In my case, I’m monitoring temperature data on an OpenThread demo that we are showcasing.  There are 3 different devices transmitting their sensor data back to Golioth. Once that data is in the Golioth cloud, I have two different “data sources” set up in Grafana to extract that data for later viewing.

Real-time data via WebSockets plugin:

Historical data via JSON API plugin:

Why the data source difference?

Now that we have covered some of “how”, let’s talk about the “why” of showing real-time vs historical data. Can’t we just use the same data source for both?

In short, no. Or you could, but it will make things difficult.

The key thing to think about is how much data we’re trying to show for each. In the image below, we’re showing the most recent value and when it was update in the two left panes. This is the “real-time” component. All you need to gather is the latest data point.

For the “historic” chart, we actually want to go backwards in time and fetch a range of values. The line chart’s legend also shows “last” value for each of the 3 sensors, but note that they don’t match the left side dials. That’s because the line chart shows the “last” reading from when the chart was fetched, which happens every 15 minutes. The dials themselves display the most recent reading, live streamed to the dashboard.

In the data source configuration screenshots, note that the two endpoints on Golioth are different. In the case of the JSON API plugin talking to the REST API, we are actually calling out the project name. When we do a POST to the LightDB Stream endpoint (/stream), we tell the API which data range we want to go fetch. This allows us to gather all data points within a pre-defined range. This is the basis of the “historical” measurement.

If we were trying to do gather the same data with the WebSocket (charting data on a time-series graph), we would only be able see the data on the graph for as long as we were capturing output from the WebSocket. For instance, if we reload the page, the chart would no longer have any data points on it, until we start capturing newly generated data points. There is no capability to go and pull historic data from the WebSocket endpoint.

The JSON API data source has an additional element that is helpful for charting time-series data on a graph: it allows grouping data by one of the columns, in our case the device name. Below I have also added a label to Sensor 1 and Sensor 2, with the last sensor not yet labeled.

This is something that could be implemented in the WebSocket data source in the future, but it doesn’t do that currently. Instead, on the “dial” and “stat” panels shown above, we ingest all data coming from the WebSocket (representing all LightDB Stream data from our project, from all devices) and then filter that data to only show the specific device we want represented in the visualization:

Setting up your next project

There are a wide variety of ways you can set up your projects to showcase the data being generated by devices in the field. It’s good to start by asking yourself: what do you want out of the data?

Do you want to be alerted to the most recent data point? You might want something like the dial shown above, including mapping colors around expected values. You can also set up alerts in Grafana to trigger as you move past certain thresholds.

Are you looking to see how things are changing over time? That might make more sense to target a “historical” view, like a line chart pulling from the Golioth REST API. This allows you to understand what your device has experienced in the past and potentially take action based on one-time events, such as a spike in data.

Or maybe you care more about how things are trending? You can always gather multiple data points and then apply statistical methods to them. This might help smooth out spikes in data and only show the median or mean of data over a defined time period. Applying statistical methods could work for either the Real-time or Historical data, but would have the same caveat of only capturing the data you are actively observing via the WebSocket endpoints.

However you decide to slice and dice your data, it’s important to understand the a capabilities of outputs from the Golioth platform and your visualization platform. We will continue to publish about our own dashboards and other ways we find to help our users visualize their data.

Photo by Brett Sayles from Pexels

One small step for debug

If you are getting started in Zephyr, you can get a lot done using the serial/terminal output. I normally refer to this as “printf debugging”. With Zephyr it would be “printk debugging” because of the difference in commands to print to the serial output (or to a remote logging service like Golioth). Honestly, this method works great for example code, including many of our tutorials.

In addition to Zephyr’s ad hoc nature as a package management platform for embedded software, it is also a Real Time Operating System (RTOS). We use Zephyr’s package management as a starting point: we want users to be able to bootstrap a solution by downloading toolchains and vendor libraries. We don’t dig into the operating system very often on this blog or in our tutorials. Much like printk debugging, the details aren’t really needed when getting started with Zephyr and Golioth. But when you begin to dig deeper, you will be kicking off your own threads and workers, and utilizing other features of the RTOS like semaphores and queues. Once you’re doing that, I can all but guarantee you’ll want more visibility into what’s happening in your system.

This article showcases SEGGER Ozone and SystemView tools, which will help you peek inside. It also adds a few pieces to getting started with these platforms that I found lacking when searching for answers on the broader internet.

Saving battery budgets

My motivator to dig deeper on these systems is getting ready for the upcoming conference season. We will be at the Zephyr Developer Summit and Embedded World representing Golioth. We want to showcase our technology, including our capabilities as a Device Management solution for Thread-based device networks. Our demo of Zephyr, OpenThread, and Golioth runs on a battery-based device, which isn’t something we normally do. Most of our demos expect you’ll be powering your platform using a USB cable. When you start to care about power draw, you start to care about where your program is spending its time. Understanding whether a device is in sleep mode and how long it spends processing a piece of data is critical to optimizing for battery life. Since I don’t want to lug along an entire suitcase of batteries with me to the conference, I started wondering where we’re hanging out in the various threads of Zephyr. This is where a debugger and a real-time process recorder come in.

Tooling up for debugging

So I know I need a debugger.

My experience as a hardware engineer is that silicon vendors normally have a dedicated path for their code examples, Real Time Operating Systems (RTOSes), and Integrated Development Environments (IDEs). If I’m being honest, that was the path I took in the past: it was a low friction way to get something blinking or talking back to the network. Going outside of that path to use Zephyr means the tooling is more DIY. Even some vendors that provide support for Zephyr as their primary or secondary solution don’t have a “one way” of doing things. The fact that Zephyr is flexible is both a blessing and a curse. I can implement anything I’d like! But I need to go figure it out.

SEGGER Ozone

SEGGER are the makers of the popular J-Link programmer, in addition to a wide suite of software tools for embedded developers. I was interested in Ozone because of the open nature of their debugger, completely decoupled from any IDE or vendor toolchain.

I was excited to see a webinar from our friends at NXP talking about using SEGGER Ozone with Zephyr (registration required). The webinar showcases using a Zephyr sample called “Synchronize”, which is available at <zephyr_install_directory>/zephyr/samples/synchronize.

The basic idea of the sample is you are sharing a semaphore between two threads. It’s like passing a ball back and forth. Once the loop for one thread runs, it release the semaphore and the other thread can pick it up and use it. Each thread is effectively running the same code, it just only does so when the thread and semaphore line up properly.

the main function of main.c on the synchronize sample (with a small modification)

The net result is that you can see the threads ping-pong-ing back and forth on a debugger. See the NXP link above for a video example of this in action.

Using Ozone with Zephyr

One thing that wasn’t clear to me from the NXP webinar is getting everything set up. This was the genesis of this article. I wanted to put the required steps in one place.

Requirements:

  • J-Link programmer
  • Development board with SWD or JTAG access
  • Compatible chipset/board in the Zephyr ecosysttem (we will be showing the mimxrt1060_evkb below)
  • SEGGER Ozone installed on your machine. You can download and install the program from this page.

Step 1: Compile the program

The first step is to compile the project at <zephyr_install_directory>/zephyr/samples/synchronize with some added settings in the prj.conf file.

You will need to have the Zephyr toolchains installed. For our example, I will compiling for the NXP RT1060 EVKB board, which means I need to include the NXP Hardware Abstraction Layer (HAL). If you’re a regular Golioth user, this is not installed by default (but will be soon). Instead, I recommend you install Zephyr directly from the tip of main or start from a “vanilla” Zephyr install already on your machine. Start a virtual environment if you have one (or prefer one) and then run the following:

mkdir ~/RTOS_test
cd ~/RTOS_test
west init
west update

This will be an entire Zephyr default install and will take a bit to download/install. We’re showing this for the RT1060 board but this should work on almost any board in the main Zephyr tree, including virtual devices.

cd ~/RTOS_test/zephyr/
nano samples/synchronize/prj.conf

Add the following to the sample code, if it’s not already there. This will allow Ozone to understand some of the threads in the program.

# enable to use thread names
CONFIG_THREAD_NAME=y
CONFIG_SCHED_CPU_MASK=y
CONFIG_THREAD_ANALYZER=y

Finally, build the code:

west build -b mimxrt1060_evk samples/synchronization/ -p    #you can swap this out for another board
west flash

This loads the binary file (zephyr.bin) onto your board.

Step 2: Load the ELF into SEGGER Ozone

Normally it’s the “binary” version of your program that is loaded onto the board. To use a debugger, we want something called an ELF File instead, which stands for “Executable and Linkable Format”. I think of it as an annotated version of your binary, because it includes the source files and all of the references as you go through your program.

Start a new project using the New Project Wizard, walking through the various dialogs:

If it’s not already selected, choose your processor (in the case of the mimxrt1060_evkb, the part is actually the rt1062)

Choose your J-Link (required for SEGGER Ozone). On my board it uses Single Wire Debug (SWD) but some boards might use JTAG.

Load the ELF file from your build directory. This will be located at  <zephyr_install_directory>/zephyr/build/zephyr, using the instructions above.

The most critical piece!

I wanted to call this out because it took me so long to find how to enable the “thread aware” debugging part of Ozone. You need to run the following command in the Console:

Project.SetOSPlugin ("ZephyrPlugin")

This tells SEGGER to run a built-in script and enable a new window in the “View” menu. Select the newly available “Zephyr” option or hit Alt + Shift + O to enable it. You should now see a new window pop up on your screen.

This window shows the two threads that are available in the “Synchronize” program.

I set a breakpoint on the printk command that is writing to the terminal (click the gray button next to the line where you want to set a breakpoint). Then I start debugging from the menu Debug -> Start Debug Session -> Attach to Running Program. This should start the debugger and then halt where you set a breakpoint:

Click the Resume button or hit F5 and you will see the Zephyr window switching between Thread A and Thread B.

SEGGER SystemView

SystemView is something I first became aware of in Brian Amos’s book “Hands-On RTOS with Microcontrollers”. I was reading it to learn more about the pieces of Real Time Operating Systems and he uses SystemView to help analyze where an RTOS is spending the majority of time. This is critical because operating systems rely on the concept of a “scheduler”, which relinquishes control over precisely what is happening when in a program.

SystemView is a separate piece of software from Ozone and is licensed differently. It is free to use as a trial, but extended usage by commercial operations will need to purchase a license. You can download the software from SEGGER for trial usage.

Using SystemView with Zephyr

There are some additional steps required to get a program working with SystemView on Zephyr.

The most critical piece(s)!

There are two critical pieces to get a Zephyr program running with SystemView:

  1. You must be doing your logging using RTT.
    • Using only UART logging of messages will not work. SystemView requires an “RTT Control Block” in your code and if it’s not there, SystemView will timeout while trying to capture events.
    • The message I kept receiving was “Could not find SystemView Buffer”.
  2. You must log traces using RAM instead of UART (default)
    • This allows the debugger to extract trace information from memory. Some other OSes can pull in UART trace messages but this is not enabled on Zephyr yet.

You can enable RTT and other required settings in the prj.conf file (these can also be set through Zephyr’s menuconfig):

CONFIG_THREAD_NAME=y
CONFIG_SEGGER_SYSTEMVIEW=y
CONFIG_USE_SEGGER_RTT=y  #see point 1 above
CONFIG_TRACING=y
CONFIG_TRACING_BACKEND_RAM=y  # see point 2 above

Recompile the program and flash to your board. You should now be able to open SystemView and get started. Upon opening the program, you’ll need to configure for your J-Link:

And your board settings:

Finally when you hit the “Play” button (green arrow) or hit F5, it should start to capture events on your device.

As you can see below in the “Timeline” window, control is bouncing back and forth between “Thread A” and “Thread B”.

Using Ozone and SystemView together

These are two different tools using the same interface. The cool thing is that you can use them together at the same time. This is especially useful because SystemView will capture all events, which can quickly become overwhelming. You might instead only want to see a small subset of events. You can set a breakpoint in Ozone, start recording in SystemView, and then get a targeted look at the program execution right where the breakpoint is happening. You can target smaller subsections of your code to really pinpoint and optimize your functions.

Giant Leaps in Debugging

These are just some of the tools that will help to give you more insight into your Zephyr programs as you dig deeper into the ecosystem and the Golioth Zephyr SDK. Once you start adding more capabilities, you will be able to visualize the finest details of what is happening and develop better software for your customers.

If you need help getting started with the tools described here, you can always join us on the Golioth Discord or check out the Golioth Forums for assistance. Happy debugging!

Devices that connect to the Golioth Cloud communicate securely thanks to a pre-shared key (PSK) that encrypts all messages. But how do you get a unique set of credentials onto every device? The options for provisioning your devices just got a whole lot more interesting thanks to some new Golioth features. There are now two ways to set the credentials using either the device shell, or our command line tool that also fetches those credentials automatically!

Zephyr settings using the device shell

Zephyr has a shell option that runs on the device itself. This is great for things like network or i2c debugging (there are specialized commands for both of those and much more). You send your credentials over a serial connection and Golioth leverages the Zephyr settings subsystem to store the device credentials (PSK-ID and PSK) in flash memory.

As part of the getting started guide, you already set up a device in the Golioth Console. Use the Devices sidebar option to find that device again (or create a new one) and click on the Credentials tab to access your PSK-ID and PSK:

Golioth Device Credentals

Now we need some code to run on the device. The Golioth settings example already has this feature built in. For this example I’m using a Nordic nRF9160dk. You can build and flash the example right away. Normally we’d put credentials into the prj.conf file first, but this time we’ll just assign those from the shell!

cd ~/golioth-ncs-workspace/modules/lib/golioth/
west build -b nrf9160dk_nrf9160_ns samples/settings/
west flash

Once programming has completed, load up your serial terminal tool of choice. I like to use minicom -D /dev/ttyACM0 but you should have the same success with screen /dev/ttyACM0 115200 or any other similar tools.

  • You’ll be greeted by the uart:~$ command prompt
  • The syntax that we need is settings
    • PSK-ID needs to be assigned to golioth/psk-id
    • PSK needs to be assigned to golioth/psk
  • Reboot the device after changing the settings

Here’s what that looks like in action (important lines highlighted):

uart:~$ *** Booting Zephyr OS build v2.6.99-ncs1-1 ***
[00:00:00.209,716] <inf> golioth_system: Initializing
[00:00:00.216,278] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.216,278] <inf> fs_nvs: alloc wra: 0, fa8
[00:00:00.216,278] <inf> fs_nvs: data wra: 0, 90
uart:~$ settings set golioth/psk-id nrf91-settings-demo-id@blog-demo
Setting golioth/psk-id to nrf91-settings-demo-id@blog-demo
Setting golioth/psk-id saved as nrf91-settings-demo-id@blog-demo
uart:~$ settings set golioth/psk my_complex_password
Setting golioth/psk to my_complex_password
Setting golioth/psk saved as my_complex_password
uart:~$ kernel reboot cold

After rebooting, the board connects to a cell tower and the connection to Golioth is successfully established!!

uart:~$ *** Booting Zephyr OS build v2.6.99-ncs1-1 ***
[00:00:00.215,850] <inf> golioth_system: Initializing
[00:00:00.222,381] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.222,412] <inf> fs_nvs: alloc wra: 0, fa8
[00:00:00.222,412] <inf> fs_nvs: data wra: 0, 90
[00:01:06.672,241] <dbg> golioth_hello.main: Start Hello sample
[00:01:06.672,485] <inf> golioth_hello: Sending hello! 0
[00:01:06.673,004] <wrn> golioth_hello: Failed to send hello!
[00:01:06.673,095] <inf> golioth_system: Starting connect
[00:01:06.967,102] <inf> golioth_system: Client connected!
[00:01:11.673,065] <inf> golioth_hello: Sending hello! 1
[00:01:16.674,316] <inf> golioth_hello: Sending hello! 2

Golioth Credentials automatically set from the command line

What if I told you that a one-line command could look up your device credentials from the Golioth Cloud and automatically send them to the device? This is literally the feature we’ve implemented. Now, I’m excited about the shell settings above, but this new command line feature is absolutely legendary!

Golioth device name

  1. Look up your device name from the Golioth Console
  2. Issue the command, using your device name and the correct port:
    1. goliothctl device config --name --port

Here’s what it looks like in action:

$ goliothctl device config --name nrf91-settings-demo --port /dev/ttyACM0
failed to get golioth/psk-id from device: setting not found
device success setting golioth/psk-id saved as nrf91-settings-demo-id@blog-demo
failed to get golioth/psk from device: setting not found
device success setting golioth/psk saved as my_complex_password
closing serial read

And check this out, it’s a quick way to make sure you have the device credentials correct. Since you’re not copy/pasting or typing the credentials, you know you have it right as long as you get the name of the device right. Running the command a second time confirms those settings are correct:

$ goliothctl device config --name nrf91-settings-demo --port /dev/ttyACM0
golioth/psk-id in the device is already set to nrf91-settings-demo-id@blog-demo
golioth/psk in the device is already set to my_complex_password
closing serial read

Visions of End Users and Bulk Provisioning

Two really easy ways to see the new features put to use are end users and manufacturing. Imaging sending devices with “stock” firmware out to customers and having them add their own credentials (we have a snazzy web-based demo in the works so stay tuned). The other thought is toward bulk-provisioning where a script can be used to register the new device on Golioth, generate credentials, and send them to the device all in the same step.

We’d love to heard about your experiences with these new tools. Catch up with us on the Golioth Discord so we can have a chat!

USB is one of the most important computer interfaces for an embedded developer. It’s how we flash, test and debug our hardware. It is supported on the 3 major desktop operating systems allowing developers to use the OS they are most productive in. Yet, despite having “universal” in its name, USB support isn’t the same everywhere. A device programmer may only have a Windows binary, while a compiler may only work in Linux.

When Microsoft released Windows Subsystem for Linux (WSL), Windows developers were excited about the potential of using both Windows and Linux tools from one machine. At launch, it mostly delivered on that promise with one big caveat: no USB support.

Of course there were hacks with copy/pasting and custom network drivers, but none of them worked reliably. But late last year Microsoft announced that they’ve added USB support to WSL! I was certainly keen to try it out and over the past few months I’ve been testing it as part of my Windows development workflow. It’s now in a state where I can recommend for use to other developers, though there are a few gotchas to make note of (we’ll discuss that later.) Today I will walk you through the basic steps of programming an embedded device over USB using WSL.

New to Golioth? Sign up for our newsletter to keep learning more about IoT development or create your free Golioth account to start building now.

What you need to get started

Windows 10 or 11

Obviously to do development on Windows…you need a copy of Windows. What’s less obvious is that WSL & USB are both supported by Windows 10. You just need Windows 10 version 2004 or the latest build of Windows 11.

WSL

There are a few ways to install WSL and you may have WSL already installed on your machine. But before we can get to using USB we need to check which Linux kernel is currently installed. It needs to be a kernel version of 5.10.60.1 or later. You can check by running uname -a.

If you installed WSL using the command line (see docs), then you may have a recent-enough kernel via Windows Update. However, I recommend installing Windows Subsystem for Linux Preview from the Microsoft Store. It’ll keep your WSL kernel up to date without having to wait for a service pack.

You also need a Linux distribution and you can use any distribution you like from the Microsoft Store. I prefer to keep things easy and use Ubuntu.

usbipd-win

This is what makes all the magic happen. The instructions are pretty straightforward. It’s mostly a matter of installing the usbipd-win program on the Windows side and a little bit of configuration on the Linux side.

(If you’re curious how this works, make sure to check out the underlying USB/IP protocol. It’s been part of the Linux kernel for years!)

Windows Terminal (optional)

Strictly not needed to work with WSL or USB, Windows Terminal is a much more modern terminal than what comes stock with Windows or PowerShell. It also offers a tabbed interface which is especially useful when needing to switch between Windows and WSL while managing your USB devices.

Using USB from WSL

With the dependencies out of the way, we can start interacting with a USB device.

Attaching a device to WSL from Windows

You’ll need to get comfortable with usbipd as it’s how we’ll enable USB to be used with WSL. The basic steps are:

  1. Launch Windows Terminal as an Administrator
  2. Run usbipd wsl list to list all the devices on your machine, noting the BUS-ID for the device you want to use
  3. Run usbipd wsl attach --busid={BUS-ID} to use the device from WSL
  4. Run usbipd wsl list again to verify that the device is now being used by WSL

Accessing a USB device from WSL

At this point you have a real USB device in Linux. To quickly verify that everything works:

  1. Open Ubuntu (doesn’t need to be as an Administrator)
  2. Run lsusb and note the USB device
  3. Start interacting with the device like you normally would

I followed the Golioth Getting Started Guide for the ESP32 using the Linux instructions and it worked like a treat. But I’ve also tested the mainline Zephyr, Arduino CLI & an STM32 with dfu-util.

Running the west flash command in WSL2

What to keep in mind

If you’re new to Linux, you’ll need to get a little familiar with tools like udev and how to find your device paths, but there are a ton of resources online. Some of the devices I tried already had the udev rules defined in my release of OpenOCD but one didn’t. It took me awhile to figure out the ancient incantation to make udev happy. Usually If you’re using an open source tool they provide pretty good documentation for usage on Linux, so check the manual.

One challenge with this approach is that usbipd tries to re-attach a device if it reboots. This can happen when you flash firmware. It doesn’t always recover, especially if the Bus ID changes, so you may need to re-attach the device. It’s a little annoying if you forget, so make sure you keep that admin console open.

Lastly, while many USB, USB/Serial & HID devices are included with the current WSL kernel, some may not be by default. Leave a comment on this issue if you want to request additional drivers.

USB all the things!

I’m grateful that Microsoft has added support for USB in WSL. It’s not only helpful for Golioth developers but anyone else doing embedded development on Windows. I didn’t go into every bit of detail in this post so if you have any questions or run into issues, feel free to reach out to me on Twitter or join our Discord.