This is the second part of How Golioth uses Hardware-in-the-Loop (HIL) Testing. In the first part, I covered what HIL testing is, why we use it at Golioth, and provided a high-level view of the system.

In this post, I’ll talk about each step we took to set up our HIL. It’s my belief that HIL testing should be a part of any firmware testing strategy, and my goal with this post is to show you that it’s not so difficult to get a minimum viable HIL set up within one week.

The Big Pieces

To recap, this is the system block diagram presented at the end of part 1:

The Raspberry Pi acts as a GitHub Actions Self-Hosted Runner which communicates with microcontroller devices via USB-to-serial adapters.

The major software pieces required are:

  • Firmware with Built-In-Test Capabilities – Runs tests on the device (e.g. ESP32-S3-DevKitC), and reports pass/fail for each test over serial.
  • Python Test Runner Script – Initiates tests and checks serial output from device. Runs on the self-hosted runner (e.g. Raspberry Pi).
  • GitHub Actions Workflow – Initiates the python test runner and reports results to GitHub. Runs on the self-hosted runner.

Firmware Built-In-Test

To test our SDK, we use a special build of firmware which exposes commands over a shell/CLI interface. There is a command named built_in_test which runs a suite of tests in firmware:

esp32> built_in_test
...
12 Tests 0 Failures 0 Ignored 
../main/app_main.c:371:test_connects_to_wifi:PASS
../main/app_main.c:378:test_golioth_client_create:PASS
../main/app_main.c:379:test_connects_to_golioth:PASS
../main/app_main.c:381:test_lightdb_set_get_sync:PASS
../main/app_main.c:382:test_lightdb_set_get_async:PASS
../main/app_main.c:383:test_lightdb_observation:PASS
../main/app_main.c:384:test_golioth_client_heap_usage:PASS
../main/app_main.c:385:test_request_dropped_if_client_not_running:PASS
../main/app_main.c:386:test_lightdb_error_if_path_not_found:PASS
../main/app_main.c:387:test_request_timeout_if_packets_dropped:PASS
../main/app_main.c:388:test_client_task_stack_min_remaining:PASS
../main/app_main.c:389:test_client_destroy_and_no_memory_leaks:PASS

These tests cover major functionality of the SDK, including connecting to Golioth servers, and sending/receiving CoAP messages. There are also a couple of tests to verify that stack and heap usage are within allowed limits, and that there are no detected memory leaks.

Here’s an example of one of the tests that runs on the device, test_connects_to_golioth:

#include "unity.h"
#include "golioth.h"

static golioth_client_t _client;
static SemaphoreHandle_t _connected_sem;

static void on_client_event(golioth_client_t client, 
                            golioth_client_event_t event, 
                            void* arg) {
    if (event == GOLIOTH_CLIENT_EVENT_CONNECTED) {
        xSemaphoreGive(_connected_sem);
    }
}

static void test_connects_to_golioth(void) {
    TEST_ASSERT_NOT_NULL(_client);
    TEST_ASSERT_EQUAL(GOLIOTH_OK, golioth_client_start(_client));
    TEST_ASSERT_EQUAL(pdTRUE, xSemaphoreTake(_connected_sem, 10000 / portTICK_PERIOD_MS));
}

This test attempts to connect to Golioth servers. If it connects, a semaphore is given which the test waits on for up to 10 seconds (or else the test fails). For such a simple test, it covers a large portion of our SDK, including the network stack (WiFi/IP/UDP), security (DTLS), and CoAP Rx/Tx.

We use the Unity test framework for basic test execution and assertion macros.

If you’re curious about the details, you can see the full test code here: https://github.com/golioth/golioth-firmware-sdk/blob/main/examples/esp_idf/test/main/app_main.c

Python Test Runner

Another key piece of software is a Python script that runs on the self-hosted runner and communicates with the device over serial. We call this script verify.py. In this script, we provision the device with credentials, issue the built_in_test command, and verify the serial output to make sure all tests pass.

Here’s the main function of the script:

def main():
    port = sys.argv[1]

    # Connect to the device over serial and use the shell CLI to interact and run tests
    ser = serial.Serial(port, 115200, timeout=1)

    # Set WiFi and Golioth credentials over device shell CLI
    set_credentials(ser)
    reset(ser)

    # Run built in tests on the device and check output
    num_test_failures = run_built_in_tests(ser)
    reset(ser)

    if num_test_failures == 0:
        run_ota_test(ser)

    sys.exit(num_test_failures)

This opens the serial port and calls various helper functions to set credentials, reset the board, run built-in-tests, and run an OTA test. The exit code is the number of failed tests, so 0 means all passed.

The script spends most of its time in this helper function which looks for a string in a line of serial text:

def wait_for_str_in_line(ser, str, timeout_s=20, log=True):
    start_time = time()
    while True:
        line = ser.readline().decode('utf-8', errors='replace').replace("\r\n", "")
        if line != "" and log:
            print(line)
        if "CPU halted" in line:
            raise RuntimeError(line)
        if time() - start_time > timeout_s:
            raise RuntimeError('Timeout')
        if str in line:
            return line

The script will continually read lines from serial for up to timeout_s seconds until it finds what it’s looking for. And if “CPU halted” ever appears in the output, that means the board crashed, so a runtime error is raised.

This verify.py script is what gets invoked by the GitHub Actions Workflow file.

GitHub Actions Workflow

The workflow file is a YAML file that defines what will be executed by GitHub Actions. If one of the steps in the workflow returns a non-zero exit code, then the workflow will fail, which results in a red ❌ in CI.

Our workflow is divided into two jobs:

  • Build the firmware and upload build artifacts. This is executed on cloud servers, since building the firmware on a Raspberry Pi would take a prohibitively long time (adding precious minutes to CI time).
  • Download firmware build, flash the device, and run verify.py. This is executed on the self-hosted runner (my Raspberry Pi).

The workflow file is created in .github/workflows/test_esp32s3.yml. Here are the important parts of the file:

name: Test on Hardware

on:
  push:
    branches: [ main ]
  pull_request:

jobs:
  build_for_hw_test:
    runs-on: ubuntu-latest
    steps:
    ...
    - name: Build test project
      uses: espressif/esp-idf-ci-action@v1
      with:
        esp_idf_version: v4.4.1
        target: esp32s3
        path: 'examples/test'
    ...

  hw_flash_and_test:
    needs: build_for_hw_test
    runs-on: [self-hosted, has_esp32s3]
    ...
    - name: Flash and Verify Serial Output
      run: |
        cd examples/test
        python flash.py $CI_ESP32_PORT && python verify.py $CI_ESP32_PORT

The first job runs in a normal GitHub runner based on Ubuntu. For building the test firmware, Espressif provides a handy CI Action to do the heavy lifting.

The second job has a needs: build_for_hw_test requirement, meaning it must wait until the first job completes (i.e. firmware is built) before it can run.

The second job has two further requirements – it must run on self-hosted and there must be a label has_esp32s3. GitHub will dispatch the job to the first available self-hosted runner that meets the requirements.

If the workflow completes all jobs, and all steps return non-zero exit code, then everything has passed, and you get the green CI checkmark ✅.

Self-Hosted Runner Setup

With all of the major software pieces in place (FW test framework, Python runner script, and Actions workflow), the next step is to set up a machine as a self-hosted runner. Any machine capable of running one of the big 3 operating systems will do (Linux, Windows, MacOS).

It helps if the machine is always on and connected to the Internet, though that is not a requirement. GitHub will simply not dispatch jobs to any self-hosted runners that are not online.

Install System Dependencies

There are a few system dependencies that need to be installed first. On my Raspberry Pi, I had to install these:

sudo apt install \
    python3 python3-pip python3-dev python-dev \
    libffi-dev cmake direnv wget gcovr clang-format \
    libssl-dev git
pip3 install pyserial

Next, log into GitHub and create a new self-hosted runner for your repo (Settings -> Actions -> Runners -> New self-hosted runner):

When you add a new self-hosted runner, GitHub will provide some very simple instructions to install the Actions service on your machine:

When I ran the config.sh line, I added --name and --unattended arguments:

./config.sh --url <https://github.com/golioth/golioth-firmware-sdk> --token XXXXXXXXXXXXXXXXXXXXXXXXXXXXX --name nicks-raspberry-pi --unattended

Next, install the service, which ensures GitHub can communicate with the self-hosted runner even if it reboots:

cd actions-runner
sudo ./svc.sh install

If there are any special environment variables needed when executing workflows, these can be added to runsvc.sh. For instance, I defined a variable for the serial port of the ESP32:

# insert anything to setup env when running as a service
export CI_ESP32_PORT=/dev/ttyUSB0

Finally, you can start the service:

sudo ./svc.sh start

And now you should see your self-hosted runner as “Idle” in GitHub Actions, meaning it is ready to accept jobs:

I added two custom labels to my runner: has_esp32s3 and has_nrf52840dk, to indicate that I have those two dev boards attached to my self-hosted runner. The workflow file uses these labels so that jobs get dispatched to self-hosted runners only if they have the required board.

And that’s it! With those steps, a basic HIL is up and running, integrated into CI, and covering a large amount of firmware code on real hardware.

Security

There’s an important note from GitHub regarding security of self-hosted runners:

Warning: We recommend that you only use self-hosted runners with private repositories. This is because forks of your repository can potentially run dangerous code on your self-hosted runner machine by creating a pull request that executes the code in a workflow.

I also stand by that recommendation. But it’s not always feasible if you’re maintaining an open-source project. Our SDKs are public, so this warning certainly applies to us.

If you’re not careful, you could give everyone on the Internet root access to your machine! It wouldn’t be very hard for someone to submit a pull request that modifies the workflow file to run whatever arbitrary Linux commands they want.

We mitigate most of the security risks with three actions.

1. Require approval to run workflows for all outside contributors.

This means someone from the Golioth organization must click a button to run the workflow. This allows the repo maintainer time to review any workflow changes before allowing the workflow to run.

This option can be enabled in Settings -> Actions -> General:

Once enabled, pull requests will have a new “Approve and Run” button:

This is not ideal if you’re running a large open-source project with lots of outside contributors, but it is a reasonable compromise to make during early stages of a project.

2. Avoid logging sensitive information

The log output of any GitHub Actions runner is visible publicly. If there is any sensitive information printed out in the workflow steps or from the serial output of the device, then this information could easily fall into the wrong hands.

For our purposes, we disabled logging WiFi and Golioth credentials.

3. Utilize GitHub Actions Secrets

You can use GitHub Actions Secrets to store sensitive information that is required by the runner. In our repo, we define a secret named `GOLIOTH_API_KEY` that allows us access to the Golioth API, which is used to deploy OTA images of built firmware during the workflow run.

If you want to know more about self-hosted runner security, these resources were helpful to me:

Tips and Tricks

Here are a few extra ideas to utilize the full power of GitHub Actions Self-Hosted Runners (SHRs):

  • Use the same self-hosted runner in multiple repos. You can define SHRs at the organization level and allow their use in more than one repo. We do this with our SHRs so they can be used to run tests in the Golioth Zephyr SDK repo and the Golioth ESP-IDF SDK repo.
  • Use labels to control workflow dispatch. Custom labels can be applied to a runner to control when and how a workflow gets dispatched. As mentioned previously, we use these to enforce that the SHR has the dev board required by the test. We have a label for each dev board, and if the admin of the SHR machine has those dev boards attached via USB, then they’d add a label for each one. This means that machine will only receive workflows that are compatible with that machine. If a board is acting up or requires maintenance, the admin can simply disconnect the board and remove the label to prevent further workflows from executing.
  • Distributed self-hosted runners to load balance and minimize CI bottlenecks. If there is only one self-hosted runner, it can become a bottleneck if there are lots of jobs to run. Also, if that runner loses Internet, then CI is blocked indefinitely. To balance the load and create more robust HIL infrastructure, it’s recommended to utilize multiple self-hosted runners. Convince your co-workers to take their unused Intel NUC off the shelf and follow the steps in the “Self-Hosted Runner Setup” section to add another runner to the pool. This allows for HIL tests to run in a distributed way, no matter where in the world you are.

Even people who have been living under a rock for the past couple of years know that there’s a global chip shortage. The correct response from the engineering community should be changes to our design philosophy and that’s the topic of the talk that Chris Gammell and I gave at the 2022 Zephyr Developer Summit back in June. With the right hardware and firmware approach, it’s possible to design truly modular hardware to respond to supply chain woes.

Standardization enabled the industrial revolution, and the electronics field is a direct descendant of those principles. You can rest easy in knowing that myriad parts exist to match your chosen operating voltage and communication scheme. But generally speaking it’s still quite painful to change microcontrollers and other key-components mid-way through product design.

Today’s hardware landscape has uprooted the old ways of doing things. You can’t wait out a 52-week lead time, so plan for the worst and hope for the best. Our talk covers the design philosophies we’ve adopted to make hardware “swapability” possible, while using Zephyr RTOS to manage the firmware side of things.

Meet the Golioth Aludel

Golioth exists to make IoT development straightforward and scalable–that’s not possible if you’re locked into specific hardware, especially these days. So we designed and built a modular platform to showcase this flexibility to our customers.

Golioth Aludel prototyping platform internal view

Called Aludel, Chris Gammell centered the design around a readily available industrial enclosure. The core concept is a PCB base that accepts any microcontroller board that uses the Feather standard. This way, every pin-location and function is standardized. Alongside the Feather you’ll find two headers that use the Click boards standard (PDF), a footprint for expansions boards facilitating i2c, SPI, UART, and more. The base includes Qwiic and Stemma headers for additional sensor connectivity, as well as terminal blocks, room for a battery, and provisions for external power.

The key customization element for Aludel is the faceplate. It’s a circuit board that can deliver a beautiful custom design to match any customer request. But on the inside, each faceplate has the same interface using a flat cable connection to the base board. Current versions of the faceplate feature a a screen, an EEPROM to identify the board, and a port expander that drives buttons, LEDs, and whatever else you need for the hardware UI.

The beauty of this is that electrically, it all just works. Need an nRF52, nRF9160, ESP32, STM32, or RP2040? They all already exist in Feather form factor. Need to change the type of temperature sensor you’re using? Want to add RS485, CANbus, MODBUS, 4-20 ma communication, or some other connectivity protocol? The hardware is ready for it–at most you might need to spin your own adapter board.

Now skeptics may look at the size of this with one abnormally high-arched eyebrow. But remember, this is a proof-of-concept platform. You can set it up for your prototyping, then take the block elements and boil them down into your final design. Prematurely optimizing for space means you will have many many more board runs as the parts change.

When you do a production run and your chip is no longer available, the same concepts will quickly allow you to test replacements and respin your production PCB to accommodate the changes.

Zephyr Alleviates your Firmware Headaches

“But wait!” you cry, “won’t someone think of the firmware?”. Indeed, someone has thought of the firmware. The Linux Foundation shepherds an amazing Real-Time Operating System called Zephyr RTOS that focuses on interoperability across a vast array of processor architectures and families. Even better, it handles the networking stack and has a standardized device model that makes it possible to switch peripherals (ie: sensors) without your C code even noticing.

We use Zephyr to maintain one firmware repository for the Aludel hardware that can be compiled for STM32F40, nRF52840, nRF9160, and ESP32 using your choice of Ethernet, WiFi, or Cellular connections. How is that even possible?

&i2c0 {
	bme280@76 {
		compatible = "bosch,bme280";
		reg = <0x76>;
		label = "BME280_I2C";
	};
};

&spi1 {
	compatible = "nordic,nrf-spi";
	status = "okay";
	cs-gpios = <&gpio0 3 GPIO_ACTIVE_LOW>,
	           <&gpio0 27 GPIO_ACTIVE_LOW>,
	           <&gpio1 8 GPIO_ACTIVE_LOW>;
	test_spi_w5500: w5500@0 {
		compatible = "wiznet,w5500";
		label = "w5500";
		reg = <0x0>;
		spi-max-frequency = <10000000>;
		int-gpios = <&gpio0 2 GPIO_ACTIVE_LOW>;
		reset-gpios = <&gpio0 30 GPIO_ACTIVE_LOW>;
	};
};

/ {
    aliases {
        aludeli2c = &i2c0;
        pca9539int = &interrupt_pin0;
    };
    gpio_keys {
        compatible = "gpio-keys";
        interrupt_pin0: int_pin_0 {
			gpios = < &gpio0 26 GPIO_ACTIVE_LOW >;
			label = "Port Expander Interrupt Pin";
		};
	};
};

Zephyr uses the DeviceTree standard to map how all of the hardware is connected. A vast amount of work has already been done for you by the manufacturers who maintain drivers for their chips and supply .dts (DeviceTree Syntax) files that specify pin functions and mappings. When writing firmware for your projects you supply DeviceTree overlay files that indicate sensors, buttons, LEDs, and anything else connected to your design. The C code looks up each device to receive all relevant information like device address, GPIO port/pin assignment and function.

const struct device *weather_dev = DEVICE_DT_GET_ANY(bosch_bme280);
sensor_sample_fetch(weather_dev);
sensor_channel_get(weather_dev, SENSOR_CHAN_AMBIENT_TEMP, &temp);
sensor_channel_get(weather_dev, SENSOR_CHAN_PRESS, &press);
sensor_channel_get(weather_dev, SENSOR_CHAN_HUMIDITY, &humidity);

The hardware abstraction doesn’t stop there. Zephyr also specifies abstraction for sensors. For instance, an IMU will have an X, Y, and Z output — Zephyr makes accessing these the same even if you have to move to an IMU from a different manufacturer due to parts shortages. You update the overlay files for the new build, and use a configuration file to turn on any specific libraries for the change, but the code you’re maintaining can continue on as if nothing happened.

Of course there are caveats which we cover in the talk. There is no one-size-fits-all in embedded engineering so you will eventually need to define your own .dts files for custom boards, and add peripherals that are unsupported in Zephyr. This is not a deal breaker, there is a template example for adding boards (and I highly recommend watching Jared Wolff’s video on the subject). And since Zephyr is open source, your customizations can be contributed upstream so others may also benefit.

Hardware will never again be the same

Electronics manufacturing will eventually emerge from the current shortages. But there are no signs of this happening soon, and it’s been ongoing for two years. We shouldn’t be trying to return to “normal”, but planning a better future. That’s what Golioth is doing for the Internet of Things. Choosing an IoT management platform shouldn’t require you to commit to one type of hardware. Your fleets should evolve, and our talk outlines how they can evolve. Golioth helps to manage your diverse and growing hardware fleet. Golioth is designed to make it easy to keep track of and control all of that hardware, whether that’s 100 devices or 100,000. Take Golioth for a test drive today.

Today we’re announcing a new feature on the Golioth Console and on our Device SDKs that enables Remote Procedure Calls (RPCs) for all users on the platform. From the cloud, you can initiate a function on your constrained device in the field, ensure the device received and executed the command, and receive a response from the device back to the Cloud.

What is a Remote Procedure Call (RPC)?

A Remote Procedure Call allows you to call a function on a remote computing device and optionally receive a result. An easy way to think about it is you’re calling a function, like you would in any other program…you’re just doing it from another computer. In this case, you’re triggering actions from the Golioth Cloud.

RPC from the Golioth Cloud (Console)

Each device in your project has a page where you can view details about things like LightDB State, LightDB Stream, Settings, and now RPC. Our Console includes an interface to directly send RPCs to the remote device. The URL will look something like:

https://console.golioth.io/devices/<YOUR_DEVICE_ID>/management/rpc

In all of our examples, we are sending an RPC to single devices. However, they can also be triggered from the REST API. As a reminder, any function you see on the Golioth Console is available on the REST API.

One critical function of RPCs is a confirmation that the remote function has actually run. The device firmware needs to send back a success message that the function has completed, and optionally a returned value. When there is a problem connecting with your device and the RPC does not complete successfully, you will see a screen that looks like this:

An RPC sent to a device that was disconnected from Wi-Fi

Also note that round trip time is measured for all RPCs, including successful ones. Transit times will depend upon your connectivity medium, in addition to the processing time of the function on the remote device.

When an RPC successfully completes, you can click the button with 3 dots to receive the returned value. In the example and in the video, we were using a method called double that takes an integer input, multiplies it by two, and then returns the value to the Cloud. Below, you can see the result when we sent “double” method with a parameter of “37”.

RPC from the Device SDK perspective

Any new feature on Golioth has Device SDK support, in addition to the new APIs and UIs on our Web Console. Earlier this week, Nick wrote about how we test hardware and firmware at Golioth, especially when a new feature is released across the platform. Now that we support 3 SDKs (Zephyr, NCS, ESP-IDF), the testing area has increased.

In the video, the focus is on ESP-IDF, which has a simple way to set up and respond to new RPCs. First, we register the new method, so we’ll recognize the command coming from the Golioth Cloud:

The function that we tie to that newly registered RPC needs to return the RPC_OK variable for the Cloud to be alerted that the function has processed properly.

If you have satisfied these requirements in the ESP-IDF SDK and copied the format, you can customize logic to do whatever task you’d like on the remote device. Let’s look at some examples.

Use cases for an RPC

The double() example is a simple showcase of the minimum requirements to create an RPC in the ESP-IDF SDK. We send a command and a value, we return a modified value.

The remote_reset command we created and showcased in the video is more like a critical function you would want to add to your project. When you want to trigger a remote function like a reset, you want to ensure that the command was properly received, that the function executed, and then that there was output data that validated the reset has happened. In the final point, that includes inferring that the device has restarted from the log messages also being sent back to the Golioth Console. Put all together, it’s a reliable way to tell the device has been reset.

Other use cases could be as simple as sending arbitrary text to a display. You would still want to know the text has been received and properly sent to the physical display. Or perhaps you have a valve and you want to be able to send an arbitrary value to the valve, but you also want to take a reading on an encoder that measures the distance the valve has moved.

Many of these functions could also be achieved with LightDB State (which the RPC service is built upon), but the context for creating an RPC is more targeted at situations like the examples above.

What will you build?

RPCs are another way for you to communicate with your constrained IoT devices from the Golioth Cloud and to get useful information back from your devices. You can start testing out this feature today.

For more questions or assistance, check out our Forums, our Discord, or drop us a note at [email protected].

 

Here’s a scenario that might sound familiar.

You’ve added a new feature to the firmware, about 800 lines of code, along with some simple unit tests. All unit tests are passing. The code coverage report is higher than ever. You pat yourself on the back for a job well done.

But then – for a brief moment, conviction wavers as a voice in your head whispers: “The mocks aren’t real”; “Does this code really work?”; “You forgot to test OTA.”

But the moment passes, and your faith is restored as you admire the elegant green CI check mark ✅.

Do you trust your unit tests enough to deploy the firmware right now to thousands of real devices via Over-the-Air (OTA) firmware update?

If the answer is “well, no, not really, we still need to test a few things with the real hardware”, then automated Hardware-in-the-Loop (HIL) testing might give you the confidence to upgrade that answer to “heck yeah, let’s ship it!”. 🚀

In this post, I’ll explain what HIL testing is, and why we use it at Golioth to continuously verify the firmware for our Zephyr and ESP-IDF SDKs.

HIL Testing: A Definition

If you search online, you’ll find a lot of definitions of what HIL testing is, but I’ll try to formulate it in my own simple terms.

There are three major types of tests:

  • Unit – the smallest kind of test, often covering just one function in isolation
  • Integration – testing multiple functions or multiple subsystems together
  • System – testing the entire system, fully integrated

A firmware test can run in three places:

  • On your host machine – Running a binary compiled for your system
  • On emulated target hardware – Something like QEMU or Renode
  • On real target hardware – Running cross-compiled code for the target, like an ARM or RISC-V processor.

A HIL test is a system test that runs on real target hardware, with real peripherals, and real connections. Typically, a HIL setup will consist of:

  • A board running your firmware (the code under test)
  • A host test machine, executing the tests
  • (optional) Other peripherals and test equipment to simulate real-world conditions, controlled by the host test machine

Coverage and Confidence

Why bother with HIL tests? Aren’t unit tests good enough?

To address this, I’ll introduce two terms:

  • Coverage: How much of the firmware is covered by tests?
  • Confidence: How confident are you that the firmware behaves as intended?

Coverage is easily measured by utilizing open-source tools like gcov. It’s an essential part of any test strategy. Confidence is not as easily measured. It’s often a gut feeling.

It’s possible to have 100% coverage and 0% confidence. You can write tests that technically execute every line of code and branch combination without testing any intended functionality. Similarly, you can have 0% coverage and 100% confidence, but if you encounter someone like this, they are, to put it bluntly, delusional.

Unit tests are great at increasing coverage. The code under test is isolated from the rest of the system, and you can get into all the dark corners. “Mocks” are used to represent adjacent parts of the system, usually in a simplistic way. This might be a small piece of code that simulates the response over a UART. But this means is you can’t be 100% confident the code will work the same way in the context of the full system.

HIL system tests are great at increasing confidence, as they demonstrate that the code is functional in the real world. However, testing things at this level makes it difficult to cover the dark corners of code. The test is limited to the external interfaces of the system (ie. the UART you had a mock for in the above example).

A test strategy involving both unit tests and HIL tests will give you high coverage and high confidence.

Continuously Verifying Golioth SDK Firmware

Why do we use HIL testing at Golioth?

We have the concept of Continuously Verified Boards, which are boards that get first-class support from us. We test these boards on every SDK release across all Golioth services.

Previously, these boards were manually tested each release, but there were several problems we encountered:

  • It was time-consuming
  • It was error-prone
  • It did not scale well as we added boards
  • OTA testing was difficult, with lots of clicking and manual steps involved
  • The SDK release process was painful
  • We were unsure whether there had been regressions between releases

We’ve alleviated all of these problems by introducing HIL tests that automatically run in pull requests and merges to the main branch. These are integrated into GitHub Actions (Continuous Integration, CI), and the HIL tests must pass for new code to merge to main.

Here’s the output of the HIL test running in CI on a commit that merged to main recently in the ESP-IDF SDK:

HIL Test Running in CI

This is just the first part of the test where the ESP32 firmware is being flashed.

Further down in the test, you can see we’re running unit tests on hardware:

Automated test output

And we even have an automated OTA test:

Automated OTA Test Output

As a developer of this firmware, I can tell you that I sleep much better at night knowing that these tests pass on the real hardware. In IoT devices, OTA firmware update is one of the most important features to get right (if not the most important), so testing it automatically has been a big win for us.

HIL on a Budget

Designing a prototype HIL system does not need to be a huge up-front investment.

At Golioth, one engineer was able to build a prototype HIL within one week, due primarily to our simple hardware configuration (each tested board just needs power, serial, and WiFi connection) and GitHub Actions Self-Hosted Runners, which completely solves the infrastructure and DevOps part of the HIL system.

The prototype design looks something like this:

 

I’m using my personal Raspberry Pi as the self-hosted runner. GitHub makes it easy to register any kind of computer (Linux, MacOS, or Windows) as a self-hosted runner.

And here’s a pic of my HIL setup, tucked away in a corner and rarely touched or thought about:

My DIY HIL Setup

Next Time

That’s it for part 1! We’ve covered the “why” of HIL testing at Golioth and a little bit of the “how”.

In the next part, we’ll dive deeper into the steps we took to create the prototype HIL system, the software involved, and some of the lessons learned along the way.

 

Thanks to Michael for the photo of Tiger and Turtle

Golioth just rolled out a new settings service that lets you control your growing fleet of IoT devices. You can specify settings for your entire fleet, and override those global settings by individual device or for multiple devices that share the same blueprint.

Every IoT project needs some type of settings feature, from adjusting log levels and configuring the delay between sensor readings, to adjusting how frequently a cellular connection is used in order to conserve power. With the new settings service, the work is already done for you. A single settings change on the Golioth web console is applied to all devices listening for changes!

As you grow from dozens of devices to hundreds (and beyond), the Golioth settings service makes sure you can change device settings and confirm that those changes were received.

Demonstrating the settings service

Golioth settings service

The settings service is ready for you use right now. We have code samples for the Golioth Zephyr SDK and the Golioth ESP-IDF SDK. Let’s take it for a spin using the Zephyr samples.

I’ve compiled and flashed the Golioth Settings sample for an ESP32 device. It observes a LOOP_DELAY_S endpoint and uses that value to decide how long to wait before sending another “Hello” log message.

Project-wide settings

On the Golioth web console, I use the Device Settings option on the left sidebar to create the key/value pair for this setting. This is available to all devices in the project with firmware that is set up to observe the LOOP_DELAY_S settings endpoint.

Golioth device settings dialog

When viewing device logs, we can see the setting is observed as soon as the device connects to Golioth. The result is that the Hello messages are now issued ten seconds apart.

[00:00:19.930,000] <inf> golioth_system: Client connected!
[00:00:20.340,000] <inf> golioth: Payload
                                  a2 67 76 65 72 73 69 6f  6e 1a 62 e9 99 c4 68 73 |.gversio n.b...hs
                                  65 74 74 69 6e 67 73 a1  6c 4c 4f 4f 50 5f 44 45 |ettings. lLOOP_DE
                                  4c 41 59 5f 53 fb 40 24  00 00 00 00 00 00       |LAY_S.@$ ......  
[00:00:20.341,000] <dbg> golioth_hello: on_setting: Received setting: key = LOOP_DELAY_S, type = 2
[00:00:20.341,000] <inf> golioth_hello: Set loop delay to 10 seconds
[00:00:20.390,000] <inf> golioth_hello: Sending hello! 2
[00:00:30.391,000] <inf> golioth_hello: Sending hello! 3
[00:00:40.393,000] <inf> golioth_hello: Sending hello! 4

Settings by device or by blueprint

Of course, you don’t always want to have the same settings for all devices. Consider debugging a single device. It doesn’t make much sense to turn up the logging level or frequency of sensor reads for all devices. So with Golioth it’s easy to change the setting on just a single device.

settings change for a single device on the Golioth console

In the device view of the Golioth web console there is a settings tab. Here you can see the key, the value, and the level of the value. I have already changed the device-specific value in this screen so the level is being reported as “Device”.

[00:07:30.466,000] <inf> golioth_hello: Sending hello! 45
[00:07:40.468,000] <inf> golioth_hello: Sending hello! 46
[00:07:43.728,000] <inf> golioth: Payload
                                  a2 67 76 65 72 73 69 6f  6e 1a 62 e9 9c 17 68 73 |.gversio n.b...hs
                                  65 74 74 69 6e 67 73 a1  6c 4c 4f 4f 50 5f 44 45 |ettings. lLOOP_DE
                                  4c 41 59 5f 53 fb 40 00  00 00 00 00 00 00       |LAY_S.@. ......  
[00:07:43.729,000] <dbg> golioth_hello: on_setting: Received setting: key = LOOP_DELAY_S, type = 2
[00:07:43.729,000] <inf> golioth_hello: Set loop delay to 2 seconds
[00:07:50.469,000] <inf> golioth_hello: Sending hello! 47
[00:07:52.471,000] <inf> golioth_hello: Sending hello! 48

When I made the change. the device was immediately notified and you can see from the timestamps that it began logging at a two-second cadence as expected.

Golioth settings applied at the blueprint level

It is also possible to change settings for a group of devices that share a common blueprint. Here you will find this setting by selecting Blueprint from the left sidebar and choosing your desired blueprint.

Settings are applied based on specificity. The device-level is the most specific and will be applied first, followed by blueprint-level, and finally project-level. Blueprints may be created and applied at any time, so if you later realize you need a more specific group you can change the blueprint for those devices.

Implementation: The two parts that make up the settings service

Fundamentally, there are two parts that make our device settings system work: the Golioth cloud services running on our servers and your firmware that is running on the devices. The Golioth device SDKs allow you to register a callback function that receives settings values each time a change is made to the settings on the cloud. You choose how the device should react to these settings, like updating a delay value, enabling/disabling features, changing log output levels, really anything you want to do.

Don’t worry if you already have devices in the field. You can add the settings service or make changes to how your device handles those settings, then use the Golioth OTA firmware update system to push out the new behavior.

Take control of your fleet

Scale is the hope for most IoT companies, but it’s also where the pain of IoT happens. You need to know you can control your devices, securely communicate with them, and perform updates as necessary. Golioth has you covered in all of these areas. The new settings service ensures that your ability to change how your fleet is performing doesn’t become outpaced by your growth.

NXP RT1060-EVKB Ethernet development board

I get to work with a lot of different hardware here at Golioth, but recently I’ve found my self gravitating to the NXP i.MX RT1060-EVKB evaluation board as my go-to development hardware. This brand new board is a variation on the RT1060-EVK, which (in addition to the extra ‘B’ in the new part number) adds an m.2 slot. But what catches my eye is the Ethernet. While I’ve previously written about using SPI-connected Ethernet modules, this board builds the connectivity right in. It’s stable, it’s fast, and has wonderful support in Zephyr.

The microcontroller that controls everything is the RT1062 which I first heard about three years back when the Teensy 4.0 was released. A 600 MHz powerhouse, it boasts 1024 kB of SRAM and is packed with peripherals. It’s an embarrassment of riches for my day-to-day firmware work, but makes sure I’ll never hit my head on the proverbial “development ceiling”.

Of course it’s not just the system clock that I’m happy about. This board has Ethernet built-in, so I’m not waiting for a WiFi connection (or really waiting for a cell connection) at boot. And it’s programmed via J-Link for a fast flash process and the RTT and debugging features that this programmer brings to the party.

Let’s take a look at how I get this board talking to Golioth, and what we might see in the near future.

An i.MX RT1060-EVKB demo

NXP RT1060-evkb webinar demo

Demonstrating Golioth LightDB Stream for temperature/pressure/humidity sensor data

A couple of weeks ago I did a demo of the i.MX RT1060-EVKB board as part of NXP’s Zephyr webinar series. It gives you a great overview of the board and the ecosystem. I was even able to do a live demo of all the Golioth features like data management, command/control, remote logging, and OTA firmware updates. (Requires a free NXP registration to see the video.)

This board already has great support with the Golioth platform. My hope is to go further than that and add this to our list of Continuously Verified Boards. This would mean that we test and verify all official Golioth code samples with this board during every Golioth release. That’s a tall hill to climb, but it would be a snap if we used automated testing through a process called Hardware in the Loop; CI/CD that automatically runs compiled code on the hardware itself. This is already the case for our recently announced Golioth ESP-IDF SDK. Nick Miller is working on a post about how he pulled that off, so keep your eyes on the Golioth Blog over the next couple of weeks!

Get your tools ready

To build for this board you need a few tools, like the HAL library for NXP, LVGL, and the board files themselves.

Adding the NXP HAL and the LVGL library

Zephyr needs a hardware abstraction layer for NXP builds, and this board also depends on having the LVGL library installed. Add these to your west manifest:

  • use the west manifest --path to help you locate which file to edit (modules/lib/golioth/west-zephyr.yml in my case)
  • Add hal_nxp and lvgl to the name-allow list
  • Run west update

With the edits in place, here’s what the relevant section of my west manifest looks like:

manifest:
  projects:
    - name: zephyr
      revision: v3.1.0
      url: https://github.com/zephyrproject-rtos/zephyr
      west-commands: scripts/west-commands.yml
      import:
        name-allowlist:
          - cmsis
          - hal_espressif
          - hal_nordic
          - mbedtls
          - net-tools
          - segger
          - tinycrypt
          - hal_nxp
          - lvgl

Cherry Pick the EVKB files (if needed)

This is a new board and the devicetree files for it were just added to Zephyr on Jun 5th. My Zephyr is still on v3.1.0 so it’s older than the commit that added support for the “evkb” board. I used git’s cherry-pick feature to add those files to my workspace. From the golioth-zephyr-workspace/zephyr directory run:

git pull
git cherry-pick 9a9aeae00b184cac308c1622676bb91c07dd465b

This will pull in just the commit that adds support for the new board.

Thar ‘b dragons

Here’s the thing about making manual changes to the Zephyr repo in your workspace: the west manifest doesn’t know you’ve done this. That means the next time you run west update, this cherry-picked commit will be lost.

Prep the hardware

Last, but not least, I used a J-Link programmer to flash firmware to this board. There’s just a bit of jumper setup necessary to make this all work. NXP has a guide to setting up an external J-Link programmer. I followed that page, using a USB cable on J1 for power (and a serial terminal connection when running apps).

Saying Hello

With that preamble behind me, I’m excited to get to the “Hello” part of things using the standard Golioth Hello sample. Running Golioth samples is fairly easy with this board, with just one puzzle piece that needs to be added for DHCP to acquire an IP address.

Use DHCP to get an IP address

I added a boards/mimxrt1060_evkb.conf file to create a configuration specific to this board. In this case, it selects the Ethernet and DHCP libraries.

CONFIG_NET_L2_ETHERNET=y
CONFIG_NET_DHCPV4=y

At the top of main.c I include the network interface library. Then in the main() function, we need to instantiate an interface and pass it to the DHCP function. This code requests an IP address from the network’s DHCP server.

#include <net/net_if.h>
struct net_if *iface;
LOG_INF("Run dhcpv4 client");
iface = net_if_get_default();
net_dhcpv4_start(iface);

client->on_message = golioth_on_message;
golioth_system_client_start();

Add credentials, then build and flash

The final step is to add your Golioth credentials before building and flashing the app. Add the credentials to the prj.conf file as outlined in the hello sample README. You can get your credentials by creating a new device on the Golioth Console.

CONFIG_GOLIOTH_SYSTEM_CLIENT_PSK_ID="my-psk-id"
CONFIG_GOLIOTH_SYSTEM_CLIENT_PSK="my-psk"

The build calls out the board name to specifically target our board and the newly added configuration. The flash command will automatically find the J-Link programmer and use it.

west build -b mimxrt1060_evkb samples/hello
west flash

In my testing it only took about three seconds to get an IP address. From there, I was up and connected to Golioth after just another two seconds!

Golioth Hello serial terminal output

Golioth Hello example output

Off to the races

There really isn’t much to getting this up and running. Once I worked through this process I was able to test out all of the Golioth samples, including using the terminal shell to provision the board (ie: adding the Golioth credentials) and performing Over-the-Air (OTA) firmware updates.

As I mentioned earlier, the speed and reliability of this dev board keeps drawing me back. It’s not just the initial connection, but when testing out OTA, the 1024 byte block downloads seem to fly right by. I’m unsure if this is the Ethernet or the chip’s ability to quickly write to flash, but if you’re working with a 250 kB firmware file the difference over, say an ESP32, is obvious.

Give it a try using Golioth’s Dev Tier which includes up to 50 devices for your test fleet. The Golioth forum is a great place for questions on the process, and we’d love to hear about what you’re building over on the Golioth Discord server.

Keep your eye on this space. When Zephyr issues its next release we’ll roll forward the Golioth SDK to include the board files for the RT1060-EVKB, and my hope is that we’ll be able to include the DHCP call into our common samples code for a truly out-of-the-box build experience. We might even see a bit of that Hardware-in-the-Loop build automation I hinted at earlier. See you next time!

On Tuesday we announced the Golioth ESP-IDF SDK that delivers all of Golioth’s excellent features to ESP32 projects built on Espressif’s FreeRTOS-based ESP-IDF ecosystem. The APIs included in our SDK make it dead simple to set up an encrypted connection with Golioth and begin sending and receiving data, controlling the device remotely, sending your logging messages up to the cloud, and of course performing Over-the-Air (OTA) updates on remote devices.

Today we dive into the code as Nick Miller, Golioth’s lead firmware engineer, takes us on a guided tour.

What does Golioth ESP-IDF SDK deliver?

All of the best features of Golioth’s device management cloud are available in our ESP-IDF SDK. The set of APIs are quite clever and take all of the heavy lift out of your hands. This includes:

  • Set, get, and observe data endpoints on the cloud
  • Write log data back to the cloud
  • Handle Over-the-Air (OTA) firmware updates
  • API calls–in both synchronous and asynchronous options–to suit your needs

Setup: Install ESP-IDF and clone the Golioth repo

To get started you need have the ESP-IDF installed and clone the Golioth ESP-IDF SDK. Instructions are available in the readme of our git repository, and there is also a quickstart on our docs site.

Our new SDK is a component for FreeRTOS, the real-time operating system used by the ESP-IDF. It’s the same operating system and build tools you’re used to, with the Golioth SDK sitting on top so that your devices can interact with the Golioth servers.

Stepping through the Golioth-Basics example

The best way to test-drive is with the Golioth-Basics example that is included in the SDK. It demonstrates assigning Golioth credentials to your device, sending/receiving data, observing data, sending log messages, and performing over-the-air (OTA) firmware updates. The golioth_basics.c file is thoroughly commented to explain each API call in detail.

The example begins by initializing non-volatile storage, configuring a serial shell, checking for credentials, and connecting to WiFi. At that point we can start using the Golioth APIs.

Creating the Golioth system client

// Now we are ready to connect to the Golioth cloud.
//
// To start, we need to create a client. The function golioth_client_create will
// dynamically create a client and return a handle to it.
//
// The client itself runs in a separate task, so once this function returns,
// there will be a new task running in the background.
//
// As soon as the task starts, it will try to connect to Golioth using the
// CoAP protocol over DTLS, with the PSK ID and PSK for authentication.
golioth_client_t client =
        golioth_client_create(nvs_read_golioth_psk_id(), nvs_read_golioth_psk());

Everything starts of by instantiating a client to handle the connection for you. This client will be passed to all of the API calls so that the SDK knows where to send them.

Sending log messages

// We can also log messages "synchronously", meaning the function will block
// until one of 3 things happen (whichever comes first):
//
// 1. We receive a response to the request from the server
// 2. The user-provided timeout expires
// 3. The default client task timeout expires (GOLIOTH_COAP_RESPONSE_TIMEOUT_S)
//
// In this case, we will block for up to 2 seconds waiting for the server response.
// We'll check the return code to know whether a timeout happened.
//
// Any function provided by this SDK ending in _sync will have the same meaning.
golioth_status_t status = golioth_log_warn_sync(client, "app_main", "Sync log", 5);

Here you can see a log being written to Golioth. Notice that the client created in the previous code block is used as the first parameter. This logging call is synchronous, and will wait to ensure the log was received by the Golioth servers. There is also an asynchronous version available that provides the option to run a callback function when the log is received by Golioth.

Setting up OTA firmware updates

// For OTA, we will spawn a background task that will listen for firmware
// updates from Golioth and automatically update firmware on the device using
// Espressif's OTA library.
//
// This is optional, but most real applications will probably want to use this.
golioth_fw_update_init(client, _current_version);

OTA firmware updates are handled for you by the SDK. The line of code shown here is all it takes to register for updates. The app will then observe the firmware version available on the server. It will automatically begin the update process whenever you roll out a new firmware release on the Golioth Cloud.

Sending and receiving data

// There are a number of different functions you can call to get and set values in
// LightDB state, based on the type of value (e.g. int, bool, float, string, JSON).
golioth_lightdb_set_int_async(client, "my_int", 42, NULL, NULL);
// To asynchronously get a value from LightDB, a callback function must be provided
golioth_lightdb_get_async(client, "my_int", on_get_my_int, NULL);

The bread and butter of the IoT industry is the ability to send and received data. This code demonstrates asynchronous set and get functions. Notice that the get API call registers on_get_my_int as a callback function that will be executed to handle the data that arrives back from the Golioth servers.

A get command runs just once to fetch the requested data from Golioth. Another extremely useful approach is to observe the data using the golioth_lightdb_observe_async(). It works the same way as an asynchronous get call, but it will execute your callback every time the data on the server changes.

Putting it all together

In the second half of the video, Nick takes us through the process running the demo. He starts with setting up the ESP-IDF environment and compiling to code, and continues all the way through to viewing the device data on the web console.

You’re going to love working with the Golioth ESP-IDF SDK. It’s designed to deal with all the complexity of securely connecting and controlling your IoT devices. The API calls are easy to understand, and they make it painless to add Golioth to existing and future ESP-IDF based projects. Give it a try today using our free Dev Tier.

We’d love to hear what you’re planning to build. You can connect with us on the Golioth Discord server, ask questions over on the Golioth Forums, and share your demos by tagging the Golioth account on Twitter.

One of the most useful services in the Golioth Zephyr SDK is the ability to observe data changes on the cloud. A device can register any LightDB endpoint and the Golioth servers will notify it whenever changes happen. If your device is a door lock, an example endpoint might be “lock status”, which you would want to know about a server-side state change immediately.

This is slightly more complex to set up than something like a LightDB ‘Set’ API call. ‘Observe’ requires a callback function to handle the asynchronous reply from the Golioth servers. Today we’ll walk through how to add Golioth LightDB Observe functionality to any Zephyr application by:

  1. Adding a callback that is called every time observed data changes
  2. Registering the callback with a data endpoint
  3. Ensuring thatgolioth_on_connect is registered with the Golioth client

These techniques are all found in our LightDB Observe sample code which acts as the roadmap for this article.

Prerequisites

Today’s post assumes that you already have a device running Zephyr and you have already tested out an app that uses the Golioth Zephyr SDK. If you’re not there yet, don’t worry. You can sign up for our free Dev Tier that includes up to 50 devices, and follow the Golioth Quickstart Guide.

Your Zephyr workspace should already have Golioth installed as a module and your app (probably in main.c) is already instantiating a Golioth system client. Basically, you should see a block like this one somewhere in your code:

#include <zephyr/net/coap.h>
#include <net/golioth/system_client.h>
static struct golioth_client *client = GOLIOTH_SYSTEM_CLIENT_GET();

If you don’t, checkout out our How to add Golioth to an existing Zephyr project blog post to get up to speed before moving on.

1. Add a callback function for observed data changes

The goal of this whole exercise is to enable your device to perform a task whenever data changes at your desired endpoint. Remember: Golioth LightDB endpoints are configurable by you! Whatever data you’d like to monitor, you can customize it to your needs.

The first thing we’ll do is create a callback function that will perform the task.

static int counter_handler(struct golioth_req_rsp *rsp)
{
    if (rsp->err) {
        LOG_ERR("Failed to receive counter value: %d", rsp->err);
        return rsp->err;
    }

    LOG_INF("Received: %.*s  Length: %d", rsp->len, rsp->data, rsp->len);

    return 0;
}

The callback receives an object (rsp) from Golioth containing the data, data length, and any error codes. The first portion of this callback checks the error code. Line 8 prints a log message that displays the data received, and it’s length.

If your endpoint contains more than just one value, it may be useful to parse the JSON object and store the values. Also keep in mind that this callback will execute on the golioth system client thread, which is a different thread than the “main” thread running your application. This means:

  • The callback function should return quickly (under 10 ms). If that’s not enough time, you can use a Zephyr Workqueue to schedule the work on another thread.
  • If access to global data is required, access to the data must be protected by a mutex to avoid data races between threads.

2. Registering the callback with a data endpoint

static void golioth_on_connect(struct golioth_client *client)
{
    int err = golioth_lightdb_observe_cb(client, "counter",
                     GOLIOTH_CONTENT_FORMAT_APP_JSON,
                     counter_handler, NULL);

    if (err) {
        LOG_WRN("failed to observe lightdb path: %d", err);
    }
}

Now we register the observation using the golioth_lightdb_observe_cb() API call. The parameters passed to this function include:

  1. The Golioth client object
  2. The endpoint to observe, “counter” in this case.
  3. The format, in this case we’ve chosen JSON. For CBOR, see the LightDB LED sample which demonstrates using CBOR serialization.
  4. The name of the callback function we created in the previous section
  5. An optional user_data value. This can be used to pass any 4-byte value which could be a discrete value, a pointer to some data structure, or NULL if you don’t need it.

Notice that we’re registering the observe callback inside of a golioth_on_connect() function. This is recommended as the observation will be re-registered any time the the Golioth client connects. Without this, your device may miss observed changes if its internet connection becomes unstable. Observed data is sent to the device at the time a callback is registered, and each time the data changes on the Golioth cloud.

3. Add the processing function to on_message

This step is small but important, and seems to be the one I frequently forget and then scratch my head when my callback isn’t working.

Whenever a message is received from Golioth, the Golioth system client executes a callback that we usually call on_message. For our observed callbacks to work, we need to tell on_message about our coap_replies array.

static void golioth_on_message(struct golioth_client *client,
                   struct coap_packet *rx)
{
    /*
     * In order for the observe callback to be called,
     * we need to call this function.
     */
    coap_response_received(rx, NULL, coap_replies,
                   ARRAY_SIZE(coap_replies));
}

 

By calling Zephyr’s coap_response_received(), the CoAP packet will be parsed and the appropriate callback will be selected from the coap_replies struct (if one exists).

4. Ensuring that golioth_on_connect is registered with the Golioth client

The final step is to make sure that the oberseve callback is registered each time the Golioth system client connects.

client->on_connect = golioth_on_connect;
golioth_system_client_start();

This should be done in main() before the loop begins. The golioth_client struct should have already been instantiated in your code, in this example it was called client. The code above associates our callback function and starts the client running.

Observed data in action

Now that we’ve tied it all together, let’s test it out. Here’s the terminal output of my Zephyr app:

*** Booting Zephyr OS build zephyr-v3.2.0  ***


[00:00:00.878,000] <inf> golioth_system: Initializing
[00:00:00.878,000] <dbg> golioth_lightdb: main: Start LightDB observe sample
[00:00:00.878,000] <inf> golioth_samples: Waiting for interface to be up
[00:00:00.878,000] <inf> golioth_samples: Connecting to WiFi
uart:~$ Connected
[00:00:11.191,000] <inf> net_dhcpv4: Received: 192.168.1.159
[00:00:11.191,000] <inf> golioth_wifi: Connected with status: 0
[00:00:11.191,000] <inf> golioth_wifi: Successfully connected to WiFi
[00:00:11.191,000] <inf> golioth_system: Starting connect
[00:00:13.042,000] <inf> golioth_system: Client connected!
[00:00:13.857,000] <inf> golioth_lightdb: Received: null Length: 4
[00:00:29.526,000] <inf> golioth_lightdb: Received: 42 Length: 2

You can see that at boot time, the observed data will be reported, which is great for setting defaults when your device first connects to observed data. In the above example, the endpoint did existon the Golioth cloud when teh device registered so a payload of null (with length 4) was returned. About 16 seconds later a payload of 42 is received. That’s when I added the endpoint and value in the Golioth Console.

On the cloud, this is an integer, but the device receives payloads as strings. You’ll need to validate received data on the device side to ensure expected behavior in your callback functions (beyond simply printing out the payload as I’m doing here). Give it a try for yourself using our LightDB Observe sample code.

Observing LightDB data gives your devices the ability to react to any changes without the need to poll like you would if you were using the golioth_lightdb_get() function. In addition to being notified each time the data changes, you’ll also get the current state when the observation is first registered (ie: at power-up). Single endpoints, or entire JSON objects can be observed, making it possible to group different types of state data to suit any need.

If you still have questions, or want to talk about how LightDB Observe works under the hood, head over to the Golioth Forum or ping us on the Golioth Discord.

Golioth is a device management cloud that is easy to add to any Zephyr project. Just list the Golioth Zephyr SDK as a module in your manifest file and the west tool will do the rest. Well, almost. Today I’ll walk through how to add Golioth to an existing Zephyr project. As an example, I’ll be using our hello sample. We chose this because it already has networking, which makes explaining things a bit easier. Generally any board that has led0 defined in the device tree and you can enable networking should be a good fit. What we want to do here is showcase the elements that allow you to add Golioth, so let’s dive in!

0. Understanding the west manifest

A Zephyr workspace is made up of a number of different code repositories all stored in the same tree. Zephyr uses a west manifest file to manage all of these repositories, including information like the origin URL of the repo, the commit hash to use, and where each repository should be placed in your local tree.

Think of the manifest as a code repository shopping list that the west update command uses to fill up your local Zephyr tree. There may be more than one manifest file, but today we’ll just focus on the main manifest.

1. Add Golioth to the west manifest

Start by locating your manifest file and opening it with a code editor.

~/zephyrproject $ west manifest --path
/home/mike/zephyrproject/zephyr/west.yml

Under projects: add an entry for the Golioth Zephyr SDK (highlighted in the abbreviated manifest sample below).

manifest:
  defaults:
    remote: upstream
 
  remotes:
    - name: upstream
      url-base: https://github.com/zephyrproject-rtos
 
  #
  # Please add items below based on alphabetical order
  projects:
      # Golioth repository.
    - name: golioth
      path: modules/lib/golioth
      revision: v0.2.0
      url: https://github.com/golioth/golioth-zephyr-sdk.git
      import:
        west-external.yml
    - name: canopennode
      revision: 53d3415c14d60f8f4bfca54bfbc5d5a667d7e724
...

Note that I have called out the v0.2.0release tag. You can set this to any of our release tags, to main, or to a specific commit.

Now run an update to incorporate the manifest changes:

west update

2. Select libraries to add to the build

We use KConfig to build in the libraries that Golioth needs. These changes are made in the prj.conf file of your application.

The first set of symbols deals with Zephyr’s networking stack. Of course every Internet of Things thing needs a network connection so you likely already have these symbols selected.

# Generic networking options
CONFIG_NETWORKING=y
CONFIG_NET_IPV4=y
CONFIG_NET_IPV6=n

Golioth is secure by default so we want to select the mbedtls libraries to handle the encryption layer.

# TLS configuration
CONFIG_MBEDTLS_ENABLE_HEAP=y
CONFIG_MBEDTLS_HEAP_SIZE=10240
CONFIG_MBEDTLS_SSL_MAX_CONTENT_LEN=2048

Now let’s turn on the Golioth libraries and give them a bit of memory they can dynamically allocate from.

# Golioth Zephyr SDK
CONFIG_GOLIOTH=y
CONFIG_GOLIOTH_SYSTEM_CLIENT=y
 
CONFIG_MINIMAL_LIBC_MALLOC_ARENA_SIZE=256

Every device needs its own credentials because all connections to Golioth require security. There are a few ways to do this, but perhaps the simplest is to add them to the prj.conf file.

# Golioth credentials
CONFIG_GOLIOTH_SYSTEM_CLIENT_PSK_ID="220220711145215-blinky1060@golioth-settings-demo"
CONFIG_GOLIOTH_SYSTEM_CLIENT_PSK="cab43a035d4fe4dca327edfff6aa7935"

And finally, I’m going to enable back-end logging. This is not required for connecting to Golioth, but sending the Zephyr logs to the cloud is a really handy feature for remote devices.

# Optional for sending Zephyr logs to the Golioth cloud
CONFIG_NET_LOG=y
CONFIG_LOG_BACKEND_GOLIOTH=y
CONFIG_LOG_PROCESS_THREAD_STACK_SIZE=2048

I separated each step above for the sake of explanation. But the combination of these into one block is the boiler-plate configuration that I use on all of my new projects. See this all as one file in the prj.conf from our hello sample.

3. Instantiate the Golioth system client and say hello

So far we added Golioth as a Zephyr module and enabled the libraries using KConfig. Now it’s time to use the Golioth APIs in the main.c file.

The first step is to include the header files and instantiate a golioth_client object. The client is used to manage the connection with the Golioth servers. Including the coap header file is not strictly required to establish a connection, but it is necessary for processing the responses from Golioth, so I always include it.

#include <net/coap.h>
#include <net/golioth/system_client.h>
static struct golioth_client *client = GOLIOTH_SYSTEM_CLIENT_GET();

I also add a callback function to handle messages coming back from the Golioth cloud. Technically this is optional, but if you want two-way communication with the cloud you need it!

/* Callback for messages received from the Golioth servers */
static void golioth_on_message(struct golioth_client *client,
                   struct coap_packet *rx)
{
    uint16_t payload_len;
    const uint8_t *payload;
    uint8_t type;
 
    type = coap_header_get_type(rx);
    payload = coap_packet_get_payload(rx, &payload_len);
 
    printk("%s\n", payload);
}

In the main function, I register my callback, start the Golioth client, and call the golioth_send_hello() API.

/* Register callback and start Golioth system client */
client->on_message = golioth_on_message;
golioth_system_client_start();
 
while (1) {
    /* Say hello to Golioth */
    int ret = golioth_send_hello(client);
    if (ret) {
        printk("Failed to send hello! %d\n", ret);
    }
    else printk("Hello sent!\n");
}

When successful, the golioth_send_hello() call will prompt for a message back from the server which includes the name of the sending device. This is printed out by the golioth_on_message() callback.

Hello sent!
Hello blinky1060
Hello sent!
Hello blinky1060
Hello sent!
Hello blinky1060

Extra Credit: Back-end Logging

Observant readers have noticed that I enabled back-end logging using KConfig symbols but I didn’t use it in the C code. Here’s the really awesome part: just use Zephyr logging as you normally would and it will automatically be sent to your Golioth console.

At the top of main.c, include the log header file and register a logging module.

#include <logging/log.h>
LOG_MODULE_REGISTER(any_name_you_want, LOG_LEVEL_DBG);

In our C functions, call logs as you normal would.

LOG_ERR("An error message goes here!");
LOG_WRN("Careful, this is a warning!");
LOG_INF("Did you now that this is an info log?");
LOG_DBG("Oh no, why isn't my app working? Maybe this debug log will help.");

Now your log messages will appear on the Golioth console!

Further Reading

This covers the basics of adding Golioth to an existing Zephyr project. You can see the steps all in one place by looking at the Golioth code samples. Here are the examples that you’ll find immediately useful:

  • Hello – basics of connecting to Golioth and sending log messages (what was covered in this post)
  • Lightdb Set – send data to Golioth
  • Lightdb Observe – device will be notified whenever an endpoint on the Golioth cloud is updated

To go even deeper, you can see how we use the Golioth Over-the-Air (OTA) firmware update feature, and how to use the Zephyr settings subsystem for persistent credential storage. And remember, the Dev Tier of Golioth includes the first 50 devices, so you can try all of this out for free.

Image from Todd Lappin on flickr