How the Golioth Developer Training paved the way

This is a guest post from Shrouk El-Attar, discussing her journey from the hardware space into the firmware space and how Golioth training has helped her understand building out IoT systems using Zephyr.

The Journey Started with Hardware

To me, hardware just makes sense! You have specific requirements, find a part that can fulfill them, read its datasheet, and then execute. Voila! Successful design achieved.

The worst thing about the process? I don’t know…maybe the Googling? Getting a calculation out by a factor of ten? The tedious BOM stock checking process? These things can be long and tiring, but still, the concepts make sense. Designing a PCB is easy to understand; you follow simple rules and let the copper tracks do the rest. Even things in the “RF Voodoo” territory make sense with well-developed RF design tools these days.

Not too long ago, I finally took the leap into the world of consulting as a self-proclaimed “Hardware Queen”. With that, I saw the terrifying rise in demand for the 2-in-1 engineering consultants: a hardware engineer who is also a firmware engineer.

The Firmware Roadblock

Firmware feels to me like hardware’s anarchist sister. I know her a little; we have some history. It might surprise you to know that she hasn’t always been this way with me.

Once upon a time, there was Arduino (I know, I know). I probably picked up my first Arduino in 2012. I’d heard about them long before then, but as an asylum seeker, I couldn’t afford to own one. That is until the UK officially recognised me as a refugee, and I could finally access higher education and buy my own shiny Arduino.

The first time I got to Blinky on my Arduino, it was like something clicked. I understood it. I understood every single bit of code I was writing on it. “I have a gift,” I thought. Perhaps learning Visual Basic (yup) back in 2007 laid down those basics for me to get to Blinky on Arduino and be a total boss at it.

From then on, I felt invincible. Whatever I could think of, I was able to make it. Self-watering plant? Check. NFC easter egg hunt? Check. Twitter-powered bubble machine? Check. Arduino felt like hardware and software meeting in perfect harmony in a world where anything I could think of was possible.

Shrouk’s Arduino Connected Plant: Light data is sent to cloud platform (left), phone notifications are sent when light data is critical (middle), plant “tweets” when light data reaches below a certain level (right).

My initial success was a sign I was going to be a firmware engineer. But soon after, my firmware dreams were shattered.

Stuck in the Firmware Valley

My first full-time engineering role was at Intel back in 2015, where I specialized in the then-brand-new field called IoT. “Can you imagine that by 2020 there will be 50 billion connected devices?” I thought to myself. I was so excited that I would be one of the people developing those devices. It was the future.

One of the first things I couldn’t wait to get my hands on was the all-shiny Intel Realsense 3D Camera. But when I downloaded my SDK, I didn’t know where to start. After days of struggling on my own, I could get some things working with the help of my senior colleagues. But none of it clicked. I had no idea what I was doing. I was not gifted. Frankly, I sucked at firmware.

I started specializing in hardware. As the years passed, some of my all-time favorite chips became the nRF5 MCUs by Nordic Semiconductor. What a hardware engineer’s dream. Can’t get to a specific pin during routing? Don’t worry; choose literally any other pin you can reach more easily, and re-route there! The nRF5 had an excellent reputation amongst my firmware colleagues too.

In 2019, I decided to give firmware another go on my favorite chip. I wasn’t too far down the toolchain setup, and I’d already started regretting it. What the heck were Make files?! It was an excruciating process. But–no exaggeration–one week later, I’d managed to set up the toolchain. Great, let’s get to Blinky! Save, build, flash. Error.

I spent hours, then finally figure out the error. Save, build, flash—another error. Repeat a zillion times. It turns out I had never set up the toolchain correctly in the first place. Months later, I finally found a training that would work, or so I thought. This turned out to be my biggest disappointment. Nothing in the training was up to date. Nothing worked the way it was supposed to. This was indeed the final time I’d try to touch firmware again. I was stuck. I was done.

The Golioth Way

Back in present day, as I was realizing the demand for the 2-in-1 hardware and firmware engineering consultant, a well timed post from Chris Gammell appeared on my feed. The post was about the Golioth Developer Training, targeted explicitly at hardware engineers. The 2-in-1 engineer could be me. Was I going to try firmware again? Absolutely.

The training took place entirely remotely with hardware engineers from all over the world. This is going to be it, for sure! But, from the very beginning, I was already struggling. “Here we go again”, I thought. I shouldn’t have thought I could do it. I struggled with minor details at first, like appending -d instead of -D to a build command, which apparently gives wildly different results. I copied and pasted the correct information but in the wrong fields. And each exercise took me way longer than the “estimated time” suggested for each section.

The training happened in a large group, but we did the exercises independently. Chris and Mike were checking in on each of us throughout, so they were able to help me fix anything I missed very quickly on a one-to-one basis. As the training went on, I started to notice that I needed their help less and less. What is this I feel? Some cautious optimism, perhaps?

The training method was active. I wasn’t sitting at my desk blindly following instructions. No, I was pushed to figure out the correct way to solve a problem at every stage. It was presented in bite-size information with many collapsed sections. The training also included links referencing concepts, e.g. Zephyr Pin Control, throughout, should I want to learn more. For the most part, I didn’t click those because I wanted to stay on track. My head was already full of too much firmware to learn. But I enjoyed that they were there so I could refer back to them in the future.

Then the happiest of accidents happened. I picked up a later stage of the training on my own, then realized that I had to redo the earlier parts to get to where I wanted to pick up from. Not going to lie; I was slightly annoyed. But while re-doing them, I saw that it wasn’t taking me very long at all. This time, I was well within the time estimates suggested for each exercise, if not much quicker. Was this working for me? Did I finally find a method to learn commercially viable firmware that works?

A light bulb went off when I realized that it took me over a week to get to Blinky on my previous firmware training, but within a couple of hours, I was already speaking to my Wi-Fi device through the Golioth Console. Not only that, but it was all through the Zephyr RTOS with Golioth. The same Zephyr RTOS I struggled endlessly with on the nRF5 hardware that I love. It was a miracle to think about being enabled for firmware on my favorite hardware platform. Coupled with the fact that Golioth is entirely scalable, i.e. the idea that I can start from an idea to production on the same platform, made Golioth a dream.

Golioth also helped me develop a community, some of whom I became close Discord friends with. We all need a Discord buddy to complain to about daily life as an electronics engineering consultant (Hi Seth!). And a decade after I picked up my first Arduino, Golioth Developer Training helped me finally break through that commercially viable firmware ceiling.

Next, I’ll work on reference designs and put the skills I learnt from the training to the test. Watch this space.

Editor note: If you’re interested in taking part as an individual or as a company, sign up for future training here.

Golioth now supports Infineon parts via the ModusToolbox™. We added Golioth device management to the ecosystem a few weeks ago and the example code is available right now to run on Infineon’s line of microcontrollers talking to their Wi-Fi parts.

ModusToolbox™ (MTB) is a software support tool from Infineon Technologies. It includes partner SDKs alongside the company’s officially supported IDEs, drivers, and examples. You can pull in the Golioth example and all dependencies using the Eclipse IDE that is included in MTB, or via the command line tool.

Infineon’s PSoC™ 6 chips are feature rich 32-bit Arm microcontrollers. Paired with the Infineon 4343W, it’s a perfect platform for IoT device builders, and exactly the kind of constrained device that Golioth was built for.

Take advantage of Over-the-Air (OTA) firmware updates, time-series databases, state data management, remote logging, plus the command/control features like remote procedure call (RPC) and the device settings service. The API calls for each of these are demonstrated and well-commented in the golioth_main.c file.

Try the Golioth example using ModusToolbox™

PSoC 6 Wi-Fi BT Prototyping Kit (CY8CPROTO-062-4343W)

We run the Golioth example on the PSoC™ 6Wi-Fi BT Prototyping Kit (CY8CPROTO-062-4343W). Here’s how to try it for yourself:

1. Install Infineon’s ModusToolbox™

Begin by downloading and installing ModusToolbox™ for your system. Then run modustoolbox-eclipsewhich is located in the ide_3.0/eclipse/ subfolder.

2. Create a Golioth Example project

With the Eclipse IDE open, click on File→New→ModsToolbox Application. This will launch the project creator window.

Select the CY8CPROTO-062-4343W from the list of PSoC™ 6 boards and then click next.

Choose the Golioth Example from the Wi-Fi list and click on the Create button. This will take a couple of minutes to clone the Golioth code and all dependencies.

3. Compile and Install MCUboot

Golioth uses MCUboot as the secure bootloader for our Over-the-Air updates. Before flashing the app to the board, we need to compile and install MCUboot. I did this using the IDE’s built-in terminal.

First, we need to install the MCUboot dependencies.

cd ~/mtw/mtb_shared/mcuboot/v1.8.1-cypress/scripts/
python -m pip install -r requirements.txt

Now we can compile and flash MCUboot. Remember to plug a USB cable into the KITPROG3 connector on your PSoC™ 6 devboard before running the program command:

cd ~/mtw/Golioth_Example/bootloader_cm0p/
make build_proj -j8
make program_proj

4. Compile and flash the Golioth App

Before compiling the Golioth App we need to give it credentials to connect to Wi-Fi and also to authenticate with the Golioth server. These are set in the ~/mtw/Golioth_Example/golioth_app/source/golioth_main.hfile.

Use the Wi-Fi credentials for your local access point. Get device credentials from the Golioth Console. If you’ve haven’t yet set up an account, check out our Quickstart. (With Golioth’s Dev Tier your first 50 devices are free.)

Once you’ve saved your changes to the golioth_main.h file, use the terminal to compile and flash the app to your PSoC™ 6 board:

cd ~/mtw/Golioth_Example/golioth_app
make build_proj -j8
make program_proj

Taking the Golioth App for a test drive

[INF] MCUBoot Bootloader Started
[INF] External Memory initialized w/ SFDP.
[INF] boot_swap_type_multi: Primary image: magic=unset, swap_type=0x1, copy_don3
[INF] boot_swap_type_multi: Secondary image: magic=unset, swap_type=0x1, copy_d3
[INF] Swap type: none
[INF] User Application validated successfully
[INF] Starting User Application (wait)...
[INF] Start slot Address: 0x10018400
[INF] MCUBoot Bootloader finished
[INF] Deinitializing hardware...
External Memory initialized w/ SFDP.
=========================================================
[GoliothApp] Version: 1.0.0, CPU: CM4

=========================================================
[GoliothApp] Watchdog timer started by the bootloader is now turned off to mark.
[GoliothApp] User LED toggles at 1000 msec interval

WLAN MAC Address : 74:7A:90:D4:5F:04
WLAN Firmware    : wl0: Jul 18 2021 19:15:39 version 7.45.98.120 (56df937 CY) FWID 01-69db62cf
WLAN CLM         : API: 12.2 Data: 9.10.39 Compiler: 1.29.4 ClmImport: 1.36.3 Creation: 2021-07-18 19:03:20 
WHD VERSION      : v2.5.0 : v2.5.0 : GCC 10.3 : 2022-09-23 13:14:02 +0800
Wi-Fi Connection Manager initialized.
Successfully connected to Wi-Fi network 'MyWiFiAP'.
IP Address Assigned: 192.168.1.153
Secure Sockets initialized
I (5515) golioth_main: Waiting to Golioth to connect...
I (5638) golioth_coap_client: Start CoAP session with host: coaps://coap.golioth.io
I (5643) libcoap: Setting PSK key

I (5649) golioth_coap_client: Entering CoAP I/O loop
I (5886) golioth_main: Golioth client connected
I (5996) golioth_fw_update: Current firmware version: 1.0.0
I (6060) golioth_fw_update: Waiting to receive OTA manifest
I (6258) golioth_fw_update: Received OTA manifest
I (6258) golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
I (6260) golioth_fw_update: Waiting to receive OTA manifest
I (6322) golioth_main: Synchronously got my_int = 42
I (6323) golioth_main: Entering endless loop
I (9006) golioth_main: Callback got my_int = 42
I (9286) golioth_main: Setting loop delay to 5 s

By monitoring the serial output from the device (that’s /dev/ttyACM0 on my system) we can see all the parts of the app at work. The board powers up and reports the firmware version before connecting to Wi-Fi. Once a Golioth connection is established it checks for firmware updates before it starts writing data to the cloud.

You can view device logs remotely through the Golioth console. Here we see “Sending hello!” messages arriving along with a counter.

Viewing state data for this device, the counter variable is update at the same rate as the hello messages. In the Device Settings on the left sidebar of the console you can add LOOP_DELAY_S and remotely control how how many seconds the device pauses for between sending back these message. It’s a perfect way to control sensor reading frequency across your entire fleet.

Adding Golioth to existing Infineon PSoC™ 6 projects

The Golioth example that is part of ModusToolbox™ is a great blueprint for adding device management to your PSoC™ 6 projects. We’d love to hear what you’re build, and if you need some help we’re here to lend a hand. Show off your successes and post questions on the Golioth forum. Feel free to reach out to the Golioth Developer Relations team to set up a demo or discuss the needs of your IoT fleet.

This is the second part of How Golioth uses Hardware-in-the-Loop (HIL) Testing. In the first part, I covered what HIL testing is, why we use it at Golioth, and provided a high-level view of the system.

In this post, I’ll talk about each step we took to set up our HIL. It’s my belief that HIL testing should be a part of any firmware testing strategy, and my goal with this post is to show you that it’s not so difficult to get a minimum viable HIL set up within one week.

The Big Pieces

To recap, this is the system block diagram presented at the end of part 1:

The Raspberry Pi acts as a GitHub Actions Self-Hosted Runner which communicates with microcontroller devices via USB-to-serial adapters.

The major software pieces required are:

  • Firmware with Built-In-Test Capabilities – Runs tests on the device (e.g. ESP32-S3-DevKitC), and reports pass/fail for each test over serial.
  • Python Test Runner Script – Initiates tests and checks serial output from device. Runs on the self-hosted runner (e.g. Raspberry Pi).
  • GitHub Actions Workflow – Initiates the python test runner and reports results to GitHub. Runs on the self-hosted runner.

Firmware Built-In-Test

To test our SDK, we use a special build of firmware which exposes commands over a shell/CLI interface. There is a command named built_in_test which runs a suite of tests in firmware:

esp32> built_in_test
...
12 Tests 0 Failures 0 Ignored 
../main/app_main.c:371:test_connects_to_wifi:PASS
../main/app_main.c:378:test_golioth_client_create:PASS
../main/app_main.c:379:test_connects_to_golioth:PASS
../main/app_main.c:381:test_lightdb_set_get_sync:PASS
../main/app_main.c:382:test_lightdb_set_get_async:PASS
../main/app_main.c:383:test_lightdb_observation:PASS
../main/app_main.c:384:test_golioth_client_heap_usage:PASS
../main/app_main.c:385:test_request_dropped_if_client_not_running:PASS
../main/app_main.c:386:test_lightdb_error_if_path_not_found:PASS
../main/app_main.c:387:test_request_timeout_if_packets_dropped:PASS
../main/app_main.c:388:test_client_task_stack_min_remaining:PASS
../main/app_main.c:389:test_client_destroy_and_no_memory_leaks:PASS

These tests cover major functionality of the SDK, including connecting to Golioth servers, and sending/receiving CoAP messages. There are also a couple of tests to verify that stack and heap usage are within allowed limits, and that there are no detected memory leaks.

Here’s an example of one of the tests that runs on the device, test_connects_to_golioth:

#include "unity.h"
#include "golioth.h"

static golioth_client_t _client;
static SemaphoreHandle_t _connected_sem;

static void on_client_event(golioth_client_t client, 
                            golioth_client_event_t event, 
                            void* arg) {
    if (event == GOLIOTH_CLIENT_EVENT_CONNECTED) {
        xSemaphoreGive(_connected_sem);
    }
}

static void test_connects_to_golioth(void) {
    TEST_ASSERT_NOT_NULL(_client);
    TEST_ASSERT_EQUAL(GOLIOTH_OK, golioth_client_start(_client));
    TEST_ASSERT_EQUAL(pdTRUE, xSemaphoreTake(_connected_sem, 10000 / portTICK_PERIOD_MS));
}

This test attempts to connect to Golioth servers. If it connects, a semaphore is given which the test waits on for up to 10 seconds (or else the test fails). For such a simple test, it covers a large portion of our SDK, including the network stack (WiFi/IP/UDP), security (DTLS), and CoAP Rx/Tx.

We use the Unity test framework for basic test execution and assertion macros.

If you’re curious about the details, you can see the full test code here: https://github.com/golioth/golioth-firmware-sdk/blob/main/examples/esp_idf/test/main/app_main.c

Python Test Runner

Another key piece of software is a Python script that runs on the self-hosted runner and communicates with the device over serial. We call this script verify.py. In this script, we provision the device with credentials, issue the built_in_test command, and verify the serial output to make sure all tests pass.

Here’s the main function of the script:

def main():
    port = sys.argv[1]

    # Connect to the device over serial and use the shell CLI to interact and run tests
    ser = serial.Serial(port, 115200, timeout=1)

    # Set WiFi and Golioth credentials over device shell CLI
    set_credentials(ser)
    reset(ser)

    # Run built in tests on the device and check output
    num_test_failures = run_built_in_tests(ser)
    reset(ser)

    if num_test_failures == 0:
        run_ota_test(ser)

    sys.exit(num_test_failures)

This opens the serial port and calls various helper functions to set credentials, reset the board, run built-in-tests, and run an OTA test. The exit code is the number of failed tests, so 0 means all passed.

The script spends most of its time in this helper function which looks for a string in a line of serial text:

def wait_for_str_in_line(ser, str, timeout_s=20, log=True):
    start_time = time()
    while True:
        line = ser.readline().decode('utf-8', errors='replace').replace("\r\n", "")
        if line != "" and log:
            print(line)
        if "CPU halted" in line:
            raise RuntimeError(line)
        if time() - start_time > timeout_s:
            raise RuntimeError('Timeout')
        if str in line:
            return line

The script will continually read lines from serial for up to timeout_s seconds until it finds what it’s looking for. And if “CPU halted” ever appears in the output, that means the board crashed, so a runtime error is raised.

This verify.py script is what gets invoked by the GitHub Actions Workflow file.

GitHub Actions Workflow

The workflow file is a YAML file that defines what will be executed by GitHub Actions. If one of the steps in the workflow returns a non-zero exit code, then the workflow will fail, which results in a red ❌ in CI.

Our workflow is divided into two jobs:

  • Build the firmware and upload build artifacts. This is executed on cloud servers, since building the firmware on a Raspberry Pi would take a prohibitively long time (adding precious minutes to CI time).
  • Download firmware build, flash the device, and run verify.py. This is executed on the self-hosted runner (my Raspberry Pi).

The workflow file is created in .github/workflows/test_esp32s3.yml. Here are the important parts of the file:

name: Test on Hardware

on:
  push:
    branches: [ main ]
  pull_request:

jobs:
  build_for_hw_test:
    runs-on: ubuntu-latest
    steps:
    ...
    - name: Build test project
      uses: espressif/esp-idf-ci-action@v1
      with:
        esp_idf_version: v4.4.1
        target: esp32s3
        path: 'examples/test'
    ...

  hw_flash_and_test:
    needs: build_for_hw_test
    runs-on: [self-hosted, has_esp32s3]
    ...
    - name: Flash and Verify Serial Output
      run: |
        cd examples/test
        python flash.py $CI_ESP32_PORT && python verify.py $CI_ESP32_PORT

The first job runs in a normal GitHub runner based on Ubuntu. For building the test firmware, Espressif provides a handy CI Action to do the heavy lifting.

The second job has a needs: build_for_hw_test requirement, meaning it must wait until the first job completes (i.e. firmware is built) before it can run.

The second job has two further requirements – it must run on self-hosted and there must be a label has_esp32s3. GitHub will dispatch the job to the first available self-hosted runner that meets the requirements.

If the workflow completes all jobs, and all steps return non-zero exit code, then everything has passed, and you get the green CI checkmark ✅.

Self-Hosted Runner Setup

With all of the major software pieces in place (FW test framework, Python runner script, and Actions workflow), the next step is to set up a machine as a self-hosted runner. Any machine capable of running one of the big 3 operating systems will do (Linux, Windows, MacOS).

It helps if the machine is always on and connected to the Internet, though that is not a requirement. GitHub will simply not dispatch jobs to any self-hosted runners that are not online.

Install System Dependencies

There are a few system dependencies that need to be installed first. On my Raspberry Pi, I had to install these:

sudo apt install \
    python3 python3-pip python3-dev python-dev \
    libffi-dev cmake direnv wget gcovr clang-format \
    libssl-dev git
pip3 install pyserial

Next, log into GitHub and create a new self-hosted runner for your repo (Settings -> Actions -> Runners -> New self-hosted runner):

When you add a new self-hosted runner, GitHub will provide some very simple instructions to install the Actions service on your machine:

When I ran the config.sh line, I added --name and --unattended arguments:

./config.sh --url <https://github.com/golioth/golioth-firmware-sdk> --token XXXXXXXXXXXXXXXXXXXXXXXXXXXXX --name nicks-raspberry-pi --unattended

Next, install the service, which ensures GitHub can communicate with the self-hosted runner even if it reboots:

cd actions-runner
sudo ./svc.sh install

If there are any special environment variables needed when executing workflows, these can be added to runsvc.sh. For instance, I defined a variable for the serial port of the ESP32:

# insert anything to setup env when running as a service
export CI_ESP32_PORT=/dev/ttyUSB0

Finally, you can start the service:

sudo ./svc.sh start

And now you should see your self-hosted runner as “Idle” in GitHub Actions, meaning it is ready to accept jobs:

I added two custom labels to my runner: has_esp32s3 and has_nrf52840dk, to indicate that I have those two dev boards attached to my self-hosted runner. The workflow file uses these labels so that jobs get dispatched to self-hosted runners only if they have the required board.

And that’s it! With those steps, a basic HIL is up and running, integrated into CI, and covering a large amount of firmware code on real hardware.

Security

There’s an important note from GitHub regarding security of self-hosted runners:

Warning: We recommend that you only use self-hosted runners with private repositories. This is because forks of your repository can potentially run dangerous code on your self-hosted runner machine by creating a pull request that executes the code in a workflow.

I also stand by that recommendation. But it’s not always feasible if you’re maintaining an open-source project. Our SDKs are public, so this warning certainly applies to us.

If you’re not careful, you could give everyone on the Internet root access to your machine! It wouldn’t be very hard for someone to submit a pull request that modifies the workflow file to run whatever arbitrary Linux commands they want.

We mitigate most of the security risks with three actions.

1. Require approval to run workflows for all outside contributors.

This means someone from the Golioth organization must click a button to run the workflow. This allows the repo maintainer time to review any workflow changes before allowing the workflow to run.

This option can be enabled in Settings -> Actions -> General:

Once enabled, pull requests will have a new “Approve and Run” button:

This is not ideal if you’re running a large open-source project with lots of outside contributors, but it is a reasonable compromise to make during early stages of a project.

2. Avoid logging sensitive information

The log output of any GitHub Actions runner is visible publicly. If there is any sensitive information printed out in the workflow steps or from the serial output of the device, then this information could easily fall into the wrong hands.

For our purposes, we disabled logging WiFi and Golioth credentials.

3. Utilize GitHub Actions Secrets

You can use GitHub Actions Secrets to store sensitive information that is required by the runner. In our repo, we define a secret named `GOLIOTH_API_KEY` that allows us access to the Golioth API, which is used to deploy OTA images of built firmware during the workflow run.

If you want to know more about self-hosted runner security, these resources were helpful to me:

Tips and Tricks

Here are a few extra ideas to utilize the full power of GitHub Actions Self-Hosted Runners (SHRs):

  • Use the same self-hosted runner in multiple repos. You can define SHRs at the organization level and allow their use in more than one repo. We do this with our SHRs so they can be used to run tests in the Golioth Zephyr SDK repo and the Golioth ESP-IDF SDK repo.
  • Use labels to control workflow dispatch. Custom labels can be applied to a runner to control when and how a workflow gets dispatched. As mentioned previously, we use these to enforce that the SHR has the dev board required by the test. We have a label for each dev board, and if the admin of the SHR machine has those dev boards attached via USB, then they’d add a label for each one. This means that machine will only receive workflows that are compatible with that machine. If a board is acting up or requires maintenance, the admin can simply disconnect the board and remove the label to prevent further workflows from executing.
  • Distributed self-hosted runners to load balance and minimize CI bottlenecks. If there is only one self-hosted runner, it can become a bottleneck if there are lots of jobs to run. Also, if that runner loses Internet, then CI is blocked indefinitely. To balance the load and create more robust HIL infrastructure, it’s recommended to utilize multiple self-hosted runners. Convince your co-workers to take their unused Intel NUC off the shelf and follow the steps in the “Self-Hosted Runner Setup” section to add another runner to the pool. This allows for HIL tests to run in a distributed way, no matter where in the world you are.

Even people who have been living under a rock for the past couple of years know that there’s a global chip shortage. The correct response from the engineering community should be changes to our design philosophy and that’s the topic of the talk that Chris Gammell and I gave at the 2022 Zephyr Developer Summit back in June. With the right hardware and firmware approach, it’s possible to design truly modular hardware to respond to supply chain woes.

Standardization enabled the industrial revolution, and the electronics field is a direct descendant of those principles. You can rest easy in knowing that myriad parts exist to match your chosen operating voltage and communication scheme. But generally speaking it’s still quite painful to change microcontrollers and other key-components mid-way through product design.

Today’s hardware landscape has uprooted the old ways of doing things. You can’t wait out a 52-week lead time, so plan for the worst and hope for the best. Our talk covers the design philosophies we’ve adopted to make hardware “swapability” possible, while using Zephyr RTOS to manage the firmware side of things.

Meet the Golioth Aludel

Golioth exists to make IoT development straightforward and scalable–that’s not possible if you’re locked into specific hardware, especially these days. So we designed and built a modular platform to showcase this flexibility to our customers.

Golioth Aludel prototyping platform internal view

Called Aludel, Chris Gammell centered the design around a readily available industrial enclosure. The core concept is a PCB base that accepts any microcontroller board that uses the Feather standard. This way, every pin-location and function is standardized. Alongside the Feather you’ll find two headers that use the Click boards standard (PDF), a footprint for expansions boards facilitating i2c, SPI, UART, and more. The base includes Qwiic and Stemma headers for additional sensor connectivity, as well as terminal blocks, room for a battery, and provisions for external power.

The key customization element for Aludel is the faceplate. It’s a circuit board that can deliver a beautiful custom design to match any customer request. But on the inside, each faceplate has the same interface using a flat cable connection to the base board. Current versions of the faceplate feature a a screen, an EEPROM to identify the board, and a port expander that drives buttons, LEDs, and whatever else you need for the hardware UI.

The beauty of this is that electrically, it all just works. Need an nRF52, nRF9160, ESP32, STM32, or RP2040? They all already exist in Feather form factor. Need to change the type of temperature sensor you’re using? Want to add RS485, CANbus, MODBUS, 4-20 ma communication, or some other connectivity protocol? The hardware is ready for it–at most you might need to spin your own adapter board.

Now skeptics may look at the size of this with one abnormally high-arched eyebrow. But remember, this is a proof-of-concept platform. You can set it up for your prototyping, then take the block elements and boil them down into your final design. Prematurely optimizing for space means you will have many many more board runs as the parts change.

When you do a production run and your chip is no longer available, the same concepts will quickly allow you to test replacements and respin your production PCB to accommodate the changes.

Zephyr Alleviates your Firmware Headaches

“But wait!” you cry, “won’t someone think of the firmware?”. Indeed, someone has thought of the firmware. The Linux Foundation shepherds an amazing Real-Time Operating System called Zephyr RTOS that focuses on interoperability across a vast array of processor architectures and families. Even better, it handles the networking stack and has a standardized device model that makes it possible to switch peripherals (ie: sensors) without your C code even noticing.

We use Zephyr to maintain one firmware repository for the Aludel hardware that can be compiled for STM32F40, nRF52840, nRF9160, and ESP32 using your choice of Ethernet, WiFi, or Cellular connections. How is that even possible?

&i2c0 {
	bme280@76 {
		compatible = "bosch,bme280";
		reg = <0x76>;
		label = "BME280_I2C";
	};
};

&spi1 {
	compatible = "nordic,nrf-spi";
	status = "okay";
	cs-gpios = <&gpio0 3 GPIO_ACTIVE_LOW>,
	           <&gpio0 27 GPIO_ACTIVE_LOW>,
	           <&gpio1 8 GPIO_ACTIVE_LOW>;
	test_spi_w5500: w5500@0 {
		compatible = "wiznet,w5500";
		label = "w5500";
		reg = <0x0>;
		spi-max-frequency = <10000000>;
		int-gpios = <&gpio0 2 GPIO_ACTIVE_LOW>;
		reset-gpios = <&gpio0 30 GPIO_ACTIVE_LOW>;
	};
};

/ {
    aliases {
        aludeli2c = &i2c0;
        pca9539int = &interrupt_pin0;
    };
    gpio_keys {
        compatible = "gpio-keys";
        interrupt_pin0: int_pin_0 {
			gpios = < &gpio0 26 GPIO_ACTIVE_LOW >;
			label = "Port Expander Interrupt Pin";
		};
	};
};

Zephyr uses the DeviceTree standard to map how all of the hardware is connected. A vast amount of work has already been done for you by the manufacturers who maintain drivers for their chips and supply .dts (DeviceTree Syntax) files that specify pin functions and mappings. When writing firmware for your projects you supply DeviceTree overlay files that indicate sensors, buttons, LEDs, and anything else connected to your design. The C code looks up each device to receive all relevant information like device address, GPIO port/pin assignment and function.

const struct device *weather_dev = DEVICE_DT_GET_ANY(bosch_bme280);
sensor_sample_fetch(weather_dev);
sensor_channel_get(weather_dev, SENSOR_CHAN_AMBIENT_TEMP, &temp);
sensor_channel_get(weather_dev, SENSOR_CHAN_PRESS, &press);
sensor_channel_get(weather_dev, SENSOR_CHAN_HUMIDITY, &humidity);

The hardware abstraction doesn’t stop there. Zephyr also specifies abstraction for sensors. For instance, an IMU will have an X, Y, and Z output — Zephyr makes accessing these the same even if you have to move to an IMU from a different manufacturer due to parts shortages. You update the overlay files for the new build, and use a configuration file to turn on any specific libraries for the change, but the code you’re maintaining can continue on as if nothing happened.

Of course there are caveats which we cover in the talk. There is no one-size-fits-all in embedded engineering so you will eventually need to define your own .dts files for custom boards, and add peripherals that are unsupported in Zephyr. This is not a deal breaker, there is a template example for adding boards (and I highly recommend watching Jared Wolff’s video on the subject). And since Zephyr is open source, your customizations can be contributed upstream so others may also benefit.

Hardware will never again be the same

Electronics manufacturing will eventually emerge from the current shortages. But there are no signs of this happening soon, and it’s been ongoing for two years. We shouldn’t be trying to return to “normal”, but planning a better future. That’s what Golioth is doing for the Internet of Things. Choosing an IoT management platform shouldn’t require you to commit to one type of hardware. Your fleets should evolve, and our talk outlines how they can evolve. Golioth helps to manage your diverse and growing hardware fleet. Golioth is designed to make it easy to keep track of and control all of that hardware, whether that’s 100 devices or 100,000. Take Golioth for a test drive today.

Today we’re announcing a new feature on the Golioth Console and on our Device SDKs that enables Remote Procedure Calls (RPCs) for all users on the platform. From the cloud, you can initiate a function on your constrained device in the field, ensure the device received and executed the command, and receive a response from the device back to the Cloud.

What is a Remote Procedure Call (RPC)?

A Remote Procedure Call allows you to call a function on a remote computing device and optionally receive a result. An easy way to think about it is you’re calling a function, like you would in any other program…you’re just doing it from another computer. In this case, you’re triggering actions from the Golioth Cloud.

RPC from the Golioth Cloud (Console)

Each device in your project has a page where you can view details about things like LightDB State, LightDB Stream, Settings, and now RPC. Our Console includes an interface to directly send RPCs to the remote device. The URL will look something like:

https://console.golioth.io/devices/<YOUR_DEVICE_ID>/management/rpc

In all of our examples, we are sending an RPC to single devices. However, they can also be triggered from the REST API. As a reminder, any function you see on the Golioth Console is available on the REST API.

One critical function of RPCs is a confirmation that the remote function has actually run. The device firmware needs to send back a success message that the function has completed, and optionally a returned value. When there is a problem connecting with your device and the RPC does not complete successfully, you will see a screen that looks like this:

An RPC sent to a device that was disconnected from Wi-Fi

Also note that round trip time is measured for all RPCs, including successful ones. Transit times will depend upon your connectivity medium, in addition to the processing time of the function on the remote device.

When an RPC successfully completes, you can click the button with 3 dots to receive the returned value. In the example and in the video, we were using a method called double that takes an integer input, multiplies it by two, and then returns the value to the Cloud. Below, you can see the result when we sent “double” method with a parameter of “37”.

RPC from the Device SDK perspective

Any new feature on Golioth has Device SDK support, in addition to the new APIs and UIs on our Web Console. Earlier this week, Nick wrote about how we test hardware and firmware at Golioth, especially when a new feature is released across the platform. Now that we support 3 SDKs (Zephyr, NCS, ESP-IDF), the testing area has increased.

In the video, the focus is on ESP-IDF, which has a simple way to set up and respond to new RPCs. First, we register the new method, so we’ll recognize the command coming from the Golioth Cloud:

The function that we tie to that newly registered RPC needs to return the RPC_OK variable for the Cloud to be alerted that the function has processed properly.

If you have satisfied these requirements in the ESP-IDF SDK and copied the format, you can customize logic to do whatever task you’d like on the remote device. Let’s look at some examples.

Use cases for an RPC

The double() example is a simple showcase of the minimum requirements to create an RPC in the ESP-IDF SDK. We send a command and a value, we return a modified value.

The remote_reset command we created and showcased in the video is more like a critical function you would want to add to your project. When you want to trigger a remote function like a reset, you want to ensure that the command was properly received, that the function executed, and then that there was output data that validated the reset has happened. In the final point, that includes inferring that the device has restarted from the log messages also being sent back to the Golioth Console. Put all together, it’s a reliable way to tell the device has been reset.

Other use cases could be as simple as sending arbitrary text to a display. You would still want to know the text has been received and properly sent to the physical display. Or perhaps you have a valve and you want to be able to send an arbitrary value to the valve, but you also want to take a reading on an encoder that measures the distance the valve has moved.

Many of these functions could also be achieved with LightDB State (which the RPC service is built upon), but the context for creating an RPC is more targeted at situations like the examples above.

What will you build?

RPCs are another way for you to communicate with your constrained IoT devices from the Golioth Cloud and to get useful information back from your devices. You can start testing out this feature today.

For more questions or assistance, check out our Forums, our Discord, or drop us a note at [email protected].

 

Here’s a scenario that might sound familiar.

You’ve added a new feature to the firmware, about 800 lines of code, along with some simple unit tests. All unit tests are passing. The code coverage report is higher than ever. You pat yourself on the back for a job well done.

But then – for a brief moment, conviction wavers as a voice in your head whispers: “The mocks aren’t real”; “Does this code really work?”; “You forgot to test OTA.”

But the moment passes, and your faith is restored as you admire the elegant green CI check mark ✅.

Do you trust your unit tests enough to deploy the firmware right now to thousands of real devices via Over-the-Air (OTA) firmware update?

If the answer is “well, no, not really, we still need to test a few things with the real hardware”, then automated Hardware-in-the-Loop (HIL) testing might give you the confidence to upgrade that answer to “heck yeah, let’s ship it!”. 🚀

In this post, I’ll explain what HIL testing is, and why we use it at Golioth to continuously verify the firmware for our Zephyr and ESP-IDF SDKs.

HIL Testing: A Definition

If you search online, you’ll find a lot of definitions of what HIL testing is, but I’ll try to formulate it in my own simple terms.

There are three major types of tests:

  • Unit – the smallest kind of test, often covering just one function in isolation
  • Integration – testing multiple functions or multiple subsystems together
  • System – testing the entire system, fully integrated

A firmware test can run in three places:

  • On your host machine – Running a binary compiled for your system
  • On emulated target hardware – Something like QEMU or Renode
  • On real target hardware – Running cross-compiled code for the target, like an ARM or RISC-V processor.

A HIL test is a system test that runs on real target hardware, with real peripherals, and real connections. Typically, a HIL setup will consist of:

  • A board running your firmware (the code under test)
  • A host test machine, executing the tests
  • (optional) Other peripherals and test equipment to simulate real-world conditions, controlled by the host test machine

Coverage and Confidence

Why bother with HIL tests? Aren’t unit tests good enough?

To address this, I’ll introduce two terms:

  • Coverage: How much of the firmware is covered by tests?
  • Confidence: How confident are you that the firmware behaves as intended?

Coverage is easily measured by utilizing open-source tools like gcov. It’s an essential part of any test strategy. Confidence is not as easily measured. It’s often a gut feeling.

It’s possible to have 100% coverage and 0% confidence. You can write tests that technically execute every line of code and branch combination without testing any intended functionality. Similarly, you can have 0% coverage and 100% confidence, but if you encounter someone like this, they are, to put it bluntly, delusional.

Unit tests are great at increasing coverage. The code under test is isolated from the rest of the system, and you can get into all the dark corners. “Mocks” are used to represent adjacent parts of the system, usually in a simplistic way. This might be a small piece of code that simulates the response over a UART. But this means is you can’t be 100% confident the code will work the same way in the context of the full system.

HIL system tests are great at increasing confidence, as they demonstrate that the code is functional in the real world. However, testing things at this level makes it difficult to cover the dark corners of code. The test is limited to the external interfaces of the system (ie. the UART you had a mock for in the above example).

A test strategy involving both unit tests and HIL tests will give you high coverage and high confidence.

Continuously Verifying Golioth SDK Firmware

Why do we use HIL testing at Golioth?

We have the concept of Continuously Verified Boards, which are boards that get first-class support from us. We test these boards on every SDK release across all Golioth services.

Previously, these boards were manually tested each release, but there were several problems we encountered:

  • It was time-consuming
  • It was error-prone
  • It did not scale well as we added boards
  • OTA testing was difficult, with lots of clicking and manual steps involved
  • The SDK release process was painful
  • We were unsure whether there had been regressions between releases

We’ve alleviated all of these problems by introducing HIL tests that automatically run in pull requests and merges to the main branch. These are integrated into GitHub Actions (Continuous Integration, CI), and the HIL tests must pass for new code to merge to main.

Here’s the output of the HIL test running in CI on a commit that merged to main recently in the ESP-IDF SDK:

HIL Test Running in CI

This is just the first part of the test where the ESP32 firmware is being flashed.

Further down in the test, you can see we’re running unit tests on hardware:

Automated test output

And we even have an automated OTA test:

Automated OTA Test Output

As a developer of this firmware, I can tell you that I sleep much better at night knowing that these tests pass on the real hardware. In IoT devices, OTA firmware update is one of the most important features to get right (if not the most important), so testing it automatically has been a big win for us.

HIL on a Budget

Designing a prototype HIL system does not need to be a huge up-front investment.

At Golioth, one engineer was able to build a prototype HIL within one week, due primarily to our simple hardware configuration (each tested board just needs power, serial, and WiFi connection) and GitHub Actions Self-Hosted Runners, which completely solves the infrastructure and DevOps part of the HIL system.

The prototype design looks something like this:

 

I’m using my personal Raspberry Pi as the self-hosted runner. GitHub makes it easy to register any kind of computer (Linux, MacOS, or Windows) as a self-hosted runner.

And here’s a pic of my HIL setup, tucked away in a corner and rarely touched or thought about:

My DIY HIL Setup

Next Time

That’s it for part 1! We’ve covered the “why” of HIL testing at Golioth and a little bit of the “how”.

In the next part, we’ll dive deeper into the steps we took to create the prototype HIL system, the software involved, and some of the lessons learned along the way.

 

Thanks to Michael for the photo of Tiger and Turtle

Golioth just rolled out a new settings service that lets you control your growing fleet of IoT devices. You can specify settings for your entire fleet, and override those global settings by individual device or for multiple devices that share the same blueprint.

Every IoT project needs some type of settings feature, from adjusting log levels and configuring the delay between sensor readings, to adjusting how frequently a cellular connection is used in order to conserve power. With the new settings service, the work is already done for you. A single settings change on the Golioth web console is applied to all devices listening for changes!

As you grow from dozens of devices to hundreds (and beyond), the Golioth settings service makes sure you can change device settings and confirm that those changes were received.

Demonstrating the settings service

Golioth settings service

The settings service is ready for you use right now. We have code samples for the Golioth Zephyr SDK and the Golioth ESP-IDF SDK. Let’s take it for a spin using the Zephyr samples.

I’ve compiled and flashed the Golioth Settings sample for an ESP32 device. It observes a LOOP_DELAY_S endpoint and uses that value to decide how long to wait before sending another “Hello” log message.

Project-wide settings

On the Golioth web console, I use the Device Settings option on the left sidebar to create the key/value pair for this setting. This is available to all devices in the project with firmware that is set up to observe the LOOP_DELAY_S settings endpoint.

Golioth device settings dialog

When viewing device logs, we can see the setting is observed as soon as the device connects to Golioth. The result is that the Hello messages are now issued ten seconds apart.

[00:00:19.930,000] <inf> golioth_system: Client connected!
[00:00:20.340,000] <inf> golioth: Payload
                                  a2 67 76 65 72 73 69 6f  6e 1a 62 e9 99 c4 68 73 |.gversio n.b...hs
                                  65 74 74 69 6e 67 73 a1  6c 4c 4f 4f 50 5f 44 45 |ettings. lLOOP_DE
                                  4c 41 59 5f 53 fb 40 24  00 00 00 00 00 00       |LAY_S.@$ ......  
[00:00:20.341,000] <dbg> golioth_hello: on_setting: Received setting: key = LOOP_DELAY_S, type = 2
[00:00:20.341,000] <inf> golioth_hello: Set loop delay to 10 seconds
[00:00:20.390,000] <inf> golioth_hello: Sending hello! 2
[00:00:30.391,000] <inf> golioth_hello: Sending hello! 3
[00:00:40.393,000] <inf> golioth_hello: Sending hello! 4

Settings by device or by blueprint

Of course, you don’t always want to have the same settings for all devices. Consider debugging a single device. It doesn’t make much sense to turn up the logging level or frequency of sensor reads for all devices. So with Golioth it’s easy to change the setting on just a single device.

settings change for a single device on the Golioth console

In the device view of the Golioth web console there is a settings tab. Here you can see the key, the value, and the level of the value. I have already changed the device-specific value in this screen so the level is being reported as “Device”.

[00:07:30.466,000] <inf> golioth_hello: Sending hello! 45
[00:07:40.468,000] <inf> golioth_hello: Sending hello! 46
[00:07:43.728,000] <inf> golioth: Payload
                                  a2 67 76 65 72 73 69 6f  6e 1a 62 e9 9c 17 68 73 |.gversio n.b...hs
                                  65 74 74 69 6e 67 73 a1  6c 4c 4f 4f 50 5f 44 45 |ettings. lLOOP_DE
                                  4c 41 59 5f 53 fb 40 00  00 00 00 00 00 00       |LAY_S.@. ......  
[00:07:43.729,000] <dbg> golioth_hello: on_setting: Received setting: key = LOOP_DELAY_S, type = 2
[00:07:43.729,000] <inf> golioth_hello: Set loop delay to 2 seconds
[00:07:50.469,000] <inf> golioth_hello: Sending hello! 47
[00:07:52.471,000] <inf> golioth_hello: Sending hello! 48

When I made the change. the device was immediately notified and you can see from the timestamps that it began logging at a two-second cadence as expected.

Golioth settings applied at the blueprint level

It is also possible to change settings for a group of devices that share a common blueprint. Here you will find this setting by selecting Blueprint from the left sidebar and choosing your desired blueprint.

Settings are applied based on specificity. The device-level is the most specific and will be applied first, followed by blueprint-level, and finally project-level. Blueprints may be created and applied at any time, so if you later realize you need a more specific group you can change the blueprint for those devices.

Implementation: The two parts that make up the settings service

Fundamentally, there are two parts that make our device settings system work: the Golioth cloud services running on our servers and your firmware that is running on the devices. The Golioth device SDKs allow you to register a callback function that receives settings values each time a change is made to the settings on the cloud. You choose how the device should react to these settings, like updating a delay value, enabling/disabling features, changing log output levels, really anything you want to do.

Don’t worry if you already have devices in the field. You can add the settings service or make changes to how your device handles those settings, then use the Golioth OTA firmware update system to push out the new behavior.

Take control of your fleet

Scale is the hope for most IoT companies, but it’s also where the pain of IoT happens. You need to know you can control your devices, securely communicate with them, and perform updates as necessary. Golioth has you covered in all of these areas. The new settings service ensures that your ability to change how your fleet is performing doesn’t become outpaced by your growth.

NXP RT1060-EVKB Ethernet development board

I get to work with a lot of different hardware here at Golioth, but recently I’ve found my self gravitating to the NXP i.MX RT1060-EVKB evaluation board as my go-to development hardware. This brand new board is a variation on the RT1060-EVK, which (in addition to the extra ‘B’ in the new part number) adds an m.2 slot. But what catches my eye is the Ethernet. While I’ve previously written about using SPI-connected Ethernet modules, this board builds the connectivity right in. It’s stable, it’s fast, and has wonderful support in Zephyr.

The microcontroller that controls everything is the RT1062 which I first heard about three years back when the Teensy 4.0 was released. A 600 MHz powerhouse, it boasts 1024 kB of SRAM and is packed with peripherals. It’s an embarrassment of riches for my day-to-day firmware work, but makes sure I’ll never hit my head on the proverbial “development ceiling”.

Of course it’s not just the system clock that I’m happy about. This board has Ethernet built-in, so I’m not waiting for a WiFi connection (or really waiting for a cell connection) at boot. And it’s programmed via J-Link for a fast flash process and the RTT and debugging features that this programmer brings to the party.

Let’s take a look at how I get this board talking to Golioth, and what we might see in the near future.

An i.MX RT1060-EVKB demo

NXP RT1060-evkb webinar demo

Demonstrating Golioth LightDB Stream for temperature/pressure/humidity sensor data

A couple of weeks ago I did a demo of the i.MX RT1060-EVKB board as part of NXP’s Zephyr webinar series. It gives you a great overview of the board and the ecosystem. I was even able to do a live demo of all the Golioth features like data management, command/control, remote logging, and OTA firmware updates. (Requires a free NXP registration to see the video.)

This board already has great support with the Golioth platform. My hope is to go further than that and add this to our list of Continuously Verified Boards. This would mean that we test and verify all official Golioth code samples with this board during every Golioth release. That’s a tall hill to climb, but it would be a snap if we used automated testing through a process called Hardware in the Loop; CI/CD that automatically runs compiled code on the hardware itself. This is already the case for our recently announced Golioth ESP-IDF SDK. Nick Miller is working on a post about how he pulled that off, so keep your eyes on the Golioth Blog over the next couple of weeks!

Get your tools ready

To build for this board you need a few tools, like the HAL library for NXP, LVGL, and the board files themselves.

Adding the NXP HAL and the LVGL library

Zephyr needs a hardware abstraction layer for NXP builds, and this board also depends on having the LVGL library installed. Add these to your west manifest:

  • use the west manifest --path to help you locate which file to edit (modules/lib/golioth/west-zephyr.yml in my case)
  • Add hal_nxp and lvgl to the name-allow list
  • Run west update

With the edits in place, here’s what the relevant section of my west manifest looks like:

manifest:
  projects:
    - name: zephyr
      revision: v3.1.0
      url: https://github.com/zephyrproject-rtos/zephyr
      west-commands: scripts/west-commands.yml
      import:
        name-allowlist:
          - cmsis
          - hal_espressif
          - hal_nordic
          - mbedtls
          - net-tools
          - segger
          - tinycrypt
          - hal_nxp
          - lvgl

Cherry Pick the EVKB files (if needed)

This is a new board and the devicetree files for it were just added to Zephyr on Jun 5th. My Zephyr is still on v3.1.0 so it’s older than the commit that added support for the “evkb” board. I used git’s cherry-pick feature to add those files to my workspace. From the golioth-zephyr-workspace/zephyr directory run:

git pull
git cherry-pick 9a9aeae00b184cac308c1622676bb91c07dd465b

This will pull in just the commit that adds support for the new board.

Thar ‘b dragons

Here’s the thing about making manual changes to the Zephyr repo in your workspace: the west manifest doesn’t know you’ve done this. That means the next time you run west update, this cherry-picked commit will be lost.

Prep the hardware

Last, but not least, I used a J-Link programmer to flash firmware to this board. There’s just a bit of jumper setup necessary to make this all work. NXP has a guide to setting up an external J-Link programmer. I followed that page, using a USB cable on J1 for power (and a serial terminal connection when running apps).

Saying Hello

With that preamble behind me, I’m excited to get to the “Hello” part of things using the standard Golioth Hello sample. Running Golioth samples is fairly easy with this board, with just one puzzle piece that needs to be added for DHCP to acquire an IP address.

Use DHCP to get an IP address

I added a boards/mimxrt1060_evkb.conf file to create a configuration specific to this board. In this case, it selects the Ethernet and DHCP libraries.

CONFIG_NET_L2_ETHERNET=y
CONFIG_NET_DHCPV4=y

At the top of main.c I include the network interface library. Then in the main() function, we need to instantiate an interface and pass it to the DHCP function. This code requests an IP address from the network’s DHCP server.

#include <net/net_if.h>
struct net_if *iface;
LOG_INF("Run dhcpv4 client");
iface = net_if_get_default();
net_dhcpv4_start(iface);

client->on_message = golioth_on_message;
golioth_system_client_start();

Add credentials, then build and flash

The final step is to add your Golioth credentials before building and flashing the app. Add the credentials to the prj.conf file as outlined in the hello sample README. You can get your credentials by creating a new device on the Golioth Console.

CONFIG_GOLIOTH_SYSTEM_CLIENT_PSK_ID="my-psk-id"
CONFIG_GOLIOTH_SYSTEM_CLIENT_PSK="my-psk"

The build calls out the board name to specifically target our board and the newly added configuration. The flash command will automatically find the J-Link programmer and use it.

west build -b mimxrt1060_evkb samples/hello
west flash

In my testing it only took about three seconds to get an IP address. From there, I was up and connected to Golioth after just another two seconds!

Golioth Hello serial terminal output

Golioth Hello example output

Off to the races

There really isn’t much to getting this up and running. Once I worked through this process I was able to test out all of the Golioth samples, including using the terminal shell to provision the board (ie: adding the Golioth credentials) and performing Over-the-Air (OTA) firmware updates.

As I mentioned earlier, the speed and reliability of this dev board keeps drawing me back. It’s not just the initial connection, but when testing out OTA, the 1024 byte block downloads seem to fly right by. I’m unsure if this is the Ethernet or the chip’s ability to quickly write to flash, but if you’re working with a 250 kB firmware file the difference over, say an ESP32, is obvious.

Give it a try using Golioth’s Dev Tier which includes up to 50 devices for your test fleet. The Golioth forum is a great place for questions on the process, and we’d love to hear about what you’re building over on the Golioth Discord server.

Keep your eye on this space. When Zephyr issues its next release we’ll roll forward the Golioth SDK to include the board files for the RT1060-EVKB, and my hope is that we’ll be able to include the DHCP call into our common samples code for a truly out-of-the-box build experience. We might even see a bit of that Hardware-in-the-Loop build automation I hinted at earlier. See you next time!

On Tuesday we announced the Golioth ESP-IDF SDK that delivers all of Golioth’s excellent features to ESP32 projects built on Espressif’s FreeRTOS-based ESP-IDF ecosystem. The APIs included in our SDK make it dead simple to set up an encrypted connection with Golioth and begin sending and receiving data, controlling the device remotely, sending your logging messages up to the cloud, and of course performing Over-the-Air (OTA) updates on remote devices.

Today we dive into the code as Nick Miller, Golioth’s lead firmware engineer, takes us on a guided tour.

What does Golioth ESP-IDF SDK deliver?

All of the best features of Golioth’s device management cloud are available in our ESP-IDF SDK. The set of APIs are quite clever and take all of the heavy lift out of your hands. This includes:

  • Set, get, and observe data endpoints on the cloud
  • Write log data back to the cloud
  • Handle Over-the-Air (OTA) firmware updates
  • API calls–in both synchronous and asynchronous options–to suit your needs

Setup: Install ESP-IDF and clone the Golioth repo

To get started you need have the ESP-IDF installed and clone the Golioth ESP-IDF SDK. Instructions are available in the readme of our git repository, and there is also a quickstart on our docs site.

Our new SDK is a component for FreeRTOS, the real-time operating system used by the ESP-IDF. It’s the same operating system and build tools you’re used to, with the Golioth SDK sitting on top so that your devices can interact with the Golioth servers.

Stepping through the Golioth-Basics example

The best way to test-drive is with the Golioth-Basics example that is included in the SDK. It demonstrates assigning Golioth credentials to your device, sending/receiving data, observing data, sending log messages, and performing over-the-air (OTA) firmware updates. The golioth_basics.c file is thoroughly commented to explain each API call in detail.

The example begins by initializing non-volatile storage, configuring a serial shell, checking for credentials, and connecting to WiFi. At that point we can start using the Golioth APIs.

Creating the Golioth system client

// Now we are ready to connect to the Golioth cloud.
//
// To start, we need to create a client. The function golioth_client_create will
// dynamically create a client and return a handle to it.
//
// The client itself runs in a separate task, so once this function returns,
// there will be a new task running in the background.
//
// As soon as the task starts, it will try to connect to Golioth using the
// CoAP protocol over DTLS, with the PSK ID and PSK for authentication.
golioth_client_t client =
        golioth_client_create(nvs_read_golioth_psk_id(), nvs_read_golioth_psk());

Everything starts of by instantiating a client to handle the connection for you. This client will be passed to all of the API calls so that the SDK knows where to send them.

Sending log messages

// We can also log messages "synchronously", meaning the function will block
// until one of 3 things happen (whichever comes first):
//
// 1. We receive a response to the request from the server
// 2. The user-provided timeout expires
// 3. The default client task timeout expires (GOLIOTH_COAP_RESPONSE_TIMEOUT_S)
//
// In this case, we will block for up to 2 seconds waiting for the server response.
// We'll check the return code to know whether a timeout happened.
//
// Any function provided by this SDK ending in _sync will have the same meaning.
golioth_status_t status = golioth_log_warn_sync(client, "app_main", "Sync log", 5);

Here you can see a log being written to Golioth. Notice that the client created in the previous code block is used as the first parameter. This logging call is synchronous, and will wait to ensure the log was received by the Golioth servers. There is also an asynchronous version available that provides the option to run a callback function when the log is received by Golioth.

Setting up OTA firmware updates

// For OTA, we will spawn a background task that will listen for firmware
// updates from Golioth and automatically update firmware on the device using
// Espressif's OTA library.
//
// This is optional, but most real applications will probably want to use this.
golioth_fw_update_init(client, _current_version);

OTA firmware updates are handled for you by the SDK. The line of code shown here is all it takes to register for updates. The app will then observe the firmware version available on the server. It will automatically begin the update process whenever you roll out a new firmware release on the Golioth Cloud.

Sending and receiving data

// There are a number of different functions you can call to get and set values in
// LightDB state, based on the type of value (e.g. int, bool, float, string, JSON).
golioth_lightdb_set_int_async(client, "my_int", 42, NULL, NULL);
// To asynchronously get a value from LightDB, a callback function must be provided
golioth_lightdb_get_async(client, "my_int", on_get_my_int, NULL);

The bread and butter of the IoT industry is the ability to send and received data. This code demonstrates asynchronous set and get functions. Notice that the get API call registers on_get_my_int as a callback function that will be executed to handle the data that arrives back from the Golioth servers.

A get command runs just once to fetch the requested data from Golioth. Another extremely useful approach is to observe the data using the golioth_lightdb_observe_async(). It works the same way as an asynchronous get call, but it will execute your callback every time the data on the server changes.

Putting it all together

In the second half of the video, Nick takes us through the process running the demo. He starts with setting up the ESP-IDF environment and compiling to code, and continues all the way through to viewing the device data on the web console.

You’re going to love working with the Golioth ESP-IDF SDK. It’s designed to deal with all the complexity of securely connecting and controlling your IoT devices. The API calls are easy to understand, and they make it painless to add Golioth to existing and future ESP-IDF based projects. Give it a try today using our free Dev Tier.

We’d love to hear what you’re planning to build. You can connect with us on the Golioth Discord server, ask questions over on the Golioth Forums, and share your demos by tagging the Golioth account on Twitter.

One of the most useful services in the Golioth Zephyr SDK is the ability to observe data changes on the cloud. A device can register any LightDB endpoint and the Golioth servers will notify it whenever changes happen. If your device is a door lock, an example endpoint might be “lock status”, which you would want to know about a server-side state change immediately.

This is slightly more complex to set up than something like a LightDB ‘Set’ API call. ‘Observe’ requires a callback function to handle the asynchronous reply from the Golioth servers. Today we’ll walk through how to add Golioth LightDB Observe functionality to any Zephyr application by:

  1. Adding a callback that is called every time observed data changes
  2. Registering the callback with a data endpoint
  3. Ensuring thatgolioth_on_connect is registered with the Golioth client

These techniques are all found in our LightDB Observe sample code which acts as the roadmap for this article.

Prerequisites

Today’s post assumes that you already have a device running Zephyr and you have already tested out an app that uses the Golioth Zephyr SDK. If you’re not there yet, don’t worry. You can sign up for our free Dev Tier that includes up to 50 devices, and follow the Golioth Quickstart Guide.

Your Zephyr workspace should already have Golioth installed as a module and your app (probably in main.c) is already instantiating a Golioth system client. Basically, you should see a block like this one somewhere in your code:

#include <zephyr/net/coap.h>
#include <net/golioth/system_client.h>
static struct golioth_client *client = GOLIOTH_SYSTEM_CLIENT_GET();

If you don’t, checkout out our How to add Golioth to an existing Zephyr project blog post to get up to speed before moving on.

1. Add a callback function for observed data changes

The goal of this whole exercise is to enable your device to perform a task whenever data changes at your desired endpoint. Remember: Golioth LightDB endpoints are configurable by you! Whatever data you’d like to monitor, you can customize it to your needs.

The first thing we’ll do is create a callback function that will perform the task.

static int counter_handler(struct golioth_req_rsp *rsp)
{
    if (rsp->err) {
        LOG_ERR("Failed to receive counter value: %d", rsp->err);
        return rsp->err;
    }

    LOG_INF("Received: %.*s  Length: %d", rsp->len, rsp->data, rsp->len);

    return 0;
}

The callback receives an object (rsp) from Golioth containing the data, data length, and any error codes. The first portion of this callback checks the error code. Line 8 prints a log message that displays the data received, and it’s length.

If your endpoint contains more than just one value, it may be useful to parse the JSON object and store the values. Also keep in mind that this callback will execute on the golioth system client thread, which is a different thread than the “main” thread running your application. This means:

  • The callback function should return quickly (under 10 ms). If that’s not enough time, you can use a Zephyr Workqueue to schedule the work on another thread.
  • If access to global data is required, access to the data must be protected by a mutex to avoid data races between threads.

2. Registering the callback with a data endpoint

static void golioth_on_connect(struct golioth_client *client)
{
    int err = golioth_lightdb_observe_cb(client, "counter",
                     GOLIOTH_CONTENT_FORMAT_APP_JSON,
                     counter_handler, NULL);

    if (err) {
        LOG_WRN("failed to observe lightdb path: %d", err);
    }
}

Now we register the observation using the golioth_lightdb_observe_cb() API call. The parameters passed to this function include:

  1. The Golioth client object
  2. The endpoint to observe, “counter” in this case.
  3. The format, in this case we’ve chosen JSON. For CBOR, see the LightDB LED sample which demonstrates using CBOR serialization.
  4. The name of the callback function we created in the previous section
  5. An optional user_data value. This can be used to pass any 4-byte value which could be a discrete value, a pointer to some data structure, or NULL if you don’t need it.

Notice that we’re registering the observe callback inside of a golioth_on_connect() function. This is recommended as the observation will be re-registered any time the the Golioth client connects. Without this, your device may miss observed changes if its internet connection becomes unstable. Observed data is sent to the device at the time a callback is registered, and each time the data changes on the Golioth cloud.

3. Add the processing function to on_message

This step is small but important, and seems to be the one I frequently forget and then scratch my head when my callback isn’t working.

Whenever a message is received from Golioth, the Golioth system client executes a callback that we usually call on_message. For our observed callbacks to work, we need to tell on_message about our coap_replies array.

static void golioth_on_message(struct golioth_client *client,
                   struct coap_packet *rx)
{
    /*
     * In order for the observe callback to be called,
     * we need to call this function.
     */
    coap_response_received(rx, NULL, coap_replies,
                   ARRAY_SIZE(coap_replies));
}

 

By calling Zephyr’s coap_response_received(), the CoAP packet will be parsed and the appropriate callback will be selected from the coap_replies struct (if one exists).

4. Ensuring that golioth_on_connect is registered with the Golioth client

The final step is to make sure that the oberseve callback is registered each time the Golioth system client connects.

client->on_connect = golioth_on_connect;
golioth_system_client_start();

This should be done in main() before the loop begins. The golioth_client struct should have already been instantiated in your code, in this example it was called client. The code above associates our callback function and starts the client running.

Observed data in action

Now that we’ve tied it all together, let’s test it out. Here’s the terminal output of my Zephyr app:

*** Booting Zephyr OS build zephyr-v3.2.0  ***


[00:00:00.878,000] <inf> golioth_system: Initializing
[00:00:00.878,000] <dbg> golioth_lightdb: main: Start LightDB observe sample
[00:00:00.878,000] <inf> golioth_samples: Waiting for interface to be up
[00:00:00.878,000] <inf> golioth_samples: Connecting to WiFi
uart:~$ Connected
[00:00:11.191,000] <inf> net_dhcpv4: Received: 192.168.1.159
[00:00:11.191,000] <inf> golioth_wifi: Connected with status: 0
[00:00:11.191,000] <inf> golioth_wifi: Successfully connected to WiFi
[00:00:11.191,000] <inf> golioth_system: Starting connect
[00:00:13.042,000] <inf> golioth_system: Client connected!
[00:00:13.857,000] <inf> golioth_lightdb: Received: null Length: 4
[00:00:29.526,000] <inf> golioth_lightdb: Received: 42 Length: 2

You can see that at boot time, the observed data will be reported, which is great for setting defaults when your device first connects to observed data. In the above example, the endpoint did existon the Golioth cloud when teh device registered so a payload of null (with length 4) was returned. About 16 seconds later a payload of 42 is received. That’s when I added the endpoint and value in the Golioth Console.

On the cloud, this is an integer, but the device receives payloads as strings. You’ll need to validate received data on the device side to ensure expected behavior in your callback functions (beyond simply printing out the payload as I’m doing here). Give it a try for yourself using our LightDB Observe sample code.

Observing LightDB data gives your devices the ability to react to any changes without the need to poll like you would if you were using the golioth_lightdb_get() function. In addition to being notified each time the data changes, you’ll also get the current state when the observation is first registered (ie: at power-up). Single endpoints, or entire JSON objects can be observed, making it possible to group different types of state data to suit any need.

If you still have questions, or want to talk about how LightDB Observe works under the hood, head over to the Golioth Forum or ping us on the Golioth Discord.