As members and contributors to the Zephyr Project, we keep an eye on new developments. A recently published feature of particular interest because it represents a new way to structure programs and messaging between different parts of your program.

ZBus (Zephyr Bus) is a recently merged feature on the Zephyr Project, which brings a standardized version of event driven architecture in the form of a publish and subscribe model inside of your program. The lead author Rodrigo Peixoto spoke with Golioth about the details of this new feature and how it might help Golioth users to make more responsive, flexible programs. In the associated video, Rodrigo walks through the history and capabilities of ZBus.

Why should you consider an event-driven bus architecture?

The decision to take on a new system architecture is not something to do willy nilly. It’s important to understand where an event driven architecture is a good fit.

The first that I think about is scalability. When you have a bus architecture, adding an additional “listener” can be done with much less work.

Consider the alternative to event-driven architecture. When you want to add a new action (say some code that initiates a sensor reading), you need to call that new code from your trigger event (say a timer being finished). That means you need to know where the trigger is located in the code and make changes there to add the call to your new task. Once you add in the required testing, each additional feature can become burdensom. This scales poorly.

With an event-driven system, the trigger is already set up to publish an event. New tasks can be added that look for the event. You don’t need to change any trigger code, you don’t even need to know where that code lives. This performs well as the amount of data increases, which will ultimately depend on how large you think your system will be.

Flexibility is another consideration. An event-driven bus allows the system to be easily adapted to handle new events or changes in the environment. This means that the system can be easily updated and modified without having to completely rewrite large swaths of code. Another type of “flexibility” is how and where you can re-use your code. This makes it easier to develop and test your system, as well as to troubleshoot and fix any problems that may arise.

Finally, if your device needs to meet critical timing, an event-driven system will not only deal with higher levels of complexity, but also respond quickly to new inputs to the system, such as external events. For example, an embedded system might be designed to control a robot, and it would use event-driven architecture to respond to sensor data from the robot’s environment and control its movements accordingly.

How ZBus implements an event-driven architecture

In ZBus, there are “producers” that generate messages and “consumers” that act upon them. There are also “Filters” help to process raw data (such as a sensor output). Each of these are organized into different “channels” to allow listening on a particular lane of data being produced.

Source: Rodrigo’s ZBus presentation (click for link)

Source: Rodrigo’s ZBus presentation (click for link)

Source: Rodrigo’s ZBus presentation (click for link)

Each of these are built into normal scenarios such as listening for timers and taking a reading from a sensor and then alerting other parts of the program that the data is now available. A common scenario is show below:

How will you use ZBus?

Microcontrollers are used in increasingly complex scenarios and are being asked to do more and more. Connecting a low power device to the internet often requires higher levels of complexity that Zephyr helps with. We expect to see more devices using Ecosystems and RTOSes like Zephyr in the future, and implementing ZBus on high complexity devices.

Are you looking at using an event-driven architecture in your system? Let us know on our forum and tell us how we can help!

The most sought-after Golioth feature is OTA, also known as Over-the-Air firmware updates. When you put an IoT device into the field it’s crucial that you be able to push firmware updates to it without human intervention. Golioth makes simplifies the process for your ESP-IDF projects.

Today we’re walking through the OTA process:

  • Build and flash the initial firmware to the device
  • Provision the device with credentials that will be persistent across firmware updates
  • Build a new revision of the firmware
  • Upload the firmware to Golioth and roll it out as a release
  • Observe the device detecting, downloading, and running the new firmware

Prerequisites:

Please ensure that you have installed a copy of the the ESP-IDF v4.4.2 to your computer. Today’s article will use an ESP32 but this will work with other variants like the ESP32s2, ESP32c3, etc.

Clone a copy of the Golioth Firmware SDK (which includes ESP-IDF support). To do, please follow the “Cloning this repo section” in the README.

Commands in this guide are based on a Linux operating system with the ESP-IDF and Golioth Firmware SDK installed in the home directory (~/). However, these are cross-platform tools and are easy to adapt to your system and your preferred install directories.

Build and flash the initial firmware

In a classic Chicken-or-Egg scenario, to perform a Golioth OTA update your device needs to be running firmware built for Golioth OTA. We can use the golioth_basics example which is ready to run without changes.

First, let’s make sure our ESP-IDF is set to the correct version and enabled for this session:

cd ~/esp-idf
git fetch
git checkout v4.4.2
git submodule update --init --recursive ./install.sh all
source export.sh

Now move to the ESP-IDF section of the Golioth Firmware SDK, specifically the golioth_basics example. We’ll build, flash, and run this code on the ESP32:

cd ~/golioth-firmware-sdk/examples/esp_idf/golioth_basics
idf.py build
idf.py flash
idf.py monitor

On some systems you will need to hold the boot button on the ESP32 in order to flash the code. I find to get the monitor command to work I need to first hold the boot button, then press reset when the screen says “waiting for download”. One last tip: CTRL-] is used to exit from the idf.py monitor screen.

Assign device credentials and connect

There are a number of ways to assign credentials to your device (including Bluetooth via your browser!) but perhaps the easiest is to type them into the shell. Head over to the Golioth Console and select your device’s credentials tab. (If you don’t have an account, sign up for the Dev Tier now, your first 50 devices are free.)

Use your the shell window to set the credentials. Here you can see the process, with the four commands for Golioth and WiFi credentials highlighted:

Type 'help' to get the list of commands.
Use UP/DOWN arrows to navigate through command history.
Press TAB when typing command name to auto-complete.
esp32> W (2212) golioth_example: WiFi and golioth credentials are not set
W (2212) golioth_example: Use the shell settings commands to set them, then restart
esp32> 
esp32> settings set golioth/psk-id 20221122201152-esp32@developer-training
Setting golioth/psk-id saved
esp32> settings set golioth/psk 5bfb64ad29dce4e3dd30ab10c5b95a6a
Setting golioth/psk saved
esp32> settings set wifi/ssid MyWifiAp
Setting wifi/ssid saved
esp32> settings set wifi/psk MyWifiPassword
Setting wifi/psk saved
esp32> reset

The final command resets the device so that it will use the new credentials. You should see this output:

I (4235) esp_netif_handlers: sta ip: 192.168.1.159, mask: 255.255.255.0, gw: 192.168.1.1
I (4235) example_wifi: WiFi Connected. Got IP:192.168.1.159
I (4235) example_wifi: Connected to AP SSID: MyWifiAp
I (4255) golioth_mbox: Mbox created, bufsize: 2184, num_items: 20, item_size: 104
I (4255) golioth_basics: Waiting for connection to Golioth...
W (4295) wifi:<ba-add>idx:0 (ifx:0, c6:ff:d4:a8:fa:10), tid:0, ssn:1, winSize:64
I (4395) golioth_coap_client: Start CoAP session with host: coaps://coap.golioth.io
I (4405) libcoap: Setting PSK key

I (4415) golioth_coap_client: Entering CoAP I/O loop
I (4805) golioth_basics: Golioth client connected
I (4805) golioth_basics: Hello, Golioth!
I (4815) golioth_coap_client: Golioth CoAP client connected
I (4815) golioth_fw_update: Current firmware version: 1.2.5
I (5735) golioth_fw_update: Waiting to receive OTA manifest
I (5835) golioth_basics: Synchronously got my_int = 42
I (5845) golioth_basics: Entering endless loop
I (5845) golioth_basics: Sending hello! 0
I (5935) golioth_fw_update: Received OTA manifest
I (5935) golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
I (5945) golioth_fw_update: Waiting to receive OTA manifest
I (6545) golioth_basics: Callback got my_int = 42
W (9805) golioth_coap_client: CoAP message retransmitted
W (10335) golioth_coap_client: 4.00 (req type: 3, path: .c/status), len 59
I (15845) golioth_basics: Sending hello! 1
I (21965) wifi:bcn_timout,ap_probe_send_start
I (25855) golioth_basics: Sending hello! 2
I (35005) wifi:bcn_timout,ap_probe_send_start
I (35855) golioth_basics: Sending hello! 3

First the ESP32 connects to WiFi, then Golioth. After checking (and not finding) a firmware update available, this example begins sending hello messages every few seconds.

Now let’s do an OTA firmware update

We connected to Golioth with the device, now let’s build and upload a new firmware package to test the OTA capabilities. We’ll use the same code, updating the Current Version number which the device uses to identify when an update is needed. We’ll also change the string used in the log messages so it’s easy to recognize that our new firmware is running.

Change the source code version and rebuild

The file we need to update is a common file used by multiple Golioth SDKs. Edit the ~/golioth-firmware-sdk/examples/common/golioth_basics.c file:

#define TAG "golioth_basics_new"

// Current firmware version
static const char* _current_version = "1.2.6";

You can see I’ve appended “_new” to the tag name and incremented the version number to 1.2.6. Now we’re ready to rebuild… but remember, don’t flash this to your ESP32. We’re going to upload it to Golioth and perform a remote firmware update!

cd ~/golioth-firmware-sdk/examples/esp_idf/golioth_basics
idf.py build

The newly built binary is located in the build folder.

Upload firmware to Golioth and roll out a release

After much preamble we’ve arrived at the important moment.

To set the scene, the ESP32 that’s running on your desk is a remote IoT device taking sensor readings in a brick-and-mortar retail establishment in Waldorf, Maryland. We’ll push an update to it using a three-step process:

  1. Upload the binary as an “artifact”
  2. Create a “release” using the artifact
  3. Click the “rollout” button to make the release live

Go to the Golioth Console and select Firmware Update→Artifacts from the left sidebar. Click the “Create” button.

Golioth OTA create artifact

The only thing we’re going to change on this window is the “Artifact Version”. Type in the same version number you entered in the sourcecode for this firmware (probably 1.2.6 if you’re following along). Click the upload icon in the middle of the window and choose the “golioth_basics.bin” file from the build directory where you ran the idf.py build command. Finally, click the Upload Artifact button.

You have the option here to use a Blueprint. I’m not detailing that today for brevity, but it’s a good practice to use Blueprints to organize your production devices.

Now let’s create a release based on the artifact. Click Firmware Update→Release from the left sidebar and click the Create button.

Golioth OTA Release

All we’re going to do here is to chose the artifact we previously created in the Artifacts box and press Create Release.

You have some options here, most notably you can choose to start the rollout as soon as the release is created. I prefer to wait and roll it out as a separate confirmation step in case I made some mistake along the way.

Note that you have the option here of selecting device Blueprints and Tags to make this a more targeted release. This window is telling me the release will apply to 51 devices (!). That’s okay here because these are all test devices on a test project that we use for training.

Finally, let’s roll out the release to our devices:

Golioth OTA rollout

The Rollout button is all that stand between you and automatic updates. Click it and you will (almost) immediately see your device begin to download the new binary.

Here’s an awesome feature to keep in mind. When you have more than one release, you can use this button to rollback to previous versions. This means if you realize you released an update that has a bug, you can just toggle this button and all of your devices will automatically download the next-newest release that has rollout selected.

Watch your ESP32 update

If you read the source code for the golioth_basics example you will notice that it calls golioth_fw_update_init(client, _current_version);. That means the device has registered with the Golioth servers to receive updates when new firmware is available. Look in the terminal output and you will see the result:

I (175827) golioth_basics: Sending hello! 17
I (185007) golioth_fw_update: Received OTA manifest
I (185007) golioth_fw_update: Current version = 1.2.5, Target version = 1.2.6
I (185017) golioth_fw_update: State = Downloading
I (185317) golioth_fw_update: Image size = 1211744
I (185327) golioth_fw_update: Getting block index 0 (1/1184)
I (185827) golioth_basics: Sending hello! 18
W (187867) golioth_coap_client: CoAP message retransmitted
I (187947) fw_update_esp_idf: Writing to partition subtype 17 at offset 0x1a0000
I (187947) fw_update_esp_idf: Erasing flash
I (191627) golioth_fw_update: Getting block index 1 (2/1184)
I (191837) golioth_fw_update: Getting block index 2 (3/1184)
I (192037) golioth_fw_update: Getting block index 3 (4/1184)
I (192187) golioth_fw_update: Getting block index 4 (5/1184)
I (192447) golioth_fw_update: Getting block index 5 (6/1184)
I (192597) golioth_fw_update: Getting block index 6 (7/1184)

... snip ...

I (279837) golioth_fw_update: Getting block index 1181 (1182/1184) 
I (280107) golioth_fw_update: Getting block index 1182 (1183/1184) 
I (280317) golioth_fw_update: Getting block index 1183 (1184/1184) 
I (280457) golioth_fw_update: Total bytes written: 1211760 
I (280467) esp_image: segment 0: paddr=001a0020 vaddr=3f400020 size=29df0h (171504) map 
I (280527) esp_image: segment 1: paddr=001c9e18 vaddr=3ffbdb60 size=05868h ( 22632) 
I (280537) esp_image: segment 2: paddr=001cf688 vaddr=40080000 size=00990h ( 2448) 
I (280547) esp_image: segment 3: paddr=001d0020 vaddr=400d0020 size=d9a64h (891492) map 
I (280847) esp_image: segment 4: paddr=002a9a8c vaddr=40080990 size=1e2a0h (123552) 
I (280887) esp_image: segment 5: paddr=002c7d34 vaddr=50000000 size=00010h ( 16) 
I (280887) golioth_fw_update: State = Downloaded 
I (281127) golioth_fw_update: State = Updating 
I (281327) fw_update_esp_idf: Setting boot partition 
I (281337) esp_image: segment 0: paddr=001a0020 vaddr=3f400020 size=29df0h (171504) map 
I (281397) esp_image: segment 1: paddr=001c9e18 vaddr=3ffbdb60 size=05868h ( 22632) 
I (281417) esp_image: segment 2: paddr=001cf688 vaddr=40080000 size=00990h ( 2448) 
I (281417) esp_image: segment 3: paddr=001d0020 vaddr=400d0020 size=d9a64h (891492) map 
I (281717) esp_image: segment 4: paddr=002a9a8c vaddr=40080990 size=1e2a0h (123552) 
I (281757) esp_image: segment 5: paddr=002c7d34 vaddr=50000000 size=00010h ( 16) 
I (281827) golioth_fw_update: Rebooting into new image in 5 seconds 
I (282827) golioth_fw_update: Rebooting into new image in 4 seconds 
I (283827) golioth_fw_update: Rebooting into new image in 3 seconds 
I (284827) golioth_fw_update: Rebooting into new image in 2 seconds 
I (285827) golioth_fw_update: Rebooting into new image in 1 seconds

... snip ...

I (4267) esp_netif_handlers: sta ip: 192.168.1.159, mask: 255.255.255.0, gw: 192.168.1.1
I (4267) example_wifi: WiFi Connected. Got IP:192.168.1.159
I (4277) example_wifi: Connected to AP SSID: TheNewPeachRepublic
I (4287) golioth_mbox: Mbox created, bufsize: 2184, num_items: 20, item_size: 104
I (4287) golioth_basics_new: Waiting for connection to Golioth...
W (4297) wifi:<ba-add>idx:0 (ifx:0, c6:ff:d4:a8:fa:10), tid:0, ssn:1, winSize:64
I (4307) golioth_coap_client: Start CoAP session with host: coaps://coap.golioth.io
I (4307) libcoap: Setting PSK key

I (4317) golioth_coap_client: Entering CoAP I/O loop
I (4637) golioth_basics_new: Golioth client connected
I (4647) golioth_coap_client: Golioth CoAP client connected
I (4657) golioth_basics_new: Hello, Golioth!
I (4657) golioth_fw_update: Current firmware version: 1.2.6
I (4657) golioth_fw_update: Waiting for golioth client to connect before cancelling rollback
I (4677) golioth_fw_update: Firmware updated successfully!
I (4727) golioth_fw_update: State = Idle
I (5937) golioth_basics_new: Synchronously got my_int = 42
I (5937) golioth_basics_new: Entering endless loop
I (5937) golioth_basics_new: Sending hello! 0

First, the device compares version numbers and then begins to download blocks of the new firmware. Once downloaded it will reboot, connect to Golioth, and verify that it is running the newest version. The log labels near the end of the output now show golioth_basics_new, confirming one of the changes we made to our source code.

Golioth OTA report firmware version

On the Golioth Console, the summary view for this device confirms the currently running firmware version is 1.2.6!

With Golioth, OTA is built into the SDK

Golioth has done the heavy lifting so that you don’t need to. Our SDK uses just the single API call to register your devices for firmware updates. Use the fleet management tools on the Golioth Cloud to provision your devices in groups and by hardware variants. These make it possible to target your test devices for the first round of updates, or push new features just to the devices on the fourth floor of your Des Moines plant.

These robust tools are crucial for successful, long-lasting IoT deployments, and and they’re ready for you to start using right now. If you have any questions, we’d love to talk! Reach out to us on the Golioth Forum or get in touch with the DevRel team for demo.

How the Golioth Developer Training paved the way

This is a guest post from Shrouk El-Attar, discussing her journey from the hardware space into the firmware space and how Golioth training has helped her understand building out IoT systems using Zephyr.

The Journey Started with Hardware

To me, hardware just makes sense! You have specific requirements, find a part that can fulfill them, read its datasheet, and then execute. Voila! Successful design achieved.

The worst thing about the process? I don’t know…maybe the Googling? Getting a calculation out by a factor of ten? The tedious BOM stock checking process? These things can be long and tiring, but still, the concepts make sense. Designing a PCB is easy to understand; you follow simple rules and let the copper tracks do the rest. Even things in the “RF Voodoo” territory make sense with well-developed RF design tools these days.

Not too long ago, I finally took the leap into the world of consulting as a self-proclaimed “Hardware Queen”. With that, I saw the terrifying rise in demand for the 2-in-1 engineering consultants: a hardware engineer who is also a firmware engineer.

The Firmware Roadblock

Firmware feels to me like hardware’s anarchist sister. I know her a little; we have some history. It might surprise you to know that she hasn’t always been this way with me.

Once upon a time, there was Arduino (I know, I know). I probably picked up my first Arduino in 2012. I’d heard about them long before then, but as an asylum seeker, I couldn’t afford to own one. That is until the UK officially recognised me as a refugee, and I could finally access higher education and buy my own shiny Arduino.

The first time I got to Blinky on my Arduino, it was like something clicked. I understood it. I understood every single bit of code I was writing on it. “I have a gift,” I thought. Perhaps learning Visual Basic (yup) back in 2007 laid down those basics for me to get to Blinky on Arduino and be a total boss at it.

From then on, I felt invincible. Whatever I could think of, I was able to make it. Self-watering plant? Check. NFC easter egg hunt? Check. Twitter-powered bubble machine? Check. Arduino felt like hardware and software meeting in perfect harmony in a world where anything I could think of was possible.

Shrouk’s Arduino Connected Plant: Light data is sent to cloud platform (left), phone notifications are sent when light data is critical (middle), plant “tweets” when light data reaches below a certain level (right).

My initial success was a sign I was going to be a firmware engineer. But soon after, my firmware dreams were shattered.

Stuck in the Firmware Valley

My first full-time engineering role was at Intel back in 2015, where I specialized in the then-brand-new field called IoT. “Can you imagine that by 2020 there will be 50 billion connected devices?” I thought to myself. I was so excited that I would be one of the people developing those devices. It was the future.

One of the first things I couldn’t wait to get my hands on was the all-shiny Intel Realsense 3D Camera. But when I downloaded my SDK, I didn’t know where to start. After days of struggling on my own, I could get some things working with the help of my senior colleagues. But none of it clicked. I had no idea what I was doing. I was not gifted. Frankly, I sucked at firmware.

I started specializing in hardware. As the years passed, some of my all-time favorite chips became the nRF5 MCUs by Nordic Semiconductor. What a hardware engineer’s dream. Can’t get to a specific pin during routing? Don’t worry; choose literally any other pin you can reach more easily, and re-route there! The nRF5 had an excellent reputation amongst my firmware colleagues too.

In 2019, I decided to give firmware another go on my favorite chip. I wasn’t too far down the toolchain setup, and I’d already started regretting it. What the heck were Make files?! It was an excruciating process. But–no exaggeration–one week later, I’d managed to set up the toolchain. Great, let’s get to Blinky! Save, build, flash. Error.

I spent hours, then finally figure out the error. Save, build, flash—another error. Repeat a zillion times. It turns out I had never set up the toolchain correctly in the first place. Months later, I finally found a training that would work, or so I thought. This turned out to be my biggest disappointment. Nothing in the training was up to date. Nothing worked the way it was supposed to. This was indeed the final time I’d try to touch firmware again. I was stuck. I was done.

The Golioth Way

Back in present day, as I was realizing the demand for the 2-in-1 hardware and firmware engineering consultant, a well timed post from Chris Gammell appeared on my feed. The post was about the Golioth Developer Training, targeted explicitly at hardware engineers. The 2-in-1 engineer could be me. Was I going to try firmware again? Absolutely.

The training took place entirely remotely with hardware engineers from all over the world. This is going to be it, for sure! But, from the very beginning, I was already struggling. “Here we go again”, I thought. I shouldn’t have thought I could do it. I struggled with minor details at first, like appending -d instead of -D to a build command, which apparently gives wildly different results. I copied and pasted the correct information but in the wrong fields. And each exercise took me way longer than the “estimated time” suggested for each section.

The training happened in a large group, but we did the exercises independently. Chris and Mike were checking in on each of us throughout, so they were able to help me fix anything I missed very quickly on a one-to-one basis. As the training went on, I started to notice that I needed their help less and less. What is this I feel? Some cautious optimism, perhaps?

The training method was active. I wasn’t sitting at my desk blindly following instructions. No, I was pushed to figure out the correct way to solve a problem at every stage. It was presented in bite-size information with many collapsed sections. The training also included links referencing concepts, e.g. Zephyr Pin Control, throughout, should I want to learn more. For the most part, I didn’t click those because I wanted to stay on track. My head was already full of too much firmware to learn. But I enjoyed that they were there so I could refer back to them in the future.

Then the happiest of accidents happened. I picked up a later stage of the training on my own, then realized that I had to redo the earlier parts to get to where I wanted to pick up from. Not going to lie; I was slightly annoyed. But while re-doing them, I saw that it wasn’t taking me very long at all. This time, I was well within the time estimates suggested for each exercise, if not much quicker. Was this working for me? Did I finally find a method to learn commercially viable firmware that works?

A light bulb went off when I realized that it took me over a week to get to Blinky on my previous firmware training, but within a couple of hours, I was already speaking to my Wi-Fi device through the Golioth Console. Not only that, but it was all through the Zephyr RTOS with Golioth. The same Zephyr RTOS I struggled endlessly with on the nRF5 hardware that I love. It was a miracle to think about being enabled for firmware on my favorite hardware platform. Coupled with the fact that Golioth is entirely scalable, i.e. the idea that I can start from an idea to production on the same platform, made Golioth a dream.

Golioth also helped me develop a community, some of whom I became close Discord friends with. We all need a Discord buddy to complain to about daily life as an electronics engineering consultant (Hi Seth!). And a decade after I picked up my first Arduino, Golioth Developer Training helped me finally break through that commercially viable firmware ceiling.

Next, I’ll work on reference designs and put the skills I learnt from the training to the test. Watch this space.

Editor note: If you’re interested in taking part as an individual or as a company, sign up for future training here.

Golioth now supports Infineon parts via the ModusToolbox™. We added Golioth device management to the ecosystem a few weeks ago and the example code is available right now to run on Infineon’s line of microcontrollers talking to their Wi-Fi parts.

ModusToolbox™ (MTB) is a software support tool from Infineon Technologies. It includes partner SDKs alongside the company’s officially supported IDEs, drivers, and examples. You can pull in the Golioth example and all dependencies using the Eclipse IDE that is included in MTB, or via the command line tool.

Infineon’s PSoC™ 6 chips are feature rich 32-bit Arm microcontrollers. Paired with the Infineon 4343W, it’s a perfect platform for IoT device builders, and exactly the kind of constrained device that Golioth was built for.

Take advantage of Over-the-Air (OTA) firmware updates, time-series databases, state data management, remote logging, plus the command/control features like remote procedure call (RPC) and the device settings service. The API calls for each of these are demonstrated and well-commented in the golioth_main.c file.

Try the Golioth example using ModusToolbox™

PSoC 6 Wi-Fi BT Prototyping Kit (CY8CPROTO-062-4343W)

We run the Golioth example on the PSoC™ 6Wi-Fi BT Prototyping Kit (CY8CPROTO-062-4343W). Here’s how to try it for yourself:

1. Install Infineon’s ModusToolbox™

Begin by downloading and installing ModusToolbox™ for your system. Then run modustoolbox-eclipsewhich is located in the ide_3.0/eclipse/ subfolder.

2. Create a Golioth Example project

With the Eclipse IDE open, click on File→New→ModsToolbox Application. This will launch the project creator window.

Select the CY8CPROTO-062-4343W from the list of PSoC™ 6 boards and then click next.

Choose the Golioth Example from the Wi-Fi list and click on the Create button. This will take a couple of minutes to clone the Golioth code and all dependencies.

3. Compile and Install MCUboot

Golioth uses MCUboot as the secure bootloader for our Over-the-Air updates. Before flashing the app to the board, we need to compile and install MCUboot. I did this using the IDE’s built-in terminal.

First, we need to install the MCUboot dependencies.

cd ~/mtw/mtb_shared/mcuboot/v1.8.1-cypress/scripts/
python -m pip install -r requirements.txt

Now we can compile and flash MCUboot. Remember to plug a USB cable into the KITPROG3 connector on your PSoC™ 6 devboard before running the program command:

cd ~/mtw/Golioth_Example/bootloader_cm0p/
make build_proj -j8
make program_proj

4. Compile and flash the Golioth App

Before compiling the Golioth App we need to give it credentials to connect to Wi-Fi and also to authenticate with the Golioth server. These are set in the ~/mtw/Golioth_Example/golioth_app/source/golioth_main.hfile.

Use the Wi-Fi credentials for your local access point. Get device credentials from the Golioth Console. If you’ve haven’t yet set up an account, check out our Quickstart. (With Golioth’s Dev Tier your first 50 devices are free.)

Once you’ve saved your changes to the golioth_main.h file, use the terminal to compile and flash the app to your PSoC™ 6 board:

cd ~/mtw/Golioth_Example/golioth_app
make build_proj -j8
make program_proj

Taking the Golioth App for a test drive

[INF] MCUBoot Bootloader Started
[INF] External Memory initialized w/ SFDP.
[INF] boot_swap_type_multi: Primary image: magic=unset, swap_type=0x1, copy_don3
[INF] boot_swap_type_multi: Secondary image: magic=unset, swap_type=0x1, copy_d3
[INF] Swap type: none
[INF] User Application validated successfully
[INF] Starting User Application (wait)...
[INF] Start slot Address: 0x10018400
[INF] MCUBoot Bootloader finished
[INF] Deinitializing hardware...
External Memory initialized w/ SFDP.
=========================================================
[GoliothApp] Version: 1.0.0, CPU: CM4

=========================================================
[GoliothApp] Watchdog timer started by the bootloader is now turned off to mark.
[GoliothApp] User LED toggles at 1000 msec interval

WLAN MAC Address : 74:7A:90:D4:5F:04
WLAN Firmware    : wl0: Jul 18 2021 19:15:39 version 7.45.98.120 (56df937 CY) FWID 01-69db62cf
WLAN CLM         : API: 12.2 Data: 9.10.39 Compiler: 1.29.4 ClmImport: 1.36.3 Creation: 2021-07-18 19:03:20 
WHD VERSION      : v2.5.0 : v2.5.0 : GCC 10.3 : 2022-09-23 13:14:02 +0800
Wi-Fi Connection Manager initialized.
Successfully connected to Wi-Fi network 'MyWiFiAP'.
IP Address Assigned: 192.168.1.153
Secure Sockets initialized
I (5515) golioth_main: Waiting to Golioth to connect...
I (5638) golioth_coap_client: Start CoAP session with host: coaps://coap.golioth.io
I (5643) libcoap: Setting PSK key

I (5649) golioth_coap_client: Entering CoAP I/O loop
I (5886) golioth_main: Golioth client connected
I (5996) golioth_fw_update: Current firmware version: 1.0.0
I (6060) golioth_fw_update: Waiting to receive OTA manifest
I (6258) golioth_fw_update: Received OTA manifest
I (6258) golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
I (6260) golioth_fw_update: Waiting to receive OTA manifest
I (6322) golioth_main: Synchronously got my_int = 42
I (6323) golioth_main: Entering endless loop
I (9006) golioth_main: Callback got my_int = 42
I (9286) golioth_main: Setting loop delay to 5 s

By monitoring the serial output from the device (that’s /dev/ttyACM0 on my system) we can see all the parts of the app at work. The board powers up and reports the firmware version before connecting to Wi-Fi. Once a Golioth connection is established it checks for firmware updates before it starts writing data to the cloud.

You can view device logs remotely through the Golioth console. Here we see “Sending hello!” messages arriving along with a counter.

Viewing state data for this device, the counter variable is update at the same rate as the hello messages. In the Device Settings on the left sidebar of the console you can add LOOP_DELAY_S and remotely control how how many seconds the device pauses for between sending back these message. It’s a perfect way to control sensor reading frequency across your entire fleet.

Adding Golioth to existing Infineon PSoC™ 6 projects

The Golioth example that is part of ModusToolbox™ is a great blueprint for adding device management to your PSoC™ 6 projects. We’d love to hear what you’re build, and if you need some help we’re here to lend a hand. Show off your successes and post questions on the Golioth forum. Feel free to reach out to the Golioth Developer Relations team to set up a demo or discuss the needs of your IoT fleet.

This is the second part of How Golioth uses Hardware-in-the-Loop (HIL) Testing. In the first part, I covered what HIL testing is, why we use it at Golioth, and provided a high-level view of the system.

In this post, I’ll talk about each step we took to set up our HIL. It’s my belief that HIL testing should be a part of any firmware testing strategy, and my goal with this post is to show you that it’s not so difficult to get a minimum viable HIL set up within one week.

The Big Pieces

To recap, this is the system block diagram presented at the end of part 1:

The Raspberry Pi acts as a GitHub Actions Self-Hosted Runner which communicates with microcontroller devices via USB-to-serial adapters.

The major software pieces required are:

  • Firmware with Built-In-Test Capabilities – Runs tests on the device (e.g. ESP32-S3-DevKitC), and reports pass/fail for each test over serial.
  • Python Test Runner Script – Initiates tests and checks serial output from device. Runs on the self-hosted runner (e.g. Raspberry Pi).
  • GitHub Actions Workflow – Initiates the python test runner and reports results to GitHub. Runs on the self-hosted runner.

Firmware Built-In-Test

To test our SDK, we use a special build of firmware which exposes commands over a shell/CLI interface. There is a command named built_in_test which runs a suite of tests in firmware:

esp32> built_in_test
...
12 Tests 0 Failures 0 Ignored 
../main/app_main.c:371:test_connects_to_wifi:PASS
../main/app_main.c:378:test_golioth_client_create:PASS
../main/app_main.c:379:test_connects_to_golioth:PASS
../main/app_main.c:381:test_lightdb_set_get_sync:PASS
../main/app_main.c:382:test_lightdb_set_get_async:PASS
../main/app_main.c:383:test_lightdb_observation:PASS
../main/app_main.c:384:test_golioth_client_heap_usage:PASS
../main/app_main.c:385:test_request_dropped_if_client_not_running:PASS
../main/app_main.c:386:test_lightdb_error_if_path_not_found:PASS
../main/app_main.c:387:test_request_timeout_if_packets_dropped:PASS
../main/app_main.c:388:test_client_task_stack_min_remaining:PASS
../main/app_main.c:389:test_client_destroy_and_no_memory_leaks:PASS

These tests cover major functionality of the SDK, including connecting to Golioth servers, and sending/receiving CoAP messages. There are also a couple of tests to verify that stack and heap usage are within allowed limits, and that there are no detected memory leaks.

Here’s an example of one of the tests that runs on the device, test_connects_to_golioth:

#include "unity.h"
#include "golioth.h"

static golioth_client_t _client;
static SemaphoreHandle_t _connected_sem;

static void on_client_event(golioth_client_t client, 
                            golioth_client_event_t event, 
                            void* arg) {
    if (event == GOLIOTH_CLIENT_EVENT_CONNECTED) {
        xSemaphoreGive(_connected_sem);
    }
}

static void test_connects_to_golioth(void) {
    TEST_ASSERT_NOT_NULL(_client);
    TEST_ASSERT_EQUAL(GOLIOTH_OK, golioth_client_start(_client));
    TEST_ASSERT_EQUAL(pdTRUE, xSemaphoreTake(_connected_sem, 10000 / portTICK_PERIOD_MS));
}

This test attempts to connect to Golioth servers. If it connects, a semaphore is given which the test waits on for up to 10 seconds (or else the test fails). For such a simple test, it covers a large portion of our SDK, including the network stack (WiFi/IP/UDP), security (DTLS), and CoAP Rx/Tx.

We use the Unity test framework for basic test execution and assertion macros.

If you’re curious about the details, you can see the full test code here: https://github.com/golioth/golioth-firmware-sdk/blob/main/examples/esp_idf/test/main/app_main.c

Python Test Runner

Another key piece of software is a Python script that runs on the self-hosted runner and communicates with the device over serial. We call this script verify.py. In this script, we provision the device with credentials, issue the built_in_test command, and verify the serial output to make sure all tests pass.

Here’s the main function of the script:

def main():
    port = sys.argv[1]

    # Connect to the device over serial and use the shell CLI to interact and run tests
    ser = serial.Serial(port, 115200, timeout=1)

    # Set WiFi and Golioth credentials over device shell CLI
    set_credentials(ser)
    reset(ser)

    # Run built in tests on the device and check output
    num_test_failures = run_built_in_tests(ser)
    reset(ser)

    if num_test_failures == 0:
        run_ota_test(ser)

    sys.exit(num_test_failures)

This opens the serial port and calls various helper functions to set credentials, reset the board, run built-in-tests, and run an OTA test. The exit code is the number of failed tests, so 0 means all passed.

The script spends most of its time in this helper function which looks for a string in a line of serial text:

def wait_for_str_in_line(ser, str, timeout_s=20, log=True):
    start_time = time()
    while True:
        line = ser.readline().decode('utf-8', errors='replace').replace("\r\n", "")
        if line != "" and log:
            print(line)
        if "CPU halted" in line:
            raise RuntimeError(line)
        if time() - start_time > timeout_s:
            raise RuntimeError('Timeout')
        if str in line:
            return line

The script will continually read lines from serial for up to timeout_s seconds until it finds what it’s looking for. And if “CPU halted” ever appears in the output, that means the board crashed, so a runtime error is raised.

This verify.py script is what gets invoked by the GitHub Actions Workflow file.

GitHub Actions Workflow

The workflow file is a YAML file that defines what will be executed by GitHub Actions. If one of the steps in the workflow returns a non-zero exit code, then the workflow will fail, which results in a red ❌ in CI.

Our workflow is divided into two jobs:

  • Build the firmware and upload build artifacts. This is executed on cloud servers, since building the firmware on a Raspberry Pi would take a prohibitively long time (adding precious minutes to CI time).
  • Download firmware build, flash the device, and run verify.py. This is executed on the self-hosted runner (my Raspberry Pi).

The workflow file is created in .github/workflows/test_esp32s3.yml. Here are the important parts of the file:

name: Test on Hardware

on:
  push:
    branches: [ main ]
  pull_request:

jobs:
  build_for_hw_test:
    runs-on: ubuntu-latest
    steps:
    ...
    - name: Build test project
      uses: espressif/esp-idf-ci-action@v1
      with:
        esp_idf_version: v4.4.1
        target: esp32s3
        path: 'examples/test'
    ...

  hw_flash_and_test:
    needs: build_for_hw_test
    runs-on: [self-hosted, has_esp32s3]
    ...
    - name: Flash and Verify Serial Output
      run: |
        cd examples/test
        python flash.py $CI_ESP32_PORT && python verify.py $CI_ESP32_PORT

The first job runs in a normal GitHub runner based on Ubuntu. For building the test firmware, Espressif provides a handy CI Action to do the heavy lifting.

The second job has a needs: build_for_hw_test requirement, meaning it must wait until the first job completes (i.e. firmware is built) before it can run.

The second job has two further requirements – it must run on self-hosted and there must be a label has_esp32s3. GitHub will dispatch the job to the first available self-hosted runner that meets the requirements.

If the workflow completes all jobs, and all steps return non-zero exit code, then everything has passed, and you get the green CI checkmark ✅.

Self-Hosted Runner Setup

With all of the major software pieces in place (FW test framework, Python runner script, and Actions workflow), the next step is to set up a machine as a self-hosted runner. Any machine capable of running one of the big 3 operating systems will do (Linux, Windows, MacOS).

It helps if the machine is always on and connected to the Internet, though that is not a requirement. GitHub will simply not dispatch jobs to any self-hosted runners that are not online.

Install System Dependencies

There are a few system dependencies that need to be installed first. On my Raspberry Pi, I had to install these:

sudo apt install \
    python3 python3-pip python3-dev python-dev \
    libffi-dev cmake direnv wget gcovr clang-format \
    libssl-dev git
pip3 install pyserial

Next, log into GitHub and create a new self-hosted runner for your repo (Settings -> Actions -> Runners -> New self-hosted runner):

When you add a new self-hosted runner, GitHub will provide some very simple instructions to install the Actions service on your machine:

When I ran the config.sh line, I added --name and --unattended arguments:

./config.sh --url <https://github.com/golioth/golioth-firmware-sdk> --token XXXXXXXXXXXXXXXXXXXXXXXXXXXXX --name nicks-raspberry-pi --unattended

Next, install the service, which ensures GitHub can communicate with the self-hosted runner even if it reboots:

cd actions-runner
sudo ./svc.sh install

If there are any special environment variables needed when executing workflows, these can be added to runsvc.sh. For instance, I defined a variable for the serial port of the ESP32:

# insert anything to setup env when running as a service
export CI_ESP32_PORT=/dev/ttyUSB0

Finally, you can start the service:

sudo ./svc.sh start

And now you should see your self-hosted runner as “Idle” in GitHub Actions, meaning it is ready to accept jobs:

I added two custom labels to my runner: has_esp32s3 and has_nrf52840dk, to indicate that I have those two dev boards attached to my self-hosted runner. The workflow file uses these labels so that jobs get dispatched to self-hosted runners only if they have the required board.

And that’s it! With those steps, a basic HIL is up and running, integrated into CI, and covering a large amount of firmware code on real hardware.

Security

There’s an important note from GitHub regarding security of self-hosted runners:

Warning: We recommend that you only use self-hosted runners with private repositories. This is because forks of your repository can potentially run dangerous code on your self-hosted runner machine by creating a pull request that executes the code in a workflow.

I also stand by that recommendation. But it’s not always feasible if you’re maintaining an open-source project. Our SDKs are public, so this warning certainly applies to us.

If you’re not careful, you could give everyone on the Internet root access to your machine! It wouldn’t be very hard for someone to submit a pull request that modifies the workflow file to run whatever arbitrary Linux commands they want.

We mitigate most of the security risks with three actions.

1. Require approval to run workflows for all outside contributors.

This means someone from the Golioth organization must click a button to run the workflow. This allows the repo maintainer time to review any workflow changes before allowing the workflow to run.

This option can be enabled in Settings -> Actions -> General:

Once enabled, pull requests will have a new “Approve and Run” button:

This is not ideal if you’re running a large open-source project with lots of outside contributors, but it is a reasonable compromise to make during early stages of a project.

2. Avoid logging sensitive information

The log output of any GitHub Actions runner is visible publicly. If there is any sensitive information printed out in the workflow steps or from the serial output of the device, then this information could easily fall into the wrong hands.

For our purposes, we disabled logging WiFi and Golioth credentials.

3. Utilize GitHub Actions Secrets

You can use GitHub Actions Secrets to store sensitive information that is required by the runner. In our repo, we define a secret named `GOLIOTH_API_KEY` that allows us access to the Golioth API, which is used to deploy OTA images of built firmware during the workflow run.

If you want to know more about self-hosted runner security, these resources were helpful to me:

Tips and Tricks

Here are a few extra ideas to utilize the full power of GitHub Actions Self-Hosted Runners (SHRs):

  • Use the same self-hosted runner in multiple repos. You can define SHRs at the organization level and allow their use in more than one repo. We do this with our SHRs so they can be used to run tests in the Golioth Zephyr SDK repo and the Golioth ESP-IDF SDK repo.
  • Use labels to control workflow dispatch. Custom labels can be applied to a runner to control when and how a workflow gets dispatched. As mentioned previously, we use these to enforce that the SHR has the dev board required by the test. We have a label for each dev board, and if the admin of the SHR machine has those dev boards attached via USB, then they’d add a label for each one. This means that machine will only receive workflows that are compatible with that machine. If a board is acting up or requires maintenance, the admin can simply disconnect the board and remove the label to prevent further workflows from executing.
  • Distributed self-hosted runners to load balance and minimize CI bottlenecks. If there is only one self-hosted runner, it can become a bottleneck if there are lots of jobs to run. Also, if that runner loses Internet, then CI is blocked indefinitely. To balance the load and create more robust HIL infrastructure, it’s recommended to utilize multiple self-hosted runners. Convince your co-workers to take their unused Intel NUC off the shelf and follow the steps in the “Self-Hosted Runner Setup” section to add another runner to the pool. This allows for HIL tests to run in a distributed way, no matter where in the world you are.

Even people who have been living under a rock for the past couple of years know that there’s a global chip shortage. The correct response from the engineering community should be changes to our design philosophy and that’s the topic of the talk that Chris Gammell and I gave at the 2022 Zephyr Developer Summit back in June. With the right hardware and firmware approach, it’s possible to design truly modular hardware to respond to supply chain woes.

Standardization enabled the industrial revolution, and the electronics field is a direct descendant of those principles. You can rest easy in knowing that myriad parts exist to match your chosen operating voltage and communication scheme. But generally speaking it’s still quite painful to change microcontrollers and other key-components mid-way through product design.

Today’s hardware landscape has uprooted the old ways of doing things. You can’t wait out a 52-week lead time, so plan for the worst and hope for the best. Our talk covers the design philosophies we’ve adopted to make hardware “swapability” possible, while using Zephyr RTOS to manage the firmware side of things.

Meet the Golioth Aludel

Golioth exists to make IoT development straightforward and scalable–that’s not possible if you’re locked into specific hardware, especially these days. So we designed and built a modular platform to showcase this flexibility to our customers.

Golioth Aludel prototyping platform internal view

Called Aludel, Chris Gammell centered the design around a readily available industrial enclosure. The core concept is a PCB base that accepts any microcontroller board that uses the Feather standard. This way, every pin-location and function is standardized. Alongside the Feather you’ll find two headers that use the Click boards standard (PDF), a footprint for expansions boards facilitating i2c, SPI, UART, and more. The base includes Qwiic and Stemma headers for additional sensor connectivity, as well as terminal blocks, room for a battery, and provisions for external power.

The key customization element for Aludel is the faceplate. It’s a circuit board that can deliver a beautiful custom design to match any customer request. But on the inside, each faceplate has the same interface using a flat cable connection to the base board. Current versions of the faceplate feature a a screen, an EEPROM to identify the board, and a port expander that drives buttons, LEDs, and whatever else you need for the hardware UI.

The beauty of this is that electrically, it all just works. Need an nRF52, nRF9160, ESP32, STM32, or RP2040? They all already exist in Feather form factor. Need to change the type of temperature sensor you’re using? Want to add RS485, CANbus, MODBUS, 4-20 ma communication, or some other connectivity protocol? The hardware is ready for it–at most you might need to spin your own adapter board.

Now skeptics may look at the size of this with one abnormally high-arched eyebrow. But remember, this is a proof-of-concept platform. You can set it up for your prototyping, then take the block elements and boil them down into your final design. Prematurely optimizing for space means you will have many many more board runs as the parts change.

When you do a production run and your chip is no longer available, the same concepts will quickly allow you to test replacements and respin your production PCB to accommodate the changes.

Zephyr Alleviates your Firmware Headaches

“But wait!” you cry, “won’t someone think of the firmware?”. Indeed, someone has thought of the firmware. The Linux Foundation shepherds an amazing Real-Time Operating System called Zephyr RTOS that focuses on interoperability across a vast array of processor architectures and families. Even better, it handles the networking stack and has a standardized device model that makes it possible to switch peripherals (ie: sensors) without your C code even noticing.

We use Zephyr to maintain one firmware repository for the Aludel hardware that can be compiled for STM32F40, nRF52840, nRF9160, and ESP32 using your choice of Ethernet, WiFi, or Cellular connections. How is that even possible?

&i2c0 {
	bme280@76 {
		compatible = "bosch,bme280";
		reg = <0x76>;
		label = "BME280_I2C";
	};
};

&spi1 {
	compatible = "nordic,nrf-spi";
	status = "okay";
	cs-gpios = <&gpio0 3 GPIO_ACTIVE_LOW>,
	           <&gpio0 27 GPIO_ACTIVE_LOW>,
	           <&gpio1 8 GPIO_ACTIVE_LOW>;
	test_spi_w5500: w5500@0 {
		compatible = "wiznet,w5500";
		label = "w5500";
		reg = <0x0>;
		spi-max-frequency = <10000000>;
		int-gpios = <&gpio0 2 GPIO_ACTIVE_LOW>;
		reset-gpios = <&gpio0 30 GPIO_ACTIVE_LOW>;
	};
};

/ {
    aliases {
        aludeli2c = &i2c0;
        pca9539int = &interrupt_pin0;
    };
    gpio_keys {
        compatible = "gpio-keys";
        interrupt_pin0: int_pin_0 {
			gpios = < &gpio0 26 GPIO_ACTIVE_LOW >;
			label = "Port Expander Interrupt Pin";
		};
	};
};

Zephyr uses the DeviceTree standard to map how all of the hardware is connected. A vast amount of work has already been done for you by the manufacturers who maintain drivers for their chips and supply .dts (DeviceTree Syntax) files that specify pin functions and mappings. When writing firmware for your projects you supply DeviceTree overlay files that indicate sensors, buttons, LEDs, and anything else connected to your design. The C code looks up each device to receive all relevant information like device address, GPIO port/pin assignment and function.

const struct device *weather_dev = DEVICE_DT_GET_ANY(bosch_bme280);
sensor_sample_fetch(weather_dev);
sensor_channel_get(weather_dev, SENSOR_CHAN_AMBIENT_TEMP, &temp);
sensor_channel_get(weather_dev, SENSOR_CHAN_PRESS, &press);
sensor_channel_get(weather_dev, SENSOR_CHAN_HUMIDITY, &humidity);

The hardware abstraction doesn’t stop there. Zephyr also specifies abstraction for sensors. For instance, an IMU will have an X, Y, and Z output — Zephyr makes accessing these the same even if you have to move to an IMU from a different manufacturer due to parts shortages. You update the overlay files for the new build, and use a configuration file to turn on any specific libraries for the change, but the code you’re maintaining can continue on as if nothing happened.

Of course there are caveats which we cover in the talk. There is no one-size-fits-all in embedded engineering so you will eventually need to define your own .dts files for custom boards, and add peripherals that are unsupported in Zephyr. This is not a deal breaker, there is a template example for adding boards (and I highly recommend watching Jared Wolff’s video on the subject). And since Zephyr is open source, your customizations can be contributed upstream so others may also benefit.

Hardware will never again be the same

Electronics manufacturing will eventually emerge from the current shortages. But there are no signs of this happening soon, and it’s been ongoing for two years. We shouldn’t be trying to return to “normal”, but planning a better future. That’s what Golioth is doing for the Internet of Things. Choosing an IoT management platform shouldn’t require you to commit to one type of hardware. Your fleets should evolve, and our talk outlines how they can evolve. Golioth helps to manage your diverse and growing hardware fleet. Golioth is designed to make it easy to keep track of and control all of that hardware, whether that’s 100 devices or 100,000. Take Golioth for a test drive today.

Today we’re announcing a new feature on the Golioth Console and on our Device SDKs that enables Remote Procedure Calls (RPCs) for all users on the platform. From the cloud, you can initiate a function on your constrained device in the field, ensure the device received and executed the command, and receive a response from the device back to the Cloud.

What is a Remote Procedure Call (RPC)?

A Remote Procedure Call allows you to call a function on a remote computing device and optionally receive a result. An easy way to think about it is you’re calling a function, like you would in any other program…you’re just doing it from another computer. In this case, you’re triggering actions from the Golioth Cloud.

RPC from the Golioth Cloud (Console)

Each device in your project has a page where you can view details about things like LightDB State, LightDB Stream, Settings, and now RPC. Our Console includes an interface to directly send RPCs to the remote device. The URL will look something like:

https://console.golioth.io/devices/<YOUR_DEVICE_ID>/management/rpc

In all of our examples, we are sending an RPC to single devices. However, they can also be triggered from the REST API. As a reminder, any function you see on the Golioth Console is available on the REST API.

One critical function of RPCs is a confirmation that the remote function has actually run. The device firmware needs to send back a success message that the function has completed, and optionally a returned value. When there is a problem connecting with your device and the RPC does not complete successfully, you will see a screen that looks like this:

An RPC sent to a device that was disconnected from Wi-Fi

Also note that round trip time is measured for all RPCs, including successful ones. Transit times will depend upon your connectivity medium, in addition to the processing time of the function on the remote device.

When an RPC successfully completes, you can click the button with 3 dots to receive the returned value. In the example and in the video, we were using a method called double that takes an integer input, multiplies it by two, and then returns the value to the Cloud. Below, you can see the result when we sent “double” method with a parameter of “37”.

RPC from the Device SDK perspective

Any new feature on Golioth has Device SDK support, in addition to the new APIs and UIs on our Web Console. Earlier this week, Nick wrote about how we test hardware and firmware at Golioth, especially when a new feature is released across the platform. Now that we support 3 SDKs (Zephyr, NCS, ESP-IDF), the testing area has increased.

In the video, the focus is on ESP-IDF, which has a simple way to set up and respond to new RPCs. First, we register the new method, so we’ll recognize the command coming from the Golioth Cloud:

The function that we tie to that newly registered RPC needs to return the RPC_OK variable for the Cloud to be alerted that the function has processed properly.

If you have satisfied these requirements in the ESP-IDF SDK and copied the format, you can customize logic to do whatever task you’d like on the remote device. Let’s look at some examples.

Use cases for an RPC

The double() example is a simple showcase of the minimum requirements to create an RPC in the ESP-IDF SDK. We send a command and a value, we return a modified value.

The remote_reset command we created and showcased in the video is more like a critical function you would want to add to your project. When you want to trigger a remote function like a reset, you want to ensure that the command was properly received, that the function executed, and then that there was output data that validated the reset has happened. In the final point, that includes inferring that the device has restarted from the log messages also being sent back to the Golioth Console. Put all together, it’s a reliable way to tell the device has been reset.

Other use cases could be as simple as sending arbitrary text to a display. You would still want to know the text has been received and properly sent to the physical display. Or perhaps you have a valve and you want to be able to send an arbitrary value to the valve, but you also want to take a reading on an encoder that measures the distance the valve has moved.

Many of these functions could also be achieved with LightDB State (which the RPC service is built upon), but the context for creating an RPC is more targeted at situations like the examples above.

What will you build?

RPCs are another way for you to communicate with your constrained IoT devices from the Golioth Cloud and to get useful information back from your devices. You can start testing out this feature today.

For more questions or assistance, check out our Forums, our Discord, or drop us a note at [email protected].

 

Here’s a scenario that might sound familiar.

You’ve added a new feature to the firmware, about 800 lines of code, along with some simple unit tests. All unit tests are passing. The code coverage report is higher than ever. You pat yourself on the back for a job well done.

But then – for a brief moment, conviction wavers as a voice in your head whispers: “The mocks aren’t real”; “Does this code really work?”; “You forgot to test OTA.”

But the moment passes, and your faith is restored as you admire the elegant green CI check mark ✅.

Do you trust your unit tests enough to deploy the firmware right now to thousands of real devices via Over-the-Air (OTA) firmware update?

If the answer is “well, no, not really, we still need to test a few things with the real hardware”, then automated Hardware-in-the-Loop (HIL) testing might give you the confidence to upgrade that answer to “heck yeah, let’s ship it!”. 🚀

In this post, I’ll explain what HIL testing is, and why we use it at Golioth to continuously verify the firmware for our Zephyr and ESP-IDF SDKs.

HIL Testing: A Definition

If you search online, you’ll find a lot of definitions of what HIL testing is, but I’ll try to formulate it in my own simple terms.

There are three major types of tests:

  • Unit – the smallest kind of test, often covering just one function in isolation
  • Integration – testing multiple functions or multiple subsystems together
  • System – testing the entire system, fully integrated

A firmware test can run in three places:

  • On your host machine – Running a binary compiled for your system
  • On emulated target hardware – Something like QEMU or Renode
  • On real target hardware – Running cross-compiled code for the target, like an ARM or RISC-V processor.

A HIL test is a system test that runs on real target hardware, with real peripherals, and real connections. Typically, a HIL setup will consist of:

  • A board running your firmware (the code under test)
  • A host test machine, executing the tests
  • (optional) Other peripherals and test equipment to simulate real-world conditions, controlled by the host test machine

Coverage and Confidence

Why bother with HIL tests? Aren’t unit tests good enough?

To address this, I’ll introduce two terms:

  • Coverage: How much of the firmware is covered by tests?
  • Confidence: How confident are you that the firmware behaves as intended?

Coverage is easily measured by utilizing open-source tools like gcov. It’s an essential part of any test strategy. Confidence is not as easily measured. It’s often a gut feeling.

It’s possible to have 100% coverage and 0% confidence. You can write tests that technically execute every line of code and branch combination without testing any intended functionality. Similarly, you can have 0% coverage and 100% confidence, but if you encounter someone like this, they are, to put it bluntly, delusional.

Unit tests are great at increasing coverage. The code under test is isolated from the rest of the system, and you can get into all the dark corners. “Mocks” are used to represent adjacent parts of the system, usually in a simplistic way. This might be a small piece of code that simulates the response over a UART. But this means is you can’t be 100% confident the code will work the same way in the context of the full system.

HIL system tests are great at increasing confidence, as they demonstrate that the code is functional in the real world. However, testing things at this level makes it difficult to cover the dark corners of code. The test is limited to the external interfaces of the system (ie. the UART you had a mock for in the above example).

A test strategy involving both unit tests and HIL tests will give you high coverage and high confidence.

Continuously Verifying Golioth SDK Firmware

Why do we use HIL testing at Golioth?

We have the concept of Continuously Verified Boards, which are boards that get first-class support from us. We test these boards on every SDK release across all Golioth services.

Previously, these boards were manually tested each release, but there were several problems we encountered:

  • It was time-consuming
  • It was error-prone
  • It did not scale well as we added boards
  • OTA testing was difficult, with lots of clicking and manual steps involved
  • The SDK release process was painful
  • We were unsure whether there had been regressions between releases

We’ve alleviated all of these problems by introducing HIL tests that automatically run in pull requests and merges to the main branch. These are integrated into GitHub Actions (Continuous Integration, CI), and the HIL tests must pass for new code to merge to main.

Here’s the output of the HIL test running in CI on a commit that merged to main recently in the ESP-IDF SDK:

HIL Test Running in CI

This is just the first part of the test where the ESP32 firmware is being flashed.

Further down in the test, you can see we’re running unit tests on hardware:

Automated test output

And we even have an automated OTA test:

Automated OTA Test Output

As a developer of this firmware, I can tell you that I sleep much better at night knowing that these tests pass on the real hardware. In IoT devices, OTA firmware update is one of the most important features to get right (if not the most important), so testing it automatically has been a big win for us.

HIL on a Budget

Designing a prototype HIL system does not need to be a huge up-front investment.

At Golioth, one engineer was able to build a prototype HIL within one week, due primarily to our simple hardware configuration (each tested board just needs power, serial, and WiFi connection) and GitHub Actions Self-Hosted Runners, which completely solves the infrastructure and DevOps part of the HIL system.

The prototype design looks something like this:

 

I’m using my personal Raspberry Pi as the self-hosted runner. GitHub makes it easy to register any kind of computer (Linux, MacOS, or Windows) as a self-hosted runner.

And here’s a pic of my HIL setup, tucked away in a corner and rarely touched or thought about:

My DIY HIL Setup

Next Time

That’s it for part 1! We’ve covered the “why” of HIL testing at Golioth and a little bit of the “how”.

In the next part, we’ll dive deeper into the steps we took to create the prototype HIL system, the software involved, and some of the lessons learned along the way.

 

Thanks to Michael for the photo of Tiger and Turtle

Golioth just rolled out a new settings service that lets you control your growing fleet of IoT devices. You can specify settings for your entire fleet, and override those global settings by individual device or for multiple devices that share the same blueprint.

Every IoT project needs some type of settings feature, from adjusting log levels and configuring the delay between sensor readings, to adjusting how frequently a cellular connection is used in order to conserve power. With the new settings service, the work is already done for you. A single settings change on the Golioth web console is applied to all devices listening for changes!

As you grow from dozens of devices to hundreds (and beyond), the Golioth settings service makes sure you can change device settings and confirm that those changes were received.

Demonstrating the settings service

Golioth settings service

The settings service is ready for you use right now. We have code samples for the Golioth Zephyr SDK and the Golioth ESP-IDF SDK. Let’s take it for a spin using the Zephyr samples.

I’ve compiled and flashed the Golioth Settings sample for an ESP32 device. It observes a LOOP_DELAY_S endpoint and uses that value to decide how long to wait before sending another “Hello” log message.

Project-wide settings

On the Golioth web console, I use the Device Settings option on the left sidebar to create the key/value pair for this setting. This is available to all devices in the project with firmware that is set up to observe the LOOP_DELAY_S settings endpoint.

Golioth device settings dialog

When viewing device logs, we can see the setting is observed as soon as the device connects to Golioth. The result is that the Hello messages are now issued ten seconds apart.

[00:00:19.930,000] <inf> golioth_system: Client connected!
[00:00:20.340,000] <inf> golioth: Payload
                                  a2 67 76 65 72 73 69 6f  6e 1a 62 e9 99 c4 68 73 |.gversio n.b...hs
                                  65 74 74 69 6e 67 73 a1  6c 4c 4f 4f 50 5f 44 45 |ettings. lLOOP_DE
                                  4c 41 59 5f 53 fb 40 24  00 00 00 00 00 00       |LAY_S.@$ ......  
[00:00:20.341,000] <dbg> golioth_hello: on_setting: Received setting: key = LOOP_DELAY_S, type = 2
[00:00:20.341,000] <inf> golioth_hello: Set loop delay to 10 seconds
[00:00:20.390,000] <inf> golioth_hello: Sending hello! 2
[00:00:30.391,000] <inf> golioth_hello: Sending hello! 3
[00:00:40.393,000] <inf> golioth_hello: Sending hello! 4

Settings by device or by blueprint

Of course, you don’t always want to have the same settings for all devices. Consider debugging a single device. It doesn’t make much sense to turn up the logging level or frequency of sensor reads for all devices. So with Golioth it’s easy to change the setting on just a single device.

settings change for a single device on the Golioth console

In the device view of the Golioth web console there is a settings tab. Here you can see the key, the value, and the level of the value. I have already changed the device-specific value in this screen so the level is being reported as “Device”.

[00:07:30.466,000] <inf> golioth_hello: Sending hello! 45
[00:07:40.468,000] <inf> golioth_hello: Sending hello! 46
[00:07:43.728,000] <inf> golioth: Payload
                                  a2 67 76 65 72 73 69 6f  6e 1a 62 e9 9c 17 68 73 |.gversio n.b...hs
                                  65 74 74 69 6e 67 73 a1  6c 4c 4f 4f 50 5f 44 45 |ettings. lLOOP_DE
                                  4c 41 59 5f 53 fb 40 00  00 00 00 00 00 00       |LAY_S.@. ......  
[00:07:43.729,000] <dbg> golioth_hello: on_setting: Received setting: key = LOOP_DELAY_S, type = 2
[00:07:43.729,000] <inf> golioth_hello: Set loop delay to 2 seconds
[00:07:50.469,000] <inf> golioth_hello: Sending hello! 47
[00:07:52.471,000] <inf> golioth_hello: Sending hello! 48

When I made the change. the device was immediately notified and you can see from the timestamps that it began logging at a two-second cadence as expected.

Golioth settings applied at the blueprint level

It is also possible to change settings for a group of devices that share a common blueprint. Here you will find this setting by selecting Blueprint from the left sidebar and choosing your desired blueprint.

Settings are applied based on specificity. The device-level is the most specific and will be applied first, followed by blueprint-level, and finally project-level. Blueprints may be created and applied at any time, so if you later realize you need a more specific group you can change the blueprint for those devices.

Implementation: The two parts that make up the settings service

Fundamentally, there are two parts that make our device settings system work: the Golioth cloud services running on our servers and your firmware that is running on the devices. The Golioth device SDKs allow you to register a callback function that receives settings values each time a change is made to the settings on the cloud. You choose how the device should react to these settings, like updating a delay value, enabling/disabling features, changing log output levels, really anything you want to do.

Don’t worry if you already have devices in the field. You can add the settings service or make changes to how your device handles those settings, then use the Golioth OTA firmware update system to push out the new behavior.

Take control of your fleet

Scale is the hope for most IoT companies, but it’s also where the pain of IoT happens. You need to know you can control your devices, securely communicate with them, and perform updates as necessary. Golioth has you covered in all of these areas. The new settings service ensures that your ability to change how your fleet is performing doesn’t become outpaced by your growth.

NXP RT1060-EVKB Ethernet development board

I get to work with a lot of different hardware here at Golioth, but recently I’ve found my self gravitating to the NXP i.MX RT1060-EVKB evaluation board as my go-to development hardware. This brand new board is a variation on the RT1060-EVK, which (in addition to the extra ‘B’ in the new part number) adds an m.2 slot. But what catches my eye is the Ethernet. While I’ve previously written about using SPI-connected Ethernet modules, this board builds the connectivity right in. It’s stable, it’s fast, and has wonderful support in Zephyr.

The microcontroller that controls everything is the RT1062 which I first heard about three years back when the Teensy 4.0 was released. A 600 MHz powerhouse, it boasts 1024 kB of SRAM and is packed with peripherals. It’s an embarrassment of riches for my day-to-day firmware work, but makes sure I’ll never hit my head on the proverbial “development ceiling”.

Of course it’s not just the system clock that I’m happy about. This board has Ethernet built-in, so I’m not waiting for a WiFi connection (or really waiting for a cell connection) at boot. And it’s programmed via J-Link for a fast flash process and the RTT and debugging features that this programmer brings to the party.

Let’s take a look at how I get this board talking to Golioth, and what we might see in the near future.

An i.MX RT1060-EVKB demo

NXP RT1060-evkb webinar demo

Demonstrating Golioth LightDB Stream for temperature/pressure/humidity sensor data

A couple of weeks ago I did a demo of the i.MX RT1060-EVKB board as part of NXP’s Zephyr webinar series. It gives you a great overview of the board and the ecosystem. I was even able to do a live demo of all the Golioth features like data management, command/control, remote logging, and OTA firmware updates. (Requires a free NXP registration to see the video.)

This board already has great support with the Golioth platform. My hope is to go further than that and add this to our list of Continuously Verified Boards. This would mean that we test and verify all official Golioth code samples with this board during every Golioth release. That’s a tall hill to climb, but it would be a snap if we used automated testing through a process called Hardware in the Loop; CI/CD that automatically runs compiled code on the hardware itself. This is already the case for our recently announced Golioth ESP-IDF SDK. Nick Miller is working on a post about how he pulled that off, so keep your eyes on the Golioth Blog over the next couple of weeks!

Get your tools ready

To build for this board you need a few tools, like the HAL library for NXP, LVGL, and the board files themselves.

Adding the NXP HAL and the LVGL library

Zephyr needs a hardware abstraction layer for NXP builds, and this board also depends on having the LVGL library installed. Add these to your west manifest:

  • use the west manifest --path to help you locate which file to edit (modules/lib/golioth/west-zephyr.yml in my case)
  • Add hal_nxp and lvgl to the name-allow list
  • Run west update

With the edits in place, here’s what the relevant section of my west manifest looks like:

manifest:
  projects:
    - name: zephyr
      revision: v3.1.0
      url: https://github.com/zephyrproject-rtos/zephyr
      west-commands: scripts/west-commands.yml
      import:
        name-allowlist:
          - cmsis
          - hal_espressif
          - hal_nordic
          - mbedtls
          - net-tools
          - segger
          - tinycrypt
          - hal_nxp
          - lvgl

Cherry Pick the EVKB files (if needed)

This is a new board and the devicetree files for it were just added to Zephyr on Jun 5th. My Zephyr is still on v3.1.0 so it’s older than the commit that added support for the “evkb” board. I used git’s cherry-pick feature to add those files to my workspace. From the golioth-zephyr-workspace/zephyr directory run:

git pull
git cherry-pick 9a9aeae00b184cac308c1622676bb91c07dd465b

This will pull in just the commit that adds support for the new board.

Thar ‘b dragons

Here’s the thing about making manual changes to the Zephyr repo in your workspace: the west manifest doesn’t know you’ve done this. That means the next time you run west update, this cherry-picked commit will be lost.

Prep the hardware

Last, but not least, I used a J-Link programmer to flash firmware to this board. There’s just a bit of jumper setup necessary to make this all work. NXP has a guide to setting up an external J-Link programmer. I followed that page, using a USB cable on J1 for power (and a serial terminal connection when running apps).

Saying Hello

With that preamble behind me, I’m excited to get to the “Hello” part of things using the standard Golioth Hello sample. Running Golioth samples is fairly easy with this board, with just one puzzle piece that needs to be added for DHCP to acquire an IP address.

Use DHCP to get an IP address

I added a boards/mimxrt1060_evkb.conf file to create a configuration specific to this board. In this case, it selects the Ethernet and DHCP libraries.

CONFIG_NET_L2_ETHERNET=y
CONFIG_NET_DHCPV4=y

At the top of main.c I include the network interface library. Then in the main() function, we need to instantiate an interface and pass it to the DHCP function. This code requests an IP address from the network’s DHCP server.

#include <net/net_if.h>
struct net_if *iface;
LOG_INF("Run dhcpv4 client");
iface = net_if_get_default();
net_dhcpv4_start(iface);

client->on_message = golioth_on_message;
golioth_system_client_start();

Add credentials, then build and flash

The final step is to add your Golioth credentials before building and flashing the app. Add the credentials to the prj.conf file as outlined in the hello sample README. You can get your credentials by creating a new device on the Golioth Console.

CONFIG_GOLIOTH_SYSTEM_CLIENT_PSK_ID="my-psk-id"
CONFIG_GOLIOTH_SYSTEM_CLIENT_PSK="my-psk"

The build calls out the board name to specifically target our board and the newly added configuration. The flash command will automatically find the J-Link programmer and use it.

west build -b mimxrt1060_evkb samples/hello
west flash

In my testing it only took about three seconds to get an IP address. From there, I was up and connected to Golioth after just another two seconds!

Golioth Hello serial terminal output

Golioth Hello example output

Off to the races

There really isn’t much to getting this up and running. Once I worked through this process I was able to test out all of the Golioth samples, including using the terminal shell to provision the board (ie: adding the Golioth credentials) and performing Over-the-Air (OTA) firmware updates.

As I mentioned earlier, the speed and reliability of this dev board keeps drawing me back. It’s not just the initial connection, but when testing out OTA, the 1024 byte block downloads seem to fly right by. I’m unsure if this is the Ethernet or the chip’s ability to quickly write to flash, but if you’re working with a 250 kB firmware file the difference over, say an ESP32, is obvious.

Give it a try using Golioth’s Dev Tier which includes up to 50 devices for your test fleet. The Golioth forum is a great place for questions on the process, and we’d love to hear about what you’re building over on the Golioth Discord server.

Keep your eye on this space. When Zephyr issues its next release we’ll roll forward the Golioth SDK to include the board files for the RT1060-EVKB, and my hope is that we’ll be able to include the DHCP call into our common samples code for a truly out-of-the-box build experience. We might even see a bit of that Hardware-in-the-Loop build automation I hinted at earlier. See you next time!