Luis Ubieda is the Lead Firmware Engineer at Croxel. He has a background in Electrical Engineering and is passionate about Electronics, Embedded Systems, and IoT Technology.

Embedded systems are riddled with complexity, mainly because they are at the intersection of expertise. Problem are often a mixture of software, electrical and mechanical issues. This is the case even for seemingly simple tasks, such as reliably detecting a button-press.

“Wait — a button-press??”

Let’s think for a second, how a button press is detected:

  1. Initial State: a button-press typically consists of a “normally-open switch”, which through a pull-up/down resistor, is normally “high” or normally ‘low’: this is the initial state.
  2. Press Event: then, when the button is pressed, the switch is closed and the signal transitions towards the opposite state (e.g, low for normally “high” state), and it’s sustained for as long as the button is held by the user.
  3. Release Event: Finally, when the button is released, it goes back to its initial state.
diagram showing the busy electrical signal produced at the beginning of a button press

Image: Signal during a button-press/release sequence where the transitions are outlined; and the scoped signal has noise. Source: GeeksforGeeks.

In an ideal world, we could just look at the signal edges to keep track of the transitions and assume a falling edge is a “press-event” and rising edge the “release event”. In our world, these transitions are affected by electrical transients caused by the mechanical properties of the button actuator. The suppression of this noise it’s commonly called “debouncing”.

“Ok, I get it. How can we `debounce` button-presses?”

There are two main ways we can approach this: the hardware-way and the software-way. Today I’ll touch on both, and detail the technique I prefer to use when debouncing button input with the Zephyr RTOS.

The Hardware Way

The hardware-way focuses on the root cause: the electrical noise. It guarantees the digital signal won’t have such noise during transitions; and it does it through the use of low-pass filters (most probably RC-filters). There are some pretty cool articles that detail this approach (see references at the end of the article).

The Software Way

On the other hand, the software-way is about “ignoring” these false positives on the signal transitions to determine which ones are the real events we’re looking for, and which ones aren’t. Even though there are many ways implement software debouncing, there are two main approaches, depending on whether the variable of interest is the signal state or the transitions: periodic sampling and tracking edge interrupts.

A. Periodic Button State Sampling

Button sampling works by periodically acquiring the signal state, which is buffered on a continuously rolling sample-set. Through detection of consecutive states, the signal change event is detected (either pressed or released). The rule is simple: if there are X-number of consecutive samples with a changed state, we assume the transition really happened. This periodic sampling is often in the order of 10 to 25-ms and is commonly paced by a hardware timer to guarantee fixed intervals and to free some CPU usage.

diagram showing a series of 1 and 0 signals sampled to detect a button press

B. Tracking Edge-Interrupts with Minimum Cooldown

This works by coordinating the detection of edge changes with the spacing between these: there must be a minimum duration before legitimate signal transition. This cooldown phase is commonly implemented through a timer, which kicks-off on the edge-interrupts: the timer gets restarted on each transition and only when it expires (after 10-ms to 25-ms of no edge-changes), the firmware handles the transition as an event.

diagram showing the ignored edges of a button press signal

Software Approach B: Tracking Edge-Interrupts with Minimum Cooldown

In multi-threaded systems, we can leverage the use of RTOS primitives to make the code more modular while simplifying the logic to achieve the same purpose: featuring a thread and semaphores to control the state transitions and decide when to notify the user of the module when an event occurred.

Example Code – Zephyr RTOS

The following code presents an example of achieving the software approach B on Zephyr, with the following observations:

  • We’re using Zephyr GPIO interrupt APIs to keep track of the edge-changes.
  • We’re using the system workqueue as the cooldown mechanism for false positives.
  • Our button-detection module, abstracts both of these details, and only notifies the user of the relevant events: pressed or released.
  • Note: the user callback context is the workqueue handler, therefore: actions on this context shall be kept brief to allow proper functioning of other parts of the system.
#ifndef _BUTTON_H_
#define _BUTTON_H_
 
enum button_evt {
   BUTTON_EVT_PRESSED,
   BUTTON_EVT_RELEASED
};
 
typedef void (*button_event_handler_t)(enum button_evt evt);
 
int button_init(button_event_handler_t handler);
 
#endif /* _BUTTON_H_ */
#include <zephyr/kernel.h>
#include <zephyr/drivers/gpio.h>
#include "button.h"
 
#define SW0_NODE    DT_ALIAS(sw0)
 
static const struct gpio_dt_spec button = GPIO_DT_SPEC_GET(SW0_NODE, gpios);
static struct gpio_callback button_cb_data;
 
static button_event_handler_t user_cb;
 
static void cooldown_expired(struct k_work *work)
{
   ARG_UNUSED(work);
 
   int val = gpio_pin_get_dt(&button);
   enum button_evt evt = val ? BUTTON_EVT_PRESSED : BUTTON_EVT_RELEASED;
   if (user_cb) {
       user_cb(evt);
   }
}
 
static K_WORK_DELAYABLE_DEFINE(cooldown_work, cooldown_expired);
 
void button_pressed(const struct device *dev, struct gpio_callback *cb,
           uint32_t pins)
{
   k_work_reschedule(&cooldown_work, K_MSEC(15));
}
 
int button_init(button_event_handler_t handler)
{
   int err = -1;
 
   if (!handler) {
       return -EINVAL;
   }
 
   user_cb = handler;
 
   if (!device_is_ready(button.port)) {
       return -EIO;
   }
 
   err = gpio_pin_configure_dt(&button, GPIO_INPUT);
   if (err) {
       return err;
   }
 
   err = gpio_pin_interrupt_configure_dt(&button, GPIO_INT_EDGE_BOTH);
   if (err) {
       return err;
   }
 
   gpio_init_callback(&button_cb_data, button_pressed, BIT(button.pin));
   err = gpio_add_callback(button.port, &button_cb_data);
   if (err) {
       return err;
   }
 
   return 0;
}

Check out the working sample code on https://github.com/ubieda/zephyr_button_debouncing

Conclusion

The most important part of debouncing inputs is to understand how far you should go and which approach best suits you. Cost sensitivity tends to favor software debouncing, whereas less CPU usage favors offloading it to the hardware approach. Like any engineering problem, there are 1000 ways to solve it: always favor the simplest (yet effective) solution that works for you.

References

The Internet of Things (IoT) can make existing infrastructure more useful and easier to operate, with the added benefit that you don’t need to be on-site to make changes. This is the case with Golioth’s latest reference design: a greenhouse controller that adjusts ventilation and grow lighting based on sensor readings. It also provides manual control from the cloud.

Whether it’s too hot or too cold, tightly monitoring and regulating greenhouse temperature has a huge effect on crop yield and growing time. The same can be said for lighting conditions. At this time of year (winter), consider the poinsettia: it requires intense light during the day, and at least 12 hours of total darkness over night in order to turn a vibrant shade of red. Sounds like a great job for an automated controller.

But think beyond one type of plant and one time of year. The agriculture industry uses automated control to implement different growing conditions based on the cultivar. A cloud-connected controller makes it much easier to update (and keep track of) the growth profiles.

The IoT Greenhouse Controller

An IoT Greenhouse Controller continues to show that simply connecting sensors to the internet is impactful. From one online dashboard you can see how the light, temperature, pressure, and humidity is trending across all of your planthouses. For this reference design we added two mains-rated relays to add control to the equation.

The cellular modem sends sensor data back to Golioth, and monitors the cloud for updates in target temperature and light intensity. A threshold setting for light level automatically controls when the grow lights are turned on or off. The same is true for a temperature threshold that is monitored for control of the ventilation system. Of course both of these relays can be controlled manually.

Let’s take a look at the hardware involved in this Reference Design

Hardware

We’re favoring off-the-shelf hardware for ease of implementation. Most of the modules that we use are simple breakout boards that aren’t much more than a sensor or two, some power handling, and interconnects like cabling. The idea is that someone could take this setup and choose which sensors they want to put onto their custom hardware design that will go out in the field.

IoT Greenhouse controller internals

Golioth Greenhouse Controller reference design internals.
Left to right: light sensor, weather sensor, relays

The full parts list is on our Golioth Projects page, but the key components involved are the Nordic Semiconductor nRF9160 cellular system-in-package (SIP), a BME280 weather sensor, an APDS9960 light intensity sensor, and a set of relays.

The nRF9160 was chosen because it is one of our best supported parts on Golioth. A cellular modem may seem like an odd choice for infrastructure-based controllers, but in combination with the lithium battery you will still be able to monitor greenhouse conditions during a power outage. There is no better peace of mind than being able to answer the question: when the power was out, how cold did my plants get and for how long?

As with many Golioth designs, our wide ranging SDK support means you can retarget the same control code to different hardware in the future. If you want to take the reference design and target a Wi-Fi, Ethernet, or Thread solution, it’s a couple of files configured differently and you have similar functionality with a whole new connectivity medium.

Greenhouse Controller block diagram

Firmware

The firmware for the Greenhouse Controller reference design uses the Golioth Settings Service. This is ideal as it facilitates control of a large fleet of these devices, allowing settings to be adjust for all at once, in groups, and of course down to individual units.

Golioth Settings Service for the Greenhouse Controller reference design

Golioth Settings Service for the Greenhouse Controller reference design

Here you can see the loop delay which indicates how long the device should sleep between sensor readings (in seconds). The light and temperature thresholds control the on/off point of the relays, and finally the auto settings indicate if the relays should be switched automatically based on those thresholds. The controller monitors Golioth’s LightDB state system for manual control commands, which do not interfere when the automated option is enabled.

All of the Golioth reference designs include Over-the-Air (OTA) firmware updates so changes to how the firmware works don’t require an on-site visit. While the current firmware doesn’t implement a schedule-based system, the concept is easy to add and install on the device using OTA.

Cloud Software / Dashboard

The Golioth Zephyr SDK takes care of the cloud connection for all of your devices. When writing firmware, just use the API to set/get/observe your data and Golioth handles the rest:

  • Sensor readings are stored as time-series data on LightDB Stream
  • Device settings are monitored in real-time, with the device reacting to your changes as soon as you make them.
  • The Golioth Console tracks the latest device state, including device health
  • Current firmware version is monitored, with the ability to rollout new OTA updates, and one-click roll-back if you need it
  • Golioth’s convenient REST API delivers easy access to the data for visualization or export to any of your favorite cloud server platforms.

We love using Grafana dashboards to visualize IoT data. The dashboard talks to the Golioth REST API to monitor the IoT sensors and the state of the lighting/ventilation. Of course you could use WebSockets to get live updates as the data arrives at the Golioth servers. For this application, it’s likely that sensor readings are being recorded every few minutes so a dashboard that reloads on its own works well.

Golioth Greenhouse Controller Grafana dashboard

More Golioth Reference Designs

Our reference designs are meant to get you through the initial steps of proving out your IoT-based business. You can buy the readily-available parts used for this Greenhouse controller and with our reference design resources you’ll be on your way to a proof of concept in days instead of weeks. This means you’re fleshing out features and heading toward a hardware prototype with actual performance data. Golioth is designed to scale, so the same connections and features that you use for your first prototype remain in place, with a platform that can handle a number of devices beyond your wildest imagination.

We are busy building out more reference material for you to take and customize for your business needs. We recently launched an Industries section of the Golioth website, which lays out some of the other areas we are targeting and Reference Designs we are building. If any of them interest you, click the “Schedule Demo” button for the one that best matches your needs. You can also drop a note on our Forum or on our Discord if you have ideas of other IoT prototypes you need help with or would like to see us build.

 

As members and contributors to the Zephyr Project, we keep an eye on new developments. A recently published feature of particular interest because it represents a new way to structure programs and messaging between different parts of your program.

ZBus (Zephyr Bus) is a recently merged feature on the Zephyr Project, which brings a standardized version of event driven architecture in the form of a publish and subscribe model inside of your program. The lead author Rodrigo Peixoto spoke with Golioth about the details of this new feature and how it might help Golioth users to make more responsive, flexible programs. In the associated video, Rodrigo walks through the history and capabilities of ZBus.

Why should you consider an event-driven bus architecture?

The decision to take on a new system architecture is not something to do willy nilly. It’s important to understand where an event driven architecture is a good fit.

The first that I think about is scalability. When you have a bus architecture, adding an additional “listener” can be done with much less work.

Consider the alternative to event-driven architecture. When you want to add a new action (say some code that initiates a sensor reading), you need to call that new code from your trigger event (say a timer being finished). That means you need to know where the trigger is located in the code and make changes there to add the call to your new task. Once you add in the required testing, each additional feature can become burdensom. This scales poorly.

With an event-driven system, the trigger is already set up to publish an event. New tasks can be added that look for the event. You don’t need to change any trigger code, you don’t even need to know where that code lives. This performs well as the amount of data increases, which will ultimately depend on how large you think your system will be.

Flexibility is another consideration. An event-driven bus allows the system to be easily adapted to handle new events or changes in the environment. This means that the system can be easily updated and modified without having to completely rewrite large swaths of code. Another type of “flexibility” is how and where you can re-use your code. This makes it easier to develop and test your system, as well as to troubleshoot and fix any problems that may arise.

Finally, if your device needs to meet critical timing, an event-driven system will not only deal with higher levels of complexity, but also respond quickly to new inputs to the system, such as external events. For example, an embedded system might be designed to control a robot, and it would use event-driven architecture to respond to sensor data from the robot’s environment and control its movements accordingly.

How ZBus implements an event-driven architecture

In ZBus, there are “producers” that generate messages and “consumers” that act upon them. There are also “Filters” help to process raw data (such as a sensor output). Each of these are organized into different “channels” to allow listening on a particular lane of data being produced.

Source: Rodrigo’s ZBus presentation (click for link)

Source: Rodrigo’s ZBus presentation (click for link)

Source: Rodrigo’s ZBus presentation (click for link)

Each of these are built into normal scenarios such as listening for timers and taking a reading from a sensor and then alerting other parts of the program that the data is now available. A common scenario is show below:

How will you use ZBus?

Microcontrollers are used in increasingly complex scenarios and are being asked to do more and more. Connecting a low power device to the internet often requires higher levels of complexity that Zephyr helps with. We expect to see more devices using Ecosystems and RTOSes like Zephyr in the future, and implementing ZBus on high complexity devices.

Are you looking at using an event-driven architecture in your system? Let us know on our forum and tell us how we can help!

The most sought-after Golioth feature is OTA, also known as Over-the-Air firmware updates. When you put an IoT device into the field it’s crucial that you be able to push firmware updates to it without human intervention. Golioth makes simplifies the process for your ESP-IDF projects.

Today we’re walking through the OTA process:

  • Build and flash the initial firmware to the device
  • Provision the device with credentials that will be persistent across firmware updates
  • Build a new revision of the firmware
  • Upload the firmware to Golioth and roll it out as a release
  • Observe the device detecting, downloading, and running the new firmware

Prerequisites:

Please ensure that you have installed a copy of the the ESP-IDF v4.4.2 to your computer. Today’s article will use an ESP32 but this will work with other variants like the ESP32s2, ESP32c3, etc.

Clone a copy of the Golioth Firmware SDK (which includes ESP-IDF support). To do, please follow the “Cloning this repo section” in the README.

Commands in this guide are based on a Linux operating system with the ESP-IDF and Golioth Firmware SDK installed in the home directory (~/). However, these are cross-platform tools and are easy to adapt to your system and your preferred install directories.

Build and flash the initial firmware

In a classic Chicken-or-Egg scenario, to perform a Golioth OTA update your device needs to be running firmware built for Golioth OTA. We can use the golioth_basics example which is ready to run without changes.

First, let’s make sure our ESP-IDF is set to the correct version and enabled for this session:

cd ~/esp-idf
git fetch
git checkout v4.4.2
git submodule update --init --recursive ./install.sh all
source export.sh

Now move to the ESP-IDF section of the Golioth Firmware SDK, specifically the golioth_basics example. We’ll build, flash, and run this code on the ESP32:

cd ~/golioth-firmware-sdk/examples/esp_idf/golioth_basics
idf.py build
idf.py flash
idf.py monitor

On some systems you will need to hold the boot button on the ESP32 in order to flash the code. I find to get the monitor command to work I need to first hold the boot button, then press reset when the screen says “waiting for download”. One last tip: CTRL-] is used to exit from the idf.py monitor screen.

Assign device credentials and connect

There are a number of ways to assign credentials to your device (including Bluetooth via your browser!) but perhaps the easiest is to type them into the shell. Head over to the Golioth Console and select your device’s credentials tab. (If you don’t have an account, sign up for the Dev Tier now, your first 50 devices are free.)

Use your the shell window to set the credentials. Here you can see the process, with the four commands for Golioth and WiFi credentials highlighted:

Type 'help' to get the list of commands.
Use UP/DOWN arrows to navigate through command history.
Press TAB when typing command name to auto-complete.
esp32> W (2212) golioth_example: WiFi and golioth credentials are not set
W (2212) golioth_example: Use the shell settings commands to set them, then restart
esp32> 
esp32> settings set golioth/psk-id 20221122201152-esp32@developer-training
Setting golioth/psk-id saved
esp32> settings set golioth/psk 5bfb64ad29dce4e3dd30ab10c5b95a6a
Setting golioth/psk saved
esp32> settings set wifi/ssid MyWifiAp
Setting wifi/ssid saved
esp32> settings set wifi/psk MyWifiPassword
Setting wifi/psk saved
esp32> reset

The final command resets the device so that it will use the new credentials. You should see this output:

I (4235) esp_netif_handlers: sta ip: 192.168.1.159, mask: 255.255.255.0, gw: 192.168.1.1
I (4235) example_wifi: WiFi Connected. Got IP:192.168.1.159
I (4235) example_wifi: Connected to AP SSID: MyWifiAp
I (4255) golioth_mbox: Mbox created, bufsize: 2184, num_items: 20, item_size: 104
I (4255) golioth_basics: Waiting for connection to Golioth...
W (4295) wifi:<ba-add>idx:0 (ifx:0, c6:ff:d4:a8:fa:10), tid:0, ssn:1, winSize:64
I (4395) golioth_coap_client: Start CoAP session with host: coaps://coap.golioth.io
I (4405) libcoap: Setting PSK key

I (4415) golioth_coap_client: Entering CoAP I/O loop
I (4805) golioth_basics: Golioth client connected
I (4805) golioth_basics: Hello, Golioth!
I (4815) golioth_coap_client: Golioth CoAP client connected
I (4815) golioth_fw_update: Current firmware version: 1.2.5
I (5735) golioth_fw_update: Waiting to receive OTA manifest
I (5835) golioth_basics: Synchronously got my_int = 42
I (5845) golioth_basics: Entering endless loop
I (5845) golioth_basics: Sending hello! 0
I (5935) golioth_fw_update: Received OTA manifest
I (5935) golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
I (5945) golioth_fw_update: Waiting to receive OTA manifest
I (6545) golioth_basics: Callback got my_int = 42
W (9805) golioth_coap_client: CoAP message retransmitted
W (10335) golioth_coap_client: 4.00 (req type: 3, path: .c/status), len 59
I (15845) golioth_basics: Sending hello! 1
I (21965) wifi:bcn_timout,ap_probe_send_start
I (25855) golioth_basics: Sending hello! 2
I (35005) wifi:bcn_timout,ap_probe_send_start
I (35855) golioth_basics: Sending hello! 3

First the ESP32 connects to WiFi, then Golioth. After checking (and not finding) a firmware update available, this example begins sending hello messages every few seconds.

Now let’s do an OTA firmware update

We connected to Golioth with the device, now let’s build and upload a new firmware package to test the OTA capabilities. We’ll use the same code, updating the Current Version number which the device uses to identify when an update is needed. We’ll also change the string used in the log messages so it’s easy to recognize that our new firmware is running.

Change the source code version and rebuild

The file we need to update is a common file used by multiple Golioth SDKs. Edit the ~/golioth-firmware-sdk/examples/common/golioth_basics.c file:

#define TAG "golioth_basics_new"

// Current firmware version
static const char* _current_version = "1.2.6";

You can see I’ve appended “_new” to the tag name and incremented the version number to 1.2.6. Now we’re ready to rebuild… but remember, don’t flash this to your ESP32. We’re going to upload it to Golioth and perform a remote firmware update!

cd ~/golioth-firmware-sdk/examples/esp_idf/golioth_basics
idf.py build

The newly built binary is located in the build folder.

Upload firmware to Golioth and roll out a release

After much preamble we’ve arrived at the important moment.

To set the scene, the ESP32 that’s running on your desk is a remote IoT device taking sensor readings in a brick-and-mortar retail establishment in Waldorf, Maryland. We’ll push an update to it using a three-step process:

  1. Upload the binary as an “artifact”
  2. Create a “release” using the artifact
  3. Click the “rollout” button to make the release live

Go to the Golioth Console and select Firmware Update→Artifacts from the left sidebar. Click the “Create” button.

Golioth OTA create artifact

The only thing we’re going to change on this window is the “Artifact Version”. Type in the same version number you entered in the sourcecode for this firmware (probably 1.2.6 if you’re following along). Click the upload icon in the middle of the window and choose the “golioth_basics.bin” file from the build directory where you ran the idf.py build command. Finally, click the Upload Artifact button.

You have the option here to use a Blueprint. I’m not detailing that today for brevity, but it’s a good practice to use Blueprints to organize your production devices.

Now let’s create a release based on the artifact. Click Firmware Update→Release from the left sidebar and click the Create button.

Golioth OTA Release

All we’re going to do here is to chose the artifact we previously created in the Artifacts box and press Create Release.

You have some options here, most notably you can choose to start the rollout as soon as the release is created. I prefer to wait and roll it out as a separate confirmation step in case I made some mistake along the way.

Note that you have the option here of selecting device Blueprints and Tags to make this a more targeted release. This window is telling me the release will apply to 51 devices (!). That’s okay here because these are all test devices on a test project that we use for training.

Finally, let’s roll out the release to our devices:

Golioth OTA rollout

The Rollout button is all that stand between you and automatic updates. Click it and you will (almost) immediately see your device begin to download the new binary.

Here’s an awesome feature to keep in mind. When you have more than one release, you can use this button to rollback to previous versions. This means if you realize you released an update that has a bug, you can just toggle this button and all of your devices will automatically download the next-newest release that has rollout selected.

Watch your ESP32 update

If you read the source code for the golioth_basics example you will notice that it calls golioth_fw_update_init(client, _current_version);. That means the device has registered with the Golioth servers to receive updates when new firmware is available. Look in the terminal output and you will see the result:

I (175827) golioth_basics: Sending hello! 17
I (185007) golioth_fw_update: Received OTA manifest
I (185007) golioth_fw_update: Current version = 1.2.5, Target version = 1.2.6
I (185017) golioth_fw_update: State = Downloading
I (185317) golioth_fw_update: Image size = 1211744
I (185327) golioth_fw_update: Getting block index 0 (1/1184)
I (185827) golioth_basics: Sending hello! 18
W (187867) golioth_coap_client: CoAP message retransmitted
I (187947) fw_update_esp_idf: Writing to partition subtype 17 at offset 0x1a0000
I (187947) fw_update_esp_idf: Erasing flash
I (191627) golioth_fw_update: Getting block index 1 (2/1184)
I (191837) golioth_fw_update: Getting block index 2 (3/1184)
I (192037) golioth_fw_update: Getting block index 3 (4/1184)
I (192187) golioth_fw_update: Getting block index 4 (5/1184)
I (192447) golioth_fw_update: Getting block index 5 (6/1184)
I (192597) golioth_fw_update: Getting block index 6 (7/1184)

... snip ...

I (279837) golioth_fw_update: Getting block index 1181 (1182/1184) 
I (280107) golioth_fw_update: Getting block index 1182 (1183/1184) 
I (280317) golioth_fw_update: Getting block index 1183 (1184/1184) 
I (280457) golioth_fw_update: Total bytes written: 1211760 
I (280467) esp_image: segment 0: paddr=001a0020 vaddr=3f400020 size=29df0h (171504) map 
I (280527) esp_image: segment 1: paddr=001c9e18 vaddr=3ffbdb60 size=05868h ( 22632) 
I (280537) esp_image: segment 2: paddr=001cf688 vaddr=40080000 size=00990h ( 2448) 
I (280547) esp_image: segment 3: paddr=001d0020 vaddr=400d0020 size=d9a64h (891492) map 
I (280847) esp_image: segment 4: paddr=002a9a8c vaddr=40080990 size=1e2a0h (123552) 
I (280887) esp_image: segment 5: paddr=002c7d34 vaddr=50000000 size=00010h ( 16) 
I (280887) golioth_fw_update: State = Downloaded 
I (281127) golioth_fw_update: State = Updating 
I (281327) fw_update_esp_idf: Setting boot partition 
I (281337) esp_image: segment 0: paddr=001a0020 vaddr=3f400020 size=29df0h (171504) map 
I (281397) esp_image: segment 1: paddr=001c9e18 vaddr=3ffbdb60 size=05868h ( 22632) 
I (281417) esp_image: segment 2: paddr=001cf688 vaddr=40080000 size=00990h ( 2448) 
I (281417) esp_image: segment 3: paddr=001d0020 vaddr=400d0020 size=d9a64h (891492) map 
I (281717) esp_image: segment 4: paddr=002a9a8c vaddr=40080990 size=1e2a0h (123552) 
I (281757) esp_image: segment 5: paddr=002c7d34 vaddr=50000000 size=00010h ( 16) 
I (281827) golioth_fw_update: Rebooting into new image in 5 seconds 
I (282827) golioth_fw_update: Rebooting into new image in 4 seconds 
I (283827) golioth_fw_update: Rebooting into new image in 3 seconds 
I (284827) golioth_fw_update: Rebooting into new image in 2 seconds 
I (285827) golioth_fw_update: Rebooting into new image in 1 seconds

... snip ...

I (4267) esp_netif_handlers: sta ip: 192.168.1.159, mask: 255.255.255.0, gw: 192.168.1.1
I (4267) example_wifi: WiFi Connected. Got IP:192.168.1.159
I (4277) example_wifi: Connected to AP SSID: TheNewPeachRepublic
I (4287) golioth_mbox: Mbox created, bufsize: 2184, num_items: 20, item_size: 104
I (4287) golioth_basics_new: Waiting for connection to Golioth...
W (4297) wifi:<ba-add>idx:0 (ifx:0, c6:ff:d4:a8:fa:10), tid:0, ssn:1, winSize:64
I (4307) golioth_coap_client: Start CoAP session with host: coaps://coap.golioth.io
I (4307) libcoap: Setting PSK key

I (4317) golioth_coap_client: Entering CoAP I/O loop
I (4637) golioth_basics_new: Golioth client connected
I (4647) golioth_coap_client: Golioth CoAP client connected
I (4657) golioth_basics_new: Hello, Golioth!
I (4657) golioth_fw_update: Current firmware version: 1.2.6
I (4657) golioth_fw_update: Waiting for golioth client to connect before cancelling rollback
I (4677) golioth_fw_update: Firmware updated successfully!
I (4727) golioth_fw_update: State = Idle
I (5937) golioth_basics_new: Synchronously got my_int = 42
I (5937) golioth_basics_new: Entering endless loop
I (5937) golioth_basics_new: Sending hello! 0

First, the device compares version numbers and then begins to download blocks of the new firmware. Once downloaded it will reboot, connect to Golioth, and verify that it is running the newest version. The log labels near the end of the output now show golioth_basics_new, confirming one of the changes we made to our source code.

Golioth OTA report firmware version

On the Golioth Console, the summary view for this device confirms the currently running firmware version is 1.2.6!

With Golioth, OTA is built into the SDK

Golioth has done the heavy lifting so that you don’t need to. Our SDK uses just the single API call to register your devices for firmware updates. Use the fleet management tools on the Golioth Cloud to provision your devices in groups and by hardware variants. These make it possible to target your test devices for the first round of updates, or push new features just to the devices on the fourth floor of your Des Moines plant.

These robust tools are crucial for successful, long-lasting IoT deployments, and and they’re ready for you to start using right now. If you have any questions, we’d love to talk! Reach out to us on the Golioth Forum or get in touch with the DevRel team for demo.

How the Golioth Developer Training paved the way

This is a guest post from Shrouk El-Attar, discussing her journey from the hardware space into the firmware space and how Golioth training has helped her understand building out IoT systems using Zephyr.

The Journey Started with Hardware

To me, hardware just makes sense! You have specific requirements, find a part that can fulfill them, read its datasheet, and then execute. Voila! Successful design achieved.

The worst thing about the process? I don’t know…maybe the Googling? Getting a calculation out by a factor of ten? The tedious BOM stock checking process? These things can be long and tiring, but still, the concepts make sense. Designing a PCB is easy to understand; you follow simple rules and let the copper tracks do the rest. Even things in the “RF Voodoo” territory make sense with well-developed RF design tools these days.

Not too long ago, I finally took the leap into the world of consulting as a self-proclaimed “Hardware Queen”. With that, I saw the terrifying rise in demand for the 2-in-1 engineering consultants: a hardware engineer who is also a firmware engineer.

The Firmware Roadblock

Firmware feels to me like hardware’s anarchist sister. I know her a little; we have some history. It might surprise you to know that she hasn’t always been this way with me.

Once upon a time, there was Arduino (I know, I know). I probably picked up my first Arduino in 2012. I’d heard about them long before then, but as an asylum seeker, I couldn’t afford to own one. That is until the UK officially recognised me as a refugee, and I could finally access higher education and buy my own shiny Arduino.

The first time I got to Blinky on my Arduino, it was like something clicked. I understood it. I understood every single bit of code I was writing on it. “I have a gift,” I thought. Perhaps learning Visual Basic (yup) back in 2007 laid down those basics for me to get to Blinky on Arduino and be a total boss at it.

From then on, I felt invincible. Whatever I could think of, I was able to make it. Self-watering plant? Check. NFC easter egg hunt? Check. Twitter-powered bubble machine? Check. Arduino felt like hardware and software meeting in perfect harmony in a world where anything I could think of was possible.

Shrouk’s Arduino Connected Plant: Light data is sent to cloud platform (left), phone notifications are sent when light data is critical (middle), plant “tweets” when light data reaches below a certain level (right).

My initial success was a sign I was going to be a firmware engineer. But soon after, my firmware dreams were shattered.

Stuck in the Firmware Valley

My first full-time engineering role was at Intel back in 2015, where I specialized in the then-brand-new field called IoT. “Can you imagine that by 2020 there will be 50 billion connected devices?” I thought to myself. I was so excited that I would be one of the people developing those devices. It was the future.

One of the first things I couldn’t wait to get my hands on was the all-shiny Intel Realsense 3D Camera. But when I downloaded my SDK, I didn’t know where to start. After days of struggling on my own, I could get some things working with the help of my senior colleagues. But none of it clicked. I had no idea what I was doing. I was not gifted. Frankly, I sucked at firmware.

I started specializing in hardware. As the years passed, some of my all-time favorite chips became the nRF5 MCUs by Nordic Semiconductor. What a hardware engineer’s dream. Can’t get to a specific pin during routing? Don’t worry; choose literally any other pin you can reach more easily, and re-route there! The nRF5 had an excellent reputation amongst my firmware colleagues too.

In 2019, I decided to give firmware another go on my favorite chip. I wasn’t too far down the toolchain setup, and I’d already started regretting it. What the heck were Make files?! It was an excruciating process. But–no exaggeration–one week later, I’d managed to set up the toolchain. Great, let’s get to Blinky! Save, build, flash. Error.

I spent hours, then finally figure out the error. Save, build, flash—another error. Repeat a zillion times. It turns out I had never set up the toolchain correctly in the first place. Months later, I finally found a training that would work, or so I thought. This turned out to be my biggest disappointment. Nothing in the training was up to date. Nothing worked the way it was supposed to. This was indeed the final time I’d try to touch firmware again. I was stuck. I was done.

The Golioth Way

Back in present day, as I was realizing the demand for the 2-in-1 hardware and firmware engineering consultant, a well timed post from Chris Gammell appeared on my feed. The post was about the Golioth Developer Training, targeted explicitly at hardware engineers. The 2-in-1 engineer could be me. Was I going to try firmware again? Absolutely.

The training took place entirely remotely with hardware engineers from all over the world. This is going to be it, for sure! But, from the very beginning, I was already struggling. “Here we go again”, I thought. I shouldn’t have thought I could do it. I struggled with minor details at first, like appending -d instead of -D to a build command, which apparently gives wildly different results. I copied and pasted the correct information but in the wrong fields. And each exercise took me way longer than the “estimated time” suggested for each section.

The training happened in a large group, but we did the exercises independently. Chris and Mike were checking in on each of us throughout, so they were able to help me fix anything I missed very quickly on a one-to-one basis. As the training went on, I started to notice that I needed their help less and less. What is this I feel? Some cautious optimism, perhaps?

The training method was active. I wasn’t sitting at my desk blindly following instructions. No, I was pushed to figure out the correct way to solve a problem at every stage. It was presented in bite-size information with many collapsed sections. The training also included links referencing concepts, e.g. Zephyr Pin Control, throughout, should I want to learn more. For the most part, I didn’t click those because I wanted to stay on track. My head was already full of too much firmware to learn. But I enjoyed that they were there so I could refer back to them in the future.

Then the happiest of accidents happened. I picked up a later stage of the training on my own, then realized that I had to redo the earlier parts to get to where I wanted to pick up from. Not going to lie; I was slightly annoyed. But while re-doing them, I saw that it wasn’t taking me very long at all. This time, I was well within the time estimates suggested for each exercise, if not much quicker. Was this working for me? Did I finally find a method to learn commercially viable firmware that works?

A light bulb went off when I realized that it took me over a week to get to Blinky on my previous firmware training, but within a couple of hours, I was already speaking to my Wi-Fi device through the Golioth Console. Not only that, but it was all through the Zephyr RTOS with Golioth. The same Zephyr RTOS I struggled endlessly with on the nRF5 hardware that I love. It was a miracle to think about being enabled for firmware on my favorite hardware platform. Coupled with the fact that Golioth is entirely scalable, i.e. the idea that I can start from an idea to production on the same platform, made Golioth a dream.

Golioth also helped me develop a community, some of whom I became close Discord friends with. We all need a Discord buddy to complain to about daily life as an electronics engineering consultant (Hi Seth!). And a decade after I picked up my first Arduino, Golioth Developer Training helped me finally break through that commercially viable firmware ceiling.

Next, I’ll work on reference designs and put the skills I learnt from the training to the test. Watch this space.

Editor note: If you’re interested in taking part as an individual or as a company, sign up for future training here.

Golioth now supports Infineon parts via the ModusToolbox™. We added Golioth device management to the ecosystem a few weeks ago and the example code is available right now to run on Infineon’s line of microcontrollers talking to their Wi-Fi parts.

ModusToolbox™ (MTB) is a software support tool from Infineon Technologies. It includes partner SDKs alongside the company’s officially supported IDEs, drivers, and examples. You can pull in the Golioth example and all dependencies using the Eclipse IDE that is included in MTB, or via the command line tool.

Infineon’s PSoC™ 6 chips are feature rich 32-bit Arm microcontrollers. Paired with the Infineon 4343W, it’s a perfect platform for IoT device builders, and exactly the kind of constrained device that Golioth was built for.

Take advantage of Over-the-Air (OTA) firmware updates, time-series databases, state data management, remote logging, plus the command/control features like remote procedure call (RPC) and the device settings service. The API calls for each of these are demonstrated and well-commented in the golioth_main.c file.

Try the Golioth example using ModusToolbox™

PSoC 6 Wi-Fi BT Prototyping Kit (CY8CPROTO-062-4343W)

We run the Golioth example on the PSoC™ 6Wi-Fi BT Prototyping Kit (CY8CPROTO-062-4343W). Here’s how to try it for yourself:

1. Install Infineon’s ModusToolbox™

Begin by downloading and installing ModusToolbox™ for your system. Then run modustoolbox-eclipsewhich is located in the ide_3.0/eclipse/ subfolder.

2. Create a Golioth Example project

With the Eclipse IDE open, click on File→New→ModsToolbox Application. This will launch the project creator window.

Select the CY8CPROTO-062-4343W from the list of PSoC™ 6 boards and then click next.

Choose the Golioth Example from the Wi-Fi list and click on the Create button. This will take a couple of minutes to clone the Golioth code and all dependencies.

3. Compile and Install MCUboot

Golioth uses MCUboot as the secure bootloader for our Over-the-Air updates. Before flashing the app to the board, we need to compile and install MCUboot. I did this using the IDE’s built-in terminal.

First, we need to install the MCUboot dependencies.

cd ~/mtw/mtb_shared/mcuboot/v1.8.1-cypress/scripts/
python -m pip install -r requirements.txt

Now we can compile and flash MCUboot. Remember to plug a USB cable into the KITPROG3 connector on your PSoC™ 6 devboard before running the program command:

cd ~/mtw/Golioth_Example/bootloader_cm0p/
make build_proj -j8
make program_proj

4. Compile and flash the Golioth App

Before compiling the Golioth App we need to give it credentials to connect to Wi-Fi and also to authenticate with the Golioth server. These are set in the ~/mtw/Golioth_Example/golioth_app/source/golioth_main.hfile.

Use the Wi-Fi credentials for your local access point. Get device credentials from the Golioth Console. If you’ve haven’t yet set up an account, check out our Quickstart. (With Golioth’s Dev Tier your first 50 devices are free.)

Once you’ve saved your changes to the golioth_main.h file, use the terminal to compile and flash the app to your PSoC™ 6 board:

cd ~/mtw/Golioth_Example/golioth_app
make build_proj -j8
make program_proj

Taking the Golioth App for a test drive

[INF] MCUBoot Bootloader Started
[INF] External Memory initialized w/ SFDP.
[INF] boot_swap_type_multi: Primary image: magic=unset, swap_type=0x1, copy_don3
[INF] boot_swap_type_multi: Secondary image: magic=unset, swap_type=0x1, copy_d3
[INF] Swap type: none
[INF] User Application validated successfully
[INF] Starting User Application (wait)...
[INF] Start slot Address: 0x10018400
[INF] MCUBoot Bootloader finished
[INF] Deinitializing hardware...
External Memory initialized w/ SFDP.
=========================================================
[GoliothApp] Version: 1.0.0, CPU: CM4

=========================================================
[GoliothApp] Watchdog timer started by the bootloader is now turned off to mark.
[GoliothApp] User LED toggles at 1000 msec interval

WLAN MAC Address : 74:7A:90:D4:5F:04
WLAN Firmware    : wl0: Jul 18 2021 19:15:39 version 7.45.98.120 (56df937 CY) FWID 01-69db62cf
WLAN CLM         : API: 12.2 Data: 9.10.39 Compiler: 1.29.4 ClmImport: 1.36.3 Creation: 2021-07-18 19:03:20 
WHD VERSION      : v2.5.0 : v2.5.0 : GCC 10.3 : 2022-09-23 13:14:02 +0800
Wi-Fi Connection Manager initialized.
Successfully connected to Wi-Fi network 'MyWiFiAP'.
IP Address Assigned: 192.168.1.153
Secure Sockets initialized
I (5515) golioth_main: Waiting to Golioth to connect...
I (5638) golioth_coap_client: Start CoAP session with host: coaps://coap.golioth.io
I (5643) libcoap: Setting PSK key

I (5649) golioth_coap_client: Entering CoAP I/O loop
I (5886) golioth_main: Golioth client connected
I (5996) golioth_fw_update: Current firmware version: 1.0.0
I (6060) golioth_fw_update: Waiting to receive OTA manifest
I (6258) golioth_fw_update: Received OTA manifest
I (6258) golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
I (6260) golioth_fw_update: Waiting to receive OTA manifest
I (6322) golioth_main: Synchronously got my_int = 42
I (6323) golioth_main: Entering endless loop
I (9006) golioth_main: Callback got my_int = 42
I (9286) golioth_main: Setting loop delay to 5 s

By monitoring the serial output from the device (that’s /dev/ttyACM0 on my system) we can see all the parts of the app at work. The board powers up and reports the firmware version before connecting to Wi-Fi. Once a Golioth connection is established it checks for firmware updates before it starts writing data to the cloud.

You can view device logs remotely through the Golioth console. Here we see “Sending hello!” messages arriving along with a counter.

Viewing state data for this device, the counter variable is update at the same rate as the hello messages. In the Device Settings on the left sidebar of the console you can add LOOP_DELAY_S and remotely control how how many seconds the device pauses for between sending back these message. It’s a perfect way to control sensor reading frequency across your entire fleet.

Adding Golioth to existing Infineon PSoC™ 6 projects

The Golioth example that is part of ModusToolbox™ is a great blueprint for adding device management to your PSoC™ 6 projects. We’d love to hear what you’re build, and if you need some help we’re here to lend a hand. Show off your successes and post questions on the Golioth forum. Feel free to reach out to the Golioth Developer Relations team to set up a demo or discuss the needs of your IoT fleet.

This is the second part of How Golioth uses Hardware-in-the-Loop (HIL) Testing. In the first part, I covered what HIL testing is, why we use it at Golioth, and provided a high-level view of the system.

In this post, I’ll talk about each step we took to set up our HIL. It’s my belief that HIL testing should be a part of any firmware testing strategy, and my goal with this post is to show you that it’s not so difficult to get a minimum viable HIL set up within one week.

The Big Pieces

To recap, this is the system block diagram presented at the end of part 1:

The Raspberry Pi acts as a GitHub Actions Self-Hosted Runner which communicates with microcontroller devices via USB-to-serial adapters.

The major software pieces required are:

  • Firmware with Built-In-Test Capabilities – Runs tests on the device (e.g. ESP32-S3-DevKitC), and reports pass/fail for each test over serial.
  • Python Test Runner Script – Initiates tests and checks serial output from device. Runs on the self-hosted runner (e.g. Raspberry Pi).
  • GitHub Actions Workflow – Initiates the python test runner and reports results to GitHub. Runs on the self-hosted runner.

Firmware Built-In-Test

To test our SDK, we use a special build of firmware which exposes commands over a shell/CLI interface. There is a command named built_in_test which runs a suite of tests in firmware:

esp32> built_in_test
...
12 Tests 0 Failures 0 Ignored 
../main/app_main.c:371:test_connects_to_wifi:PASS
../main/app_main.c:378:test_golioth_client_create:PASS
../main/app_main.c:379:test_connects_to_golioth:PASS
../main/app_main.c:381:test_lightdb_set_get_sync:PASS
../main/app_main.c:382:test_lightdb_set_get_async:PASS
../main/app_main.c:383:test_lightdb_observation:PASS
../main/app_main.c:384:test_golioth_client_heap_usage:PASS
../main/app_main.c:385:test_request_dropped_if_client_not_running:PASS
../main/app_main.c:386:test_lightdb_error_if_path_not_found:PASS
../main/app_main.c:387:test_request_timeout_if_packets_dropped:PASS
../main/app_main.c:388:test_client_task_stack_min_remaining:PASS
../main/app_main.c:389:test_client_destroy_and_no_memory_leaks:PASS

These tests cover major functionality of the SDK, including connecting to Golioth servers, and sending/receiving CoAP messages. There are also a couple of tests to verify that stack and heap usage are within allowed limits, and that there are no detected memory leaks.

Here’s an example of one of the tests that runs on the device, test_connects_to_golioth:

#include "unity.h"
#include "golioth.h"

static golioth_client_t _client;
static SemaphoreHandle_t _connected_sem;

static void on_client_event(golioth_client_t client, 
                            golioth_client_event_t event, 
                            void* arg) {
    if (event == GOLIOTH_CLIENT_EVENT_CONNECTED) {
        xSemaphoreGive(_connected_sem);
    }
}

static void test_connects_to_golioth(void) {
    TEST_ASSERT_NOT_NULL(_client);
    TEST_ASSERT_EQUAL(GOLIOTH_OK, golioth_client_start(_client));
    TEST_ASSERT_EQUAL(pdTRUE, xSemaphoreTake(_connected_sem, 10000 / portTICK_PERIOD_MS));
}

This test attempts to connect to Golioth servers. If it connects, a semaphore is given which the test waits on for up to 10 seconds (or else the test fails). For such a simple test, it covers a large portion of our SDK, including the network stack (WiFi/IP/UDP), security (DTLS), and CoAP Rx/Tx.

We use the Unity test framework for basic test execution and assertion macros.

If you’re curious about the details, you can see the full test code here: https://github.com/golioth/golioth-firmware-sdk/blob/main/examples/esp_idf/test/main/app_main.c

Python Test Runner

Another key piece of software is a Python script that runs on the self-hosted runner and communicates with the device over serial. We call this script verify.py. In this script, we provision the device with credentials, issue the built_in_test command, and verify the serial output to make sure all tests pass.

Here’s the main function of the script:

def main():
    port = sys.argv[1]

    # Connect to the device over serial and use the shell CLI to interact and run tests
    ser = serial.Serial(port, 115200, timeout=1)

    # Set WiFi and Golioth credentials over device shell CLI
    set_credentials(ser)
    reset(ser)

    # Run built in tests on the device and check output
    num_test_failures = run_built_in_tests(ser)
    reset(ser)

    if num_test_failures == 0:
        run_ota_test(ser)

    sys.exit(num_test_failures)

This opens the serial port and calls various helper functions to set credentials, reset the board, run built-in-tests, and run an OTA test. The exit code is the number of failed tests, so 0 means all passed.

The script spends most of its time in this helper function which looks for a string in a line of serial text:

def wait_for_str_in_line(ser, str, timeout_s=20, log=True):
    start_time = time()
    while True:
        line = ser.readline().decode('utf-8', errors='replace').replace("\r\n", "")
        if line != "" and log:
            print(line)
        if "CPU halted" in line:
            raise RuntimeError(line)
        if time() - start_time > timeout_s:
            raise RuntimeError('Timeout')
        if str in line:
            return line

The script will continually read lines from serial for up to timeout_s seconds until it finds what it’s looking for. And if “CPU halted” ever appears in the output, that means the board crashed, so a runtime error is raised.

This verify.py script is what gets invoked by the GitHub Actions Workflow file.

GitHub Actions Workflow

The workflow file is a YAML file that defines what will be executed by GitHub Actions. If one of the steps in the workflow returns a non-zero exit code, then the workflow will fail, which results in a red ❌ in CI.

Our workflow is divided into two jobs:

  • Build the firmware and upload build artifacts. This is executed on cloud servers, since building the firmware on a Raspberry Pi would take a prohibitively long time (adding precious minutes to CI time).
  • Download firmware build, flash the device, and run verify.py. This is executed on the self-hosted runner (my Raspberry Pi).

The workflow file is created in .github/workflows/test_esp32s3.yml. Here are the important parts of the file:

name: Test on Hardware

on:
  push:
    branches: [ main ]
  pull_request:

jobs:
  build_for_hw_test:
    runs-on: ubuntu-latest
    steps:
    ...
    - name: Build test project
      uses: espressif/esp-idf-ci-action@v1
      with:
        esp_idf_version: v4.4.1
        target: esp32s3
        path: 'examples/test'
    ...

  hw_flash_and_test:
    needs: build_for_hw_test
    runs-on: [self-hosted, has_esp32s3]
    ...
    - name: Flash and Verify Serial Output
      run: |
        cd examples/test
        python flash.py $CI_ESP32_PORT && python verify.py $CI_ESP32_PORT

The first job runs in a normal GitHub runner based on Ubuntu. For building the test firmware, Espressif provides a handy CI Action to do the heavy lifting.

The second job has a needs: build_for_hw_test requirement, meaning it must wait until the first job completes (i.e. firmware is built) before it can run.

The second job has two further requirements – it must run on self-hosted and there must be a label has_esp32s3. GitHub will dispatch the job to the first available self-hosted runner that meets the requirements.

If the workflow completes all jobs, and all steps return non-zero exit code, then everything has passed, and you get the green CI checkmark ✅.

Self-Hosted Runner Setup

With all of the major software pieces in place (FW test framework, Python runner script, and Actions workflow), the next step is to set up a machine as a self-hosted runner. Any machine capable of running one of the big 3 operating systems will do (Linux, Windows, MacOS).

It helps if the machine is always on and connected to the Internet, though that is not a requirement. GitHub will simply not dispatch jobs to any self-hosted runners that are not online.

Install System Dependencies

There are a few system dependencies that need to be installed first. On my Raspberry Pi, I had to install these:

sudo apt install \
    python3 python3-pip python3-dev python-dev \
    libffi-dev cmake direnv wget gcovr clang-format \
    libssl-dev git
pip3 install pyserial

Next, log into GitHub and create a new self-hosted runner for your repo (Settings -> Actions -> Runners -> New self-hosted runner):

When you add a new self-hosted runner, GitHub will provide some very simple instructions to install the Actions service on your machine:

When I ran the config.sh line, I added --name and --unattended arguments:

./config.sh --url <https://github.com/golioth/golioth-firmware-sdk> --token XXXXXXXXXXXXXXXXXXXXXXXXXXXXX --name nicks-raspberry-pi --unattended

Next, install the service, which ensures GitHub can communicate with the self-hosted runner even if it reboots:

cd actions-runner
sudo ./svc.sh install

If there are any special environment variables needed when executing workflows, these can be added to runsvc.sh. For instance, I defined a variable for the serial port of the ESP32:

# insert anything to setup env when running as a service
export CI_ESP32_PORT=/dev/ttyUSB0

Finally, you can start the service:

sudo ./svc.sh start

And now you should see your self-hosted runner as “Idle” in GitHub Actions, meaning it is ready to accept jobs:

I added two custom labels to my runner: has_esp32s3 and has_nrf52840dk, to indicate that I have those two dev boards attached to my self-hosted runner. The workflow file uses these labels so that jobs get dispatched to self-hosted runners only if they have the required board.

And that’s it! With those steps, a basic HIL is up and running, integrated into CI, and covering a large amount of firmware code on real hardware.

Security

There’s an important note from GitHub regarding security of self-hosted runners:

Warning: We recommend that you only use self-hosted runners with private repositories. This is because forks of your repository can potentially run dangerous code on your self-hosted runner machine by creating a pull request that executes the code in a workflow.

I also stand by that recommendation. But it’s not always feasible if you’re maintaining an open-source project. Our SDKs are public, so this warning certainly applies to us.

If you’re not careful, you could give everyone on the Internet root access to your machine! It wouldn’t be very hard for someone to submit a pull request that modifies the workflow file to run whatever arbitrary Linux commands they want.

We mitigate most of the security risks with three actions.

1. Require approval to run workflows for all outside contributors.

This means someone from the Golioth organization must click a button to run the workflow. This allows the repo maintainer time to review any workflow changes before allowing the workflow to run.

This option can be enabled in Settings -> Actions -> General:

Once enabled, pull requests will have a new “Approve and Run” button:

This is not ideal if you’re running a large open-source project with lots of outside contributors, but it is a reasonable compromise to make during early stages of a project.

2. Avoid logging sensitive information

The log output of any GitHub Actions runner is visible publicly. If there is any sensitive information printed out in the workflow steps or from the serial output of the device, then this information could easily fall into the wrong hands.

For our purposes, we disabled logging WiFi and Golioth credentials.

3. Utilize GitHub Actions Secrets

You can use GitHub Actions Secrets to store sensitive information that is required by the runner. In our repo, we define a secret named `GOLIOTH_API_KEY` that allows us access to the Golioth API, which is used to deploy OTA images of built firmware during the workflow run.

If you want to know more about self-hosted runner security, these resources were helpful to me:

Tips and Tricks

Here are a few extra ideas to utilize the full power of GitHub Actions Self-Hosted Runners (SHRs):

  • Use the same self-hosted runner in multiple repos. You can define SHRs at the organization level and allow their use in more than one repo. We do this with our SHRs so they can be used to run tests in the Golioth Zephyr SDK repo and the Golioth ESP-IDF SDK repo.
  • Use labels to control workflow dispatch. Custom labels can be applied to a runner to control when and how a workflow gets dispatched. As mentioned previously, we use these to enforce that the SHR has the dev board required by the test. We have a label for each dev board, and if the admin of the SHR machine has those dev boards attached via USB, then they’d add a label for each one. This means that machine will only receive workflows that are compatible with that machine. If a board is acting up or requires maintenance, the admin can simply disconnect the board and remove the label to prevent further workflows from executing.
  • Distributed self-hosted runners to load balance and minimize CI bottlenecks. If there is only one self-hosted runner, it can become a bottleneck if there are lots of jobs to run. Also, if that runner loses Internet, then CI is blocked indefinitely. To balance the load and create more robust HIL infrastructure, it’s recommended to utilize multiple self-hosted runners. Convince your co-workers to take their unused Intel NUC off the shelf and follow the steps in the “Self-Hosted Runner Setup” section to add another runner to the pool. This allows for HIL tests to run in a distributed way, no matter where in the world you are.

Even people who have been living under a rock for the past couple of years know that there’s a global chip shortage. The correct response from the engineering community should be changes to our design philosophy and that’s the topic of the talk that Chris Gammell and I gave at the 2022 Zephyr Developer Summit back in June. With the right hardware and firmware approach, it’s possible to design truly modular hardware to respond to supply chain woes.

Standardization enabled the industrial revolution, and the electronics field is a direct descendant of those principles. You can rest easy in knowing that myriad parts exist to match your chosen operating voltage and communication scheme. But generally speaking it’s still quite painful to change microcontrollers and other key-components mid-way through product design.

Today’s hardware landscape has uprooted the old ways of doing things. You can’t wait out a 52-week lead time, so plan for the worst and hope for the best. Our talk covers the design philosophies we’ve adopted to make hardware “swapability” possible, while using Zephyr RTOS to manage the firmware side of things.

Meet the Golioth Aludel

Golioth exists to make IoT development straightforward and scalable–that’s not possible if you’re locked into specific hardware, especially these days. So we designed and built a modular platform to showcase this flexibility to our customers.

Golioth Aludel prototyping platform internal view

Called Aludel, Chris Gammell centered the design around a readily available industrial enclosure. The core concept is a PCB base that accepts any microcontroller board that uses the Feather standard. This way, every pin-location and function is standardized. Alongside the Feather you’ll find two headers that use the Click boards standard (PDF), a footprint for expansions boards facilitating i2c, SPI, UART, and more. The base includes Qwiic and Stemma headers for additional sensor connectivity, as well as terminal blocks, room for a battery, and provisions for external power.

The key customization element for Aludel is the faceplate. It’s a circuit board that can deliver a beautiful custom design to match any customer request. But on the inside, each faceplate has the same interface using a flat cable connection to the base board. Current versions of the faceplate feature a a screen, an EEPROM to identify the board, and a port expander that drives buttons, LEDs, and whatever else you need for the hardware UI.

The beauty of this is that electrically, it all just works. Need an nRF52, nRF9160, ESP32, STM32, or RP2040? They all already exist in Feather form factor. Need to change the type of temperature sensor you’re using? Want to add RS485, CANbus, MODBUS, 4-20 ma communication, or some other connectivity protocol? The hardware is ready for it–at most you might need to spin your own adapter board.

Now skeptics may look at the size of this with one abnormally high-arched eyebrow. But remember, this is a proof-of-concept platform. You can set it up for your prototyping, then take the block elements and boil them down into your final design. Prematurely optimizing for space means you will have many many more board runs as the parts change.

When you do a production run and your chip is no longer available, the same concepts will quickly allow you to test replacements and respin your production PCB to accommodate the changes.

Zephyr Alleviates your Firmware Headaches

“But wait!” you cry, “won’t someone think of the firmware?”. Indeed, someone has thought of the firmware. The Linux Foundation shepherds an amazing Real-Time Operating System called Zephyr RTOS that focuses on interoperability across a vast array of processor architectures and families. Even better, it handles the networking stack and has a standardized device model that makes it possible to switch peripherals (ie: sensors) without your C code even noticing.

We use Zephyr to maintain one firmware repository for the Aludel hardware that can be compiled for STM32F40, nRF52840, nRF9160, and ESP32 using your choice of Ethernet, WiFi, or Cellular connections. How is that even possible?

&i2c0 {
	bme280@76 {
		compatible = "bosch,bme280";
		reg = <0x76>;
		label = "BME280_I2C";
	};
};

&spi1 {
	compatible = "nordic,nrf-spi";
	status = "okay";
	cs-gpios = <&gpio0 3 GPIO_ACTIVE_LOW>,
	           <&gpio0 27 GPIO_ACTIVE_LOW>,
	           <&gpio1 8 GPIO_ACTIVE_LOW>;
	test_spi_w5500: w5500@0 {
		compatible = "wiznet,w5500";
		label = "w5500";
		reg = <0x0>;
		spi-max-frequency = <10000000>;
		int-gpios = <&gpio0 2 GPIO_ACTIVE_LOW>;
		reset-gpios = <&gpio0 30 GPIO_ACTIVE_LOW>;
	};
};

/ {
    aliases {
        aludeli2c = &i2c0;
        pca9539int = &interrupt_pin0;
    };
    gpio_keys {
        compatible = "gpio-keys";
        interrupt_pin0: int_pin_0 {
			gpios = < &gpio0 26 GPIO_ACTIVE_LOW >;
			label = "Port Expander Interrupt Pin";
		};
	};
};

Zephyr uses the DeviceTree standard to map how all of the hardware is connected. A vast amount of work has already been done for you by the manufacturers who maintain drivers for their chips and supply .dts (DeviceTree Syntax) files that specify pin functions and mappings. When writing firmware for your projects you supply DeviceTree overlay files that indicate sensors, buttons, LEDs, and anything else connected to your design. The C code looks up each device to receive all relevant information like device address, GPIO port/pin assignment and function.

const struct device *weather_dev = DEVICE_DT_GET_ANY(bosch_bme280);
sensor_sample_fetch(weather_dev);
sensor_channel_get(weather_dev, SENSOR_CHAN_AMBIENT_TEMP, &temp);
sensor_channel_get(weather_dev, SENSOR_CHAN_PRESS, &press);
sensor_channel_get(weather_dev, SENSOR_CHAN_HUMIDITY, &humidity);

The hardware abstraction doesn’t stop there. Zephyr also specifies abstraction for sensors. For instance, an IMU will have an X, Y, and Z output — Zephyr makes accessing these the same even if you have to move to an IMU from a different manufacturer due to parts shortages. You update the overlay files for the new build, and use a configuration file to turn on any specific libraries for the change, but the code you’re maintaining can continue on as if nothing happened.

Of course there are caveats which we cover in the talk. There is no one-size-fits-all in embedded engineering so you will eventually need to define your own .dts files for custom boards, and add peripherals that are unsupported in Zephyr. This is not a deal breaker, there is a template example for adding boards (and I highly recommend watching Jared Wolff’s video on the subject). And since Zephyr is open source, your customizations can be contributed upstream so others may also benefit.

Hardware will never again be the same

Electronics manufacturing will eventually emerge from the current shortages. But there are no signs of this happening soon, and it’s been ongoing for two years. We shouldn’t be trying to return to “normal”, but planning a better future. That’s what Golioth is doing for the Internet of Things. Choosing an IoT management platform shouldn’t require you to commit to one type of hardware. Your fleets should evolve, and our talk outlines how they can evolve. Golioth helps to manage your diverse and growing hardware fleet. Golioth is designed to make it easy to keep track of and control all of that hardware, whether that’s 100 devices or 100,000. Take Golioth for a test drive today.

Today we’re announcing a new feature on the Golioth Console and on our Device SDKs that enables Remote Procedure Calls (RPCs) for all users on the platform. From the cloud, you can initiate a function on your constrained device in the field, ensure the device received and executed the command, and receive a response from the device back to the Cloud.

What is a Remote Procedure Call (RPC)?

A Remote Procedure Call allows you to call a function on a remote computing device and optionally receive a result. An easy way to think about it is you’re calling a function, like you would in any other program…you’re just doing it from another computer. In this case, you’re triggering actions from the Golioth Cloud.

RPC from the Golioth Cloud (Console)

Each device in your project has a page where you can view details about things like LightDB State, LightDB Stream, Settings, and now RPC. Our Console includes an interface to directly send RPCs to the remote device. The URL will look something like:

https://console.golioth.io/devices/<YOUR_DEVICE_ID>/management/rpc

In all of our examples, we are sending an RPC to single devices. However, they can also be triggered from the REST API. As a reminder, any function you see on the Golioth Console is available on the REST API.

One critical function of RPCs is a confirmation that the remote function has actually run. The device firmware needs to send back a success message that the function has completed, and optionally a returned value. When there is a problem connecting with your device and the RPC does not complete successfully, you will see a screen that looks like this:

An RPC sent to a device that was disconnected from Wi-Fi

Also note that round trip time is measured for all RPCs, including successful ones. Transit times will depend upon your connectivity medium, in addition to the processing time of the function on the remote device.

When an RPC successfully completes, you can click the button with 3 dots to receive the returned value. In the example and in the video, we were using a method called double that takes an integer input, multiplies it by two, and then returns the value to the Cloud. Below, you can see the result when we sent “double” method with a parameter of “37”.

RPC from the Device SDK perspective

Any new feature on Golioth has Device SDK support, in addition to the new APIs and UIs on our Web Console. Earlier this week, Nick wrote about how we test hardware and firmware at Golioth, especially when a new feature is released across the platform. Now that we support 3 SDKs (Zephyr, NCS, ESP-IDF), the testing area has increased.

In the video, the focus is on ESP-IDF, which has a simple way to set up and respond to new RPCs. First, we register the new method, so we’ll recognize the command coming from the Golioth Cloud:

The function that we tie to that newly registered RPC needs to return the RPC_OK variable for the Cloud to be alerted that the function has processed properly.

If you have satisfied these requirements in the ESP-IDF SDK and copied the format, you can customize logic to do whatever task you’d like on the remote device. Let’s look at some examples.

Use cases for an RPC

The double() example is a simple showcase of the minimum requirements to create an RPC in the ESP-IDF SDK. We send a command and a value, we return a modified value.

The remote_reset command we created and showcased in the video is more like a critical function you would want to add to your project. When you want to trigger a remote function like a reset, you want to ensure that the command was properly received, that the function executed, and then that there was output data that validated the reset has happened. In the final point, that includes inferring that the device has restarted from the log messages also being sent back to the Golioth Console. Put all together, it’s a reliable way to tell the device has been reset.

Other use cases could be as simple as sending arbitrary text to a display. You would still want to know the text has been received and properly sent to the physical display. Or perhaps you have a valve and you want to be able to send an arbitrary value to the valve, but you also want to take a reading on an encoder that measures the distance the valve has moved.

Many of these functions could also be achieved with LightDB State (which the RPC service is built upon), but the context for creating an RPC is more targeted at situations like the examples above.

What will you build?

RPCs are another way for you to communicate with your constrained IoT devices from the Golioth Cloud and to get useful information back from your devices. You can start testing out this feature today.

For more questions or assistance, check out our Forums, our Discord, or drop us a note at [email protected].

 

Here’s a scenario that might sound familiar.

You’ve added a new feature to the firmware, about 800 lines of code, along with some simple unit tests. All unit tests are passing. The code coverage report is higher than ever. You pat yourself on the back for a job well done.

But then – for a brief moment, conviction wavers as a voice in your head whispers: “The mocks aren’t real”; “Does this code really work?”; “You forgot to test OTA.”

But the moment passes, and your faith is restored as you admire the elegant green CI check mark ✅.

Do you trust your unit tests enough to deploy the firmware right now to thousands of real devices via Over-the-Air (OTA) firmware update?

If the answer is “well, no, not really, we still need to test a few things with the real hardware”, then automated Hardware-in-the-Loop (HIL) testing might give you the confidence to upgrade that answer to “heck yeah, let’s ship it!”. 🚀

In this post, I’ll explain what HIL testing is, and why we use it at Golioth to continuously verify the firmware for our Zephyr and ESP-IDF SDKs.

HIL Testing: A Definition

If you search online, you’ll find a lot of definitions of what HIL testing is, but I’ll try to formulate it in my own simple terms.

There are three major types of tests:

  • Unit – the smallest kind of test, often covering just one function in isolation
  • Integration – testing multiple functions or multiple subsystems together
  • System – testing the entire system, fully integrated

A firmware test can run in three places:

  • On your host machine – Running a binary compiled for your system
  • On emulated target hardware – Something like QEMU or Renode
  • On real target hardware – Running cross-compiled code for the target, like an ARM or RISC-V processor.

A HIL test is a system test that runs on real target hardware, with real peripherals, and real connections. Typically, a HIL setup will consist of:

  • A board running your firmware (the code under test)
  • A host test machine, executing the tests
  • (optional) Other peripherals and test equipment to simulate real-world conditions, controlled by the host test machine

Coverage and Confidence

Why bother with HIL tests? Aren’t unit tests good enough?

To address this, I’ll introduce two terms:

  • Coverage: How much of the firmware is covered by tests?
  • Confidence: How confident are you that the firmware behaves as intended?

Coverage is easily measured by utilizing open-source tools like gcov. It’s an essential part of any test strategy. Confidence is not as easily measured. It’s often a gut feeling.

It’s possible to have 100% coverage and 0% confidence. You can write tests that technically execute every line of code and branch combination without testing any intended functionality. Similarly, you can have 0% coverage and 100% confidence, but if you encounter someone like this, they are, to put it bluntly, delusional.

Unit tests are great at increasing coverage. The code under test is isolated from the rest of the system, and you can get into all the dark corners. “Mocks” are used to represent adjacent parts of the system, usually in a simplistic way. This might be a small piece of code that simulates the response over a UART. But this means is you can’t be 100% confident the code will work the same way in the context of the full system.

HIL system tests are great at increasing confidence, as they demonstrate that the code is functional in the real world. However, testing things at this level makes it difficult to cover the dark corners of code. The test is limited to the external interfaces of the system (ie. the UART you had a mock for in the above example).

A test strategy involving both unit tests and HIL tests will give you high coverage and high confidence.

Continuously Verifying Golioth SDK Firmware

Why do we use HIL testing at Golioth?

We have the concept of Continuously Verified Boards, which are boards that get first-class support from us. We test these boards on every SDK release across all Golioth services.

Previously, these boards were manually tested each release, but there were several problems we encountered:

  • It was time-consuming
  • It was error-prone
  • It did not scale well as we added boards
  • OTA testing was difficult, with lots of clicking and manual steps involved
  • The SDK release process was painful
  • We were unsure whether there had been regressions between releases

We’ve alleviated all of these problems by introducing HIL tests that automatically run in pull requests and merges to the main branch. These are integrated into GitHub Actions (Continuous Integration, CI), and the HIL tests must pass for new code to merge to main.

Here’s the output of the HIL test running in CI on a commit that merged to main recently in the ESP-IDF SDK:

HIL Test Running in CI

This is just the first part of the test where the ESP32 firmware is being flashed.

Further down in the test, you can see we’re running unit tests on hardware:

Automated test output

And we even have an automated OTA test:

Automated OTA Test Output

As a developer of this firmware, I can tell you that I sleep much better at night knowing that these tests pass on the real hardware. In IoT devices, OTA firmware update is one of the most important features to get right (if not the most important), so testing it automatically has been a big win for us.

HIL on a Budget

Designing a prototype HIL system does not need to be a huge up-front investment.

At Golioth, one engineer was able to build a prototype HIL within one week, due primarily to our simple hardware configuration (each tested board just needs power, serial, and WiFi connection) and GitHub Actions Self-Hosted Runners, which completely solves the infrastructure and DevOps part of the HIL system.

The prototype design looks something like this:

 

I’m using my personal Raspberry Pi as the self-hosted runner. GitHub makes it easy to register any kind of computer (Linux, MacOS, or Windows) as a self-hosted runner.

And here’s a pic of my HIL setup, tucked away in a corner and rarely touched or thought about:

My DIY HIL Setup

Next Time

That’s it for part 1! We’ve covered the “why” of HIL testing at Golioth and a little bit of the “how”.

In the next part, we’ll dive deeper into the steps we took to create the prototype HIL system, the software involved, and some of the lessons learned along the way.

 

Thanks to Michael for the photo of Tiger and Turtle