This is a guest post from Chris Wilson discussing how the Golioth training inspired him to create a custom Zephyr board definition for the Adafruit MagTag board used in the training.

Back in November of 2022, I ran across a post from Chris Gammell announcing a free developer training that Golioth would be offering the following month. At the time, I had no previous experience working with Zephyr or the Golioth IoT platform, but this seemed like a good introduction to both–so I signed up!

New to Golioth? Sign up for our newsletter to keep learning more about IoT development or create your free Golioth account to start building now.

The training offered by the Golioth team was really approachable, even for people like me without an extensive background in firmware development or real-time operating systems. The training starts with a basic introduction to building firmware in the underlying Zephyr RTOS and progresses through a series of examples that showcase the features of the Golioth SDK.

However, there was one aspect of the training that initially confused me: the training docs instruct you to build firmware for the ESP32-S2-Saola-1 board, but then run that firmware image on the Adafruit MagTag board.

For example, to build the firmware for the Golioth Demo application, the -b esp32s2_saola board argument is passed to the west build command:

west build -b esp32s2_saola app/golioth-demo

Why are we building firmware for a completely different board? 🤔

It turns out this works because:

  1. The ESP32-S2-Saola-1 board uses the exact same ESP32-S2 system-on-chip (SoC) as the Adafruit MagTag board, so firmware compiled for one board can run on the other.
  2. The Golioth training repo includes some additional Zephyr “overlay” files that modify the base board definition for the ESP32-S2-Saola-1 in Zephyr to work with the additional hardware features on the MagTag board.

This highlights one of the strengths of the underlying Zephyr RTOS: the ability to quickly extend or modify existing board definitions through the use of devicetree overlay files. Overlays make it possible to extend or modify an existing board definition to support new hardware variants, without having to go through the major effort of defining and upstreaming a brand new board definition to the Zephyr project.

This is great for getting something running quickly, but since these are totally different boards, I thought it felt a bit awkward (and potentially confusing) to keep using the esp32s2_saola board name in the training demos. I thought:

Wouldn’t it be nice if we could use the adafruit_magtag board name in the Golioth demo apps without having to add it to the upstream Zephyr repo?

Fortunately, Zephyr’s flexibility provides us with an option: we can bundle a custom MagTag board definition alongside the training demo apps, without having to touch the upstream Zephyr repository!

In this article, I’ll walk through step-by-step how I added a new board definition for the Adafruit MagTag board in the Golioth magtag-demo repository. By the end of the article, we’ll be able to pass the adafruit_magtag board argument to west commands like this:

west build -b adafruit_magtag app/golioth-demo

Understanding “Boards” in Zephyr

Since we want to add support for a new physical board, we need to understand what a “Board” is in the Zephyr ecosystem.

Zephyr has a layered architecture that explicitly defines a “Board” entity that is distinct from other layers in the architecture like a “CPU” or a “SoC”.

Configuration Hierarchy image from https://docs.zephyrproject.org/latest/hardware/porting/board_porting.html

The Zephyr glossary defines a board this way:

A target system with a defined set of devices and capabilities, which can load and execute an application image. It may be an actual hardware system or a simulated system running under QEMU. The Zephyr kernel supports a variety of boards.

Zephyr already has support for the Xtensa CPU (zephyr/arch/xtensa/core/) and the ESP32-S2 SoC (zephyr/soc/xtensa/esp32s2/), so we don’t need to add anything new for these layers. The only thing we need to add is a new definition for the MagTag board itself.

Let’s dig into the Zephyr documentation to see how to add support for a new board.

Adding a new Board in Zephyr

Zephyr has extensive documentation on how to add support for new hardware (see Porting). For this article specifically, I referred to the Board Porting Guide that covers how to add support for a new board in Zephyr.

The board porting guide provides a generic overview of the porting process for a fake board named “plank”, while this article tries to “fill in the gaps” for some of the more specific questions I had while working on the definition for the Adafruit MagTag board. I find it’s helpful to walk through the end-to-end process for a real board, but because this article is tailored specifically for the MagTag board, it may not exhaustively cover every possible aspect of porting Zephyr to a new board.

Zephyr is flexible and it supports pulling in board definitions from multiple possible locations. Before we can dive in and start adding a new MagTag board definition, we need to understand where to locate the files so the Zephyr build system can find them. To do that, we need to take a quick step back to understand how west workspaces and manifest repositories work.

Understanding west workspaces and manifest repositories

Building a Zephyr-based firmware image requires pulling in source code for the bootloader, kernel, libraries, and application logic from multiple Git repositories (the Zephyr term for these individual Git repositories is projects). Managing these individual repos manually would be a nightmare! Thankfully, Zephyr provides a command line tool named west that automatically manages these Git repositories for us.

West manages all these dependencies inside a top-level directory called a workspace. Every west workspace contains exactly one manifest repository, which is a Git repository containing a manifest file. The manifest file (named west.yml by default) defines the Git repositories (projects) to be managed by west in the workspace.

West is flexible and supports multiple topologies for application development within a workspace (you can read about all the supported topologies here). The magtag-demo repo is structured as a variation of the T2: Star topology. This means the magtag-demo repo is the manifest repository inside the magtag-demo-workspace west workspace, and the zephyr repository is included as a dependency in the west manifest file (in our example we keep this in deps/zephyr).

The workspace looks something like this (some folders redacted for clarity):

magtag-demo-workspace/                 # west workspace ("topdir")
├── .west/                             # marks the location of the west topdir
│   └── config                         # per-workspace local west configuration file
│
│   # The manifest repository, never modified by west after creation:
├── app/                               # magtag-demo.git repo cloned here as "app" by west
│   ├── golioth-demo/                  # Zephyr app for Golioth demo
│   │   └── boards/
│   │       ├── esp32s2_saola.conf     # app-specific software configuration
│   │       └── esp32s2_saola.overlay  # app-specific hardware configuration
│   └── west.yml                       # west manifest file
│
│   # Directories containing dependencies (git repos) managed by west:
└── deps/
    ├── bootloader/
    ├── modules/
    ├── tools/
    └── zephyr/
        └── boards/
            └── xtensa/
                └── esp32s2_saola/     # board definition for ESP32-S2-Saola-1

When we run the west build -b esp32s2_saola command, the Zephyr build system will look for a board named esp32s2_saola in a subdirectory of the zephyr/boards directory AND in a subdirectory of app/boards (if it exists). As you can see in the hierarchy above, the zephyr repo already includes the board definition for the ESP32-S2-Saola-1 board in the zephyr/boards/xtensa/esp32s2_saola/ directory, so this is the board definition that is pulled in when building the golioth-demo application.

However, if you look in the magtag-demo-workspace/app/golioth-demo/boards/ directory, you’ll notice files like esp32s2_saola.conf and esp32s2_saola.overlay that extend the esp32s2_saola board definition to enable additional software/hardware features on the MagTag board (LEDs, buttons, etc). I’ll cover the details of these files later on in this article, but for now, you just need to know that they allow application-specific modifications to the base esp32s2_saola board definition. The key takeaway here is that your Zephyr application can use and extend any existing board definition from the upstream zephyr repo.

So, to recap, if we want to add a new adafruit_magtag board definition for our app, there are two places where we could add it:

  1. In the upstream zephyr repository as boards/xtensa/adafruit_magtag
  2. In the magtag-demo repository as boards/xtensa/adafruit_magtag

If we add the board definition into the upstream zephyr repository, it would make the board definition available to anybody who uses Zephyr. That’s great! However, it can take a while for the Zephyr developers to review and approve a PR to add a new board definition. It is also required to add documentation for the board as part of the PR, which adds some additional overhead to the submission process.

In this article, we’re just going to add the custom board definition in the magtag-demo repo (as described here) so that we can bundle it alongside the training apps without waiting for it to go through the upstream submission process.

By the end of this article, we’ll end up creating the following new files:

magtag-demo-workspace/
└── app/
    ├── boards/
    │   └── xtensa/
    │       ├── Kconfig.board
    │       ├── Kconfig.defconfig
    │       ├── adafruit_magtag-pinctrl.dtsi
    │       ├── adafruit_magtag.dts
    │       ├── adafruit_magtag_defconfig
    │       └── board.cmake
    ├── dts/
    │   └── bindings/
    │       └── gpios.yaml
    ├── golioth-demo/
    │   └── boards/
    │       ├── adafruit_magtag.conf
    │       └── adafruit_magtag.overlay
    └── zephyr/
        └── module.yml

Let’s take a look at each of these files in detail.

Create the new board directory

The first step is to create a new directory where we can add the files for the adafruit_magtag board definition:

magtag-demo-workspace/app/boards/xtensa/adafruit-magtag/
This directory doesn’t need to match the board name. However, the board name must be unique. You can run west boards to get a list of the existing Zephyr board names.

Define the board hardware using Devicetree

In order to generate customized firmware for each supported board, Zephyr needs to have an understanding of each board’s hardware configuration. Rather than hard coding all the hardware details of each board into the operating system, Zephyr uses the Devicetree Specification to describe the hardware available on supported boards. Using devicetree, many aspects of the hardware can be described in a data structure that is passed to the operating system at boot time. Using this data structure, the firmware can get information about the underlying hardware through the standard devicetree.h API at runtime.

It’s easy to get overwhelmed when you first start trying to understand devicetree. Hang in there! You’ll soon see that the benefits of devicetree are worth the initial learning curve. If you’ve never worked with devicetree before, I would encourage you to spend some time reading the Introduction to devicetree in the Zephyr docs. If you prefer a video introduction, check out Marti Bolivar’s talk A deep dive into the Zephyr 2.5 device model from the 2021 Zephyr Developer’s Summit.

The devicetree data structure is essentially a hierarchy of nodes and properties. In practice, the hierarchy of nodes reflects the real-world hierarchy of the hardware, and the properties describe or configure the hardware each node represents.

There are four Devicetree files we need to provide as part of the board definition:

magtag-demo-workspace/
└── app/
    ├── boards/
    │   └── xtensa/
    │       ├── adafruit_magtag-pinctrl.dtsi
    │       └── adafruit_magtag.dts
    ├── dts/
    │   └── bindings/
    │       └── gpios.yaml
    └── golioth-demo/
        └── boards/
            └── adafruit_magtag.overlay

adafruit_magtag-pinctrl.dtsi

Zephyr uses a system called Pin Control to map peripheral functions (UART, I2C, etc) to a specific set of pins. It’s common to put these pin definitions in a <board_name>-pinctrl.dtsi file and include that file in the main <board_name>.dts device tree source file for the board.

The Golioth magtag-demo uses UART0 for the serial console, I2C1 for the onboard LIS3DH accelerometer, SPIM2 for the WS2812 “neopixel” LEDs, and LEDC0 as the PWM controller for the red LED.

Here’s the pin mapping for these peripherals on the MagTag board:

UART0:

  • TX: GPIO43
  • RX: GPIO44

I2C1:

  • SDA: GPIO33
  • SCL: GPIO34

SPIM2

  • MOSI: GPIO1
  • MISO: (not used)
  • SCLK: (not used)

To describe the hardware pin mapping, we need to create a devicetree include file:

magtag-demo-workspace/app/boards/xtensa/adafruit-magtag/adafruit_magtag-pinctrl.dtsi

First, we need to include a couple pin control header files for the ESP32-S2. These files contain macros that we’ll use in the pin control definitions:

#include <zephyr/dt-bindings/pinctrl/esp-pinctrl-common.h>
#include <dt-bindings/pinctrl/esp32s2-pinctrl.h>
#include <zephyr/dt-bindings/pinctrl/esp32s2-gpio-sigmap.h>
Although DTS has a /include/ "<filename>" syntax for including other files, the C preprocessor is run on all devicetree files, so includes are generally done with C-style #include <filename> instead.

Espressif also provides an ESP32-S2 devicetree include file (zephyr/dts/xtensa/espressif/esp32s2.dtsi) that contains a devicetree node for the pin controller called pin-controller with a node label named pinctrl:

pinctrl: pin-controller {
    compatible = "espressif,esp32-pinctrl";
    status = "okay";
};

We need to extend this node to include the missing pin configuration for the MagTag board. Zephyr provides a convenient shortcut to refer to existing devicetree nodes via the &node syntax (where node is the node label). In the adafruit_magtag-pinctrl.dtsi file, we’ll refer to this node as &pinctrl and extend it by providing additional properties:

&pinctrl {
    ...
};

Pin control has the concept of states, which can be used to set different pin configurations based on runtime operating conditions. Currently, two standard states are defined in Zephyr: default and sleep. For the Golioth magtag-demo we’re only going to define pin mappings for the default state.

Let’s define the default state mapping for the UART0 pins. We’ll define a node named uart0_default with matching node label uart0_default. Since the RX pin requires an internal pull-up to be enabled on our board, we’ll define two groups: group1 and group2. Groups allow properties to be applied to multiple pins at once, and we’ll use it here to apply the bias-pull-up property to the RX pin. In each group, pins are declared by assigning one of the macro definitions from esp32s2-pinctrl.h to the pinmux property. For example, the UART0_TX_GPIO43 macro assigns GPIO43 to the UART0 peripheral as TX, and UART0_RX_GPIO44 assigns GPIO44 to the UART0 peripheral as RX:

&pinctrl {

    uart0_default: uart0_default {
        group1 {
            pinmux = <UART0_TX_GPIO43>;
        };
        group2 {
            pinmux = <UART0_RX_GPIO44>;
            bias-pull-up;
        };
    };
    
};

We can follow the same procedure to define additional pin mappings for the I2C1, SPIM2, and LEDC0 peripherals (you can see the complete pin control mapping file here).

Now that we’ve got the pin control mappings defined, we can use them in the main adafruit_magtag.dts devicetree source file.

adafruit_magtag.dts

To describe the hardware available on the board, we need to create a devicetree source (DTS) file:

magtag-demo-workspace/app/boards/xtensa/adafruit-magtag/adafruit_magtag.dts

First, we add a line specifying the devicetree syntax version we’re going to be using in the file:

/dts-v1/;

Next, we include the ESP32-S2 SoC devicetree definitions provided by Espressif in zephyr/dts/xtensa/espressif/esp32s2.dtsi:

#include <espressif/esp32s2.dtsi>

This file defines the hardware available on the ESP32-S2 SoC such as the available CPUs, flash memory, WiFi, GPIOs, etc.

Note that many of the peripherals defined in this file are disabled by default (status = "disabled";). We’ll enable all the peripherals used on the MagTag board later.

Since the MagTag board has a PWM-capable LED, we also need to include the PWM device tree bindings header file so that we can use the PWM_HZ(x) macro:

#include <zephyr/dt-bindings/pwm/pwm.h

Finally we include the Pin Control file we created earlier which defines the pin control mappings for the board:

#include "adafruit_magtag-pinctrl.dtsi"

Now we can define the actual device tree data structure for the MagTag board.

/ defines the root node for the board. The model property defines a human readable name for the board, and the compatible property can be used to match this node to a compatible devicetree binding file (you can think of bindings as a sort of schema for the nodes):

/ {
    model = "adafruit_magtag";
    compatible = "adafruit,magtag";
    
    ...

First, we’ll create a node for the GPIO-controlled LEDs on the MagTag board.

The LEDs on the MagTag board are connected to GPIO pins on the ESP32-S2, so we’ll look in the devicetree bindings index to see if there is already a binding that describes this hardware feature. There’s one called gpio-leds and the description says:

This allows you to define a group of LEDs. Each LED in the group is controlled by a GPIO. Each LED is defined in a child node of the gpio-leds node.

Perfect! That sounds exactly like what we want.

We’ll create a leds node for the MagTag based on the example provided in the binding file. The compatible property says that this node is compatible with the gpio-leds binding. Each individual LED is defined as a child node under leds. For example, led_0 is defined as pin 13 on gpio0, and is assigned the node label red_led. The GPIO_ACTIVE_HIGH flag means the LED is on when the pin is high,  and off when the pin is low.

leds {
    compatible = "gpio-leds";
    red_led: led_0 {
        gpios =  <&gpio0 13 GPIO_ACTIVE_HIGH>;
    };
};

Right about now, you might be scratching your head wondering how the heck we knew what to put in the value for the gpios property (i.e. <&gpio0 13 GPIO_ACTIVE_HIGH>;).

Here’s how you figure it out:

The gpio-leds.yaml file defines the gpios property as type: phandle-array, so we know that the value for this property must be of the form <&phandle specifier1 specifier2 etc...>. We also know that the MagTag board has a RED LED connected to pin 13 of the GPIO0 controller, so we need to use the &gpio0 phandle to refer to the controller node. Let’s look up the gpio0 controller in zephyr/dts/xtensa/espressif/esp32s2.dtsi:

gpio0: gpio@3f404000 {
    compatible = "espressif,esp32-gpio";
    gpio-controller;
    #gpio-cells = <2>;
    reg = <0x3f404000 0x800>;
    interrupts = <GPIO_INTR_SOURCE>;
    interrupt-parent = <&intc>;
    ngpios = <32>;   /* 0..31 */
};

The #gpio-cells = <2>; property tells us that there are two specifiers required for the &gpio0 phandle. The compatible = "espressif,esp32-gpio"; property tells us the name of the binding that defines what those specifiers should be. Looking in zephyr/dts/bindings/pwm/espressif,esp32-ledc.yaml, it defines the specifiers required for gpio-cells:

gpio-cells:
  - pin
  - flags

Putting it all together, we can see that the property must be specified like this:

gpios = <&gpioX pin flags>;

which in this specific example is:

gpios = <&gpio0 13 GPIO_ACTIVE_HIGH>

We can follow the same procedure to define additional nodes for the PWM LEDs and the buttons on the MagTag (using the pwm-leds and gpio-keys bindings respectively). You can see these nodes in the complete device tree source file here.

The MagTag board has a couple other GPIOs that are used to gate the neopixel power, control the ePaper display, and drive the speaker. Unfortunately, there aren’t any existing Zephyr bindings we can use to expose this hardware to the custom drivers in the magtag-demo repo, so we’ll create a simple gpios.yaml binding file that allows us to define groups of GPIOs:

magtag-demo-workspace/app/dts/bindings/gpios.yaml

The binding defines a single gpios property (similar to gpio-leds and gpio-keys):

description: |
  This allows you to define a group of GPIOs.
  
  Each GPIO is defined in a child node of the gpios node.

  Here is an example which defines three GPIOs in the node /brd-ctrl:

  / {
      brd-ctrl {
          compatible = "gpios";
          ctrl_0 {
              gpios = <&gpio0 1 GPIO_ACTIVE_LOW>;
          };
          ctrl_1 {
              gpios = <&gpio0 2 GPIO_ACTIVE_HIGH>;
          };
          ctrl_2 {
              gpios = <&gpio1 15 (GPIO_PULL_UP | GPIO_ACTIVE_LOW)>;
          };
      };
  };

compatible: "gpios"

child-binding:
    description: GPIO child node
    properties:
        gpios:
            type: phandle-array
            required: true

Now that we have a generic gpios binding, we can add the missing nodes for the remaining GPIOs.

Let’s create a speaker node that contains the GPIOs needed for the speaker on the MagTag board. In the same way we defined the LEDs above, we define two GPIOs, active to enable the speaker and sound to drive the speaker:

speaker {
    compatible = "gpios";
    active: active_pin {
        gpios = <&gpio0 16 GPIO_ACTIVE_HIGH>;
    };
    sound: sound_pin {
        gpios = <&gpio0 17 GPIO_ACTIVE_HIGH>;
    };
};

We can follow the same procedure to define additional nodes for the neopixel power and the e-paper display GPIOs (you can see these nodes in the complete device tree source file here).

Finally, we’ll create the special /alias and /chosen nodes.

The /chosen node is used to define a set of commonly used Zephyr properties for system-wide settings like the UART device used by console driver, or the default display controller. These properties refer to other nodes using their phandles (&node, where node is the node label):

chosen {
    zephyr,sram = &sram0;
    zephyr,console = &uart0;
    zephyr,shell-uart = &uart0;
    zephyr,flash = &flash0;
};

The /aliases node is used to override generic hardware devices defined by an application. For example, the Blinky sample application requires an alias led0 to be defined. We can build and run the Blinky app on any board that defines this alias, including the MagTag board which defines the alias led0 = &red_led; to map led0 to the red LED:

aliases {
    watchdog0 = &wdt0;
    led0 = &red_led;
    pwm-led0 = &red_pwm_led;
    led-strip = &led_strip;
    sw0 = &button0;
    sw1 = &button1;
    sw2 = &button2;
    sw3 = &button3;
    neopower = &neopower;
    mosi = &mosi;
    sclk = &sclk;
    csel = &csel;
    busy = &busy;
    dc = &dc;
    rst = &rst;
    activate = &active;
    sound = &sound;
};

Now that we’ve finished creating new child nodes under the root node, we can start to customize the existing SoC nodes we included from espressif/esp32s2.dtsi. This is required to provide board-specific customizations, such configuring the pins used for a SPI peripheral or specifying the devices present on an I2C bus. As I mentioned earlier, Zephyr provides a convenient shortcut to refer to existing nodes via the &node syntax (where node is the node label) so we don’t need to write out the full device tree path.

Let’s start by taking a look at the I2C1 controller node that is defined in zephyr/dts/xtensa/espressif/esp32s2.dtsi:

i2c1: i2c@3f427000 {
    compatible = "espressif,esp32-i2c";
    #address-cells = <1>;
    #size-cells = <0>;
    reg = <0x3f427000 0x1000>;
    interrupts = <I2C_EXT1_INTR_SOURCE>;
    interrupt-parent = <&intc>;
    clocks = <&rtc ESP32_I2C1_MODULE>;
    status = "disabled";
};

We can see that the I2C1 controller is disabled by default (status = "disabled";) and it’s missing some of the properties required by the espressif,esp32-i2c binding (for example, the pinctrl properties). In our adafruit_magtag.dts file, we can refer to the &i2c1 node and define the missing required properties:

&i2c1 {
    ...
};

The pinctrl-* properties assign the i2c1_default pin control state to the controller and give it the name "default". To enable the the I2C1 controller, we override the status property by assigning status = "okay";. We also set the I2C clock frequency to I2C_BITRATE_STANDARD (100 Kbit/s).

&i2c1 {
    pinctrl-0 = <&i2c1_default>;
    pinctrl-names = "default";
    status = "okay";
    clock-frequency = <I2C_BITRATE_STANDARD>;
};

The MagTag board has an onboard LIS3DH accelerometer on the I2C1 bus, so we also add a subnode lis3dh@19. In devicetree jargon, the @19 is called the unit address and it defines the “subnode’s address in the address space of its parent node” (which in this case is the accelerometer’s I2C address in the address space of possible I2C addresses). The compatible = "st,lis2dh"; property assigns the correct binding for the accelerometer so that the Zephyr sensor drivers can use it, and the reg = <0x19>; property sets the device’s I2C address on the bus.

&i2c1 {
    pinctrl-0 = <&i2c1_default>;
    pinctrl-names = "default";
    status = "okay";
    clock-frequency = <I2C_BITRATE_STANDARD>;

    lis3dh@19 {
        compatible = "st,lis2dh";
        reg = <0x19>;
    };
};

Some nodes, like &gpio0, don’t require any additional configuration, but are disabled by default. These nodes can be enabled simply by overriding the status property:

&gpio0 {
    status = "okay";
};

We can follow the same procedure to configure the remaining nodes for the ESP32-S2 SoC (you can see these nodes in the complete device tree source file here).

adafruit_magtag.overlay

In some cases, an application may need to extend or modify nodes in the board’s devicetree structure. Zephyr provides this flexibility through the use of a devicetree overlay file. The build system will automatically pick up the overlay file if it’s placed in the <app>/boards/ subdirectory and named <board_name>.overlay.

For example, let’s create an overlay for the golioth-demo app in the magtag-demo repo:

magtag-demo-workspace/app/golioth-demo/boards/adafruit_magtag.overlay

The &wifi node for the ESP32-S2 is disabled by default. The golioth-demo app needs Wi-Fi to be enabled so it can connect to the Golioth cloud, so we’ll enable it in the app overlay:

&wifi {
    status = "okay";
};

You can see the complete overlay file here.

Define the board software features using Kconfig

Before we can compile a firmware image for the board, we need to provide some configuration options that will allow us to control which software features are enabled when building for this board. Similar to the Linux kernel, Zephyr uses the Kconfig language to specify these configuration options.

For more details on how to use Kconfig to configure the Zephyr kernel and subsystems, see Configuration System (Kconfig) in the Zephyr docs.

There are four Kconfig files we need to provide as part of the board definition:

magtag-demo-workspace/
└── app/
    ├── boards/
    │   └── xtensa/
    │       └── adafruit-magtag/
    │           ├── Kconfig.board
    │           ├── Kconfig.defconfig
    │           └── adafruit_magtag_defconfig
    └── golioth-demo/
        └── boards/
            └── adafruit_magtag.conf

Kconfig.board

This file is included by boards/Kconfig to include your board in the list of available boards. We need to add a definition for the top-level BOARD_ADAFRUIT_MAGTAG Kconfig option. Note that this option should depend on the SOC_ESP32S2 Kconfig option which is defined in soc/xtensa/esp32s2/Kconfig.soc:

config BOARD_ADAFRUIT_MAGTAG
    bool "Adafruit MagTag board"
    depends on SOC_ESP32S2

Kconfig.defconfig

This file sets board-specific default values.

# Always set CONFIG_BOARD here. This isn't meant to be customized,
# but is set as a "default" due to Kconfig language restrictions.
config BOARD
    default "adafruit_magtag"
    depends on BOARD_ADAFRUIT_MAGTAG

The ENTROPY_GENERATOR Kconfig option enables the entropy drivers for the networking stack:

config ENTROPY_GENERATOR
    default y

adafruit_magtag_defconfig

This file is a Kconfig fragment that is merged as-is into the final .config in the build directory whenever an application is compiled for this board.

The CONFIG_XTENSA_RESET_VECTOR Kconfig option controls whether the initial reset vector code is built. On the ESP32-S2, the reset vector code is located in the mask ROM of the chip and cannot be modified, so this option is disabled:

CONFIG_XTENSA_RESET_VECTOR=n

Whenever we’re building an application for this board specifically, we want to ensure that the top-level Kconfig options for the SoC and the board itself are enabled:

CONFIG_BOARD_ADAFRUIT_MAGTAG=y
CONFIG_SOC_ESP32S2=y

Change the main stack size for the various system threads to 2048 (the default is 1024):

CONFIG_MAIN_STACK_SIZE=2048

Set the system clock frequency to 240 MHz:

CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC=240000000

Zephyr is flexible and it supports emitting console messages to a wide variety of console “devices” beyond just a serial port. For example, it is possible to emit console messages to a RAM buffer, the semihosting console, the Segger RTT console, etc. As a result, we need to configure Zephyr to:

  1. Enable the console drivers (CONFIG_CONSOLE)
  2. Enable the serial drivers (CONFIG_SERIAL)
  3. Use UART for console (CONFIG_UART_CONSOLE)
CONFIG_CONSOLE=y
CONFIG_SERIAL=y
CONFIG_UART_CONSOLE=y

The ESP32-S2 defines its own __start so we need to disable CONFIG_XTENSA_USE_CORE_CRT1:

CONFIG_XTENSA_USE_CORE_CRT1=n

Enable the GPIO drivers:

CONFIG_GPIO=y

The ESP32 platform uses the gen_isr_tables script to generate its interrupt service request tables. Reset vector code is located in the mask ROM of the ESP32 chip and cannot be modified, so it does not need an interrupt vector table to be created:

CONFIG_GEN_ISR_TABLES=y
CONFIG_GEN_IRQ_VECTOR_TABLE=n

Enable support for the hardware clock controller driver:

CONFIG_CLOCK_CONTROL=y

Configure the ESP-IDF bootloader to be built and flashed with our Zephyr application:

CONFIG_BOOTLOADER_ESP_IDF=y

Enable the SPI drivers for the WS2812 “neopixel” LEDs:

CONFIG_SPI=y
CONFIG_WS2812_STRIP_SPI=y

adafruit_magtag.conf

This file defines the application-specific configuration options.

For example, magtag-demo-workspace/app/golioth-demo/boards/adafruit_magtag.overlay enables & configures the WiFi networking stack, including the Golioth utilities for easy WiFi setup:

CONFIG_WIFI=y
CONFIG_HEAP_MEM_POOL_SIZE=37760

CONFIG_NET_L2_ETHERNET=y

CONFIG_NET_DHCPV4=y

CONFIG_NET_CONFIG_LOG_LEVEL_DBG=y
CONFIG_NET_CONFIG_NEED_IPV4=y

CONFIG_MBEDTLS_ENTROPY_ENABLED=y
CONFIG_MBEDTLS_KEY_EXCHANGE_ECDHE_ECDSA_ENABLED=y
CONFIG_MBEDTLS_ECP_ALL_ENABLED=y

CONFIG_ESP32_WIFI_STA_AUTO_DHCPV4=y

CONFIG_GOLIOTH_SAMPLE_WIFI=y

# when enabling NET_SHELL, the following
# helps to optimize memory footprint
CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM=8
CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM=8
CONFIG_ESP32_WIFI_DYNAMIC_TX_BUFFER_NUM=8
CONFIG_ESP32_WIFI_IRAM_OPT=n
CONFIG_ESP32_WIFI_RX_IRAM_OPT=n

Configure the build system

Before we can actually build and flash the firmware, we need to add a couple additional files for the Zephyr build system.

zephyr/module.yml

First, we need to create a module description file:

magtag-demo-workspace/app/zephyr/module.yml

This tells the build system where to find the new board and device tree source files we added above:

build:
    settings:
        board_root: .
        dts_root: .

board.cmake

In order to flash the firmware image onto the MagTag board, we need to add CMake board file:

magtag-demo-workspace/app/boards/xtensa/adafruit-magtag/board.cmake

We can just copy the file that Espressif provided for the esp32s2_saola board (since the MagTag uses the same ESP32-S2 module). This file includes the generic CMake files for the ESP32 family and OpenOCD (making sure the correct OpenOCD from the Espressif toolchain is used):

if(NOT "${OPENOCD}" MATCHES "^${ESPRESSIF_TOOLCHAIN_PATH}/.*")
    set(OPENOCD OPENOCD-NOTFOUND)
endif()
find_program(OPENOCD openocd PATHS ${ESPRESSIF_TOOLCHAIN_PATH}/openocd-esp32/bin NO_DEFAULT_PATH)

include(${ZEPHYR_BASE}/boards/common/esp32.board.cmake)
include(${ZEPHYR_BASE}/boards/common/openocd.board.cmake)

Build the firmware

At this point, we should have everything we need for the new MagTag board definition. For example, we should be able to build the firmware for the golioth-demo app using the following command:

west build -b adafruit_magtag app/golioth-demo

Next Steps

Hooray! We’ve successfully added a new board definition! 🎉

If you’d like to try out the Golioth demo apps yourself, you can take the self-paced training online for free at https://training.golioth.io/docs/intro

We can also provide private training for your company or group. Please contact us directly if interested.

Golioth returned to Embedded World in 2023 to showcase at the Zephyr booth. We brought a range of designs built with Zephyr and connected to the Golioth Cloud. Each of our Reference Designs show how Golioth technology can target verticals throughout the industry. We are regularly creating new designs and posting about them, both on this blog and on the Golioth Projects site.

Moving To Common Elements With Our 2023 Designs

We had a more standardized form factor and design elements with our 2023 designs than our demos at Embedded World 2022. Last year, we wanted to differentiate the functions and features of the Golioth Cloud when showcasing the “color demos”. Each of these demonstrated the different parts of our platform.

Golioth Embedded World 2022 Color Demos

This year was all about showcasing how similar many IoT designs can be. By extension, we wanted to show how we can swap some hardware and firmware to target entirely different market segments.

We built a new form factor that contains off-the-shelf hardware but still presents it in a somewhat compact manner. This took the form of the Aludel Mini case and PCB design, as well as our Ostentus front panel, both of which we have written about before. The result is a black box (har har) that allows us to target verticals. Our goals in the near future is to create additional firmware resources to make it easier for our users to replicate these designs using 100% off-the-shelf components.

Asset Tracker Port Side of Aludel Mini with Ostentus

Asset Tracker Connector Side of Aludel Mini with Ostentus

The 2022 designs had explanatory information / diagrams on the top PCB. This year we migrated to putting that information on a laser cut backing plate used as a mounting surface for the actual Reference Designs. These allowed visitors to read more on their own, if they desired, and kept our Reference Designs smaller and more like what might be deployed to the field. See images below for examples of backing plates.

Reference Design Demos

We brought 5 Reference Designs with us to Embedded World. In fact, we had more demos than we had space in the Zephyr booth to showcase them! Alas, we tried our best to highlight each element to the people walking by our booth.

DC Power monitor

This design was based off of our AC Power Monitor Reference Design that we have published about before. However, when thinking about logistics at a conference, we didn’t feel comfortable monitoring AC power in the booth. Instead, Mike took the design and swapped out the Click headers and reworked some of the firmware to instead monitor USB power flowing through the design. In the video above, you can see that we monitored the current of a fan and a USB lightbulb and then were able to dynamically chart the power usage on our bespoke Grafana dashboard.

Air Quality Monitor

Our Air Quality Monitor Reference Design was so new for Embedded World that we had only just published about it on our projects site. We will be doing a blog post and video about it soon. The main focus was capturing and displaying this information both on the Ostentus display (front panel) and then on the associated dashboard.

There are two interesting things that differentiate this design from the others. The first is a Remote Procedure Call (RPC) that directly activates the fan onboard to start a cleaning cycle. This is a great example of how RPCs can be used for one-off activities triggered programmatically on an “as needed” basis.

The second is the use of LightDB State to visualize and trigger warnings from the device. Note the red dotted line in the chart on the CO2 concentration. This is a configurable level in LightDB State on a per-device basis. It could be used to trigger a local alarm (light, sound) or can be used to trigger other notification/alarm activities on the cloud.

Cold Storage Asset Tracker

Last year we brought an asset tracker in the form of the Orange Demo, based upon the Nordic Thingy91. This year we upgraded with a more accurate tracking GPS module that can run simultaneously with the cellular modem.

The unit tracks temperature for “cold storage” applications. This is a common use case for refrigerated trucks and shipping containers, as wells as tracking of vaccines in transit between medical facilities. Demonstrate GPS inside a conference hall is a challenge because of being locked to one position and under a bunch of metal girders, but we were able to showcase the underlying hardware and example paths that recalled historic trip data stored on Golioth.

IoT Trashcan Monitor

Our waste management solution is made to help municipalities and parks departments more efficiently route their diesel trash trucks. As in the rest of our demos, this becomes and exercise in scaling things down to fit on a tabletop in a conference facility. We achieved this by creating a portable (foldable) trashcan that we can setup on conference booth tables.

IoT Trashcan Monitor Demo on the right side

The miniature version of a trashcan helps to illustrate the usefulness of the Golioth Settings Service. The original demo had a trashcan that was roughly 1 meter tall and all of the “percentage-full” levels were based off of that height. Golioth makes it simple to select the individual device we brought to the show and adjust the height for a 300 mm tall trashcan. This “calibration” was instantly sent down to the device and it reported levels in exactly the same way it had for a taller trashcan.

Soil Moisture Monitor

The Soil Moisture Monitor Reference Design measures soil moisture levels and the amount of light reaching the unit. During this conference it barely saw any light, since we ran out of room on the desk! You can find full details in the soil moisture monitor demo video, and project page. We will have this and many new designs on display at the Embedded Open Source Summit in Prague in late June. Please be sure to stop by there to see what we have been working on!

We’ll Train You To Build Your Own Zephyr Design

One of the things we were sure to point out in each of the example videos above is that we are running new training sessions showing people how to design with Zephyr. If you’d like to learn how to build your next design with the popular Open Source RTOS and Ecosystem, sign up at golioth.io/ew23.

The Nordic nRF9160 is a fantastic solution for cellular connectivity. All of the Golioth sample code runs on the nRD9160-DK without any changes to configuration so you can test out all of our features. But eventually you’re going to start scratching your head about how the LTE connection works. This is especially true because the device will do nothing while first establishing the connection, which can take over one minute depending on your network. The answer is to use the LTE link controller library.

Using Automatic LTE Control

The Nordic nRF Connect SDK (based on Zephyr) makes it really easy to automatically connect to LTE. Simply put, there’s a Kconfig symbol that causes the modem to connect to the cellular network when the device powers up:

CONFIG_LTE_AUTO_INIT_AND_CONNECT=y

While this takes care of the connection, it does so before main() begins running, and it blocks program execution until the network is connected. That can take more than a minute and depends on things like your distance to your closest tower and the strength of the signal in your office. Since you can’t write log messages, toggle LEDs, or write to a screen, it can look to a user like the device is stuck.

For battery-controlled devices you want to carefully control when the radio is on and when it is off. In this case, automatic control is usually not an option.

Using Manual LTE Control

Nordic’s docs for the LTE link controller are fantastic, you should spend the time to read through them. We’ll discuss the most basic form of link control: manually establishing a connection.

To use the link controller, first select the library using Kconfig:

CONFIG_LTE_LINK_CONTROL=y

Then include the library in your c file:

#include <modem/lte_lc.h>

We can now start using the link controller functions. For me, the most interesting ones are the async functions. For instance:

int err;

//Initalize the modem and connect to the network (register a callback)
err = lte_lc_init_and_connect_async(lte_handler);

/* Do some things on the network */

//Place the modem in offline mode
err = lte_lc_offline();

/* Do some offline things */

//Modem already initialized, just reconnect again
err = lte_lc_connect_async(lte_handler);

Notice that lines 4 and 14 register a callback. Nordic has a nice example of what that callback should look like:

/* Semaphore used to block the main thread until the link controller has
 * established an LTE connection.
 */
K_SEM_DEFINE(lte_connected, 0, 1);

static void lte_handler(const struct lte_lc_evt *const evt)
{
     switch (evt->type) {
     case LTE_LC_EVT_NW_REG_STATUS:
             if ((evt->nw_reg_status != LTE_LC_NW_REG_REGISTERED_HOME) &&
             (evt->nw_reg_status != LTE_LC_NW_REG_REGISTERED_ROAMING)) {
                     break;
             }

             printk("Connected to: %s network\n",
             evt->nw_reg_status == LTE_LC_NW_REG_REGISTERED_HOME ? "home" : "roaming");

             k_sem_give(&lte_connected);
             break;
     case LTE_LC_EVT_PSM_UPDATE:
     case LTE_LC_EVT_EDRX_UPDATE:
     case LTE_LC_EVT_RRC_UPDATE:
     case LTE_LC_EVT_CELL_UPDATE:
     case LTE_LC_EVT_LTE_MODE_UPDATE:
     case LTE_LC_EVT_TAU_PRE_WARNING:
     case LTE_LC_EVT_NEIGHBOR_CELL_MEAS:
     case LTE_LC_EVT_MODEM_SLEEP_EXIT_PRE_WARNING:
     case LTE_LC_EVT_MODEM_SLEEP_EXIT:
     case LTE_LC_EVT_MODEM_SLEEP_ENTER:
             /* Callback events carrying LTE link data */
             break;
     default:
             break;
     }
}

Also notice that they recommend using a semaphore. Because this runs asynchronously, checking this semaphore is a good way for the rest of your code to know if an LTE connection has been established.

Considerations when Using Golioth with Manual Control

Golioth depends on a network connection. When you manually control the network, you should also take Golioth into consideration. You need to wait until LTE has been connected to start the Golioth System Client. A good place to do this is in the callback:

case LTE_LC_EVT_NW_REG_STATUS:
    if ((evt->nw_reg_status != LTE_LC_NW_REG_REGISTERED_HOME) &&
     (evt->nw_reg_status != LTE_LC_NW_REG_REGISTERED_ROAMING)) {
        break;
    }

    LOG_INF("Connected to LTE network. Starting Golioth System Client...");

    golioth_system_client_start();

    break;

Before going offline or putting the modem into a sleep mode it is recommended that you stop the Golioth client:

golioth_system_client_stop();

Calls to Golioth services (LightDB State, LightDB Stream, etc.) will cause errors if the client is not connected. You may choose simply to ignore the errors in the serial terminal, however, gating those function calls with a semaphore is another option.

For applications that utilize an intermittent network connection, we like using message queues to cache data. Timestamps may be added to each reading so that the data is properly recorded the next time a connection is available. We have previously discussed using Zephyr message queues for this purpose.

Conclusion

Automatic LTE control is great for trying out demo code. However, we think in most applications you’ll want to decide when and how to use the modem. Luckily, for this particular SIP, Nordic has made the control library really easy to use.

Do you have questions about cellular modem control with your IoT fleets? We’d love to hear from you! Open a new thread on the Golioth Forum, or set up a video call with our Developer Relations crew to discuss your use case.

In a best case scenario, once an IoT fleet is deployed, you never need to (physically) touch them again. Golioth helps give you tools to work with your devices remotely and make this a reality. Today, we’ll look at dynamically modifying the number of logs being sent back to the Cloud. This allows fleet managers to peek into individual devices without needing to waste data and battery power by always sending every log message back to Golioth.

Logging to the cloud is already built into Golioth, so it’s really just a matter of tuning how many logs are being sent by your devices. Golioth hooks into the Zephyr RTOS Logging service, which we’ll be showcasing here today.

Background on Remote Logging with Zephyr

The Golioth Zephyr SDK has remote logging built it, and in our sample applications (like the hello sample) it is enabled by default in the prj.conf files:

CONFIG_LOG_BACKEND_GOLIOTH=y
CONFIG_LOG_PROCESS_THREAD_STACK_SIZE=2048

At the top of each C file you need to register for logging. This is also a good place to set the default logging level, which I’ll refer to as the “compiled-in” logging level.

#include <zephyr/logging/log.h>
LOG_MODULE_REGISTER(remote_logging_example, LOG_LEVEL_DBG);

This compiled-in level is an important decision because the preprocessor will not include logging calls if they have a higher value than this parameter. For instance, if you set the default to LOG_LEVEL_ERR, you cannot remotely turn on debugging messages because LOG_LEVEL_DBG is a higher logging level. Here is the hierarchy of logging levels in Zephyr:

#define LOG_LEVEL_NONE 0U
#define LOG_LEVEL_ERR  1U
#define LOG_LEVEL_WRN  2U
#define LOG_LEVEL_INF  3U
#define LOG_LEVEL_DBG  4U

In our case, the solution is to use LOG_LEVEL_DBG when compiling, and programmatically set the value to a lower level at run time. This will make your binary a bit larger, since the strings for every logging message will be included, but it delivers the option to turn on debugging messages after deployment which in most cases is worth the extra bytes of flash.

Write a Function to Set Log Levels at Run Time

Now we need a function that, when called, will automatically adjust the logging level for every logging source on the device.

#include <zephyr/logging/log_ctrl.h>

void change_logging_level(int log_level) {
    int source_id = 0;
    char *source_name;
    while(1) {
        source_name = (char *)log_source_name_get(0, source_id);
        if (source_name == NULL) {
            break;
        } else {
            LOG_WRN("Settings %s log level to: %d", source_name, log_level);
            log_filter_set(NULL, 0, source_id, log_level);
            ++source_id;
        }
    }
}

This function uses Zephyr’s logging control to query the name of each logging source that is available on the system. It then uses that source name to set the new logging level.

If you are putting cellular devices into the field, you probably don’t want to have logging turned on very high (or at all) by default since you’ll be paying for bandwidth. The first thing you can do at run time is call this function:

change_logging_level(LOG_LEVEL_ERR);

Now, only error messages will be logged to the Golioth cloud.

Setting Log Level Remotely

There are two obvious ways you can go about setting log levels remotely: the Golioth Remote Procedure Call (RPC) service, and the Golioth Settings service. Since the most likely use for this is turning on logs for a single device (and not fleetwide all at the same time), using a remote procedure call makes the most sense to me.

// Remote procedure call (RPC) callback for setting the logging levels
static enum golioth_rpc_status on_set_log_level(QCBORDecodeContext *request_params_array,
                       QCBOREncodeContext *response_detail_map,
                       void *callback_arg)
{
    double a;
    uint32_t log_level;
    QCBORError qerr;

    QCBORDecode_GetDouble(request_params_array, &a);
    qerr = QCBORDecode_GetError(request_params_array);
    if (qerr != QCBOR_SUCCESS) {
        LOG_ERR("Failed to decode array item: %d (%s)", qerr, qcbor_err_to_str(qerr));
        return GOLIOTH_RPC_INVALID_ARGUMENT;
    }

    log_level = (uint32_t)a;

    if ((log_level < 0) || (log_level > LOG_LEVEL_DBG)) {

        LOG_ERR("Requested log level is out of bounds: %d", log_level);
        return GOLIOTH_RPC_INVALID_ARGUMENT;
    }

    change_logging_level(log_level);
    return GOLIOTH_RPC_OK;
}

// Register RPC to listen for "set_log_levels"
// This should be called from your "on_connect()" callback
golioth_rpc_register(rpc_client, "set_log_level", on_set_log_level, NULL);

There are two parts to the function above. The first is a callback that will run when a remote procedure call (RPC) instruction is received from the Golioth servers. It will get incoming log level as a parameter, validate it, then run the function we discussed in the previous section to change the log levels.

The second part of the code is the act of registering the RPC. This tells the Golioth servers that this device wants to be notified whenever a callback with the name "set_log_level" is issued from the Golioth web console, or via the Golioth REST API.

Golioth RPC for setting logging levels

The Golioth web console can be used to send an RPC, or you may do so using the Golioth REST API

Here’s an example of using the web console interface to submit this RPC. I sent a request to change to log level 3 (LOG_LEVEL_INF), and upon success/failure we get a notification message back. This RPC was successful, and took 913 milliseconds for the device to receive the message, execute it, and report the results.

[00:00:37.736,663] <inf> app_work: Sending hello! 2
[00:01:07.738,800] <inf> app_work: Sending hello! 3
[00:01:36.947,418] <wrn> app_rpc: Settings golioth_dfu log level to: 3
--- 12 messages dropped ---
[00:01:36.947,448] <wrn> app_rpc: Settings golioth_rd_template log level to: 3
[00:01:36.947,479] <wrn> app_rpc: Settings golioth_samples log level to: 3
[00:01:36.947,479] <wrn> app_rpc: Settings golioth_system log level to: 3
[00:01:36.947,509] <wrn> app_rpc: Settings lightdb log level to: 3
[00:01:36.947,540] <wrn> app_rpc: Settings log log level to: 3
[00:01:36.947,570] <wrn> app_rpc: Settings lte_lc log level to: 3
[00:01:36.947,601] <wrn> app_rpc: Settings lte_lc_helpers log level to: 3
[00:01:36.947,631] <wrn> app_rpc: Settings mcuboot_util log level to: 3
[00:01:36.947,662] <wrn> app_rpc: Settings modem_antenna log level to: 3
[00:01:36.947,692] <wrn> app_rpc: Settings mpu log level to: 3
[00:01:36.947,723] <wrn> app_rpc: Settings net_buf log level to: 3
[00:01:36.947,753] <wrn> app_rpc: Settings net_coap log level to: 3
[00:01:36.947,753] <wrn> app_rpc: Settings net_core log level to: 3
[00:01:36.947,784] <wrn> app_rpc: Settings net_if log level to: 3
[00:01:36.947,814] <wrn> app_rpc: Settings net_shell log level to: 3
[00:01:36.947,845] <wrn> app_rpc: Settings net_sock log level to: 3
[00:01:36.947,875] <wrn> app_rpc: Settings net_sock_addr log level to: 3
[00:01:36.947,906] <wrn> app_rpc: Settings net_sock_tls log level to: 3
[00:01:36.947,937] <wrn> app_rpc: Settings net_sock_wrapper log level to: 3
[00:01:36.947,967] <wrn> app_rpc: Settings net_socket_offload log level to: 3
[00:01:36.947,998] <wrn> app_rpc: Settings net_utils log level to: 3
[00:01:36.948,028] <wrn> app_rpc: Settings nrf_modem log level to: 3
[00:01:36.948,059] <wrn> app_rpc: Settings os log level to: 3
[00:01:36.948,089] <wrn> app_rpc: Settings pm log level to: 3
[00:01:36.948,120] <wrn> app_rpc: Settings settings log level to: 3
[00:01:36.948,150] <wrn> app_rpc: Settings shell.shell_uart log level to: 3
[00:01:36.948,181] <wrn> app_rpc: Settings shell_uart log level to: 3
[00:01:36.948,211] <wrn> app_rpc: Settings soc log level to: 3
[00:01:36.948,242] <wrn> app_rpc: Settings stream log level to: 3
[00:01:36.948,272] <wrn> app_rpc: Settings uart_nrfx_uarte log level to: 3
[00:01:37.741,027] <inf> app_work: Sending hello! 4
[00:02:07.743,041] <inf> app_work: Sending hello! 5
uart:~$ 

On the device side, we can see the output of our RPC on a serial terminal (above). There are a lot of logging sources running on this device and they have all been individually set to level 3.

Remember, if the compiled-in level for any given source has been set lower (to only show errors, or to show no logging), setting a higher number at run time will not return additional messages because higher-level messages were not included in the build.

There and Back Again

What does it take to troubleshoot an IoT device in the field? If designed correctly, it will not take a physical visit to the device, but merely a few remote communications. Ideally, you will turn on debugging, analyze the issues you’re having, and then send a command to adjust accordingly.

But even if you haven’t planned very far ahead, with Golioth you can still enable these features. We recommend that every device you put in the field have Golioth Over-the-Air (OTA) firmware updates enabled. That way, you can send these remote-logging features to your devices, even if they are already in the field.

Do you have questions or suggestions on adjusting logging levels remotely? We’d love to hear from you on the Golioth Forum!

One of my first engineering jobs was in the Test and Measurement space. Tracing everything back to standards and calibration is a key part of the process. It takes a long time and is taken very seriously. I had many learning experiences that reinforced the importance of calibration (despite my tongue-in-cheek article title).

IoT devices often don’t have the luxury of being as accurate: the cost of sensors, the power usage of analog measurements, the “awake” windows for battery power devices… all of these things contribute to different priorities while measuring the physical world. However, if you have a precise (repeatable) sensor, you can utilize the trend of the data in a useful way.

If you ARE going to calibrate, it’s normally done in a sensor library. For certain sensors, Zephyr excels at this. There are built in sensor libraries that return standard values. You can even pull calibrated and normalized readings from the sensor in real time from Zephyr’s sensor shell. I consistently use sensors like the BME280 and the LIS2DH in my reference designs since it is really easy to add the readings to the pile of data I send back to Golioth.

What happens when a sensor driver is not in tree and I don’t want to go about writing my own sensor driver? I pull the raw readings to the cloud and work with them there!

Prototyping, not production

When you are trying to prove your IoT system can work or test out a business idea, you don’t always start by making production ready designs. But you can still make useful designs by pulling raw data readings.

I did this recently with an i2c sensor that was pulling in ADC readings. The “counts” that I gathered from the ADC are tied to physical phenomenon (in this case, soil moisture), but they are not absolute values. Instead I am able to gather the readings using direct ADC readings and then publish them to Golioth’s LightDB Stream service.

Below, I will discuss three ways you can interact with this data using other Golioth services and interactively produce even more useful data. These include tweaking threshold settings, calibrating via remote procedure call (RPC), and using OTA to upload new machine learning sets.

Set “thresholds” with Settings

One of my favorite things to use the Golioth Settings Service for is interacting with data and creating an extra layer of intelligence. For the IoT Trashcan Monitor reference design, I do this to set “threshold” on each device in the field. There is a time-of-flight sensor that gathers the distance from the top of the trashcan to the top of the garbage in that trashcan. What happens if you want to utilize the same sensor but on different sized trashcans? What if you want to set the “100%” level of the trashcan differently, say if one part of the national park needs to have a cleaner look than another?

You could keep this intelligence on the Cloud, but then you don’t get the benefit of the device reporting it’s various levels, so it’s harder to read from a terminal or in the logs. I push a level I created on the Settings Service down to each device that defines different thresholds:

I also utilize these levels coming back from each trashcan to trigger icons on our Grafana dashboards, which makes it even easier to tell which devices require intervention.

The above is just for the Trashcan example, there are loads of other examples where you might want to have field-settable values, from the installer or technician. In the case of the soil moisture sensor, I want to be able to calibrate “wet” and “dry” (per the simplistic calibration instructions) and then do some interpolation in between.

On the fly modifications with Remote Procedure Calls

Most of the time when using raw data from a sensor, it is done in a “device to cloud” context. You take the reading from the sensor and ship it off to be dealt with by larger computers (AKA “the cloud”). However, some applications will include the need for some kind of feedback from the cloud computing element. You could argue this is what we were doing above, since the Settings Service is pushing data down to the device.

Sometimes you want to be able to inject data into your data measurement and management process on the device, which is a perfect use case for Remote Procedure Calls (RPCs). One way to think of RPCs is if you were accessing a function between two different parts of your code. You put the function prototype in your code.h file… except now you can access that function from the cloud. So maybe I don’t want to do a full calibration on the device, but it is beneficial to set an offset for something like an i2c-based thermocouple measurement chip like the MCP96L01T. I could easily pipe raw data from the chip up to the cloud, but I might want to change a setting for the resolution of that data or the cold junction compensation temperature.

Registers on the MCP960X family of parts. You could write a function to change these values with raw i2c writes to the chip and have the function be accessible via a Golioth RPC.

I could have a function on the device that I use during start that sets these values like set_thermocouple_resolution() and set_cold_junction_temp_c(), which I use during startup. Under the covers they would be simple i2c writes to the device to set registers with bit-masked values. However, I could also expose these to the cloud using an RPC. When I call that RPC from the REST API or the Golioth Console, I also pass a value (in this case a new resolution setting or cold junction temp), it gets validated on the device as an acceptable value, and then a success message is sent back to the cloud once its executed. Add in logging messages into the device-side code, and you should be able to easily see that the device has successfully switched modes and the raw data being piped back is now different. (The change means a higher resolution or a different cold junction compensation.)

Machine Learning + data capture + OTA

The ultimate (and most trendy) way to deal with sensor data is to not care at all about the sensor output. Instead, capture and correlate data with desired behaviors; collect data with a “known good” and a “known bad” state of your machine, for instance, to allow the model to discern between the two.

Golioth has the tools to enhance this method of working with data, on a local machine or from afar. First, you can capture general data using LightDB stream, sending back your raw data readings from your sensor. If desired, you could also link a button, switch, or other input on the device to correlate when you are performing “Action A” that is different from “Action B”. Next, you capture and ingest that data into a machine learning algorithm like PyTorch or TinyML. Finally, when you create or revise your model and build it into your design, you can upload that new firmware using the Golioth Over-The-Air (OTA) service. Over time, you can continue to refine the model and even combine it with the other methods mentioned above.

Start prototyping today!

One thing I hope you get out of reading this article is the prototyping capabilities available to you when you use Golioth. While I benefit from the Zephyr driver ecosystem (as well as driver ecosystems from other SDKs that we support), I don’t want to feel limited when I want to try out a new chip that isn’t in-tree yet. I feel comfortable doing things like raw i2c and SPI read/write functions, and now can make that data available to the Cloud for even faster prototyping. If you need help prototyping your next IoT device, stop by our Forum to discuss or send an email to [email protected].

Cellular IoT products struggle with battery life. Getting and staying connected to a cell tower takes a good amount of energy. Though we’re past the days of GSM drawing amps of current (!), it is still costly to open sessions to the tower. Understanding how your device is communicating with the Cloud is crucial to building robust devices that will last years in the field.

This webinar will include Golioth team members, alongside Jared Wolff of CircuitDojo. Jared is an early adopter of Golioth and a Golioth Ambassador, as well as an advocate for Zephyr devices. Golioth utilizes Jared’s nRF9160 Feather board designs in all of our current cellular-based Reference Designs.

What you can expect from this Webinar

First and foremost, we hope to make this a more dynamic and interactive session than many technical webinars. (No one will be reading Powerpoint slides in a monotone voice here!) The session will cover:

  • Understanding your connection to cellular towers
  • Understanding your power draw when in a passive or sleep mode
  • Measurement challenges for embedded cellular projects
  • Methods for saving data (and power) when connecting to the cloud
  • Building robust device health metrics for your fleet
  • System architecture decisions for lower power circuit boards

Sign up now!

This webinar is at 1 pm EST / 10 am PST on January 18th, 2023. If you can’t make it the day of, you can still still sign up to access the on-demand content. Those who attend will have an opportunity to ask questions towards the end of the presentation.

Luis Ubieda is the Lead Firmware Engineer at Croxel. He has a background in Electrical Engineering and is passionate about Electronics, Embedded Systems, and IoT Technology.

Embedded systems are riddled with complexity, mainly because they are at the intersection of expertise. Problem are often a mixture of software, electrical and mechanical issues. This is the case even for seemingly simple tasks, such as reliably detecting a button-press.

“Wait — a button-press??”

Let’s think for a second, how a button press is detected:

  1. Initial State: a button-press typically consists of a “normally-open switch”, which through a pull-up/down resistor, is normally “high” or normally ‘low’: this is the initial state.
  2. Press Event: then, when the button is pressed, the switch is closed and the signal transitions towards the opposite state (e.g, low for normally “high” state), and it’s sustained for as long as the button is held by the user.
  3. Release Event: Finally, when the button is released, it goes back to its initial state.
diagram showing the busy electrical signal produced at the beginning of a button press

Image: Signal during a button-press/release sequence where the transitions are outlined; and the scoped signal has noise. Source: GeeksforGeeks.

In an ideal world, we could just look at the signal edges to keep track of the transitions and assume a falling edge is a “press-event” and rising edge the “release event”. In our world, these transitions are affected by electrical transients caused by the mechanical properties of the button actuator. The suppression of this noise it’s commonly called “debouncing”.

“Ok, I get it. How can we `debounce` button-presses?”

There are two main ways we can approach this: the hardware-way and the software-way. Today I’ll touch on both, and detail the technique I prefer to use when debouncing button input with the Zephyr RTOS.

The Hardware Way

The hardware-way focuses on the root cause: the electrical noise. It guarantees the digital signal won’t have such noise during transitions; and it does it through the use of low-pass filters (most probably RC-filters). There are some pretty cool articles that detail this approach (see references at the end of the article).

The Software Way

On the other hand, the software-way is about “ignoring” these false positives on the signal transitions to determine which ones are the real events we’re looking for, and which ones aren’t. Even though there are many ways implement software debouncing, there are two main approaches, depending on whether the variable of interest is the signal state or the transitions: periodic sampling and tracking edge interrupts.

A. Periodic Button State Sampling

Button sampling works by periodically acquiring the signal state, which is buffered on a continuously rolling sample-set. Through detection of consecutive states, the signal change event is detected (either pressed or released). The rule is simple: if there are X-number of consecutive samples with a changed state, we assume the transition really happened. This periodic sampling is often in the order of 10 to 25-ms and is commonly paced by a hardware timer to guarantee fixed intervals and to free some CPU usage.

diagram showing a series of 1 and 0 signals sampled to detect a button press

B. Tracking Edge-Interrupts with Minimum Cooldown

This works by coordinating the detection of edge changes with the spacing between these: there must be a minimum duration before legitimate signal transition. This cooldown phase is commonly implemented through a timer, which kicks-off on the edge-interrupts: the timer gets restarted on each transition and only when it expires (after 10-ms to 25-ms of no edge-changes), the firmware handles the transition as an event.

diagram showing the ignored edges of a button press signal

Software Approach B: Tracking Edge-Interrupts with Minimum Cooldown

In multi-threaded systems, we can leverage the use of RTOS primitives to make the code more modular while simplifying the logic to achieve the same purpose: featuring a thread and semaphores to control the state transitions and decide when to notify the user of the module when an event occurred.

Example Code – Zephyr RTOS

The following code presents an example of achieving the software approach B on Zephyr, with the following observations:

  • We’re using Zephyr GPIO interrupt APIs to keep track of the edge-changes.
  • We’re using the system workqueue as the cooldown mechanism for false positives.
  • Our button-detection module, abstracts both of these details, and only notifies the user of the relevant events: pressed or released.
  • Note: the user callback context is the workqueue handler, therefore: actions on this context shall be kept brief to allow proper functioning of other parts of the system.
#ifndef _BUTTON_H_
#define _BUTTON_H_
 
enum button_evt {
   BUTTON_EVT_PRESSED,
   BUTTON_EVT_RELEASED
};
 
typedef void (*button_event_handler_t)(enum button_evt evt);
 
int button_init(button_event_handler_t handler);
 
#endif /* _BUTTON_H_ */
#include <zephyr/kernel.h>
#include <zephyr/drivers/gpio.h>
#include "button.h"
 
#define SW0_NODE    DT_ALIAS(sw0)
 
static const struct gpio_dt_spec button = GPIO_DT_SPEC_GET(SW0_NODE, gpios);
static struct gpio_callback button_cb_data;
 
static button_event_handler_t user_cb;
 
static void cooldown_expired(struct k_work *work)
{
   ARG_UNUSED(work);
 
   int val = gpio_pin_get_dt(&button);
   enum button_evt evt = val ? BUTTON_EVT_PRESSED : BUTTON_EVT_RELEASED;
   if (user_cb) {
       user_cb(evt);
   }
}
 
static K_WORK_DELAYABLE_DEFINE(cooldown_work, cooldown_expired);
 
void button_pressed(const struct device *dev, struct gpio_callback *cb,
           uint32_t pins)
{
   k_work_reschedule(&cooldown_work, K_MSEC(15));
}
 
int button_init(button_event_handler_t handler)
{
   int err = -1;
 
   if (!handler) {
       return -EINVAL;
   }
 
   user_cb = handler;
 
   if (!device_is_ready(button.port)) {
       return -EIO;
   }
 
   err = gpio_pin_configure_dt(&button, GPIO_INPUT);
   if (err) {
       return err;
   }
 
   err = gpio_pin_interrupt_configure_dt(&button, GPIO_INT_EDGE_BOTH);
   if (err) {
       return err;
   }
 
   gpio_init_callback(&button_cb_data, button_pressed, BIT(button.pin));
   err = gpio_add_callback(button.port, &button_cb_data);
   if (err) {
       return err;
   }
 
   return 0;
}

Check out the working sample code on https://github.com/ubieda/zephyr_button_debouncing

Conclusion

The most important part of debouncing inputs is to understand how far you should go and which approach best suits you. Cost sensitivity tends to favor software debouncing, whereas less CPU usage favors offloading it to the hardware approach. Like any engineering problem, there are 1000 ways to solve it: always favor the simplest (yet effective) solution that works for you.

References

As members and contributors to the Zephyr Project, we keep an eye on new developments. A recently published feature of particular interest because it represents a new way to structure programs and messaging between different parts of your program.

ZBus (Zephyr Bus) is a recently merged feature on the Zephyr Project, which brings a standardized version of event driven architecture in the form of a publish and subscribe model inside of your program. The lead author Rodrigo Peixoto spoke with Golioth about the details of this new feature and how it might help Golioth users to make more responsive, flexible programs. In the associated video, Rodrigo walks through the history and capabilities of ZBus.

Why should you consider an event-driven bus architecture?

The decision to take on a new system architecture is not something to do willy nilly. It’s important to understand where an event driven architecture is a good fit.

The first that I think about is scalability. When you have a bus architecture, adding an additional “listener” can be done with much less work.

Consider the alternative to event-driven architecture. When you want to add a new action (say some code that initiates a sensor reading), you need to call that new code from your trigger event (say a timer being finished). That means you need to know where the trigger is located in the code and make changes there to add the call to your new task. Once you add in the required testing, each additional feature can become burdensom. This scales poorly.

With an event-driven system, the trigger is already set up to publish an event. New tasks can be added that look for the event. You don’t need to change any trigger code, you don’t even need to know where that code lives. This performs well as the amount of data increases, which will ultimately depend on how large you think your system will be.

Flexibility is another consideration. An event-driven bus allows the system to be easily adapted to handle new events or changes in the environment. This means that the system can be easily updated and modified without having to completely rewrite large swaths of code. Another type of “flexibility” is how and where you can re-use your code. This makes it easier to develop and test your system, as well as to troubleshoot and fix any problems that may arise.

Finally, if your device needs to meet critical timing, an event-driven system will not only deal with higher levels of complexity, but also respond quickly to new inputs to the system, such as external events. For example, an embedded system might be designed to control a robot, and it would use event-driven architecture to respond to sensor data from the robot’s environment and control its movements accordingly.

How ZBus implements an event-driven architecture

In ZBus, there are “producers” that generate messages and “consumers” that act upon them. There are also “Filters” help to process raw data (such as a sensor output). Each of these are organized into different “channels” to allow listening on a particular lane of data being produced.

Source: Rodrigo’s ZBus presentation (click for link)

Source: Rodrigo’s ZBus presentation (click for link)

Source: Rodrigo’s ZBus presentation (click for link)

Each of these are built into normal scenarios such as listening for timers and taking a reading from a sensor and then alerting other parts of the program that the data is now available. A common scenario is show below:

How will you use ZBus?

Microcontrollers are used in increasingly complex scenarios and are being asked to do more and more. Connecting a low power device to the internet often requires higher levels of complexity that Zephyr helps with. We expect to see more devices using Ecosystems and RTOSes like Zephyr in the future, and implementing ZBus on high complexity devices.

Are you looking at using an event-driven architecture in your system? Let us know on our forum and tell us how we can help!

A key mission at Golioth is to make it easier for hardware and firmware developers to connect devices to the internet. We do that in two ways:

  1. Providing easy-to-use APIs and SDKs for IoT devices to connect to Golioth Cloud endpoints.
  2. Training developers how to use the Device side code.

We have done many successful training sessions so far, showing individuals and companies how to connect their first devices. Along the way, we have learned that there is a large unfulfilled need in the market for training in the IoT space. So we’re doing it again! We’ll show people how to connect devices and get access to things like:

  • Secure Over-The-Air Updates to constrained devices
  • Command and Control over remote devices
  • Learn how to create and modify settings for remote devices
  • Understand how to implement data tracking from your device

We will be running our first training open to the public on December 14th, 2022. Read more below if you’d like to take part.

Training challenges

Once again we’ll be training developers from afar. We did this back in October for a select group of hardware engineers looking to learn more about Zephyr:

Click to learn more about our experience back in October of 2022

The upcoming training will be built upon the lessons we learned during that training, and our last in-person training at the Hackaday Superconference. In both cases, we used Kasm to provide fully remote development environments so that users don’t need to install anything on their local machine (there are directions on how to do that after the training is over). We think this is an important piece to ensure people can get started quickly.

How to use Zephyr

We currently offer 3 SDKs as part of our device support, including an ESP-IDF SDK, a Modus Toolbox SDK, and a Zephyr SDK (including the Nordic Connect SDK variant). These SDKs cover a wide range of embedded hardware from different vendors.

The training includes some segments that detail how to use Zephyr, a Real Time Operating System (RTOS) that covers a wide range of different hardware platforms. We use it on many of our internal hardware reference designs at Golioth, and it was the first platform we launched. Hardware and firmware engineers who are new to Real Time Operating Systems will continue their learning journey by understanding how the RTOS connects to sensors and low level GPIO and how to manipulate different elements of the subsystems. Once a trainee understands how to get the data off of an external component (like a sensor), the Golioth Zephyr SDK makes it a simple task to forward that data along to the Golioth Cloud.

Requirements and background

We have referred to “Hardware and Firmware Engineers” in this article, because we expect that intermediate to expert level engineers will get the most out of this training. If you are brand new to understanding C or if you have never tried programming embedded hardware before, this might be a frustrating experience. If you would like some pointers to starter content that might prepare you for the training, please ask on our Forum and we will try to get you a customized list of resources that will help prepare you for future versions of this training.

Logistics

  • We are not charging for this training
  • We will be capping the training at 30 people
  • All attendees will be on a first-come, first-served basis
  • Those who are accepted for this training will receive an email with more details
  • You will be expected to purchase your own hardware
    • Details will be sent with your acceptance to this training
    • Be sure you leave enough time for shipping from your local distributor
  • Signing up to take part and not attending will disqualify you from future training

Sign up here


If you have clicked “submit” and don’t see any changes, please scroll back up to the top

The Golioth Zephyr SDK is now 100% easier to use. The new v0.4.0 release was tagged on Wednesday and delivers all of the Golioth features with less coding work needed to use them in your application. Highlights include:

  • Asynchronous function/callback declaration is greatly simplified
    • User-defined data can now be passed to callbacks
  • Synchronous versions of each function were added
  • API is now CoAP agnostic (reply handling happens transparently)
  • User code no longer needs to register an on_message() callback
  • Verified with the latest Zephyr v3.2.0 and NCS v2.1.0

The release brings with it many other improvements that make your applications better, even without knowing about them. For the curious, check out the changelog for full details.

Update your code for the new API (or not)

The new API includes some breaking changes and to deal with this you have two main options:

  1. Update legacy code to use the new API
  2. Lock your legacy code to a previous Golioth version

1. Updating legacy code

The Golioth Docs have been updated for the new API, and reading through the Firmware section will give you a great handle on how everything works. The source of truth continues to be the Golioth Zephyr SDK reference (Doxygen).

Updating to the new API is not difficult. I’ve just finished working on that for a number of our hardware demos, including the magtag-demo repository we use for our Developer Training sessions. The structure of your program will remain largely the same, with Golioth function names and parameters being the most noticeable change.

Consider the following code that uses the new API to perform an asynchronous get operation for LightDB State data:

/* The callback function */
static int counter_handler(struct golioth_req_rsp *rsp)
{
    if (rsp->err) {
        LOG_ERR("Failed to receive counter value: %d", rsp->err);
        return rsp->err;
    }

    LOG_HEXDUMP_INF(rsp->data, rsp->len, "Counter (async)");

    return 0;
}

/* Register the LightDB Get callback from somewhere in your code */
static int my_function(void)
{
    int err;
    err = golioth_lightdb_get_cb(client, "counter",
                                 GOLIOTH_CONTENT_FORMAT_APP_JSON,
                                 counter_handler, NULL);
}

Previously, the application code would have needed to allocate a coap_reply, pass it as a parameter in the get function call, use the on_message callback to process the reply, then unpack the payload in the reply callback before acting on it. All of that busy work is gone now!

With the move to the v0.4.0 API, we don’t need to worry about anything other than:

  • Registering the callback function
  • Working with the data (or an error message) when we hear back from Golioth.

You can see the response struct makes the data itself, the length of the data, and the error message available in a very straightforward way.

A keen eye already noticed the NULL as the final parameter. This is a void * type that lets you pass your user-defined data to the callback. Any value that’s 4-bytes or less can be passed directly, or you can pass a pointer to a struct packed with information. Just be sure to be mindful of the memory allocation lifespan of what you pass.

All of the asynchronous API function calls follow this same pattern for callbacks and requests. The synchronous calls are equally simple to understand. I found the Golioth sample applications to be a great blueprint for updating the API calls in my application code. The changelog also mentions each API-altering commit which you may find useful for additional migration info.

The Golioth Forum is a great place to ask questions and share your tips and tricks when getting to know the new syntax.

2. Locking older projects to an earlier Golioth

While we recommend updating your applications, if you do have the option to continue using an older version of Golioth instead. For that, we recommend using a west manifest to lock your project to a specific version of Golioth.

Manifest files specify the repository URL and tag/hash/branch that should be checked out. That version is used when running west update, which then imports a version of Zephyr and all supporting modules specified in the Golioth SDK manifest to be sure they can build the project in peace and harmony.

By adding a manifest to your project that references the Golioth Zephyr SDK v0.3.1 (the latest stable version before the API change) you can ensure your application will build in the future without the need to change your code. Please see our forum thread on using a west manifest to set up a “standalone” repository for more information.

A friendlier interface improves your Zephyr experience

Version 0.4.0 of the Golioth Zephyr SDK greatly improves the ease-of-use when adding device management features to your IoT applications. Device credentials and a few easy-to-use APIs are all it takes to build in data handling, command and control, and Over-the-Air updates into your device firmware. With Golioth’s Dev Tier your first 50 devices are free so you can start today, and as always, get in touch with us if you have any questions along the way!