Tag Archive for: zephyr

This is the second part of How Golioth uses Hardware-in-the-Loop (HIL) Testing. In the first part, I covered what HIL testing is, why we use it at Golioth, and provided a high-level view of the system.

In this post, I’ll talk about each step we took to set up our HIL. It’s my belief that HIL testing should be a part of any firmware testing strategy, and my goal with this post is to show you that it’s not so difficult to get a minimum viable HIL set up within one week.

The Big Pieces

To recap, this is the system block diagram presented at the end of part 1:

The Raspberry Pi acts as a GitHub Actions Self-Hosted Runner which communicates with microcontroller devices via USB-to-serial adapters.

The major software pieces required are:

  • Firmware with Built-In-Test Capabilities – Runs tests on the device (e.g. ESP32-S3-DevKitC), and reports pass/fail for each test over serial.
  • Python Test Runner Script – Initiates tests and checks serial output from device. Runs on the self-hosted runner (e.g. Raspberry Pi).
  • GitHub Actions Workflow – Initiates the python test runner and reports results to GitHub. Runs on the self-hosted runner.

Firmware Built-In-Test

To test our SDK, we use a special build of firmware which exposes commands over a shell/CLI interface. There is a command named built_in_test which runs a suite of tests in firmware:

esp32> built_in_test
...
12 Tests 0 Failures 0 Ignored 
../main/app_main.c:371:test_connects_to_wifi:PASS
../main/app_main.c:378:test_golioth_client_create:PASS
../main/app_main.c:379:test_connects_to_golioth:PASS
../main/app_main.c:381:test_lightdb_set_get_sync:PASS
../main/app_main.c:382:test_lightdb_set_get_async:PASS
../main/app_main.c:383:test_lightdb_observation:PASS
../main/app_main.c:384:test_golioth_client_heap_usage:PASS
../main/app_main.c:385:test_request_dropped_if_client_not_running:PASS
../main/app_main.c:386:test_lightdb_error_if_path_not_found:PASS
../main/app_main.c:387:test_request_timeout_if_packets_dropped:PASS
../main/app_main.c:388:test_client_task_stack_min_remaining:PASS
../main/app_main.c:389:test_client_destroy_and_no_memory_leaks:PASS

These tests cover major functionality of the SDK, including connecting to Golioth servers, and sending/receiving CoAP messages. There are also a couple of tests to verify that stack and heap usage are within allowed limits, and that there are no detected memory leaks.

Here’s an example of one of the tests that runs on the device, test_connects_to_golioth:

#include "unity.h"
#include "golioth.h"

static golioth_client_t _client;
static SemaphoreHandle_t _connected_sem;

static void on_client_event(golioth_client_t client, 
                            golioth_client_event_t event, 
                            void* arg) {
    if (event == GOLIOTH_CLIENT_EVENT_CONNECTED) {
        xSemaphoreGive(_connected_sem);
    }
}

static void test_connects_to_golioth(void) {
    TEST_ASSERT_NOT_NULL(_client);
    TEST_ASSERT_EQUAL(GOLIOTH_OK, golioth_client_start(_client));
    TEST_ASSERT_EQUAL(pdTRUE, xSemaphoreTake(_connected_sem, 10000 / portTICK_PERIOD_MS));
}

This test attempts to connect to Golioth servers. If it connects, a semaphore is given which the test waits on for up to 10 seconds (or else the test fails). For such a simple test, it covers a large portion of our SDK, including the network stack (WiFi/IP/UDP), security (DTLS), and CoAP Rx/Tx.

We use the Unity test framework for basic test execution and assertion macros.

If you’re curious about the details, you can see the full test code here: https://github.com/golioth/golioth-firmware-sdk/blob/main/examples/esp_idf/test/main/app_main.c

Python Test Runner

Another key piece of software is a Python script that runs on the self-hosted runner and communicates with the device over serial. We call this script verify.py. In this script, we provision the device with credentials, issue the built_in_test command, and verify the serial output to make sure all tests pass.

Here’s the main function of the script:

def main():
    port = sys.argv[1]

    # Connect to the device over serial and use the shell CLI to interact and run tests
    ser = serial.Serial(port, 115200, timeout=1)

    # Set WiFi and Golioth credentials over device shell CLI
    set_credentials(ser)
    reset(ser)

    # Run built in tests on the device and check output
    num_test_failures = run_built_in_tests(ser)
    reset(ser)

    if num_test_failures == 0:
        run_ota_test(ser)

    sys.exit(num_test_failures)

This opens the serial port and calls various helper functions to set credentials, reset the board, run built-in-tests, and run an OTA test. The exit code is the number of failed tests, so 0 means all passed.

The script spends most of its time in this helper function which looks for a string in a line of serial text:

def wait_for_str_in_line(ser, str, timeout_s=20, log=True):
    start_time = time()
    while True:
        line = ser.readline().decode('utf-8', errors='replace').replace("\r\n", "")
        if line != "" and log:
            print(line)
        if "CPU halted" in line:
            raise RuntimeError(line)
        if time() - start_time > timeout_s:
            raise RuntimeError('Timeout')
        if str in line:
            return line

The script will continually read lines from serial for up to timeout_s seconds until it finds what it’s looking for. And if “CPU halted” ever appears in the output, that means the board crashed, so a runtime error is raised.

This verify.py script is what gets invoked by the GitHub Actions Workflow file.

GitHub Actions Workflow

The workflow file is a YAML file that defines what will be executed by GitHub Actions. If one of the steps in the workflow returns a non-zero exit code, then the workflow will fail, which results in a red ❌ in CI.

Our workflow is divided into two jobs:

  • Build the firmware and upload build artifacts. This is executed on cloud servers, since building the firmware on a Raspberry Pi would take a prohibitively long time (adding precious minutes to CI time).
  • Download firmware build, flash the device, and run verify.py. This is executed on the self-hosted runner (my Raspberry Pi).

The workflow file is created in .github/workflows/test_esp32s3.yml. Here are the important parts of the file:

name: Test on Hardware

on:
  push:
    branches: [ main ]
  pull_request:

jobs:
  build_for_hw_test:
    runs-on: ubuntu-latest
    steps:
    ...
    - name: Build test project
      uses: espressif/esp-idf-ci-action@v1
      with:
        esp_idf_version: v4.4.1
        target: esp32s3
        path: 'examples/test'
    ...

  hw_flash_and_test:
    needs: build_for_hw_test
    runs-on: [self-hosted, has_esp32s3]
    ...
    - name: Flash and Verify Serial Output
      run: |
        cd examples/test
        python flash.py $CI_ESP32_PORT && python verify.py $CI_ESP32_PORT

The first job runs in a normal GitHub runner based on Ubuntu. For building the test firmware, Espressif provides a handy CI Action to do the heavy lifting.

The second job has a needs: build_for_hw_test requirement, meaning it must wait until the first job completes (i.e. firmware is built) before it can run.

The second job has two further requirements – it must run on self-hosted and there must be a label has_esp32s3. GitHub will dispatch the job to the first available self-hosted runner that meets the requirements.

If the workflow completes all jobs, and all steps return non-zero exit code, then everything has passed, and you get the green CI checkmark ✅.

Self-Hosted Runner Setup

With all of the major software pieces in place (FW test framework, Python runner script, and Actions workflow), the next step is to set up a machine as a self-hosted runner. Any machine capable of running one of the big 3 operating systems will do (Linux, Windows, MacOS).

It helps if the machine is always on and connected to the Internet, though that is not a requirement. GitHub will simply not dispatch jobs to any self-hosted runners that are not online.

Install System Dependencies

There are a few system dependencies that need to be installed first. On my Raspberry Pi, I had to install these:

sudo apt install \
    python3 python3-pip python3-dev python-dev \
    libffi-dev cmake direnv wget gcovr clang-format \
    libssl-dev git
pip3 install pyserial

Next, log into GitHub and create a new self-hosted runner for your repo (Settings -> Actions -> Runners -> New self-hosted runner):

When you add a new self-hosted runner, GitHub will provide some very simple instructions to install the Actions service on your machine:

When I ran the config.sh line, I added --name and --unattended arguments:

./config.sh --url <https://github.com/golioth/golioth-firmware-sdk> --token XXXXXXXXXXXXXXXXXXXXXXXXXXXXX --name nicks-raspberry-pi --unattended

Next, install the service, which ensures GitHub can communicate with the self-hosted runner even if it reboots:

cd actions-runner
sudo ./svc.sh install

If there are any special environment variables needed when executing workflows, these can be added to runsvc.sh. For instance, I defined a variable for the serial port of the ESP32:

# insert anything to setup env when running as a service
export CI_ESP32_PORT=/dev/ttyUSB0

Finally, you can start the service:

sudo ./svc.sh start

And now you should see your self-hosted runner as “Idle” in GitHub Actions, meaning it is ready to accept jobs:

I added two custom labels to my runner: has_esp32s3 and has_nrf52840dk, to indicate that I have those two dev boards attached to my self-hosted runner. The workflow file uses these labels so that jobs get dispatched to self-hosted runners only if they have the required board.

And that’s it! With those steps, a basic HIL is up and running, integrated into CI, and covering a large amount of firmware code on real hardware.

Security

There’s an important note from GitHub regarding security of self-hosted runners:

Warning: We recommend that you only use self-hosted runners with private repositories. This is because forks of your repository can potentially run dangerous code on your self-hosted runner machine by creating a pull request that executes the code in a workflow.

I also stand by that recommendation. But it’s not always feasible if you’re maintaining an open-source project. Our SDKs are public, so this warning certainly applies to us.

If you’re not careful, you could give everyone on the Internet root access to your machine! It wouldn’t be very hard for someone to submit a pull request that modifies the workflow file to run whatever arbitrary Linux commands they want.

We mitigate most of the security risks with three actions.

1. Require approval to run workflows for all outside contributors.

This means someone from the Golioth organization must click a button to run the workflow. This allows the repo maintainer time to review any workflow changes before allowing the workflow to run.

This option can be enabled in Settings -> Actions -> General:

Once enabled, pull requests will have a new “Approve and Run” button:

This is not ideal if you’re running a large open-source project with lots of outside contributors, but it is a reasonable compromise to make during early stages of a project.

2. Avoid logging sensitive information

The log output of any GitHub Actions runner is visible publicly. If there is any sensitive information printed out in the workflow steps or from the serial output of the device, then this information could easily fall into the wrong hands.

For our purposes, we disabled logging WiFi and Golioth credentials.

3. Utilize GitHub Actions Secrets

You can use GitHub Actions Secrets to store sensitive information that is required by the runner. In our repo, we define a secret named `GOLIOTH_API_KEY` that allows us access to the Golioth API, which is used to deploy OTA images of built firmware during the workflow run.

If you want to know more about self-hosted runner security, these resources were helpful to me:

Tips and Tricks

Here are a few extra ideas to utilize the full power of GitHub Actions Self-Hosted Runners (SHRs):

  • Use the same self-hosted runner in multiple repos. You can define SHRs at the organization level and allow their use in more than one repo. We do this with our SHRs so they can be used to run tests in the Golioth Zephyr SDK repo and the Golioth ESP-IDF SDK repo.
  • Use labels to control workflow dispatch. Custom labels can be applied to a runner to control when and how a workflow gets dispatched. As mentioned previously, we use these to enforce that the SHR has the dev board required by the test. We have a label for each dev board, and if the admin of the SHR machine has those dev boards attached via USB, then they’d add a label for each one. This means that machine will only receive workflows that are compatible with that machine. If a board is acting up or requires maintenance, the admin can simply disconnect the board and remove the label to prevent further workflows from executing.
  • Distributed self-hosted runners to load balance and minimize CI bottlenecks. If there is only one self-hosted runner, it can become a bottleneck if there are lots of jobs to run. Also, if that runner loses Internet, then CI is blocked indefinitely. To balance the load and create more robust HIL infrastructure, it’s recommended to utilize multiple self-hosted runners. Convince your co-workers to take their unused Intel NUC off the shelf and follow the steps in the “Self-Hosted Runner Setup” section to add another runner to the pool. This allows for HIL tests to run in a distributed way, no matter where in the world you are.

Here’s a scenario that might sound familiar.

You’ve added a new feature to the firmware, about 800 lines of code, along with some simple unit tests. All unit tests are passing. The code coverage report is higher than ever. You pat yourself on the back for a job well done.

But then – for a brief moment, conviction wavers as a voice in your head whispers: “The mocks aren’t real”; “Does this code really work?”; “You forgot to test OTA.”

But the moment passes, and your faith is restored as you admire the elegant green CI check mark ✅.

Do you trust your unit tests enough to deploy the firmware right now to thousands of real devices via Over-the-Air (OTA) firmware update?

If the answer is “well, no, not really, we still need to test a few things with the real hardware”, then automated Hardware-in-the-Loop (HIL) testing might give you the confidence to upgrade that answer to “heck yeah, let’s ship it!”. 🚀

In this post, I’ll explain what HIL testing is, and why we use it at Golioth to continuously verify the firmware for our Zephyr and ESP-IDF SDKs.

HIL Testing: A Definition

If you search online, you’ll find a lot of definitions of what HIL testing is, but I’ll try to formulate it in my own simple terms.

There are three major types of tests:

  • Unit – the smallest kind of test, often covering just one function in isolation
  • Integration – testing multiple functions or multiple subsystems together
  • System – testing the entire system, fully integrated

A firmware test can run in three places:

  • On your host machine – Running a binary compiled for your system
  • On emulated target hardware – Something like QEMU or Renode
  • On real target hardware – Running cross-compiled code for the target, like an ARM or RISC-V processor.

A HIL test is a system test that runs on real target hardware, with real peripherals, and real connections. Typically, a HIL setup will consist of:

  • A board running your firmware (the code under test)
  • A host test machine, executing the tests
  • (optional) Other peripherals and test equipment to simulate real-world conditions, controlled by the host test machine

Coverage and Confidence

Why bother with HIL tests? Aren’t unit tests good enough?

To address this, I’ll introduce two terms:

  • Coverage: How much of the firmware is covered by tests?
  • Confidence: How confident are you that the firmware behaves as intended?

Coverage is easily measured by utilizing open-source tools like gcov. It’s an essential part of any test strategy. Confidence is not as easily measured. It’s often a gut feeling.

It’s possible to have 100% coverage and 0% confidence. You can write tests that technically execute every line of code and branch combination without testing any intended functionality. Similarly, you can have 0% coverage and 100% confidence, but if you encounter someone like this, they are, to put it bluntly, delusional.

Unit tests are great at increasing coverage. The code under test is isolated from the rest of the system, and you can get into all the dark corners. “Mocks” are used to represent adjacent parts of the system, usually in a simplistic way. This might be a small piece of code that simulates the response over a UART. But this means is you can’t be 100% confident the code will work the same way in the context of the full system.

HIL system tests are great at increasing confidence, as they demonstrate that the code is functional in the real world. However, testing things at this level makes it difficult to cover the dark corners of code. The test is limited to the external interfaces of the system (ie. the UART you had a mock for in the above example).

A test strategy involving both unit tests and HIL tests will give you high coverage and high confidence.

Continuously Verifying Golioth SDK Firmware

Why do we use HIL testing at Golioth?

We have the concept of Continuously Verified Boards, which are boards that get first-class support from us. We test these boards on every SDK release across all Golioth services.

Previously, these boards were manually tested each release, but there were several problems we encountered:

  • It was time-consuming
  • It was error-prone
  • It did not scale well as we added boards
  • OTA testing was difficult, with lots of clicking and manual steps involved
  • The SDK release process was painful
  • We were unsure whether there had been regressions between releases

We’ve alleviated all of these problems by introducing HIL tests that automatically run in pull requests and merges to the main branch. These are integrated into GitHub Actions (Continuous Integration, CI), and the HIL tests must pass for new code to merge to main.

Here’s the output of the HIL test running in CI on a commit that merged to main recently in the ESP-IDF SDK:

HIL Test Running in CI

This is just the first part of the test where the ESP32 firmware is being flashed.

Further down in the test, you can see we’re running unit tests on hardware:

Automated test output

And we even have an automated OTA test:

Automated OTA Test Output

As a developer of this firmware, I can tell you that I sleep much better at night knowing that these tests pass on the real hardware. In IoT devices, OTA firmware update is one of the most important features to get right (if not the most important), so testing it automatically has been a big win for us.

HIL on a Budget

Designing a prototype HIL system does not need to be a huge up-front investment.

At Golioth, one engineer was able to build a prototype HIL within one week, due primarily to our simple hardware configuration (each tested board just needs power, serial, and WiFi connection) and GitHub Actions Self-Hosted Runners, which completely solves the infrastructure and DevOps part of the HIL system.

The prototype design looks something like this:

 

I’m using my personal Raspberry Pi as the self-hosted runner. GitHub makes it easy to register any kind of computer (Linux, MacOS, or Windows) as a self-hosted runner.

And here’s a pic of my HIL setup, tucked away in a corner and rarely touched or thought about:

My DIY HIL Setup

Next Time

That’s it for part 1! We’ve covered the “why” of HIL testing at Golioth and a little bit of the “how”.

In the next part, we’ll dive deeper into the steps we took to create the prototype HIL system, the software involved, and some of the lessons learned along the way.

 

Thanks to Michael for the photo of Tiger and Turtle

TL;DR: we’ve enabled people to compile Zephyr programs from a computer with no toolchain installed, almost instantly.

Part of our charter at Golioth is to help people prototype and scale IoT devices faster. That’s why we offer an open source SDK built on top of Zephyr. We think this represents a “fast forward” or “cheat code” for quickly standing up an IoT device prototype. On the cloud side, our servers represent hundreds of hours of customization and testing; you can instantly connect and get access to resources that allow hardware and firmware developers to scale to thousands or millions of devices. But sometimes it can be scary to get started in a new ecosystem or Real Time Operating System (RTOS) like Zephyr, even if it will speed things up later. As such, we do public and private training for companies and individuals.

As part of the resources we offer, we maintain a Training site that walks people through how to get started using Zephyr, normally targeting remote training. You can follow along right now; you’ll need to purchase an Adafruit MagTag board and sign up for a free Dev Tier account, but everything else is covered on the training site. At the end of the training, you should understand how to interact with hardware in Zephyr and send data to and from the Golioth cloud over WiFi. It’s a short jump from there to re-target other hardware, including your custom designs.

The tripping points for the training often revolve around the installation process. This is multi-pronged:

  • The size of a Zephyr install is relatively large, even when you are only targeting a specific platform. Having multiple people in a room, even with good WiFi or network connectivity, means that the shared bandwidth will be a limiting factor. More trainees means slower downloads.
  • Everyone comes to training with a computer in a different state. They might have tried to install Zephyr tools in the past, or they might have a particularly rare Linux distro, or many other possible variations. It would be best if everyone showed up with a fresh OS install…but that is very unrealistic.
  • There are different expectations around how installations should go. Many embedded engineers are “Windows first” and expect a complete IDE for any new platform. Some silicon vendors help to support this in Zephyr, such as Nordic Semiconductor. But Zephyr was originally targeting Linux-based machines, and we have found the smoothest flow for installing tools for all of the platforms that Zephyr can target means you are Linux-first.

In this article, we’re going to talk about our attempt to normalize setups and have pre-installed tools using Kasm and Docker. These are not the only tools in this space; we have previously written about GitPod and are investigating GitHub Codespaces, but this is a look at one of the latest experiments we’re running at Golioth.

Kasm thin client

The concept of a browser based client or a “thin client” is nothing new. They were all the rage back in the day of time share servers (really those were “dumb terminals”) and then again in the 90s as computing was more ubiquitous throughout the office (with a centralized set of servers). The difference is that now things are much more graphical and running completely inside the browser.

Kasm was started in 2017 and includes an open source project run by Kasm Technologies. The company behind Kasm has a per seat licensing model or they will run the servers directly for you (once you’re past 5 trial seats). They specialize in visualizations around containers. Once you log into a Kasm server, you are able to launch a range of containers, normally a desktop view or a single app that will load up in your browser. You can try this for yourself on the Kasm demo page.

The server that we’re running on is a pre-configured image that I pulled from the Digital Ocean marketplace. I was able to install all of the required software on a provisioned server running in some unknown datacenter. All I did was log in the first time to get my credentials for a user and an admin, and the rest of my interaction was on the web interface that the Kasm server presents to me as an admin.

Docker

As a hardware engineer, Docker is one of those things I heard about for a long time and never really “got it”. I’m still not sure I do. But following the tutorial for customizing a Kasm container, I started to understand a bit more. In that set of tutorials I started from a base Operating System image (Ubuntu Focal) that allowed visualization through the browser. Then I was able to start customizing, adding things like custom files on the desktop, custom icons to launch programs I installed, or adding background images pulled in from the web. It was in this customization section that I could add all of the commands from the Golioth Docs for installing Zephyr tools.

My layman explanation of Docker would be “Creating a virtual computer where I can automatically install a bunch of software using shell scripts. Once I have built that virtual computer, I am able to use it over and over again, including different instances of that virtual computer (for this Kasm scenario)”. The analogy would be if I bought a bunch of laptops, had an install CD (remember those?) with all of the required software on them, and then I mailed the freshly installed laptop to everyone who is taking our training. Sound crazy? That’s one of the best solutions we have seen, where a trainer will bring a pelican case with 24 laptops freshly imaged to on-site training. Their training works flawlessly every time!

I don’t have much else to mention about Docker aside from the idea that it’s possible to script a bunch of install commands that match the install instructions we have on our Zephyr getting started guide. In fact, I used those very directions to build the container shown in the video above. So all I’m doing in this case is automating the install process, doing it once, and then deploying the container (with all of the software and dependencies installed) over and over again for different users.

Challenges

We don’t think this is the ultimate solution for our training, so much as an experiment that showcases what we can do with containerized solutions. There are some remaining challenges, and we would love to have some help from our community.

Loading firmware onto the device

Currently our plan (as shown in the video) is to have our users/trainees pull the final built binary to their local computer to run it on the device like the MagTag. This echoes the way the mbed online compiler worked.

If there is a bootloader and a USB to serial connection, it’s possible to directly load onto the embedded device. In the case of some Espressif boards, this would be something like having ESPtool.py installed locally on your machine. There are an increasing amount of tools that make this process easier, such as an ESP tool that allows you to load firmware using WebUSB. Certain specialized bootloaders like the one that comes default on the MagTag loads UF2 files. When the MagTag is plugged in over USB and a sequence of buttons are hit, the device shows up as a mass storage drive. You drop a UF2 formatted binary–which is just an alternative form of compiled format–onto the drive and the device reboots and starts running the code.

If it’s a board without a bootloader, the user would need to have a debugger and local tools to communicate with that debugger, such as a JLink device and JFlash software. This means they would still need some OS specific loader tools to get the binary into the embedded device. The user would not be able to take advantage of the built-in tools in west that allow direct loading onto the device.

You steppin’?

If you would like to do debugging instead of “printf/printk” debugging, you simply need to download a different file from the container. If you download the zephyr.elf file instead of the zephyr.bin file, you can load it into a 3rd party debugger like Segger Ozone (made by the same company as the JLink). We have done some experiments with this in the past, including also analyzing where the device is spending its time using SystemView. This would once again require installing local programs that could talk over the USB port to something like a JLink.

Experimental port forwarding and WebUSB

Some GDB debuggers/servers will host the control of the debugger over a port on the machine’s localhost. We have some experiments we’re trying where we forward this port to the container so we could directly run a debugger from a software debugger inside the container.

We have also heard some whispers of a WebUSB implementation that can tunnel to the container. So we could plug in a board on our host machine (ie. my laptop) and connect to it over WebUSB, and then forward all information along to the container machine (ie. the browser based desktop running on the Kasm server).

We would love to hear about other projects that are trying this.

Shared resources

The final challenge we are dealing with is the fact that we’re basically “renting” a computer to do exactly what we could be doing with the host machine sitting right in front of us. Most developers have access to very powerful machines and we are instead using the resources on a remote machine (the Kasm server). The cost of standardization is the cost of renting server time for each person in the workshop. It might be worth it, but it is a constraint and a challenge.

Containers are another tool

Anyone reading this with a web background is likely thinking, “Yeah, containers, cool, 2010 called and wants their headline back”. But we are excited about it because these tools are finally making their way into the historically sluggish embedded industry. While our use case of containers is mostly around zero-install-time training, others are using containers to automate their testing and implementing best software engineering practices for the range of devices they have on their desk or in the field.

We’d love to hear how you think we can improve our training and make it easier for you to learn more about Golioth, Zephyr, and building code instantly. Check out our forums, our Discord, ping us on Twitter, or send us an email at [email protected]

Hello from ZDS!

This week the Golioth team is at Zephyr Developer Summit. Previously we announced that we’ll be there and shared the talks we are presenting. We will post those shortly after the conference takes place. In the meantime, let’s recap how we got here in the first place and share a little bit more about what we’re showcasing.

Why Zephyr?

In short, because it helps our users. We are members of the Technical Steering Committee (TSC) and have been almost since the inception of the company. We built our first Device SDK on top of Zephyr because of the broad hardware support and high level integration of Golioth features into the Real Time Operating System (RTOS).

The assertion that “Zephyr helps our users” might be extra frustrating to beginners: Zephyr—and RTOSes more broadly—represents a tricky new set of skills that might be foreign to firmware or hardware engineers. For beginners coming from the hobby space, it can be an extra rude introduction into the world of command line compilation and large ecosystem. However, connecting to the internet is a difficult task, especially for custom hardware: we think that Zephyr represents a great first step towards managing those devices over time. We are committed to pushing for more user-friendly code and methods from the Zephyr foundation, and we will continue to publish best practices on our blog and our YouTube channel to help people get connected.

Showcase

One thing we’re excited about is showcasing how Golioth works to members of the community. We have been developing different “color coded” demos to make them a bit more memorable for folks that stop by our booth. Each of these demos feature a hardware (device) component and a dashboard component, in order to visualize the data that is on the Golioth Cloud.

This is the first time we have showcased the “Aludel”, which is our internal platform for prototyping ideas and switching out different development boards and plug-in sensors. We will post more about this in the future, including our talk on the subject.

Red Demo

The Red Demo is our showcase of devices running OpenThread on Zephyr; this is part of our larger interest in Thread, which we see as a very interesting way to connect a large range of sensors to the internet securely. We have been excited to show how we can use low power devices like the Nordic nRF52840 to communicate directly with the Golioth Cloud.

The devices we are using in this case are off-the-shelf multi-sensor nodes from Laird called the BT510. This hardware has additional sensors on the board which we integrated with LightDB Stream to send time-series data back to Golioth. This was fast work, thanks to Laird’s Zephyr support, it was as simple as calling out the board when we compiled the demo firmware.

We then capture the data from these on the Red Demo Dashboard, showing both historical and live data for the sensors.

 

Green Demo

The Green Demo showcases LightDB State, our real-time database that can be used to control a wide range of devices in a deployment. On the device side, it uses the Aludel platform to measure a light sensor, as would happen in a greenhouse. There is also a secondary Zephyr-based device inside a lamp, representing a grow light that might be inside a grow house. The lamp is set up to “listen” to commands from another node, in this case the Aludel.

LightDB State is used to control elements like “update rate” to control regulate flow of information. It also lets us monitor critical device variables on an ongoing basis and set up logic on the web to take actions as a result. Command and control variables can be set from multiple places, including a custom mobile app, the Golioth Console, a visualization platform, a web page, or (as is the case here) even from another device!

Our Green Demo Dashboard (below) again showcases live and historical information, as well as the current status of the connected lamp.

As an added bonus, we control some of the logic on the back end from a Node-RED instance, including control logic. That takes the light intensity sensor output and calculates how bright the lamp should be. Because this is written in Node-RED, we can include an additional input from a mobile app to control the “target intensity”. In this way, people at the booth can adjust the lamp output if the exhibition space is brighter or darker. Plus…it looks cool!

Blue Demo

The Blue Demo helps to showcase how data migrates into and out of Golioth. Using Output Streams, you can export all cloud events to 3rd party providers like AWS, Azure, and Google Cloud. Buttons on the Blue faceplate switch the output being sent back to the cloud. The sensor readings being exported to all 3 clouds can be turned on or off by changing which variables are exported from the device.

On the device side, we capture a sensor using our Aludel platform. The sensor is a BME280 (in-tree sensor in Zephyr), going through a feather form-factor dev board, talking to the network over a WizNet W5500 Ethernet external chip to the network. The Blue Demo Dashboard showcases the live data, and of course the data is being exported simultaneously to the 3 cloud platforms in real-time.

Orange Demo

Golioth is a “middleware” built on top of Zephyr RTOS, which means you can use it to implement new features on top of already-existing hardware. This demo uses the Nordic Semiconductor Thingy91 with custom firmware to send GPS data back over the cellular network to Golioth using LightDB Stream. This demo also has Golioth Logging and Device Firmware Update, which are easy to add to any project as an additional service for troubleshooting or in-field updates.

On the dashboard side, we wanted different ways to showcase this data, including “latest update”. Having access to the raw data is useful for anyone wanting to try asset tracking applications. We’re excited to be able to showcase this data as it dynamically flows into the Golioth Console and back out to the Grafana dashboard.

Future showcases

We’re excited to be showcasing our demos at the Zephyr Developer Summit, but these are moving targets! We will continue to update and pull in new feature for future events. We will be at Embedded World in two weeks (June 20-24th) and will have many of the same demos there.

Custom Kconfig symbols

Last week I programmed 15 consecutive boards with unique firmware images. I needed to build multiple versions of the same Zephyr firmware, supplying unique values to each process during build time. The Zephyr Kconfig system provides built-in support for passing values during a build. The trick is that you can’t just dynamically declare symbols, you need to tell Zephyr that you are expecting a value to be set for a new symbol. Today I will walk through how to do this, and why you might want to.

Fifteen Devices, Fifteen Names

Golioth device names

In my case I was pre-provisioning devices to use in a training workshop. I supplied each build them with credentials so they would be able to authenticate with the Golioth Cloud platform. There is already built-in Kconfig support for these values. But at the same time, I wanted the devices to have a human-readable name that matches up with the device displayed on our cloud console. During the training, the device prints its name on a screen as an easy reference.

The solution to both of these needs is to set the values at build time using the -D<SYMBOLNAME>=flag format on the command line. Any Kconfig value that you would normally set in a prj.conf file can also be set this way. So if you wanted to enable the Zephyr Logging subsystem for just one build, you could turn it on by adding -DCONFIG_LOG=y to your west build command.

I used a script that called the goliothctl command line tool to create each device and to make the credentials for each on our cloud platform. The script then called the west tool to build the firmware, supplying the device name and the credentials as arguments.

The gotcha is that if you try to make up your own symbols on the fly (like -DGOLIOTH_IS_AWESOME=y) you will be met with errors. Zephyr needs to know what symbols to expect–anything else is assumed to be a typo.

Add a Custom Symbol to Zephyr Kconfig

The easiest way to add a symbol is to declare it in the Kconfig file in your project directory. The syntax for this is pretty simple:

  • config SYMBOLNAME
    • bool “Description”

According to the Linux docs, the following types are valid: bool, tristate, string, hex, int. The description string is important if you want the value to appear in menuconfig. Consider the following code:

config GOLIOTH_IS_AWESOME
	bool "Confirm that Golioth is awesome"

config MAGTAG_NAME
	string "MagTag Name"
	default "MagTag-One"
	help
		Used during automatic provisioning for workshops

This creates two new symbols, one accepts a boolean value, the other a string value. These can be viewed and set using the menuconfig interface. After building your project, type west build -t menuconfig.

Custom Kconfig symbols shown in menuconfig

Both of the new symbols appear in the menu, and you can see the strings we used when declaring the type are what is shown as labels in the menu interface. Of course, you can now set the values in a Kconfig file as you would any other interface. But for me, the goal was to do so from the command line. Here is an example build command used by my provisioning script:

west build -b esp32s2_saola magtag-demo -d /tmp/magtag-build -DCONFIG_MAGTAG_NAME=\"azure-camellia\" -DCONFIG_GOLIOTH_SYSTEM_CLIENT_PSK_ID=\"azure-camellia-id@developer-training\" -DCONFIG_GOLIOTH_SYSTEM_CLIENT_PSK=\"b00a0fef769d65d9021d747c8d710af5\" -DCONFIG_ESP32_WIFI_SSID=\"Golioth\" -DCONFIG_ESP32_WIFI_PASSWORD=\"training\"

The symbol will now be available to your c code:

LOG_INF("Device name: %s", CONFIG_MAGTAG_NAME);

And of course you can view the state of all Kconfig symbols processed during the build process. Just open up the build/zephyr/.config file that was generated. Below you will see the first baker’s-dozen lines, including my custom symbols along with some specified by the Golioth SDK, and others that are standard to Zephyr:

CONFIG_ESP32_WIFI_SSID="Golioth"
CONFIG_ESP32_WIFI_PASSWORD="training"
CONFIG_DNS_SERVER_IP_ADDRESSES=y
CONFIG_DNS_SERVER1="1.1.1.1"
CONFIG_GOLIOTH_IS_AWESOME=y
CONFIG_MAGTAG_NAME="azure-camellia"
CONFIG_GPIO=y
CONFIG_SPI=y
CONFIG_I2C=y
# CONFIG_KSCAN is not set
CONFIG_LV_Z_POINTER_KSCAN_DEV_NAME="KSCAN"
# CONFIG_WIFI_WINC1500 is not set
CONFIG_WIFI=y

Learn more about Kconfig

To dive deeper into Kconfig options, your first stop should be the Zephry Docs page on Setting Kconfig values. I also found the Kconfig Tips and Best Practices to be useful, as well as the Linux Kconfig Language reference. Ten minutes of reading the docs, and a little bit of trial and error, and you’ll have a good grasp of what Kconfig is all about.

This guest post is contributed by Asgeir Stavik Hustad, a Golioth community member who is active on the Golioth Discord. Reach him on Twitter at @AsgeirSH.

This tutorial was inspired by and a response to the tutorial about how to build your Zephyr application in a standalone folder. I have done exactly that before, but I also wanted to include all my dependencies in that separate folder.

Background and motivation

I need to maintain different firmware with different Zephyr versions and trees. For example, I maintain the following directories:

  • Nordic’s Zephyr-variant (NCS) for the nRF9160
  • Base Zephyr for Atmel-MCUs
  • Base Zephyr, but locked to a particular version (ie. “2.7.0”)

We also have several custom boards. These are currently maintained in each project, but could be moved to a separate dependency if we want to use the same board overlay files in multiple projects.

Instead of trying to swap a single Zephyr-installation between all of these, I did some research into using west and its manifest file to automatically set up my project folders to include all dependencies. I also wanted to ensure our build server didn’t require any manual work to build different projects. The Zephyr docs present this topic in depth, and are recommended reading if you want to set this up.

Let’s look at how we can set up a project to fit a wider range of needs.

Project structure

Most of my projects are kept in my “Dev” folder, so for this example we’ll be using ~/Dev/app_zephyr as the root directory of the project.

I put my application source in application, which is further split into at least boards and src (you can add any folder you like here). You’ll note this is the same structure as any of the Zephyr or Golioth samples you see; in fact, you can copy a sample as the starting point (such as <Zephyr SDK Install location>/zephyr/samples/basic/blinky). The other folders include deps for dependencies and build for the build output folder.

Inside the root folder, add .west/config. This is a plain text file describing to west where it should look for the manifest file and where Zephyr should be placed.

[manifest]
path = application
[zephyr]
base = deps/zephyr

Drawbacks

  • The initial clone and west update of a project set up like this takes some time.
  • This method uses quite a bit of disk space because each project carries around the Zephyr dependencies, as opposed to having your application live within the Zephyr SDK.
  • Ensuring you get updates to all your projects means you need to update the projects in your manifest file to a new revision manually (not really a drawback in my eyes – I want control!)

Let’s go through the manifest-file itself step by step. It’s found in application/west.yml:

manifest:
  version: 0.7

  defaults:
    remote: zephyrproject
  remotes:
    - name: zephyrproject
      url-base: https://github.com/zephyrproject-rtos
    - name: mcutools
      url-base: https://github.com/mcu-tools
  projects:
    - name: zephyr
      repo-path: zephyr
      revision: v2.7.0
      import:
        path-prefix: deps
      path: zephyr
    - name: mcuboot
      remote: mcutools
      repo-path: mcuboot
      revision: v1.7.2
      path: deps/mcuboot

  self:
    path: application
  • The manifest version being set to 0.7 simply means west (the meta tool) must be at version 0.7 or higher to parse this correctly.
  • Default attributes for the project are not required, but in this case lists the main remote.
  • Remotes lists where west should look for project repos.
  • Projects lists the full range of repositories we’ll pull in as dependencies. This includes the revision, so we have control over upgrades when they are available. I want to prevent breaking changes from entering my project without my knowledge.
  • The self: path: application is used to further define where in the project tree the west.yml file is compared to the root of the project.

I feel that the projects key is the true turning point of this manifest. By adding to this we can make Zephyr pull any git repository we want, and put it in our dependencies-folder. We specify the project name, a remote (if different from the specified default), a repo-path on the remote (defaults to name), a revision (defaults to master) and a local path (with a slight footnote for this one).

There is a special key here as well that makes this work, import will make west import the projects from the manifest file of that project as well. This means that when running west update on this manifest, west will first clone all projects in this manifest, then run west update on the manifest file in the specified project, and clone all projects from that, with the specified path-prefix for all those. I’ve used this for the Zephyr-include, but not for the mcutools.

Build

In practice, this means that my project structure for the manifest file above after running west update will look like this:

- app_zephyr/
    - .west/
        - config
    - application/
        - boards/
            - arm/
                - ah_1202a/
        - src/
        - CMakeLists.txt
        - prj.conf
        - west.yml
    - deps/
        - mcuboot/
        - modules/
        - tools/
        - zephyr/
    - .gitignore

Your custom board *.dts-files can include all the root overlays from the Zephyr-dependency or any other projects. (I’ve also set this up so VS Code can do IntelliSense of these DTS files, that’s just a matter of setting the correct includePaths.)

From here, you can run west build and have it use your custom board-files, source and everything. In my case:

cd application
west build -p -b ah_1202a -d ../build

Revision Control

One of the benefits of a method like this is the reduced amount of files going into revision control. You don’t need to index all of the Zephyr directory files in your project repo. This is a bad idea anyway, given the size of the project and the almost certain guarantee they will be out of date the next time you pull your project. Locking the Zephyr version in west.yml will ensure that your project is always pulling from the expected version of an SDK or Zephyr repo. Adding a .gitignore file as shown below to your main repository will reduce your total footprint and only capture the unique elements of your project–your application code.

deps/
build/
.vscode/

Build it your way

The first step to building an optimal workflow for your company or personal development process is understanding how your build system works. The above method is far from the only way of doing things, but helps to give more precise control over what is tracked and what is pulled in from external sources.

Zephyr has a number of tools to aid in debugging during your development process. Today we’re focusing on the most available and widely useful of these: printing message to a terminal and enabling logging messages.

New to Golioth? Sign up for our newsletter to keep learning more about IoT development or create your free Golioth account to start building now.

printk() is the printf() of the Zephyr world

Printing useful messages using printf() is a time-tested practice for getting programs up and running. Some frown upon using this as the main debugging approach, but don’t discount how incredibly useful it is as a first step.

printk("String: %s Length: %zd Pointer: %p\n", my_str, sizeof(my_str), my_str);

Zephyr builds this functionality right in with the `printk()` command so that you can have immediate feedback. These messages print out over a serial connection using the same style of conversion syntax as printf(). This automatically converts data types into the printable representation. In my example I’m debugging a string in c by printing out the string itself, the length, and the pointer address. The Linux docs are handy for those looking to drill down into the specifics of printk formatting.

Tip: Pay attention to Zephyr return codes!

Throughout the Zephyr samples you’ll see that it’s standard practice to test return codes and print them out when they are non-zero numbers. I have found these return codes to be indispensable when troubleshooting subsystems like i2c. The paradigm most often used is:

int ret = gpio_pin_configure(dev, PIN, GPIO_OUTPUT_ACTIVE | FLAGS);
if (ret < 0) {
	printk("Pin config failed: %d", ret);
}

You can look up error codes in Zephyr docs. I was getting a -88, which is ENOSYS –function not implemented.

Use the Logging Subsystem as a Powerful Debugging Tool

Once you’ve seen the Zephyr logging subsystem, there is no replacement. Log messages automatically include a timestamp and report on what part of the application they come from. Data can be included in a few different ways, and these messages are queued so that they don’t interfere with time-dependent parts of your program.

Perhaps the best part is that you specify the importance of each message, allowing you to choose at compile time which logging messages will be included in the binary. This means you can pepper your program with debug-level messages and choose to leave them out of the production builds.

How to Enable Logging in Zephyr

To turn on logging, we need to do three things: tell our Kconfig that we want to use the subsystem, include the logging header file, and declare the module name.

Step 1: Add CONFIG_LOG=y to your project’s prj.conf file.

Like all subsystems in Zephyr, we need to tell CMake that we want to use it. The easiest way to do this is by adding CONFIG_LOG=y to the prj.conf file in the project directory.

Step 2: Add the header file to the top of your c file: #include <logging/log.h>

This one is straightforward. We want to use a C library so we have to include it in the main.c (and any other C files in the project).

Step 3: Declare the module

We need to tell the logging module where the message is coming from using a macro: LOG_MODULE_REGISTER(logging_blog);

There are a couple of important things to understand here. First, you will use any unique token you want in this macro, but make sure you don’t surround it in quotes. Second, as I just mentioned, this needs to be unique (in this example this is represented by logging_blog but could be any arbitrary phrase). If you have additional C files in your project, you either need to register different tokens, or more commonly just declare the file as part of the original module: LOG_MODULE_DECLARE(logging_blog);

There is an optional second argument and this is where you choose which logging events will be compiled into the application. By default, debug messages will not be shown, so you can declare your module to enable them: LOG_MODULE_REGISTER(logging_blog, LOG_LEVEL_DBG);. Logging levels run from 0 to 4 using the following suffixes: _NONE, _ERR, _WRN, _INF, _DBG.

How to use the Logging subsystem in Zephyr

Using the logging subsystem is just as easy as using printf: LOG_INF("Count: %d", count);. The log outputs for this example look like this:

[00:01:52.439,000] <inf> logging_blog: Count: 112
[00:01:53.439,000] <inf> logging_blog: Count: 113
[00:01:54.439,000] <inf> logging_blog: Count: 114
[00:01:55.439,000] <inf> logging_blog: Count: 115
[00:01:56.439,000] <inf> logging_blog: Count: 116
[00:01:57.439,000] <inf> logging_blog: Count: 117
[00:01:58.439,000] <inf> logging_blog: Count: 118
[00:01:59.439,000] <inf> logging_blog: Count: 119

We begin each line with a timestamp down to microseconds, the severity level (inf for INFO), followed by our printf-style message output. Look at the timestamps in this example–they are exactly one second apart. This drives home the power of the queuing system: the messages arrive at the terminal slightly delayed, but they don’t alter the k_msleep() timing that was used for this example.

You can use four different built-in severity levels for your logs by choosing a different macro: LOG_ERR(), LOG_WRN(), LOG_INF(), LOG_DBG(). Setting these different levels allows you to choose what gets included at compile time. If you made all your debugging messages using printk(), they will always compile into your code until you remove them from the C file. If you use LOG_DBG(), you can choose not to include that level of logging when you compile the production version of your code.

By default, debug-level messages will not be shown. As mentioned earlier, you have the option of specifying the maximum severity level to show whey you register your modules.

Hex dumping via logs

Logging lets you dump data arrays without the need to turn that data into something digestible first.

LOG_HEXDUMP_INF(my_data, sizeof(my_data), "Non-printable:");

I’ve given an array of values, the length of that array, and a string to use as the label for the log message. The logging subsystem will automatically show the hexadecimal representation of that data, as well as a string representation to the right, much as you’d expect from any hexdump program (you’ll need to scroll right in the example below).

[00:00:00.427,000]  logging_blog: Non-printable:
                                       01 fe 12 00 27                                   |....'            

Other debugging tools for next time

Using logging can give you enough feedback to solve the majority of your development issues in Zephyr, but of course there are other tools available. In future posts we’ll discuss using virtual devices via QEMU to speed up debugging sessions because you won’t have to flash newly compiled binaries to hardware. And we plan to dive into on-chip debugging that lets you set break points and step through your code. Stay tuned!

See it in action: Zephyr Debugging demo video

Troubleshooting high complexity systems like Zephyr requires more thorough tools. Menuconfig allows users to see the layers of their system and adjust settings without requiring a complete system recompilation.

The troubleshoot loop

Modify, compile, test.

Modify, compile, test.

Modify, compile, test.

Modify, compile, test.

How do we break out of this loop of trying to change different settings in a program, recompiling the entire thing, and then waiting for a build to finish? Sure, there are some tools to modify things if you’re step debugging, such as changing parameters in memory. But you can’t go and allocate new memory after compiling normally. So what happens when you need to change things? You find the #define in the code, change the parameter, and recompile. What a slow process!

Moving up the complexity stack

We move up the “complexity stack” from a bare-metal device to running a Real Time Operating System (RTOS) in order to get access to higher level functions. Not only does this allow us to abstract things like network interfaces and target different types of hardware, but it also allows us to add layers of software that would be untenable when running bare-metal firmware. The downside, of course, is that it’s more complex.

When you’re trying to figure out what is going wrong in a complex system like Zephyr, it can mean chasing problems through many layers of functions and threads. It’s hard to keep track of where things are and what is “in charge” when it comes time to change things.

Enter Menuconfig

Menuconfig is a tool borrowed from Linux development that works in a similar way: a high complexity system that needs some level of organization. Obviously, in full Linux systems, the complexity often will be even higher than in an RTOS. In the video below, Marcin shows how he uses Menuconfig to turn features on and off during debugging, including with the Golioth “hello” example. As recommended in the video, new Zephyr users can also utilize Menuconfig to explore the system and which characteristics are enabled and available.

 

 

Every IoT project needs to provision devices that are going to be available in the field. Leveraging open standards, Golioth cuts down on the required time and hassle for IoT development teams.

Provisioning is a critical step in IoT projects when they go to production. Unfortunately, this process remains a mystery for many engineers due to lack of information about the process. At a high level, provisioning is passing configurations and credentials to an IoT device so it can connect securely to the cloud. Once provisioned, the device can send telemetry, receive commands, or be updated (by OTA DFU) when it’s out in the field. How you provision a device depends a lot on the use case. 

(click the image above to see the full diagram)

Example use cases

First, let’s examine a customer-facing product like a smart light bulb. In this scenario, the first step would be for the user to provide WiFi credentials to connect to the user’s home network. On the platform side, the device would obtain a new set of credentials to connect to the backend services. These credentials would be specific to that particular user and device. Later, the user might decide to clean up the device to sell it, so the ability to remove device configurations and deleting a given set of credentials is important. This is a perfect example for using BLE provisioning like shown in the video below.  The user experience is seamless with any existing mobile app used for controlling the bulb and reporting data back from the end device.

Next, we’ll consider factory-level provisioning. An example device like a cellular asset tracker would be pre-provisioned at the factory before being used by your customer. Later the user will only associate that device with their account, but the credentials to talk to the cloud are already set on the device. This can be done as part of the manufacturing process, probing the device via Serial/UART to get the device hardware ID, provisioning it to the cloud, and sending credentials back to the device via the same transport. We can even have different firmware that will only provision in the factory. The device accepts the initial device configuration and saves the credentials to flash. Subsequent firmware that doesn’t have that initial feature enabled, making sure external parties can’t change or reverse engineer the initial configuration.

There are myriad ways that provisioning can be done. Each instance will depend on the factory environment, the capabilities of the user, and on the end application. The video below is a setup similar to the first example explained above, using a Bluetooth application to read and then program the end device, all while working with the Golioth cloud.

Our demo application

As you can see in the video, we developed an end-to-end sample that shows a practical scenario of provisioning IoT devices with a native mobile app, talking with an IoT device over Bluetooth, and provisioning device/credentials in Golioth Cloud. We leverage different tools for doing so:

  • MCUmgr as the device management subsystem and protocol.
  • Zephyr as the real-time operating system, that implements MCUmgr.
  • Open-source mobile SDK to integrate MCUmgr on an app
  • Golioth’s API and the Device/Credentials Management capabilities. 

The MCUmgr community developed multiple types of transports to interact with devices, a benefit of MCUmgr being an open standard and having a vibrant community. One option is to communicate with the device over serial UART using the `mcumgr` cli or even integrate that into your own set of provisioning tools. Another option is to use a mobile SDK that implements MCUmgr protocols over BLE to talk with devices.

We took the Bluetooth approach and forked Nordic’s MCmgr Example application, adding communication with Golioth APIs to manage devices. Once we discover the name of the device, we assign credentials via the REST API and securely send them over Bluetooth to the end device. The device is running one of Golioth’s samples that accepts dynamic configuration for WiFi and DTLS Pre Shared Keys to talk securely with our cloud. The device uses a different Golioth service called LightDB. Using this configuration engine, we can publish the on/off state of the light bulb using LightDB,show that data on a UI, and even send commands to change the state on the device. 

Source code for the mobile app:

More details on how to use our REST API and how to generate API Keys can be checked on our docs website.

References

A possible solution

Let’s pretend you’re in the middle of a global chip shortage.

Surprise! There’s no need to pretend, as we are all currently in the middle of a global chip shortage. Right now it’s very difficult to source certain components.

“Why don’t hardware makers just switch out the components when they can’t source them during the chip shortage? In fact, why don’t people switch chips on a regular basis?”

As a generalization, switching costs for embedded devices are very high. If we were able to magically solve all of the switching costs for the hardware, you’d still need to deal with the switching costs of firmware and cloud platforms. This often is even more dire than than the hardware switching costs. It’s significant at both an individual level (rewriting firmware to target different architectures and board setups) and at an institutional level (maintaining different platforms and interoperability).

Operating systems and Real Time Operating Systems (RTOS) help by abstracting away a lot of the individual hardware details. When a new device is added to an RTOS, it needs to fit within the constraints of the system. If your board has an i2c sensor on it, you need to ensure your supporting firmware for that board or chipset capable of working with the elements of the RTOS. Then you can take advantage of the drivers already written for other boards/chipsets on the platform. Assuming you are willing to work within that system, you can start to supercharge your development. It’s possible to switch out components quickly and confidently, helping to alleviate the woes of the chip shortage currently underway.

Making the switch

Let’s say you have a board using an ESP32 module. Due to sourcing problems, you can no longer source a particular LED on your board, but you don’t want to change your PCB. Instead, you ask a technician wire in an extra LED to a spare pin you have on your production board that has a larger landing area. You need to build a firmware image to drive a different pin on the microcontroller than you previously were using. Now “LED2” (as it’s called in your program) is not driving pin 18, but is instead driving pin 22. With Zephyr overlays, the switch will take 5 minutes. As Marcin shows in the video below, the device tree overlay is where we map the signals internal to the firmware to the physical pins being used.

Now let’s say you cannot source the ESP32 at all, for some reason. You could create a new overlay file for a different target that works with Zephyr, assign the pins to target the functions you need on a new PCB containing a different chip, and then target that device. The time consuming aspect would be checking all of the functions are performing the same as your previous platform. But once you have decided on a new platform, assigning pins and functions to your new device would occur through overlay files.

How to use Zephyr Overlays

In the video below, we  walk through the location and function of overlays in Zephyr. Marcin explains that customization of firmware images for particular hardware targets can be as simple as a different flag on the command line. In this particular example, we are showing how to change the pins for the ESP32 demo. Previously the ESP32 overlay was shown as part of our our LightDB Sample code (docs), which targeted an ESP32-DevKitC in that video.

About The Zephyr Project

The Zephyr Project is a popular and open source Real Time Operating System (RTOS) that enables complex features and easy connectivity to embedded devices.  The project is focused on vendor participation, long-term support, and in-depth security development life cycles for products.

About Golioth

Golioth helps users to speed up development and increase the chances that pilots will be put into production with a commercial IoT development platform built for scale. We offer standardized interfaces for connecting embedded devices to the cloud and build out software ecosystems that allow your projects to get to market faster. Golioth uses Zephyr as part of the Golioth SDK to bootstrap application examples and show how to utilize the range of networking features Golioth enables via APIs.