Golioth wants to make embedded engineers into better IoT engineers. That’s why we do a training where we not only show people how Zephyr works, but also how they can connect a device to the cloud and immediately get the benefit of streaming data, pipelines, Over-The-Air (OTA) updates, settings, Remote Procedure Calls (RPCs), and more.

After another successful year of Zephyr training, we will be holding our final training of 2024 on December 4th. Sign up today!

Changes over time

As Zephyr goes, so goes the Golioth Firmware SDK. Meaning, we are always updating our helper libraries and Zephyr implementation when new features are ready in the ecosystem. The latest changes from Zephyr 3.7 and NCS 2.7 have been incorporated into the Golioth SDK and are now part of the training as well. Not a lot has changed for our beginner content, but you’ll see things like the new board format while you’re compiling Zephyr programs.

By attending our upcoming training, you’ll gain hands-on experience with the latest version of our SDK and learn best practices for Zephyr development directly from the Golioth team. The training will cover topics such as:

Whether you’re new to Zephyr or an experienced embedded developer, this training will equip you with valuable skills and knowledge to build robust, secure, and scalable IoT solutions.

What’s Next?

As the winter begins in the northern hemisphere, many people find themselves inside with more time on their hands. Taking the Golioth training will unlock your Zephyr skills but also will have you on the path to becoming a better IoT developer. Use the same boards and underlying tools to build out your next IoT project or even utilize our Follow Along Hardware designs to replicate one of our many Reference Designs. As your data gets more and more complex, you can push that data through Pipelines to our variety of services, including many of our AI destinations for high complexity processing.

The best way to get started is to sign up for training today…we’ll see you on December 4th!

What if you kept the “core” or the “essence” of a board the same throughout your designs and only changed the peripherals? You could build a range of products that had similar processing capabilities, but might be completely different products. By defining an abstraction layer at the edge of the core module you design, it’s possible to achieve efficiencies for your variety of products, including when those products are manufactured over time.

The above video is a talk given at the Zephyr meetup hosted by NXP at Embedded World North America in October of 2024. I focus on modular hardware design for scalability and adaptability. The Drachm (pronounced “dram”) module is something we have been regularly streaming on YouTube. This talk describes some of the motivations, the challenges of unconstrained designs, and how to build modular systems that take advantage of Zephyr RTOS capabilities.

Many SKUs, One Core

The Drachm module was conceived to be an extension of the Aludel Platform. We currently target a range of different verticals and end-applications with Golioth Reference Designs. But these are somewhat bulky: made to accept MikroBus Click boards, the case measures roughly 80 x 80 x 40 mm. What if we want to go smaller, but not re-design the board every time we have a new product idea?

The Drachm module was meant to achieve this on multiple fronts. The castellated edge design would allow us to have a standard offering that covers a range of peripheral we want to hook up to the core module. But the module itself can change as well. The different versions of the Drachm module offer different capabilities of communication, processing power, and peripheral access on the underlying silicon. As in many modular systems, we trade off flexibility for an overall reduced feature set.

The Challenge of Hardware Changes

Modularity at the module connector edge benefits long term manufacturing as well. Building scalable IoT hardware requires constant adaptability. Component shortages, changing device requirements, and new tech trends demand flexible designs. Traditional methods, such as swapping processors or upgrading communications modules, can be tedious and impractical without a modular system.

How it works in Zephyr 

This talk built on top of the article we published about abstract hardware interfaces in Zephyr. The key change is how we think of each component of the design. The module is being mounted to a carrier board, so it seems like a component of the carrier board’s BOM and software system. But in fact, we need to think about the carrier board as a “shield” (in Zephyr/Arduino) parlance, and instead map pins based on how we define the module edge (castellated pins). See the linked article on how to use predefined node labels and GPIO nexus nodes to achieve those ends.

Another key piece is that each module variant would need to maintain its own board files (Devicetree and Kconfig) to match up the pins on the castellated edge connection to the silicon onboard. The “analog input 1” pin on the drachm-1 variant (nRF9151 + ESP32-C3) would have a different mapping than the drachm-2 variant (nRF52480 + BG77), but to the parts on the shield / carrier board, it’d be all the same, since they only interface via the “analog input 1” nomenclature.

How it fits with Golioth

Golioth works great with customized hardware, which is subject to the harsh realities of sourcing. Engineers who need to redesign their hardware to upgrade or even just keep their production line running don’t want to re-do every aspect of their design, including firmware and cloud software. That’s why engineers are increasingly choosing Zephyr to target many hardware SKUs and why they’re choosing Golioth to help create a seamless experience for those variety of targets.

Using Golioth’s features like Blueprints, Tags, and our new OTA Cohorts capability an engineer could deliver different main firmware updates to boards using drachm-1 or drachm-n modules, but still have a consistent set of AI models being delivered to all eligible units. Golioth can also help to target different cohorts in your fleet to ensure the right updates are going to the right devices in the field.

Want to create a more resilient fleet? We can help! Our Solutions Engineering team can help to introduce more modularity and have a robust OTA strategy for your fleet. If you’re interested in doing it yourself but you’d like to talk through the options with Golioth team members, head over to our forum to start a conversation there.

Slides

Golioth-Module-Abstraction-Layers-Zephyr-Meetup-Oct-9th-2024
How to publish GitHub Actions summaries

Automated testing is core to maintaining a stable codebase. But running the tests isn’t enough. You also need to get useful information from the outcome. The Golioth Firmware SDK is now running 530 tests for each PR/merge, and recently we added a summary to the GitHub action that helps to quickly review and discover the cause of failures. This uses a combination of pytest (with or without Zephyr’s twister test runner), some Linux command line magic, and an off-the-shelf GitHub summary Action.

Why summaries are helpful

A table with a different hardware device in each row and columns for passed, failed, skipped, and time

Test summary shown on GitHub Actions page

From this summary you can pretty quickly figure out if failures are grouped around a particular type of hardware, or across the board. Clicking the name of a dev board jumps to another table summarizing the outcomes.

A table with test names in each row and columns for passed, failed, skipped, and time

JUnit XML board summary

Reviewing the summary for the mimxrt1024_evk board, we can see that six test suites were run, with one test in the “ota” suite failing. Clicking that test suite name jumps to the block under the table that shows the error. In this case, we test all possible OTA reason and state codes the device can send to the server. This test failed because there is a latency issue between the device sending the code and the test reading it from the server.

While this doesn’t show everything you might want to know, it’s enough to decide if you need to dig further or if this is just a fleeting issue.

It’s worth noting that we also use Allure Reports (here is the latest for our main branch) to track our automated tests. This provides quite a bit more information, like historic pass/fail reporting. I’ll write a future post to cover how we added those reports to our CI.

Generating test summaries

Today we’re discussing test summaries using JUnit XML reports. Pytest already includes report generation for JUnit XML which makes it a snap to add to your existing tests. Browsing through pytest --help we see a simple flag will generate the report.

--junit-xml=path      Create junit-xml style report file at given path
--junit-prefix=str    Prepend prefix to classnames in junit-xml output

For our testing, we simply added a name for the summary file. Here’s an example using pytest directly:

pytest --rootdir . tests/hil/tests/$test      \
  --some-other-flags-here                     \
  --junitxml=summary/hil-linux-${test}.xml

If you use Zephyr, running twister with pytest already automatically produces JUnit XML formatted test reports at twister-out/twister_suite_report.xml.

Gathering and post-processing reports

Gathering up Twister-generated reports is pretty easy since Twister already batches reports into suites that are uniquely named by test. Since we’re running matrix tests, we give the files a unique name and store them in a summary/ folder in order to use the artifact-upload action later.

- name: Prepare CI report summary
  if: always()
  run: |
    rm -rf summary
    mkdir summary
    cp twister-out/twister_suite_report.xml summary/samples-zephyr-${{ inputs.hil_board }}.xml

On the other hand, the integration tests we run using pytest directly require a bit more post-processing. Since those generate individual suites, a bit of xml hacking is necessary to group them by the board used in the test run.

- name: Prepare summary
  if: always()
  shell: bash
  run: |
    sudo apt install -y xml-twig-tools
    xml_grep \
      --pretty_print indented \
      --wrap testsuites \
      --descr '' \
      --cond "testsuite" \
      summary/*.xml \
      > combined.xml
    mv combined.xml summary/hil-zephyr-${{ inputs.hil_board }}.xml

This step installs the xml-twig-tools package so that we have access to xml_grep. This magical Linux tool is able to match xml summary output filename patterns, unwrapping the testsuites item in each and rewrapping them all into one testsuites entry in a new file. This way we get a summary entry for each board instead of individual entries for every suite that a board runs.

As noted before, we use the upload-artifact action to upload all of these XML files for summarization at the end of the CI run.

Publish the comprehensive summary

The final step is to download all the XML files we’ve collected and publish a summary. I found the test-reporting action is excellent for this. We use a trigger at the end of our test run to handle the summary.

publish_summary:
  needs:
    - hil_sample_esp-idf
    - hil_sample_zephyr
    - hil_sample_zephyr_nsim_summary
    - hil_test_esp-idf
    - hil_test_linux
    - hil_test_zephyr
    - hil_test_zephyr_nsim_summary
  if: always()
  uses: ./.github/workflows/reports-summary-publish.yml

Note that this uses the needs property to ensure all of the tests running in parallel finish before trying to generate the summary.

name: Publish Summary of all test to Checks section

on:
  workflow_call:

jobs:
  publish-test-summaries:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout repository
      uses: actions/checkout@v4

    - name: Gather summaries
      uses: actions/download-artifact@v4
      with:
        path: summary
        pattern: ci-summary-*
        merge-multiple: true

    - name: Test Report
      uses: phoenix-actions/test-reporting@v15
      if: success() || failure()
      with:
        name: Firmware Test Summary
        path: summary/*.xml
        reporter: java-junit
        output-to: step-summary

Finally, the job responsible for publishing the test summary runs. Each different pytest job uploaded its JUnit XML summary files with the ci-summary- prefix, which is used now to download and merge all files into a summary directory. The test-reporting action is then called with a path pattern to locate those files. The summary is automatically added to the bottom of the GitHub actions summary page.

One note on visibility with this GitHub action: we added the output-to: step-summary after first implementing this system. Ideally, these summaries should be available as their own line item on the “checks” tab of a GitHub pull request. But in practice we found they got lumped in as a step on a random job often unrelated to the HIL tests. Outputting to the step-summary ensures we always know how to find them. Go to the HIL Action page and scrolling down past the job/step graph.

Making continuous integration more useful

There’s a lot that can go wrong in these types of integration tests. We rely on the build, the programmer, the device, the network connection (cellular, WiFi, and Ethernet), the device-to-cloud connection, and the test-runner to cloud API to all work flawlessly for these tests to pass. When they don’t, we need to know where to look for trouble and how to analyze the root cause.

Adding this summary has made it much easier to glean useful knowledge from hundreds of tests. The next installment of our hardware testing series will cover using Allure Reports to add more context to why and how often tests are failing. Until then, check our our guide on using pytest for hardware testing and our series on setting up your own Hardware-in-the-Loop (HIL) tests.

As we previously wrote about, we attended the first Embedded World North America, held in Austin, Texas on October 8-10th, 2024. Part of our time there was to showcase Golioth’s Cloud + low power capabilities at the Joulescope booth. In this post and video, we’ll explain how to trigger different low power modes and how we measured the output.

Goals for the demo

When we were invited to showcase alongside Joulescope, I knew I wanted to be able to turn off elements of the PCB that I believed consumed power. This meant two things:

  1. Reviewing how Golioth can trigger actions from the cloud
  2. Calling the appropriate APIs from Zephyr

The first part I have done many times before. I targeted using Remote Procedure Calls (RPCs) and Settings to push information to the device. However, I could have also used LightDB State. Each has their place in different IoT setups, but as I’ll explain later, Settings seemed to work best for low power contexts. I wrote a Guide on how to add a new setting to a Golioth project a couple days ago.

All of the code mentioned here is targeted at our Aludel Elixir board. The basis for the code is our Reference Design Template, which also targets the nRF9160-DK, but that is off-the-shelf hardware that already has the kinks figured out (thanks Nordic!), so there was less interesting stuff I wanted to do. I will try to target that hardware in a future post as well.

Calling APIs to trigger lower power behavior

As you can see in the Golioth Joulescope demo repo on GitHub, the APIs we call are captive in app_rpc.c and app_settings.c, since that’s where we triggered these actions. I created two new RPCs to turn on rails and see what happened. The most extreme was the 3V3 rail, which controls the sensor + WiFi section of the board using a “downstream” power switch. You can see in the video that I trigger this and the power draw goes up nearly 5x. I also was able to use the same logic for the 5V rail. Both of these utilized the “Regulator” subsystem in Zephyr. I needed to modify the device tree using an overlay file to make sure these two rails were not turned on by default (an article for another day, basically the opposite behavior of this article). I triggered the behavior using these API calls:

I also wanted to trigger less frequent communication with the cloud, simulating a “sleepy device”. For this, I leaned on eDRX or Extended Discontinuous Reception. I had been calling the API using an RPC, but it wasn’t super optimized for low power modes. For instance, if I have eDRX mode enabled, I won’t be checking into the tower all that often; if I call the RPC to disable eDRX, there’s a chance the RPC will timeout in our 10 second window. Interestingly, we figured out that the first call will get cached at the tower (after a 10 second timeout) and the second will often return as a failure. Either way, it’s not a super reliable way to both send a command to a device and ensure it’s properly been received (nor when it has been received).

Instead, I moved the eDRX to use the Golioth Settings Subsystem. This worked great because the change in setting is transmitted to the device and it will send a callback to the server whenever it has been received. This is purpose built for asynchronous operation in low power states. Until the server receives the GOLIOTH_SETTINGS_SUCCESS enum back from the device, the Console will show that the device as “Synchronized” or “Out of sync”. Now whenever the device in eDRX mode is checking back in with the tower, it will see that an update is available and will do the synchronization.

Other modes we enabled

In addition to triggering different APIs remotely, I also set up the board to be in a lower power state to start with. This included turning off a good portion of the peripherals that would draw power. I followed Marko’s article for optimizing power on nRF9160 boards.

I had moved the shell output from UART0 to UART1 on the Elixir board (utilizing the MikroBus headers and plugging in a USB to Serial converter) because I was not using the USB to serial chip. I didn’t want to have the 5V from the USB interfering with the measurement, nor did I want to have that chip (CP2102) siphoning power from the rest of the board unknowingly. Even after all the work that I did the move over to using UART1, it was a pretty large power hog (<500 uA – 1 mA) because it’s always listening for commands. Instead, I relied on the Golioth Logging Subsystem, so I could see what was happening on the cloud. Since we were sending on an infrequent basis, adding in a couple of logs didn’t add much to the overall power when the device woke up from “sleepy mode”. If they did, I could always turn down the logging level using our standard set_log_level RPC in the Reference Design Template. See the demo video above for more detail.

How the measurement works

The Joulescope is a great tool for measuring low current applications. It has a 1 nA resolution, but is accurate to around 30 nA out of the box (way more accuracy than I need, sad to say); it also has super fast range switching, so catching spikes during RF transmissions are captured as well. You can see different modes activated in the demo that we shot with Matt Liberty at Embedded World North America.

In this setup, we have an off-the-shelf power supply that is simulating the Lithium-Ion battery we normally have plugged in. Then we use the Joulescope to measure the voltage and current as we flow current through the 2 mm JST battery connector on the Elixir board. We’re able to capture peaks and sleep currents of the design in different modes. See the video above for live views.

Power monitoring as a troubleshooting tool

One interesting behavior is the NAT timeout that Dan mentioned last week. I had been playing around with different KConfig settings, trying longer and longer keep alives and timeouts. With Dan’s help, we narrowed in on the 2 minute mark as a timeout, as explained in the “Configuring the Golioth Firmware SDK for Sleepy Devices”. Seeing the device using outsized amounts of current despite us having Connection ID enabled helped us narrow in on different parts of the system. In that case it was the NAT for our MVNO, and the fact that Connection ID wasn’t configured correctly on my project. Once we pushed a change for the Connection ID code, we were able to push the sleepiness of the device even further out. Connection ID also obviates the need to worry about NAT timeout for any particular MVNO, which means the Golioth SDK is helping to standardize offerings from different carriers.

Future Improvements

The cool thing about this kind of activity/demo is that it allows us to isolate and measure the power impact of each action. That means we can assign a “cost” to things like:

  • Doing an OTA update (including pushing out various artifacts using Cohorts)
  • Sending single log messages
  • Sending single Stream messages to push to different services using Pipelines
  • Connecting to the tower and the value/cost of ConnectionID

As I mentioned above, I’d also love to target some of this behavior directly at the nRF9160-DK and other development boards. It can be useful to be able to remotely trigger lower power modes, but it’s often tied directly to the capabilities of boards.

We really enjoyed optimizing for lower and lower power and will be writing more about this topic in the future. If you have suggestions, please let us know on the forum. If you want help getting your IoT device to lower and lower current levels, please get in touch with Golioth Solutions.

Zephyr has all of the bells and whistles. Your project only needs a handful of them. But which handful? To be fair, you can build with every possible module in your local tree and only the necessary bits will be pulled in. But wouldn’t it be nice to know exactly which modules need to be added to a manifest allow list? Answer that question and your users won’t be stuck cloning tons of unnecessary files. That could save time on each build, which really adds up over the course of a project’s life.

The west meta-tool used by Zephyr includes a package management system based on manifest files, often called west.yml. Part of the power of this system is that manifest files may inherit other manifest files. The downside to this is that you may be cloning a large number of packages your project will never use. Limit this by using an allow-list in your manifest. But what packages do you need to add to your allow list?

There is no answer to this question

Let’s be up-front about this: there is no definitive answer to this question.

Your project needs to allow all of the modules it uses. Sometimes that means modules that are enabled for some builds and disabled for others. For instance, the Golioth Firmware SDK includes example apps that will build for Espressif, Nordic, and NXP processors. Each have their own HAL but only one of them is used in any given build. You can’t really programmatically generate a modules list in a case like this; you just need to know these packages are needed, even if currently not in the build.

Even without an automated tool, I’ve had to answer this question for myself and I have some pointers on how to approach the problem.

The low-hanging fruit: check your build directory

The first thing you need to do is make sure your project builds without an allow list. That all files inherited from Zephyr or from NCS (Nordic’s Zephyr-based nRF Connect SDK) will be included from the build.

manifest:
  projects:
    - name: zephyr
      revision: v3.7.0
      url: https://github.com/zephyrproject-rtos/zephyr  
      west-commands: scripts/west-commands.yml
      import: true

This manifest will include dozens of modules available from the upstream Zephyr repository. There isn’t actually anything wrong with that. You clone the modules once and they live on your hard drive. But, it does take a long time to clone all of them and it will occupy several gigabytes of space. And it’s a good practice to know exactly which packages are actually in use. So let’s try to limit what is cloned in the future.

Directory listing with a few dozen Zephyr modules names shown

The build/modules directory from a Zephyr app

Above is a listing of the build/modules directory from a Zephyr application. All of these modules were scanned during the build process, but almost none of them have any object files that will be used in the build.

├── hal_rpi_pico 
│   ├── CMakeFiles 
│   └── cmake_install.cmake 
├── hal_silabs 
│   ├── CMakeFiles 
│   └── cmake_install.cmake 
├── hal_st 
│   ├── CMakeFiles 
│   └── cmake_install.cmake 
├── hal_telink 
│   ├── CMakeFiles 
│   └── cmake_install.cmake

In fact, we can use this to help us find the modules that are actually at work in a project. Here’s a one-liner you can run from the build/modules directory to get a list of modules we know are needed for this build:

➜ find . -type f -not -name "cmake_install.cmake" | cut -d/ -f2 | uniq
mbedtls
golioth-firmware-sdk
zcbor
hal_nxp

Let’s add this these modules to an allow-list and move to the next step.

manifest:
  projects:
    - name: zephyr
      revision: v3.7.0
      url: https://github.com/zephyrproject-rtos/zephyr
      west-commands: scripts/west-commands.yml
      import:
        name-allowlist:
          - mbedtls
          - zcbor
          - hal_nxp

The trial-and-error step

Okay, the easy part is behind us. Now it’s time to figure things out the hard way. Begin by removing your module sources. These are usually in a modules directory that is a sibling of the zephyr directory where the Zephyr tree is stored. Check carefully that you do not have any uncommitted changes in these modules before removing them from your local storage. (I’ve learned this the hard way.)

Next, add an allow-list with the modules we found in the previous section. Run west update to clone the modules. This should happen rather quickly as we’ve greatly narrowed down what will be checked out. Try to build your application. If it fails, we need to divine which module was missing from the build and add that to the allow-list.

warning: HAS_CMSIS_CORE (defined at modules/cmsis/Kconfig:7) has direct dependencies 0 with value n, but is currently being y-selected by the following symbols:
 - CPU_CORTEX_M (defined at arch/arm/core/Kconfig:6), with value y, direct dependencies ARM (value: y), and select condition ARM (value: y)

Part the build error is pointing to a modules/cmsis. If you look in the west.yml from the Zephyr tree you’ll see there is indeed a module named cmsis. We can add to our allow list, run `west update`, and then rebuild.

Guess what? That was it… the project now builds! Here’s what my entire manifest looks like:

manifest:
  projects:
    - name: zephyr
      revision: v3.7.0
      url: https://github.com/zephyrproject-rtos/zephyr
      west-commands: scripts/west-commands.yml
      import:
        name-allowlist:
          - mbedtls
          - zcbor
          - hal_nxp
          - cmsis

  self:
    path: modules/lib/golioth-firmware-sdk
    west-commands: scripts/west-commands.yml
    userdata:
      patches_dirs:
        - patches/west-zephyr

Note that the golioth-firrmware-sdk was one of the modules our search of the build directory turned up. But since that module is being added explicitly in this manifest file, it doesn’t need to be on the allow-list for the inherited Zephyr manifest.

Take control of your manifest with allow lists

Knowing exactly what libraries are being used in your build is part of good project management. Since manifest files let you target libraries and modules with version tags or commit hashes, this locks your project to a known-working state. I’m a huge advocate of this and gave an entire talk about Zephyr manifest files at the Embedded Open Source Summit.

Limiting your manifest files to libraries you are explicitly using helps you understand when upstream dependencies change. It may be a bit of a hassle to go through this process the first time, but doing so is a basic form of vetting your build and your product will be better for it.

We love getting out into the real world and meeting Golioth users (and future users!). We’ll be hitting the road once again October 8-10th for the inaugural Embedded World North America! The popular conference, regularly held in Nuremberg in the spring, is rolling into Texas next week for an autumn, US-based version of the conference. Members of our Operations Team and Developer Relations Team will be on-site to discuss Golioth and check out the latest and greatest work from our partner companies.

Low power demo

I (Chris) will be showcasing a demo and helping out at the Joulescope booth, utilizing the JS220 to measure low power on our open source Aludel Elixir board. We’ll show different power modes, enabled over a cellular connection triggered from the Golioth Console. We’re able to turn different things on and off and see how it impacts current draw live. In a true meta capability, we can even see how much power it takes to send the command to use less power!

Joulescope is a favorite of our customers for measuring low power modes with high dynamic range. Stop by booth #1723 in the Austin convention center to see the Joulescope and the Golioth demos live.

The Joulescope JS220 that will be featured in the Golioth demo

Other events

As with any conference, there are night time events we like to take part in as well. We’d love to see you at some of these gatherings!

  • IoT Stars – This is a popular after-hours event at Embedded World DE, as well as at Mobile World Congress. Laurens and Marc put together a technical program and have a networking/social hour.
  • Zephyr meetup at NXP – I (Chris) will be speaking at this meetup about abstraction at the module layer of a design and how we have been creating the Drachm (“dram”) module on hardware livestreams.

See you there!

Just like any in-person event, we love the opportunity to catch up with partners, clients, and new contacts. If you have a project you need help with, we’re happy to discuss how Golioth Solutions can help you get off the ground, or how you can take Golioth for a spin for your next IoT project. Please get in touch if you’d like to meet during the conference.

We’re huge fans of Zephyr around here. We have been targeting the popular RTOS and Ecosystem since the start of Golioth, including our latest Firmware SDK release (0.15.0). And while I’ve been here a good chunk of that time, my background is in hardware and I still struggle with some simple things in Zephyr. So I thought I’d write about one of them.

Let’s start with a gripe: There is no “GPIO” example in Zephyr.

What’s that, you say? There’s a blinky example? That’s correct! The blinky example is the canonical starting point for all Zephyr dev boards and is arguably one of the simplest samples. And this does indeed utilize GPIO to turn LEDs on and off! So let’s go look at the blinky example before we get back to my gripe.

The Blinky Example

From the Zephyr docs page for the blinky sample, it tells you to build for something like the Reel Board using this command:

west build -b reel_board samples/basic/blinky
west flash

A bit further down the page, you see that you need to have an overlay file (or a native node) for your device tree to contain the following:

/ {
     aliases {
             led0 = &myled0;
     };

     leds {
             compatible = "gpio-leds";
             myled0: led_0 {
                     gpios = <&gpio0 13 GPIO_ACTIVE_LOW>;
             };
     };
};

Inner hardware engineer dialog: “What the heck am I looking at right now?”

Overlay Files and DeviceTree

I’ll pause here and quickly point you at some resources. When I was getting started, I found this to be one of the most confusing parts of Zephyr. Ultimately I know that I am going to solder a part down to the board and that there’s a physical pin that the signals will squirt out of. Depending on the IDE or Ecosystem I’m using, there might be a configurator tool, or a config file, or a set of registers I need to decode in order to get a pin to do what I want. P0.13 on the nRF52840 of the Reel Board (as shown in the example above) is going to be blinking an LED on and off and, so I want to set that…but how?

Zephyr has a “DeviceTree” that represents the nodes of a particular chip or board. When we’re trying to blink an LED on or off, we will be writing code that searches through that element to find compatible elements, such as gpio-leds. When we write code that will control the LEDs, it will know to apply the data structures we create to the LED elements. If you’re new to DeviceTree, I suggest you don’t travel too far down the rabbit hole at the beginning of your journey (but you’ll get there).

Now, this is a far cry from the experience of many hardware engineers. My earliest control of microcontrollers involved OR’ing logic that basically directly sets and LED by toggling a bit on a memory mapped register. But there are benefits to these new ways! We can have similar code target a wider range of hardware. In fact, the overlay file shown above is just that: we’re only assigning one of the pins to change from its default behavior to instead have different behavior and be mapped as gpio-leds. Another benefit is that we’re building on top of many (many!) default configurations. So in the example above, we’re actually using the reel_board board configuration that is “in-tree” in Zephyr. That is actually built on top of the nrf52840 SOC configuration, also “in-tree”. The overlay is only telling the build system what is going to change. If you’d like to see more about Overlays, one of our most popular posts on the blog is about that very topic.

One important topic is understanding how different files are pieced together when you’re building your project. There are multiple layers of inheritance that includes things like SOC definition, board definition, project configurations, and more. This will be the basis of many of your errors and build problems, especially if you start building custom hardware and firmware. My co-worker Mike always calls out that you should look at the generated devicetree files during builds (even failed ones) in order to suss out what’s wrong. This is great advice.

All the devicetree files (including your overlay files) get combined into one build/zephyr/zephyr.dts file at build time.

But what about a GPIO example?

OK, back to my gripe: There is not a GPIO example in Zephyr.

This results in many projects using gpio-leds as their representation of GPIO. Someone is triggering the pin to the enable a chip on their board and it’s still…an LED? Why is that?

Other times you’ll see someone looking to interrupt a chip with another chip and they use a callback and the button compatible type. That’s not a button!

Oh I’m guilty of it too. On my post where I was driving input of a BJT with a PWM signal to trigger a buzzer I’m also using pwm-leds? What gives?

Bindings

All of the above compatible elements are called “bindings“. This is how the system knows that when I call out pin P0.13 as an LED GPIO, I can assign a struct that matches that type of GPIO and then trigger it accordingly. On the blinky example the top line showcases how that struct is assigned:

static const struct gpio_dt_spec led = GPIO_DT_SPEC_GET(LED0_NODE, gpios);

int main(void)
{
    int ret;

    if (!gpio_is_ready_dt(&led)) {
        return 0;
    }

    ret = gpio_pin_configure_dt(&led, GPIO_OUTPUT_ACTIVE);
    if (ret < 0) {
        return 0;
    }

    while (1) {
        ret = gpio_pin_toggle_dt(&led);
        if (ret < 0) {
            return 0;
        }
        k_msleep(SLEEP_TIME_MS);
    }
    return 0;
}

In this we’re searching the device tree to find LED0_NODE and then pulling that information through to associate the GPIO with the code we’re going to write in the main loop. Note the functions used in the blinky code are not the only way to configure and trigger a GPIO.

In this case we’re using the gpio-leds binding, which as I said, is often co-opted for hardware that isn’t controlling LEDs. This is not the only type of GPIO that is available though. Zephyr expects that you will make a custom GPIO type for your project, which is why there isn’t a generic GPIO example.

The custom_dts_binding Sample

I’ll be honest, I’m really not a fan that one of the “basic” samples in Zephyr is to start making custom types of GPIO. But I kind of get it. The Zephyr project has all types of people and projects walking through the door on a daily basis. So, we’re going to make a custom devicetree binding. Ultimately it isn’t that hard, but it leans on inheritance again. The custom_dts_binding sample shows the following files (as viewed in my VS Code)

In this case nucleo_l073rz.overlay is going to call out this new binding and power-switch.yaml is the actual binding itself. The code inside that yaml is…simple!

description: GPIO pin to switch a power output on or off

compatible: "power-switch"

properties:
  gpios:
    type: phandle-array
    required: true
    description: |
      The GPIO connected to the gate driver for the MOSFET.

All it does is create a single GPIO! But now it’s a custom GPIO type. If we look at the overlay file, it calls out that compatible to match.

/ {
    load_switch: load_switch {
        compatible = "power-switch";
                /* using built-in LED pin for demonstration */
        gpios = <&gpioa 5 GPIO_ACTIVE_HIGH>;
    };
};

And now GPIO A5 is assigned to that one type of new pin. The GPIO types were hiding in plain sight.

OK, but I actually had to learn this for my own uses. I was trying to assign two GPIOs to control the pins on a programmable gain amplifier. If I tried to use the above binding (really mis-using it because it wouldn’t be for a power-switch), I wouldn’t be able to assign two pins because it’s only made for one. It’s called a ‘parent’ node.

Instead, I went and copied the devicetree binding for our old friend gpio-leds and renamed that:

description: GPIO pins to control gain on the INA225

compatible: "ina225-gain"

child-binding:
  description: The GPIOs controlling the pins for gain
  properties:
    gpios:
     type: phandle-array
     required: true
    label:
      type: string
      description: |
        Human readable string describing the LED. It can be used by an
        application to identify this LED or to retrieve its number/index
        (i.e. child node number) on the parent device.

As I’m copying this over to the blog post, I even notice I forgot to change the ‘description’ to match. Yeah, I should (and will) change that.

This allowed me to add the following to my overlay file (among many other nodes):

gain-pins {
    compatible = "ina225-gain";
    gs0: gs0 {
        gpios = <&gpio0 31 0>;
        label = "GS0 pin for INA225";
    };
    gs1: gs1 {
        gpios = <&gpio0 30 0>;
        label = "GS1 pin for INA225";
    };


};

aliases {
    gs0 = &gs0;
    gs1 = &gs1;
};

Notice that like the gpio-leds, I have multiple children elements (thanks to the child-binding) that I can name and then search later. I could have also set up two custom bindings for gs0 and gs1, but that seems like unnecessary duplication of effort.

I added ‘aliases’ so I had an easy way to search the device tree from the code in my application:

static const struct gpio_dt_spec ina225_gs0 = GPIO_DT_SPEC_GET(DT_ALIAS(gs0), gpios);
static const struct gpio_dt_spec ina225_gs1 = GPIO_DT_SPEC_GET(DT_ALIAS(gs1), gpios);

And then further down main.c

err = gpio_pin_configure_dt(&ina225_gs0, GPIO_OUTPUT_HIGH);
if (err) {
    LOG_ERR("Unable to set ina225_gs0 high");
}

err = gpio_pin_configure_dt(&ina225_gs1, GPIO_OUTPUT_HIGH);
if (err) {
    LOG_ERR("Unable to set ina225_gs1 high");
}

Since I actually only needed to have these pins be high all the time, I didn’t need to do anything else, I simply configured them at boot and left them on for the duration of my program.

Notice that the files live in the application I’m building (repeating the image here from above):

When the build system is crawling this application, it sees that there are files in dts/bindings that are custom and which will be added to the other bindings that are already in-tree in Zephyr. Same goes for the boards directory, which contains the overlays. CMakeLists.txt doesn’t need to change unless we’re adding additional files to compile in the src folder.

Aspiring Firmware Engineer

I’ve been saying for the past few years that I’m an “aspiring firmware engineer” because it’s hard to be a hardware engineer without the code to control all this silicon. I think it’s worthwhile to climb up the learning curve and take on new challenges in Zephyr. GPIO is one of many topics that will be part of that curve. As you build your skills, you’ll start to recognize more patterns and reduce your confusion during firmware build. Then you get to take advantage of all the great open source firmware that has been written by the community and you’ll accelerate your next IoT project.

If you’re having problems with this or other Zephyr topics, be sure to stop by our forum!

Zephyr has extensive built-in support for a multiverse of microcontrollers, development boards, and sensors. This is possible because of an abstraction layer that allows anyone to hook their own devices into the system. However, there are a few bits of core knowledge you need to get everything working just right. Let’s work our way through those and discuss how to write a Zephyr device driver!

Overview of Zephyr Device Drivers

You will want to implement most of these pieces to get your device driver up and running:

  • Devicetree binding to define a “compatible” for your device
  • Kconfig symbol to include or exclude the driver from the build
  • Power management support to move between power modes (on, standby, sleep, etc.)
  • A data structure for per-instance data storage
  • An API so user applications may access the driver

None of this is particularly complex, but as a whole, it can be daunting to figure out where to start and how to troubleshoot when something isn’t working correctly.

In preparation for this post, I converted the Golioth Ostentus library (libostentus) into a proper Zephyr device driver. Ostentus is an open source hardware faceplate that adds a user interface to an embedded project using i2c. While it certainly worked before this change, we get a few nice bonuses by making it a driver:

  • The device is now added as a devicetree node
  • The library is automatically selected when a devicetree node is preset
  • The driver will automatically initialize the device before the application begins running
  • Multiple instances of the device may now be included in a single build

Let’s dig in!

Optional Prerequisite: How to Write a Zephyr Module

Zephyr device drivers may be included in your application directory. But in this case, we want to use the Ostentus in numerous Zephyr projects. To accomplish this, we need to make the driver a Zephyr Module. This means it will live in its own git repository and be included in the west manifest of projects that use it.

I’ve previously written about this process. If you need a refresher, check out our post on How to Turn Helper Code into a Zephyr Module.

Tree Overview

.
├── CMakeLists.txt
├── dts
│   └── bindings
│       ├── golioth,ostentus.yaml
│       └── vendor-prefixes.txt
├── include
│   └── libostentus.h
├── Kconfig
├── libostentus.c
└── zephyr
    └── module.yml

We’ll be jumping back and forth through files during this post. For your reference, this tree contains all the files we’ll touch along the way.

1. Create the Binding and add it to the Zephyr Module

Ostentus is an i2c device with no other special considerations. That makes the binding really simple because we just need to include the default i2c binding.

golioth	Golioth

description: "Golioth Ostentus Faceplate"

compatible: "golioth,ostentus"

include: [i2c-device.yaml]

The directory structure includes two files in dts/bindings. Since Golioth isn’t in Zephyr’s existing list of hardware vendors, it’s added to the vendor-prefixes.txt file. (Note that the syntax for this file requires a tab character between the vendor prefix and the vendor name.)

The binding itself uses the <vendor>,<device>.yaml naming convention. That file defines the golioth,ostentus compatible and (as already mentioned) includes the existing i2c device binding.

One place I struggled was in getting Zephyr to properly ingest this binding. Because this is a module, we need to specify a dts_root in zephyr/module.yml so that it will look for our dts directory:

build:
  cmake: .
  kconfig: Kconfig
  settings:
    dts_root: .

2. Set Kconfig to Automatically Enable the Driver

You’ll know your changes made in step 1 are working because a project built with a golioth,ostentus compatible in the devicetree will result in the following Kconfig symbol in build/zephyr/.config:

CONFIG_DT_HAS_ARM_V8M_NVIC_ENABLED=y
CONFIG_DT_HAS_FIXED_PARTITIONS_ENABLED=y
CONFIG_DT_HAS_GOLIOTH_OSTENTUS_ENABLED=y
CONFIG_DT_HAS_GPIO_KEYS_ENABLED=y
CONFIG_DT_HAS_GPIO_LEDS_ENABLED=y

Neat, right? The symbol appears automatically, based on the CONFIG_DT_HAS_<VENDOR>_<DEVICENAME>_ENABLED syntax from the compatible that was defined. This is useful because we can depend upon it to add the library to the build.

menuconfig LIB_OSTENTUS
    bool "Enable the driver library for the Golioth Ostentus faceplate"
    default y
    depends on DT_HAS_GOLIOTH_OSTENTUS_ENABLED
    select I2C
    help
      Helper functions for controlling the Golioth Ostentus faceplate.
      Features include controlling LEDs, adding slides and slide data,
      enabling slideshows, etc.

if LIB_OSTENTUS

config OSTENTUS_INIT_PRIORITY
    int "Ostentus init priority"
    default 90
    help
      Ostentus initialization priority.

config OSTENTUS_LOG_LEVEL
    int "Default log level for libostentus"
    default 4
    help
        The default log level, which is used to filter log messages.

        0: None
        1: Error
        2: Warn
        3: Info
        4: Debug
        5: Verbose

endif #LIB_OSTENTUS

This Kconfig file adds the LIB_OSTENTUS symbol, but only if DT_HAS_GOLIOTH_OSTENTUS_ENABLED is present. In this case, the library symbol was added as a menu with two additional symbols used to set the log level and the initialization priority.

3. Define Typedefs and Custom Device API

Now the real work begins.

This section is all about writing a custom device API. If all you’re after is adding your own sensor to Zephyr, you can pretty much skip this section because all the work has been done for you in include/zephyr/drivers/sensor.h. That’s just one in-tree API you can choose from, so if any of them fit your needs please use one of those.

The Ostentus doesn’t fit into any of the existing APIs so we need to create our own. This happens in a header file named for your driver and placed in the include directory of your driver repository. First, make a typedef that reflects the parameter fingerprint of all the functions you want to call as part of your driver.

typedef int (*ostentus_cmd_t)(const struct device *dev);
typedef int (*ostentus_setval_8_t)(const struct device *dev, uint8_t val);

Now use those typedefs to declare your API.

__subsystem struct ostentus_driver_api {
    ostentus_cmd_t ostentus_clear_memory;
    ostentus_setval_8_t ostentus_led_power_set;
};

This prepares a driver API for use when we define the device instances. In reality there are a couple dozen functions in our actual API that use less than a dozen typedefs. Here’s the relevant code if you’re interested in seeing everything.

4. Define Syscalls and Inline Functions

Now that we have an API, we need inline functions that will call the functions associated with that API.

You have a choice to make these regular functions, or syscall functions. Zephyr offers a User Mode which sandboxes the application. If you want your driver to work for User Mode applications, you need to implement them as syscalls.

__syscall int ostentus_clear_memory(const struct device *dev);

static inline int z_impl_ostentus_clear_memory(const struct device *dev)
{
    const struct ostentus_driver_api *api = (const struct ostentus_driver_api *)dev->api;
    if (api->ostentus_clear_memory == NULL) {
        return -ENOSYS;
    }
    return api->ostentus_clear_memory(dev);
}

__syscall int ostentus_led_power_set(const struct device *dev, uint8_t state);

static inline int z_impl_ostentus_led_power_set(const struct device *dev, uint8_t state)
{
    const struct ostentus_driver_api *api = (const struct ostentus_driver_api *)dev->api;
    if (api->ostentus_led_power_set == NULL) {
        return -ENOSYS;
    }
    return api->ostentus_led_power_set(dev, state);
}

The __syscall directive is used in the function prototype, then when defining the function the z_imp_ is used to prefix the name of the API call. Note the purpose of this inline function is to check that a function was assigned to this API call (we’ll do that in step 6 below), before passing the parameters to that function.

To finish setting up the syscalls we need to add a special include to the end of this file. That include uses the #include <syscalls/[NameOfThisHeaderFile]> format. We also need to tell CMake that this file uses syscalls.

#include <syscalls/libostentus.h>
zephyr_syscall_header(${ZEPHYR_LIBOSTENTUS_MODULE_DIR}/include/libostentus.h)

Once again, there are far more functions defined in the actual driver, which you can see for yourself by viewing the actual header file.

5. Implement the Driver Functions and Assign to the API

Technically, the header file we created in steps 3 and 4 is a generic API that may be reused by any number of different devices. Now we can implement one such device. We’ll use the libostentus.c file to write the device-specific functions, then assign them to our API calls.

#include <libostentus.h>
#include <libostentus_regmap.h>
#include <zephyr/drivers/i2c.h>

static int ostentus_i2c_write2(const struct device *dev, uint8_t reg, uint8_t *data1,
                   uint8_t data1_len, uint8_t *data2, uint8_t data2_len)
{
    const struct ostentus_config *config = dev->config;

    struct i2c_msg msgs[] = {
        {
            .buf = &reg,
            .len = 1,
            .flags = I2C_MSG_WRITE,
        },
        {
            .buf = data1,
            .len = data1_len,
            .flags = I2C_MSG_WRITE,
        },
        {
            .buf = data2,
            .len = data2_len,
            .flags = I2C_MSG_WRITE | I2C_MSG_STOP,
        },
    };
    uint8_t num_msgs = ARRAY_SIZE(msgs);

    /* Detect how many i2c messages there are and which is the last one */
    for (int i = 1; i < ARRAY_SIZE(msgs); i++) {
        if (!msgs[i].len) {
            msgs[i - 1].flags |= I2C_MSG_STOP;
            num_msgs = i;
        }
    }

    return i2c_transfer_dt(&config->i2c, msgs, num_msgs);
}

static int ostentus_i2c_write1(const struct device *dev, uint8_t reg, uint8_t *data,
uint8_t data_len)
{
    return ostentus_i2c_write2(dev, reg, data, data_len, NULL, 0);
}
static int ostentus_i2c_write0(const struct device *dev, uint8_t reg)
{
    return ostentus_i2c_write2(dev, reg, NULL, 0, NULL, 0);
}

static int clear_memory(const struct device *dev)
{
    return ostentus_i2c_write0(dev, OSTENTUS_CLEAR_MEM);
}
static int led_power_set(const struct device *dev, uint8_t state)
{
    uint8_t byte = state ? 1 : 0;
    return ostentus_i2c_write1(dev, OSTENTUS_LED_POW, &byte, 1);
}

This file begins with three functions that handle writing to the device using i2c that aren’t defined in the API. The two functions at the bottom of the file receive device structs (and all other parameters) in a way that matches the typedefs created in step 3. These two functions pass the device struct to the i2c functions to communicate with the device.

Now it’s time to associate these functions with the API.

static const struct ostentus_driver_api ostentus_api = {
    .ostentus_clear_memory = &clear_memory,
    .ostentus_led_power_set = &led_power_set,
};

Once again, this is a greatly simplified version of the actual API definition.

6. Define the Device Instances

There’s a lot happening in this set but we’re almost done! To tie everything together we must declare a driver compatible and handle the data, configuration, power management, and initialization. All of these parts are tied together with a bit of “macrobatics“.

Declare a Driver Compat

This is incredibly important and easy to miss. Declare a driver compatible that matches your devicetree binding in your c file:

#define DT_DRV_COMPAT golioth_ostentus

Device Data

This device has no need for persistent data. We could do something like store the firmware version the Ostentus faceplate reports, but that can just be read and printed during initialization with no need for storage.

To learn more about handling data, check out any of the sensor drivers in the Zephyr tree for data struct and data initialization.

Power Management

We have not yet implemented power management for this device. Future work might include sending a command that puts the Ostentus in sleep mode, and another to wake it up again.

Examples of power management are available in the Zephyr tree sensor drivers.

Configuration

Configuration info is basically a context for each device instance. This is where the driver will store the i2c bus and address info for Ostentus. The struct is defined in the driver header file.

struct ostentus_config {
    struct i2c_dt_spec i2c;
};

Initialization

The driver will automatically initialize the device, but we must supply the initialization function.

static int ostentus_init(const struct device *dev)
{
    const struct ostentus_config *config = dev->config;

    if (!device_is_ready(config->i2c.bus)) {
        LOG_ERR("I2C bus device not ready");
        return -ENODEV;
    }

    char buf[32];
    int err = version_get(dev, buf, 32);
    if (err) {
        LOG_ERR("Unable to communicate with Ostentus over i2c: %d", err);
        return err;
    } else {
        LOG_INF("Ostentus firmware version: %s", buf);
    }

    return 0;
}

This function gets the i2c bus from the associated config struct and tests to make sure everything is kosher. It then reads and logs the firmware version from the device.

Macros for Device Instances

Now use macros to tie everything together at the bottom of the C file.

#define OSTENTUS_DEFINE(inst)                                      \
    static const struct ostentus_config ostentus_config_##inst = { \
        .i2c = I2C_DT_SPEC_INST_GET(inst),                         \
    };                                                             \
                                                                   \
    DEVICE_DT_INST_DEFINE(inst,                                    \
                  ostentus_init,                                   \
                  NULL,                                            \
                  NULL,                                            \
                  &ostentus_config_##inst,                         \
                  POST_KERNEL,                                     \
                  CONFIG_OSTENTUS_INIT_PRIORITY,                   \
                  &ostentus_api);

DT_INST_FOREACH_STATUS_OKAY(OSTENTUS_DEFINE)

We define a macro that populates the member of the config struct using devicetree information. (This would also be where you would populate data and power management if you have them.)

The DEVICE_DT_INST_DEFINE function passes in the init function, config struct, power management (NULL), data struct (NULL), initialization level, initialization priority, and the address of the API struct. The final macro calls our mega-macro once for each instance of a device encountered in the devicetree.

Using Your Device Driver

So, how do you use this whole thing? It’s very similar to using a sensor in Zephyr. In our case we need to first include the module in west.yml.

manifest:
  projects: 
    - name: libostentus
      path: deps/modules/lib/libostentus
      revision: v2.0.0
      url: https://github.com/golioth/libostentus

Add an instance of Ostentus to the devicetree.

&i2c2 {
    /* Needed for I2C writes used by libostentus */
    zephyr,concat-buf-size = <48>;

    ostentus@12 {
        status = "okay";
        compatible = "golioth,ostentus";
        reg = <0x12>;
    };
};

And then interact with the device in your application:

#include <libostentus.h>

static const struct device *ostentus = DEVICE_DT_GET_ANY(golioth_ostentus);

static int some_function(void)
{
    ostentus_clear_memory(ostentus);
    ostentus_led_power_set(ostentus, 1);
}

Going Deeper

There’s a lot here to digest. While this is a nice walkthrough, the full code is worth your review. All Golioth hardware is open source and that includes the libostentus driver library used as the example in this post.

In 2022 I attended a fantastic talk on custom drivers presented by Gerard Marull Paretas at the Zephyr Developer’s Summit. You can watch the talk recording and also peruse the sample code from that talk. I’d like to extend a personal thank you to Gerard for such an excellent presentation!

What are you building? We’d love hear about the devices for which you’re creating drivers. Start a thread in the Golioth Forum to share the progress of your work!

Golioth’s own Dan Mangum presented a talk at this year’s Embedded Open Source Summit detailing how to use WebAssembly with Zephyr RTOS. For those unfamiliar with WebAssembly, it was conceived as a replacement for JavaScript. So what is it doing in microcontrollers? Dan takes on that question, and covers how to validate whether Wasm on Zephyr is a viable solution for you.

What is WebAssembly?

WebAssembly–aka Wasm–is a portable binary format that can be executed on myriad different systems and architectures. Platforms that support Wasm have a runtime that makes execution possible and this is the case for Zephyr.

The WebAssembly Micro Runtime (wamr for those in the know) already has a Zephyr port that you can try out right now. Wamr delivers a runtime optimized for embedded systems that sandboxes the the Wasm code it is running.

Just build the runtime into your firmware, then supply a new Wasm binary whenever you want to change how that part of the application works. You now have a way to update programs in a safe way without a full firmware update and even without rebooting the hardware.

Why Use Wasm with Zephyr (or any embedded system)?

Dan spends the first half of his talk discussing the criteria used to evaluate tradeoffs in play with WebAssembly. You’re always going to use more resources and take a speed hit compared to native code, that’s no surprise. But especially in cases where dynamic code execution is needed, Wasm checks a lot of boxes like portability and security.

The demonstration implements a temperature threshold mechanism that triggers an alert when readings rise above a certain level. This is basically a hello-world example that shows how native firmware can pass a primitive value into the runtime, and the Wasm code can call native functions (a high-temperature alert log message).

But the secret sauce is the the Wasm binary itself. You could implement a complex algorithmic processing and change that algorithm without a full OTA firmware update. In fact, Dan’s just passing the Wasm binary as a base64-encoded string and restarting the runtime without rebooting the microcontroller. This is done using the Golioth Settings service so it’s available to the entire fleet, but targetable by device or groups of devices.

However, this real time update ability is not the only trick Wasm can pull off.

The Portability of WebAssembly

Sure, it’s very cool to be able to perform a bit of brain surgery on your firmware by loading a new Wasm binary. What boggles the mind is the ability to run that binary on just about any platform imaginable.

A typical IoT installation that uses Golioth has embedded devices in the field, a server with which those devices interact (authentication, data routing, control, etc), and a cloud component to use the data and issue directives to the fleet. Your Wasm binary can be moved and executed on a different part of this system depending on need. While the demo is first run on a Nordic nRF52840 microcontroller, the same binary is shown running on the cloud, and inside of a browser.

Whether during initial development, or to meet changing device constraints or customer needs, sliding the compute from one place to another without major engineering work is a pretty interesting tool to have in your arsenal.

A Wasm Deep Dive

The proof of concept is already there for you to build your own Zephyr-based Wasm experiments. We hope you’ll give Golioth a try for deploying the binary updates to your devices.

For those who want to deeper dive into the world of WebAssembly, Dan’s been busy in that area. Checkout out his post on Understanding Every Byte in a WASM Module.

Golioth is expanding its Reference Design portfolio by adding an OpenThread Demo, a Reference Design based on our known and well-tested Reference Design Template. The purpose of the OpenThread Demo is to add Thread networking capability to the RD Template so anyone using Thread and Golioth can start development immediately, use it as a basis for their project, and take full advantage of Golioth’s Device Management, Data Routing, and Application Service capabilities.

Thread Recap

Thread is an IPv6-based networking protocol designed for low-power Internet of Things devices. It uses the IEEE 802.15.4 mesh network as the foundation for providing reliable message transmission between individual Thread Devices at the link level. The 6LoWPAN network layer sits on top of 802.15.4, created to apply Internet Protocol (IP) to smaller devices. In almost all cases, it’s used to transmit IPv6 Packets.

If you need a network of devices that can communicate with each other and connect to the Internet securely, Thread might be the solution you’re looking for.

Built it yourself

The follow-along guide shows how to build your own OpenThread Demo using widely available off-the-shelf components from our partners. We call this Follow-Along Hardware, and we think it’s one of the quickest and easiest ways to start building an IoT proof-of-concept with Golioth.

Hardware

Every mesh network needs some hardware, and for the OpenThread Demo, you will need a Thread Border Router and a Thread node. This demo doesn’t need additional sensors or an actuator, as there are generated values created by the code in the Reference Design Template (ie simulated values). Later you can modify our other Reference Designs and their hardware to get to a prototype or production device that is more specific to a vertical like Air Quality Monitoring or DC Power Monitoring.

Border Router

A Thread Border Router connects a Thread network to other IP-based networks, such as Wi-Fi or Ethernet, and it configures a Thread network for external connectivity. It also forwards information between a Thread network and a non-Thread network (from Thread nodes to the Internet). The Border Router should be completely invisible to Thread Devices, much like a Wi-Fi router is in a home or corporate network.

In this demo, we use a commercially available GL-S200 Thread Border Router designed for users to host and manage low-power and reliable IoT mesh networks.

GL-S200 provides a simple Admin Panel UI to configure the Border Router and a Topology Graph to see all the end node devices and their relationship. As a bonus, it also does NAT64 translation between IPv6 and IPv4, making it a real plug-and-play solution.

 

Thread Node

Now that the centerpiece of our Thread network is sorted, the next part is a Thread node. In the follow-along guide, we built a Thread node based on the nRF52840 DK. The node is built using Zephyr, and the OpenThread stack will be compiled into it. The GitHub repository used in the guide is open source, so you can build the application yourself, or you can use the pre-built images for the nRF52840 DK or Adafruit Feather nRF52840.

Firmware

Thread node firmware is based on the Reference Design Template, a starting point for all our Reference Designs. With all Golioth features implemented in their basic form, you can now use Device Management, Data Routing, and Application Services with Thread network connectivity.

OTA Updates

Adding Thread support to a device is not cheap, memory-wise. The firmware image is larger than 500kB, and the on-chip flash of the nRF52840 DK has a size of 1MB. Luckily, both the nRF52840 DK and the Adafurit Feather have an external flash chip, making the OTA updates possible. Any custom hardware you create in the future should also follow this model of having external flash mapped to the nRF52840.

To create a secondary partition for MCUBoot in an external flash, we must first enable it in the nrf52840dk_nrf52840.overlay file:

/ { 
    chosen { 
        nordic,pm-ext-flash = &mx25r64; 
    };
};

The CONFIG_PM_EXTERNAL_FLASH_MCUBOOT_SECONDARYKconfig option is set by default to place the secondary partition of MCUboot in the external flash instead of the internal flash (this option should only be enabled in the parent image).

To pass the image-specific variables (device-tree overlay file and Kconfig symbols) to the MCUBoot child image, we need to create a child-image folder in which we  need to update the CONFIG_BOOT_MAX_IMG_SECTORS Kconfig option. This option defines the maximum number of image sectors MCUboot can handle, as MCUboot typically increases slot sizes when external flash is enabled. Otherwise, it defaults to the value used for internal flash, and the application may not boot if the value is set too low. In our case, we updated it to 256in the child_image/mcuboot/boards/nrf52840dk_nrf52840.conf file.

CONFIG_BOOT_MAX_IMG_SECTORS=256

Connecting to Golioth Cloud

Thread nodes utilize IPv6 address space, and the question is how to communicate with IPv4 hosts, such as Golioth Cloud.

Golioth Cloud has an IPv4 address, and the Thread node needs to synthesize the server’s IPv6 address in order to connect to it. OpenThread doesn’t use the NAT64 well-known prefix 64:ff9b::/96; instead, Thread Border Routers publish their dynamically generated NAT64 prefix used by the NAT64 translator in the Thread Network Data. Thread nodes must obtain this NAT64 prefix and synthesize the IPv6 addresses.

While the process of synthesizing IPv6 addresses is automatically handled in the OpenThread CLI when using the Zephyr shell and pinging an IPv4 address (e.g. ot ping 8.8.8.8), it’s important to note that this process needs to be specifically implemented in applications.

As part of the Firmware SDK, the Golioth IPv6 address is automatically synthesized from the CONFIG_GOLIOTH_COAP_HOST_URI Kconfig symbol using the advertised NAT64 prefix by leveraging the OpenThread DNS. Even if the Golioth host URI changes within the SDK, you won’t need to change your application.

Learn more

For detailed information about the OpenThread Demo, check out more details the project page! Additionally, you can drop us a note on our Forum if you have questions about this design. If you would like a demo of this reference design, contact [email protected].