Golioth is expanding its Reference Design portfolio by adding an OpenThread Demo, a Reference Design based on our known and well-tested Reference Design Template. The purpose of the OpenThread Demo is to add Thread networking capability to the RD Template so anyone using Thread and Golioth can start development immediately, use it as a basis for their project, and take full advantage of Golioth’s Device Management, Data Routing, and Application Service capabilities.

Thread Recap

Thread is an IPv6-based networking protocol designed for low-power Internet of Things devices. It uses the IEEE 802.15.4 mesh network as the foundation for providing reliable message transmission between individual Thread Devices at the link level. The 6LoWPAN network layer sits on top of 802.15.4, created to apply Internet Protocol (IP) to smaller devices. In almost all cases, it’s used to transmit IPv6 Packets.

If you need a network of devices that can communicate with each other and connect to the Internet securely, Thread might be the solution you’re looking for.

Built it yourself

The follow-along guide shows how to build your own OpenThread Demo using widely available off-the-shelf components from our partners. We call this Follow-Along Hardware, and we think it’s one of the quickest and easiest ways to start building an IoT proof-of-concept with Golioth.

Hardware

Every mesh network needs some hardware, and for the OpenThread Demo, you will need a Thread Border Router and a Thread node. This demo doesn’t need additional sensors or an actuator, as there are generated values created by the code in the Reference Design Template (ie simulated values). Later you can modify our other Reference Designs and their hardware to get to a prototype or production device that is more specific to a vertical like Air Quality Monitoring or DC Power Monitoring.

Border Router

A Thread Border Router connects a Thread network to other IP-based networks, such as Wi-Fi or Ethernet, and it configures a Thread network for external connectivity. It also forwards information between a Thread network and a non-Thread network (from Thread nodes to the Internet). The Border Router should be completely invisible to Thread Devices, much like a Wi-Fi router is in a home or corporate network.

In this demo, we use a commercially available GL-S200 Thread Border Router designed for users to host and manage low-power and reliable IoT mesh networks.

GL-S200 provides a simple Admin Panel UI to configure the Border Router and a Topology Graph to see all the end node devices and their relationship. As a bonus, it also does NAT64 translation between IPv6 and IPv4, making it a real plug-and-play solution.

 

Thread Node

Now that the centerpiece of our Thread network is sorted, the next part is a Thread node. In the follow-along guide, we built a Thread node based on the nRF52840 DK. The node is built using Zephyr, and the OpenThread stack will be compiled into it. The GitHub repository used in the guide is open source, so you can build the application yourself, or you can use the pre-built images for the nRF52840 DK or Adafruit Feather nRF52840.

Firmware

Thread node firmware is based on the Reference Design Template, a starting point for all our Reference Designs. With all Golioth features implemented in their basic form, you can now use Device Management, Data Routing, and Application Services with Thread network connectivity.

OTA Updates

Adding Thread support to a device is not cheap, memory-wise. The firmware image is larger than 500kB, and the on-chip flash of the nRF52840 DK has a size of 1MB. Luckily, both the nRF52840 DK and the Adafurit Feather have an external flash chip, making the OTA updates possible. Any custom hardware you create in the future should also follow this model of having external flash mapped to the nRF52840.

To create a secondary partition for MCUBoot in an external flash, we must first enable it in the nrf52840dk_nrf52840.overlay file:

/ { 
    chosen { 
        nordic,pm-ext-flash = &mx25r64; 
    };
};

The CONFIG_PM_EXTERNAL_FLASH_MCUBOOT_SECONDARYKconfig option is set by default to place the secondary partition of MCUboot in the external flash instead of the internal flash (this option should only be enabled in the parent image).

To pass the image-specific variables (device-tree overlay file and Kconfig symbols) to the MCUBoot child image, we need to create a child-image folder in which we  need to update the CONFIG_BOOT_MAX_IMG_SECTORS Kconfig option. This option defines the maximum number of image sectors MCUboot can handle, as MCUboot typically increases slot sizes when external flash is enabled. Otherwise, it defaults to the value used for internal flash, and the application may not boot if the value is set too low. In our case, we updated it to 256in the child_image/mcuboot/boards/nrf52840dk_nrf52840.conf file.

CONFIG_BOOT_MAX_IMG_SECTORS=256

Connecting to Golioth Cloud

Thread nodes utilize IPv6 address space, and the question is how to communicate with IPv4 hosts, such as Golioth Cloud.

Golioth Cloud has an IPv4 address, and the Thread node needs to synthesize the server’s IPv6 address in order to connect to it. OpenThread doesn’t use the NAT64 well-known prefix 64:ff9b::/96; instead, Thread Border Routers publish their dynamically generated NAT64 prefix used by the NAT64 translator in the Thread Network Data. Thread nodes must obtain this NAT64 prefix and synthesize the IPv6 addresses.

While the process of synthesizing IPv6 addresses is automatically handled in the OpenThread CLI when using the Zephyr shell and pinging an IPv4 address (e.g. ot ping 8.8.8.8), it’s important to note that this process needs to be specifically implemented in applications.

As part of the Firmware SDK, the Golioth IPv6 address is automatically synthesized from the CONFIG_GOLIOTH_COAP_HOST_URI Kconfig symbol using the advertised NAT64 prefix by leveraging the OpenThread DNS. Even if the Golioth host URI changes within the SDK, you won’t need to change your application.

Learn more

For detailed information about the OpenThread Demo, check out more details the project page! Additionally, you can drop us a note on our Forum if you have questions about this design. If you would like a demo of this reference design, contact [email protected].

 

Embedded systems, like any software system, benefits from modularizing software components, especially as they approach production. In this talk at the Embedded Open Source Summit 2024, Golioth Firmware Lead Sam Friedman talks about how to create “microservices” for microcontrollers. He maps a popular web concept onto existing software characteristics in Zephyr and shows how a real-world example can benefit from truly modular software.

Mapping a web concept to the microcontroller realm

As Sam points out early in this talk, it’s not really about microservices, because that’s a web concept. A microservice on the web is a piece of software, normally deployed onto cloud infrastructure, that can stand alone. It has defined inputs and outputs (APIs) and can operate independent of any other microservice. This helps for scalability and testing, but is a general trend in web software and deploying applications.

Microcontrollers are smaller and traditionally operate more like a “monolith” (another web term) because everything is interconnected. But there are concepts like Inter-Process Communication (IPC), which allows constrained devices to have similar ideas. IPC is a computer science idea that helps to optimize communication inside of operating systems. As it so happens, Zephyr is a (real time) operating system. Let’s look at what these are in practice.

How firmware developers can benefit

Sam describes how the concepts of Tasks, IPC, and Event Tasks are defined and might be used. But it is the Zephyr analogs that highlights familiar features, like the relatively new ZBus methodology. If a user adds a listener on the ZBus, they can listen (subscribe) for a particular value (topic) on the bus and take action based off of it. This helps to make the overall system more modular, because the addition or removal of a feature is not deeply integrated between elements of the system. Instead, the new piece of code is reacting to data put on the bus, which reduces interdependency and improves test areas.

Real-World Example

Sam drives home his point by talking about a Golioth Reference Design like the Cold Chain Asset Tracker and how we can add capabilities like an onboard alarm when we hit a temperature threshold. Previously, this would have required refactoring to also send data from the sensor process to a new module that containes the alarm code. But with something like ZBus, the alarm can simply listen for a topic on ZBus and when the temperature module publishes to that topic, all relevant parties are updated.

This works in the opposite direction as well. Code written with this in mind would not break any future builds if a hardware cost-down removed an element like a front panel display. Instead, the user chooses not to build in that portion of the code (memory savings, yay!) and other parts of the code are not negatively impacted.

Bringing together the Cloud and Embedded Developers

Sam’s talk showcases what Golioth does well: match up the capabilities of the Cloud with the capabilities of an embedded system. Often many of the key ideas from computer science are more onerous to implement on a constrained system like a microcontroller, but Zephyr’s growing software toolbox makes it easier than ever to build a modular, testable system. Check out Sam’s talk above and his slides below for more context into how to build such a system.

 

There are 512 supported boards (according to find -name board.yml | wc -l) already in the Zephyr tree. Most of them are real hardware platforms and the remaining ones are virtual. Why would you bother with a virtual platform? Zephyr can probably build for the SoC or development board of your choice, right? In this post, I’m going to talk about the reasons you want to try out Native Simulator.

Spoiler: Your Zephyr applications development time will drop through the floor.

Zephyr support for virtual platforms

Zephyr comes with support for various virtual platforms, but two of them are most widely used:

  • QEMU
  • Native Simulator

Both are extensively used in Zephyr Continuous Integration pipelines as well as during development by Zephyr users.

QEMU

QEMU is a generic machine emulator. It emulates CPUs by interpreting architecture-specific instructions as well as some peripherals like UART, flash, and networking adapters. Its main advantage is that binary (compiled code) running on QEMU is very similar to the binary that runs on a real hardware. All the low-level instructions, memory-mapped peripheral access, constrained RAM, thread context switching, thread stack sizes, interrupt handling, step-debugging with GDB, and many others mechanisms behave almost the same as on a real microcontroller.

Networking with QEMU can be achieved by setting up a TUN/TAP interface on a Linux host system. Once set up, you attach to the emulated network adapter that is handled by Zephyr drivers. The application is built with Zephyr and has access to the same network as the host machine (like a Linux laptop). After correctly configuring the TUN/TAP interface it is possible to access internet without additional hardware.

Native Simulator

Native Simulator is a POSIX architecture based “board” (Zephyr target) that runs as a standalone Linux executable. It is based on native_simulator and Zephyr POSIX architecture. As opposed to QEMU, it does not need any middle layer that emulates instructions or peripheral access. Instead, Zephyr (under Native Simulator) runs natively on Linux with very little overhead. Most of the time, it’s as fast as any regular Linux application.

However, Native Simulator does not emulate microcontroller peripherals the same way as QEMU does. It has special modules and functions called trampolines. As an example, instead of using memory mapped I/O to handle UART drivers (and logging and shell modules that utilize UART backend) there are trampolines to translate UART access APIs to pseudo-terminal I/Os on the Linux host.

Networking with Native Simulator was possible with TUN/TAP interface. So development experience in terms of IoT applications was similar to QEMU.

The need for offloaded sockets

Issues with TUN/TAP

Networking with QEMU and/or Native Simulator requires root privileges on the host computer in order to create the TUN/TAP network interface. It routes the traffic between Zephyr and the internet. This is a bit of an inconvenience for hackers that have Zephyr SDK installed directly on their Linux workstation. Setting up proper privileges in Docker is possible as well, when such a container is used for development purposes. But what about networking in CI pipelines with GitHub Actions or GitLab CI? The only option to get that working are self-hosted runners.

Use of TUN/TAP interface allows us to test almost the entire Zephyr networking stack, down to the Ethernet layer. However there is no platform-specific driver that talks to an Ethernet phy. Instead, there is a driver that sends Ethernet frames to a virtual TUN/TAP interface that requires setup on the host (e.g. Linux) system. This has advantages like higher code coverage when testing IoT applications.

Unfortunately, there are many disadvantages as well. Setting up TUN/TAP interface requires running as a privileged user on the host system. This might not be an issue on personal PC or laptop. However, root access inside Docker might not always be possible. This is especially true when using existing infrastructure, like GitHub Codespaces, GitHub-hosted runners in GitHub Actions, or hosted GitLab Runners in GitLab.

Offloaded sockets as an alternative

Zephyr has quite a unique feature called socket offloading. This is a mechanism that allows us to utilize (offload to) an external networking stack. Such a stack can be implemented as a 3rd-party library with proprietary drivers that come with a modem. Alternatively, we could use this with an external modem, commonly used with AT commands. In both cases, the contract between the Zephyr application and the offloaded networking stack is socket-level API. One example platform that uses socket offloading is the Nordic nRF9160.

Native Simulator is just a Linux executable. There are no special permissions required to access internet when writing regular Linux programs in C.

What if BSD the compatible sockets API (socket(), connect(), recv(), send(), …) could be exposed to Zephyr when running under Native Simulator? This should be possible with a bunch of trampolines between Zephyr world and Linux world.

Native Simulator Offloaded Sockets

Implementation of socket offloading for Native Simulator was part of a recent hackday project I worked on at Golioth. At the end of day, UDP communication was working, without any setup. This confirmed the idea about networking in Zephyr without root privileges. The next step in the following months was contributing the work to Zephyr with many followup improvements, so that the community can use it.

Development speed

Why should Native Simulator be used for IoT firmware development instead of real hardware? Flashing firmware on a device, connecting to the internet, and then executing application takes a considerable amount of time. This is where Native Simulator with offloaded sockets shines.

Flashing is not part of the testing process when using Native Simulator. Connecting to the internet (e.g. using WiFi or Cellular) is not needed, since the host machine is connected all the time. And lastly, executing application code is much faster on the beefy host machine compared to a very constrained microcontroller.

This is just theory, so let’s look at some timing measurements for those not convinced yet. We’ll use http_getwith TLS with minimal modifications required to get connected to a WiFi Access Point. Modified code is available at https://github.com/mniestroj/zephyr/tree/native-sim-http-get-benchmark.

In this example we’ll use nRF52840DK with ESP32 running ESP-AT firmware. This is what the “flash + execute” process looks like:

Zephyr's http_get on native_sim vs nrf52840dk

This is how much time it took for each platform to run http_get sample (once it was already built):

  • Native Simulator: 0.42 s
  • nRF52840-DK: 16.80 s (flash 10.90 s, run 5.90 s)

Wouldn’t you like to go 40 times faster in your development?

Next steps

Many improvements to Native Simulator Offloaded Sockets were contributed to Zephyr upstream last month. Those will be part of upcoming Zephyr 3.7.0 (planned for release on 2024/07/26). When the Golioth Firmware SDK includes those changes, it will be much faster to develop and test IoT applications.

Recently I was working on upgrading a Zephyr-based project and encountered the worst of debug situations: the device was completely unresponsive after flashing the firmware. Opening a debug session didn’t yield any help, program flow never reached main, and I wasn’t even able to break on the Zephyr kernel initialization functions. What is there to do in this case? If your problems all start before user code, it’s time to check on what the bootloader is doing. Today we’ll take a look at how to debug MCUboot when all else has failed.

Debugging User Code

Debuggers usually help zero-in on bugs pretty quickly. For this project I was targeting a Thingy91 (based on the Nordic nRF9160) using a J-Link programmer, so west attach is all it takes to start the debugger. However, I was unable to get much useful output when starting a debugging session.

Using GDB to debug user code

As you can see, the debugger doesn’t recognize any symbols at the current memory addresses. This matches up with the device being unresponsive, the app hasn’t started running yet. Let’s go deeper and look at the bootloader.

Loading Bootloader Symbols Into the Debugger

The Zephyr build system already built MCUboot as part of the normal compilation process. To debug the bootloader, simply use the file command to load the .elf file from the MCUboot directory.

Loading the MCUboot elf file in GDB

When building a project for the nRF9160 under NCS, the build/mcuboot/zephyr folder contains the bootloader files. By loading the symbols from the .elf file, we have changed from debugging the user app to debugging the bootloader.

Getting a Useful Backtrace

Resetting and running program flow doesn’t lead to a crash, but we can halt after a second and check the backtrace.

MCUboot backtrace shows a panic

From this output it’s much easier to tell why our device is unresponive: mcuboot is in a panic state. That’s helpful but we really need to know why. The next step is to set a breakpoint and walk through the code.

Stepping through MCUboot with GDB

The backtrace shows that the panic happened in main. Let’s debug by setting a breakpoint there and stepping through to find more info.

MCUboot reports that it is unable to find a bootable image

After setting the breakpoint the device is reset and the continue command starts program flow. The next command is then used to run each successive call and it doesn’t take long to get to a very useful log message.

687             BOOT_LOG_ERR("Unable to find bootable image");

MCUboot needs to validate the images it is about to run, so this message indicates the image in the slot is invalid. Upon closer inspection (not shown here), some bug in the build system has allowed the image to be built too large when it should have caused the build to fail. MCUboot is aware of the partition table, and validates the signature cutting off at the hard stop of that partition size. This of course makes the signature check fail.

On some boards, this error message would have been printed out. However, it seems that the default configuration for the Thingy91 doesn’t enable terminal output for MCUboot, so instead of seeing the message we see nothing. With a little know-how, the debugger revealed the reason why.

View the Debugging Process

Sometimes a text overview is a bit hard to follow. You can see the full debugging process in the terminal capture below.

We got an early look at Nordic’s new cellular modem, the nRF9151, and it already works with Golioth!

With any new board, we ask ourselves “can we connect it to Golioth?”. You may remember a similar post when the nRF7002-DK first came out. Of course the answer for these two boards, and pretty much all other network-enabled embedded systems, is: yes, you can use them with Golioth. So today we’ll walk though the experience of connecting the nRF9151 to Golioth for the first time.

What’s new with the nRF9151?

We love the nRF9160 cellular modem and have support for it in all of the Golioth Firmware SDK samples, as well as using it in the Hardware-in-the-Loop (HIL) testing that is connected to our continuous integration infrastructure. So what’s the deal with the new part?

Finger pointing at a small rectangular chip (SOC)Most obviously, it’s really really small. The 9151 is about a 20% size reduction from the 9160 (new dimensions are approximately 11×12 mm). Here you can see it’s smaller than the fingernail on my pointer finger. The smaller sized also delivers lower peak current consumption. As with the recently announced nRF9161, the nRF9151 supports DECT NR+. And Nordic indicates the new design is fully compatible with the existing nRF91 family of chips.

This is also the first Nordic dev board I’ve seen that uses a USB-C connector. While you can’t get your hands on one of these just yet, since Golioth is partners with Nordic they were kind enough to send us one of these nRF9151 Development Kits to take for a test drive.

Building Golioth examples with the nRF9151

This board is not yet available to order, but support has already been added to Zephyr. To get it working with Golioth, we needed a fix that Nordic merged after their v2.6.1 release of the nRF Connect SDK (NCS). So today I’ll be checking out a commit in between releases. When Nordic releases v2.7.0 everything will work without this extra step.

0. Install the Golioth Firmware SDK

You will need an NCS build environment along with the Golioth SDK. You can follow the Golioth Docs to install an NCS workspace, or add Golioth to your existing NCS workspace.

1. Update NCS version (if needed)

If you are using NCS v2.7.0 (not yet released at the time of writing) or newer, you can skip this step. Otherwise, edit your west manifest and update the NCS version. Below is the west-nrf.yml file from the Golioth SDK with the changed line highlighted.

manifest:
  projects:
    - name: nrf
      revision: 85097eb933d93374fe270ce4c004bea10ee80e97
      url: http://github.com/nrfconnect/sdk-nrf
      import: true

  self:
    path: modules/lib/golioth-firmware-sdk

This happened to be the commit at the tip of main when writing this post. We usually recommend against targeting commits in between releases, so consider this experimental.

2. Add a board Kconfig file for the nRF9151

Add the board-specific configuration to the boards’ directory. For today’s post, I’m building the Golioth stream sample so I’ve added this nrf9151dk_nrf9151_ns.conf board file to that sample directory.

# General config
CONFIG_HEAP_MEM_POOL_SIZE=4096
CONFIG_NEWLIB_LIBC=y

# Networking
CONFIG_NET_SOCKETS_OFFLOAD=y
CONFIG_NET_IPV6=y
CONFIG_NET_IPV6_NBR_CACHE=n
CONFIG_NET_IPV6_MLD=n

# Increase native TLS socket implementation, so that it is chosen instead of
# offloaded nRF91 sockets
CONFIG_NET_SOCKETS_TLS_PRIORITY=35

# Modem library
CONFIG_NRF_MODEM_LIB=y
CONFIG_NRF_MODEM_LIB_ON_FAULT_APPLICATION_SPECIFIC=y

# LTE connectivity with network connection manager
CONFIG_NRF_MODEM_LIB_NET_IF=y
CONFIG_NRF_MODEM_LIB_NET_IF_AUTO_START=y
CONFIG_NRF_MODEM_LIB_NET_IF_AUTO_CONNECT=y
CONFIG_NRF_MODEM_LIB_NET_IF_AUTO_DOWN=y

CONFIG_NET_CONNECTION_MANAGER=y
CONFIG_NET_CONNECTION_MANAGER_MONITOR_STACK_SIZE=1024

# Increased sysworkq size, due to LTE connectivity
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=2048

# Disable options y-selected by NCS for no good reason
CONFIG_MBEDTLS_KEY_EXCHANGE_DHE_PSK_ENABLED=n
CONFIG_MBEDTLS_KEY_EXCHANGE_DHE_RSA_ENABLED=n

# Generate MCUboot compatible images
CONFIG_BOOTLOADER_MCUBOOT=y

3. Build the Golioth stream sample

Building and running this sample is now quite simple. I have included the option to use runtime credentials in this build so that we can provision the device from the Zephyr shell.

$ cd examples/zephyr/stream
$ west build -b nrf9151dk/nrf9151/ns -- -DEXTRA_CONF_FILE=../common/runtime_settings.conf
$ west flash

4. Provision and run the sample

Golioth is free for individual use so sign up for an account if you have not already done so. After creating a project and device we can provision the PSK-ID/PSK by opening a serial connection to the device.

uart:~$ settings set golioth/psk-id <your-psk-id>
uart:~$ settings set golioth/psk <your-psk>

Here’s the terminal output during my tests:

*** Booting nRF Connect SDK v2.6.99-85097eb933d9 ***
*** Using Zephyr OS v3.6.99-18285a0ea4b9 ***
[00:00:00.538,452] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.538,482] <inf> fs_nvs: alloc wra: 0, fb8
[00:00:00.538,482] <inf> fs_nvs: data wra: 0, 68
[00:00:00.538,879] <dbg> golioth_stream: main: Start Golioth stream sample
[00:00:00.539,001] <inf> golioth_samples: Bringing up network interface
[00:00:00.539,001] <inf> golioth_samples: Waiting to obtain IP address
[00:00:01.691,894] <inf> lte_monitor: Network: Searching
uart:~$ settings set golioth/psk-id 20240603190757-nrf9151dk@nrf9151-demo
Setting golioth/psk-id to 20240603190757-nrf9151dk@nrf9151-demo
Setting golioth/psk-id saved as 20240603190757-nrf9151dk@nrf9151-demo
uart:~$ settings set golioth/psk e487ea809e5fa705c2af4050150f822c
Setting golioth/psk to e487ea809e5fa705c2af4050150f822c
Setting golioth/psk saved as e487ea809e5fa705c2af4050150f822c
[00:01:10.748,168] <inf> lte_monitor: Network: Registered (roaming)
[00:01:10.748,901] <inf> golioth_mbox: Mbox created, bufsize: 1232, num_items: 10, item_size: 112
[00:01:12.994,964] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:01:12.995,025] <inf> golioth_stream: Sending temperature 20.000000 (sync)
[00:01:12.995,269] <inf> golioth_stream: Golioth client connected
[00:01:12.995,269] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:01:13.543,975] <dbg> golioth_stream: temperature_push_cbor: Temperature successfully pushed
[00:01:18.544,067] <inf> golioth_stream: Sending temperature 20.500000 (async)
[00:01:20.953,582] <wrn> golioth_coap_client: Resending request 0x2001e2c0 (reply 0x2001e308) (retries 2)
[00:01:23.544,311] <inf> golioth_stream: Sending temperature 21.000000 (sync)
[00:01:25.544,677] <wrn> golioth_stream: Failed to push temperature: 9
[00:01:25.772,186] <wrn> golioth_coap_client: Resending request 0x2001e2c0 (reply 0x2001e308) (retries 1)
[00:01:25.947,631] <wrn> golioth_coap_client: Resending request 0x2001e440 (reply 0x2001e488) (retries 2)
[00:01:30.544,738] <inf> golioth_stream: Sending temperature 21.500000 (async)
[00:01:30.581,359] <dbg> golioth_stream: temperature_async_push_handler: Temperature successfully pushed
[00:01:30.949,401] <dbg> golioth_stream: temperature_async_push_handler: Temperature successfully pushed
[00:01:35.544,952] <inf> golioth_stream: Sending temperature 22.000000 (sync)
[00:01:36.326,812] <dbg> golioth_stream: temperature_push_cbor: Temperature successfully pushed
[00:01:41.326,873] <inf> golioth_stream: Sending temperature 22.500000 (async)
[00:01:42.582,946] <dbg> golioth_stream: temperature_async_push_handler: Temperature successfully pushed
[00:01:46.327,117] <inf> golioth_stream: Sending temperature 23.000000 (sync)
[00:01:46.947,204] <dbg> golioth_stream: temperature_push_cbor: Temperature successfully pushed
[00:01:51.947,296] <inf> golioth_stream: Sending temperature 23.500000 (async)
[00:01:52.718,261] <dbg> golioth_stream: temperature_async_push_handler: Temperature successfully pushed
[00:01:56.947,540] <inf> golioth_stream: Sending temperature 24.000000 (sync)
[00:01:57.663,665] <dbg> golioth_stream: temperature_push_cbor: Temperature successfully pushed
[00:02:02.663,726] <inf> golioth_stream: Sending temperature 24.500000 (async)
[00:02:03.725,708] <dbg> golioth_stream: temperature_async_push_handler: Temperature successfully pushed
[00:02:07.663,970] <inf> golioth_stream: Sending temperature 25.000000 (sync)
[00:02:08.589,111] <dbg> golioth_stream: temperature_push_cbor: Temperature successfully pushed
[00:02:13.589,172] <inf> golioth_stream: Sending temperature 25.500000 (async)

5. View the data sent from the device

In the Golioth web console I can navigate to the LightDB Stream tab for the device and see the data as it arrives on the cloud. Try out Pipelines to transform and send that data to a destination.

A table of temperature data displayed on the Golioth web console

What will you do with the nRF9151?

We see a lot of IoT deployments using the nRF9160 to provide a cellular connection. They’re versatile parts with plenty of peripherals. The new nRF9151 part number is nice for your board footprint, and your power budget. And of course, every fleet needs management and data handling. Golioth already works with this SoC and so many more!

Golioth will be joining our friends at Digikey on June 13th to talk about “Leveraging Zephyr to enable super-flexible IoT designs”.

Digikey is where we source many of the parts for our custom hardware and where we often order development boards for putting together demos. If you’ve seen our “Follow Along Hardware”, the product SKUs revolve around Digikey stock.

So we thought it would be a great opportunity to showcase just how many different boards, chips, and sensors we can control using the same base Zephyr code, while sending fleet back to the Golioth cloud.

What we will cover

In the upcoming webinar, we’ll cover:

  • The basics of Zephyr RTOS and how to get started designing quickly
  • How one code base can serve designs from 3 different microcontroller vendors with 3 different types of connectivity and two different sensor vendors!
    • NXP, Espressif, Nordic processors
    • Ethernet, Wi-Fi, Cellular Communications
    • Sensors from Infineon and Bosch
  • How to utilize Cloud services to deliver interesting features to a product with a single SDK install
  • How Golioth’s end-to-end Reference Designs can jumpstart your own IoT designs
  • How the recently announced Pipelines feature will enable even more flexibility in designs

How to register

Register for the event using this link. You will also be able to get access to the recording if you can’t make the live event, though Golioth staff be available for live Q&A directly after our presentation.

Yesterday I was upgrading a Golioth Reference Design to the newest version of the Golioth Firmware SDK and I encountered a network error I had never seen before. I was able to track it down fairly quickly using the debugging tools built into Zephyr. This process is quite handy, so today I’ll walk through how to debug a network error in Zephyr using GDB to help others hone their embedded debugging skills.

Encountering an Error

There are two errors shown below. The first is expected: the cell modem is not yet connected to the network so sending data will fail. But soon after the connection is established there is a second error highlighted below.

*** Booting nRF Connect SDK v2.5.2 ***
[00:00:00.465,942] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.465,942] <inf> fs_nvs: alloc wra: 0, fb8
[00:00:00.465,972] <inf> fs_nvs: data wra: 0, 68
[00:00:00.466,308] <dbg> golioth_powermonitor: main: Start Power Monitor Reference Design
[00:00:00.466,339] <inf> golioth_powermonitor: Firmware version: 1.2.0
[00:00:00.472,991] <inf> golioth_powermonitor: Modem firmware version: mfw_nrf9160_1.3.1
[00:00:00.474,456] <inf> golioth_powermonitor: Connecting to LTE, this may take some time...
[00:00:00.531,127] <inf> app_sensors: Device: ina260@40, 4.980000 V, 0.335000 A, 1.659999 W
[00:00:00.531,219] <inf> app_sensors: Device: ina260@41, 5.117499 V, 0.000000 A, 0.000000 W
[00:00:00.531,250] <dbg> app_sensors: app_sensors_read_and_stream: Ontime:      (ch0): 1        (ch1): 0
[00:00:00.531,463] <err> app_sensors: Failed to send sensor data to Golioth: 5
[00:00:02.397,918] <inf> lte_monitor: Network: Searching
[00:00:03.772,033] <inf> lte_monitor: Network: Registered (roaming)
[00:00:03.772,521] <inf> golioth_mbox: Mbox created, bufsize: 1232, num_items: 10, item_size: 112
[00:00:03.773,223] <inf> golioth_fw_update: Current firmware version: main - 1.2.0
[00:00:06.010,528] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:00:06.010,803] <inf> golioth_powermonitor: Golioth client connected
[00:00:06.010,833] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:00:06.421,752] <dbg> app_state: async_handler: State successfully set
[00:00:06.443,542] <err> net_coap: 16 is > sizeof(coap_option->value)(12)!
[00:00:06.443,572] <dbg> app_state: app_state_desired_handler: desired
                                    66 61 6c 73 65                                   |false
[00:00:06.533,081] <inf> app_sensors: Device: ina260@40, 4.982499 V, 0.331250 A, 1.649999 W
[00:00:06.533,203] <inf> app_sensors: Device: ina260@41, 5.115000 V, 0.001250 A, 0.000000 W
[00:00:06.533,233] <dbg> app_sensors: app_sensors_read_and_stream: Ontime:      (ch0): 6003     (ch1): 1
[00:00:06.536,865] <inf> app_settings: Set loop delay to 10 seconds
[00:00:06.538,330] <inf> app_sensors: Device: ina260@40, 4.980000 V, 0.335000 A, 1.679999 W
[00:00:06.538,482] <inf> app_sensors: Device: ina260@41, 5.113749 V, 0.001250 A, 0.000000 W
[00:00:06.538,513] <dbg> app_sensors: app_sensors_read_and_stream: Ontime:      (ch0): 6008     (ch1): 6
[00:00:06.942,932] <dbg> app_settings: on_loop_delay_setting: Received LOOP_DELAY_S already matches local value.
[00:00:06.946,136] <dbg> app_settings: on_loop_delay_setting: Received LOOP_DELAY_S already matches local value.
[00:00:06.949,096] <dbg> app_settings: on_loop_delay_setting: Received LOOP_DELAY_S already matches local value.
[00:00:06.992,187] <inf> golioth_rpc: RPC observation established
[00:00:06.993,041] <inf> golioth_fw_update: Waiting to receive OTA manifest
[00:00:07.365,142] <dbg> app_sensors: get_cumulative_handler: Decoded: ch0: 1579017, ch1: 790405
[00:00:07.365,722] <dbg> app_state: async_handler: State successfully set
[00:00:07.367,553] <dbg> app_sensors: get_cumulative_handler: Decoded: ch0: 1579017, ch1: 790405
[00:00:07.368,133] <dbg> app_state: async_handler: State successfully set
[00:00:07.765,747] <inf> golioth_fw_update: Received OTA manifest
[00:00:07.765,777] <inf> golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
[00:00:07.765,808] <inf> golioth_fw_update: Waiting to receive OTA manifest

Hmmm, I wonder where this error came from?

<err> net_coap: 16 is > sizeof(coap_option->value)(12)!

I’ve never seen an error of this type before. Zephyr logging puts a logging tag at the beginning of each message and net_coap isn’t one that comes to mind. I started troubleshooting by “grepping”, or searching all files in a directory tree, for this message.

➜ rg net_coap
zephyr/subsys/net/lib/coap/coap.c
8:LOG_MODULE_REGISTER(net_coap, CONFIG_COAP_LOG_LEVEL);
1903:void net_coap_init(void)

Seven different files were returned by rg (that’s ripgrep, which is just a different flavor of grep), but the first one is obviously what we want. You can see the exact net_coap name registered as the logging module tag.

Looking inside that file, I searched for the error message. I only searched for the > sizeof part of the error message since the rest is likely being added to the log using string substitution.

if (option) {
    /*
     * Make sure the option data will fit into the value field of
     * coap_option.
     * NOTE: To expand the size of the value field set:
     * CONFIG_COAP_EXTENDED_OPTIONS_LEN=y
     * CONFIG_COAP_EXTENDED_OPTIONS_LEN_VALUE=<size>
     */
    if (len > sizeof(option->value)) {
        NET_ERR("%u is > sizeof(coap_option->value)(%zu)!",
            len, sizeof(option->value));
        return -EINVAL;
    }

Now we’re getting somewhere. The Zephyr contributor who worked on this code was even kind enough to leave comments on how to fix the error. However, I want to know what is causing the issue in the first place since I’m unfamiliar with this failure. Let’s use the debugger!

Using GDB to Debug Zephyr

Many boards that work with Zephyr have debugging support built right into the ecosystem. From the same directory where the build was run, I can run west attach to start GBD.

In GDB, first type mon reset to prepare the device to start from the beginning of the program. I know from the excerpt above that the error message is printed out from line 590 in the coap.c file. We can use the command b coap.c:590 to set a breakpoint, and start running using c for continue.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/mike/golioth-compile/reference-design-dc-power-monitor/app/build/zephyr/zephyr.elf...
Remote debugging using :2331
arch_cpu_idle () at /home/mike/golioth-compile/reference-design-dc-power-monitor/deps/zephyr/arch/arm/core/aarch32/cpu_idle.S:143
143             cpsie   i
(gdb) mon reset
Resetting target
(gdb) b coap.c:590
Breakpoint 1 at 0x2139e: file /home/mike/golioth-compile/reference-design-dc-power-monitor/deps/zephyr/subsys/net/lib/coap/coap.c, line 590.
(gdb) c
Continuing.

Breakpoint 1, parse_option (data=0x20013bd1 <rx_buffer> "hE\212\361\206)\035渠.>a\002R.d\adesired\r\003reset_cumulative\377false", offset=<optimized out>, pos=pos@entry=0x2001a54c <golioth_thread_stacks+5708>, max_len=<optimized out>, opt_delta=opt_delta@entry=0x2001a54e <golioth_thread_stacks+5710>, opt_len=opt_len@entry=0x2001a54a <golioth_thread_stacks+5706>, option=option@entry=0x2001a578 <golioth_thread_stacks+5752>) at /home/mike/golioth-compile/reference-design-dc-power-monitor/deps/zephyr/subsys/net/lib/coap/coap.c:590
590                             NET_ERR("%u is > sizeof(coap_option->value)(%zu)!",
(gdb)

Great, we stopped where the error message is printed. At this point I want to know what my program was doing leading up to this moment. For this we can view the backtrace by typing bt.

(gdb) bt
#0  parse_option (data=0x20013bd1 <rx_buffer> "hE\212\361\206)\035渠.>a\002R.d\adesired\r\003reset_cumulative\377false", offset=<optimized out>,
    pos=pos@entry=0x2001a54c <golioth_thread_stacks+5708>, max_len=<optimized out>, opt_delta=opt_delta@entry=0x2001a54e <golioth_thread_stacks+5710>,
    opt_len=opt_len@entry=0x2001a54a <golioth_thread_stacks+5706>, option=option@entry=0x2001a578 <golioth_thread_stacks+5752>)
    at /home/mike/golioth-compile/reference-design-dc-power-monitor/deps/zephyr/subsys/net/lib/coap/coap.c:590
#1  0x0004f8bc in coap_find_options (cpkt=cpkt@entry=0x20020950, code=code@entry=23, options=options@entry=0x2001a578 <golioth_thread_stacks+5752>, veclen=veclen@entry=1)
    at /home/mike/golioth-compile/reference-design-dc-power-monitor/deps/zephyr/subsys/net/lib/coap/coap.c:907
#2  0x0004fa30 in coap_get_option_int (cpkt=cpkt@entry=0x20020950, code=code@entry=23)
    at /home/mike/golioth-compile/reference-design-dc-power-monitor/deps/zephyr/subsys/net/lib/coap/coap.c:1282
#3  0x00031308 in golioth_coap_req_reply_handler (req=req@entry=0x20021558, response=response@entry=0x20020950)
    at /home/mike/golioth-compile/reference-design-dc-power-monitor/deps/modules/lib/golioth-firmware-sdk/src/zephyr_coap_req.c:180
#4  0x00055e38 in golioth_coap_req_process_rx (client=client@entry=0x20020518, rx=rx@entry=0x20020950)
    at /home/mike/golioth-compile/reference-design-dc-power-monitor/deps/modules/lib/golioth-firmware-sdk/src/zephyr_coap_req.c:362
#5  0x000326be in golioth_process_rx_data (len=<optimized out>, data=<optimized out>, client=0x20020518)
    at /home/mike/golioth-compile/reference-design-dc-power-monitor/deps/modules/lib/golioth-firmware-sdk/src/coap_client_zephyr.c:866
#6  golioth_process_rx (client=0x20020518) at /home/mike/golioth-compile/reference-design-dc-power-monitor/deps/modules/lib/golioth-firmware-sdk/src/coap_client_zephyr.c:949
#7  golioth_coap_client_thread (arg=0x20020518) at /home/mike/golioth-compile/reference-design-dc-power-monitor/deps/modules/lib/golioth-firmware-sdk/src/coap_client_zephyr.c:1092
#8  0x0004d8a8 in z_thread_entry (entry=0x566ad <golioth_thread_main>, p1=<optimized out>, p2=<optimized out>, p3=<optimized out>)
    at /home/mike/golioth-compile/reference-design-dc-power-monitor/deps/zephyr/lib/os/thread_entry.c:48
#9  0xaaaaaaaa in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

The backtrack places the most recent function call at the top in position #0. Looking down the list I can see that starting on line #3 the Golioth SDK is calling a Zephyr CoAP function. Walking back through those function calls I established that we received a CoAP packet and are trying to decode the options stored in that packet.

I don’t really need to know how all of that packet handling is done… what is more important to me is to see the packet itself to help illuminate why there’s an option in it that is too big for the configured space. Luckily, GDB lets us look at what’s stored in memory.

Using GDB to Inspect Data in Memory

If we look at the coap.c sourcecode, we find the breakpoint we set is inside the of the parse_option function.

static int parse_option(uint8_t *data, uint16_t offset, uint16_t *pos,
            uint16_t max_len, uint16_t *opt_delta, uint16_t *opt_len,
            struct coap_option *option)

This has a data array as a parameter that likely has our coap packet in it. We can print this out to see the data. It’s as simple as p data, with data being the name of the variable.

(gdb) p data
$1 = (uint8_t *) 0x20013bd1 <rx_buffer> "hE\212\361\206)\035渠.>a\002R.d\adesired\r\003reset_cumulative\377false"

(Note: yes, That 渠 is what GDB actually outputs. Binary data sometimes has weird consequences, especially when there are unicode characters for symbols that match)

We’re getting somewhere, but this is not all that useful since it was printed as ASCII values instead of showing the actual hexadecimal data. Let’s print that out.

(gdb) p/x data@max_len
value has been optimized out

The p/x data@max_len command tells GDB to print hexidecimal data from an array called data and to use the max_len variable to determine how many bytes to print. But it looks like we’re stymied by the optimization of the program.

The max_len of the data array has already been optimized out and is unavailable to us. The next thing to do is to print out an arbitrary number of bytes by guessing at the length of the data array. Since we were already able to print it I’m guess it’s about 64 bytes and then using the ASCII values of the final parts of that string to figure out where the data actually ends:

(gdb) x/64xb data
0x20013bd1 <rx_buffer>: 0x68    0x45    0x8a    0xf1    0x86    0x29    0x1d    0xe6
0x20013bd9 <rx_buffer+8>:       0xb8    0xa0    0x2e    0x3e    0x61    0x02    0x52    0x2e
0x20013be1 <rx_buffer+16>:      0x64    0x07    0x64    0x65    0x73    0x69    0x72    0x65
0x20013be9 <rx_buffer+24>:      0x64    0x0d    0x03    0x72    0x65    0x73    0x65    0x74
0x20013bf1 <rx_buffer+32>:      0x5f    0x63    0x75    0x6d    0x75    0x6c    0x61    0x74
0x20013bf9 <rx_buffer+40>:      0x69    0x76    0x65    0xff    0x66    0x61    0x6c    0x73
0x20013c01 <rx_buffer+48>:      0x65    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x20013c09 <rx_buffer+56>:      0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00

The x/64xb data command prints out exactly what we’re after. In GDB the x command prints out memory contents (I always remember this as “examine”). The slash (/) adds the additional commands to print 64 hexidecimal (x) bytes (b) starting from the pointer address named data.

Decoding the CoAP Packet

After just a bit of cleanup, I have the data I’m after but it’s certainly not human readable. I like to use a site called Koap Online CoAP Decoder to take care of this for me:

When we go back to the original error message, the option that is too long is 16 characters. From the decoding above we see that the third option is a path called reset_cumulative that is 16 characters long. This is too long for the 12 character buffer we have configured in the Zephyr CoAP library!

I did this to myself! The application I’m working on is observing a Golioth LightDB State path and I chose a long name:

I followed the advice from the code comments in the Zephyr file and that fixed things right up.

# Adjust coap setting for a long (16-char) LightDB State sub-path
CONFIG_COAP_EXTENDED_OPTIONS_LEN=y
CONFIG_COAP_EXTENDED_OPTIONS_LEN_VALUE=16

Make the Debugger Your Go-To

The worst part about using a debugger is usually setting things up. But in many cases, that work has already been done for you in the Zephyr ecosystem. Try out these skills the next time an unfamiliar error pops up in your embedded development work!

Manufacturing is marathon, not a sprint. Zephyr RTOS includes numerous features to help you at every step along the way, from initial prototype, to maintaining your hardware fleet in the field. Golioth’s Developer Releations lead, Chris Gammell, spoke on this topic at the 2024 Embedded Open Source Summit.

Chris’ approach boils down to breaking manufacturing into five distinct phases:

Golioth - going to production with hardware

  1. Early prototype
  2. Custom hardware
  3. First device in production
  4. Scaling production
  5. Maintaining a scaled fleet

The challenges of each phase exist whether or not you’re using Zephyr. But this RTOS has good tools you should utilize to smooth out many wrinkles. Let’s walk through each phase to see what is involved. The full set of talk slides is available at the bottom of this post.

Early Prototyping on Dev Boards

Chris always starts his prototyping out with commercially available development boards when possible. This means the hardware is in a known working state. Even if you haven’t finalized all of your hardware choices, Zephyr offers great portability so you can relatively easily change to a different part without the need to scrap your early work.

Zephyr also offers a number of tools for early tinkering. The menuconfig system is excellent to explore the configuration options available for the peripherals you have chosen. And the Zephyr shell is fantastic when validating new parts. For instance, the sensor and i2c shells facilitate live interaction with your sensors before getting down to the business of writing C code. Read about Golioth community member Timon switching over to Zephyr for prototyping.

Custom HW, First Pilot

As you move into your first pilot, this will likely be the first time you stand up custom hardware to ensure the system design works. Take time here to validate all of the parts in the design. Now is when you should be looking to see you have the feature coverage necessary to meet your needs. Confirm that the parts you have on the board are all needed, and ditch the ones that aren’t.

This is also a great time to begin planning for how you will test and provision each device. What kind of test points do you need? Ensure you’ve correctly routed the programming header and test placement for quick work during manufacturing.

Zephyr’s debugging features come into play during here. Consider the best setup for Zephyr’s logging system, whether that’s just turning it on and off, or changing up backends like the Golioth logging backend that sends logs to the cloud. Give thread-aware debugging suites like Ozone and Systemview a try before you need them. You’ll get a ton of insight to how your system is performing before a showstopper forces you to!

First Devices in Production

Pick a number, maybe that’s 100, of devices to join your first manufacturing run. This will be the first glimpse you have into some of the issues that will surface when you scale your production.

At this point, Chris likes to reach for the Zephyr board definitions and makes use of the support for board revisions. When peculiar behavior happens, the ease of compiling the same code for two different board revisions will help you discover if it’s something that’s always been there, or just arrived at the party.

commands to build firmware for different revisions of a board

Now is the time to set up your hardware-in-the-loop testing. You need to move fast and manual testing is the opposite of that. It’s not too late to adapt hardware for automated testing and you’ll thank yourself later. Once you have a programming and serial interface to the boards, Zephyr will swoop in with Twister and pytest that can be run on every PR and merge to catch problems early and run cycle tests far more frequently than you would otherwise.

Finally, don’t forget to plan for how you will perform firmware updates. Sure, you can plug USB cables into the 100 units you have in front of you, but that’s going to get really old when you do two patch releases in the same week. Don’t wait until you start to scale, set up your OTA updates now so you can begin testing automatic updates. With Golioth, you can do OTA from day one!

Scaling Production

This is it, time to turn the process up to 11 and start churning out boards. Smart decisions now will have a huge impact on your bottom line, so firm up those decisions on whether or not you need the top chip version in the family or can take it down a notch or two.

Consider how your choices affect cost after production. For instance, power budget is often a very large consideration. Zephyr includes a Power Management that should be used for battery-operated devices.

Network bandwidth is more directly related to monetary cost; optimizing your data usage leads to lower cellular and data usage bills. Even small savings scale! Consider configuring log levels to disable debug and info messages during normal operation. The Golioth settings service or an RPC can be used to remotely configure this. The same is true for what data is being streamed back to the servers and how frequently. We also recommend implementing a reboot RPC as a simple version of the “have you tried turning it off and back on again?” adage.

Chris touches on the topic of developing test stands for use during manufacturing. These interface with your hardware, and may work in conjunction with custom Zephyr shell commands to control the device during tests.

Maintaining a Scaled Fleet

You haven’t really crossed the finish line until your deployed devices reach their usable lifetime in the field. This means maintenance as myriad different operating conditions are sure to turn up unknown behavior.

If you followed Chris’ guidance in previous steps, your OTA update system is already in place and can be utilized to push out updates to address problems. Be sure to take advantage of simple things like Zephyr’s watchdog subsystem for automatic reboot when all else fails. But ultimately you want to fix the problems in place, so leveraging core dumps, and perhaps pushing fixes outside of full updates using the LLEXT feature in Zephyr is worth a look.

Slides

Give Chris’ talk a shot. There’s a ton of useful information there, whether this is your first rodeo or you’ve been rolling boards off of the production line since Chris was still in diapers. Manufacturing is defined by change. Embrace that concept and you’ll never be left behind.

Slides are below, the video is embedded at the top of the post.

How fast can a new Zephyr user go from zero to successful code compilation?

Golioth continues to invest in Zephyr RTOS capabilities: we have a super strong device SDK that works right on top of the Zephyr SDK, our Reference Designs are built on top of Zephyr, and we continue to train engineers every month on how to get booted quickly with Zephyr. When a fresh face walks through the door of our virtual training, we want to ensure a good experience for training.

In this talk at the Embedded Open Source Summit 2024, Firmware Engineer Mike Szczys describes how we utilize Dev Containers and GitHub Codespaces to provide a pre-installed Zephyr toolchain so that users have a seamless getting-started experience.

An evolution of training

Golioth has held a number of virtual training events using Kasm, something we have written about in the past (and gave a talk about at last year’s EOSS). We liked this method because it provided a pre-installed toolchain in a virtual environment. But it also included a “virtual desktop” environment and was built on top of generic “always on” cloud compute. Any time we did the training we would spin up new EC2 instances to serve users.

The problem is that the utilization was often low and we didn’t keep these resources available. Moreover, the users couldn’t take the work with them: when the session was over, their work was destroyed.

GitHub Codespaces fixes this up front: it’s a configuration script and setup that can deploy a full development environment onto a user’s own GitHub account. If someone wants to continue their work later in the day/week/month, they boot the Codespace back up and pick up where they left off. It’s also possible to utilize the setup to extend to their own projects, or implement interesting remote testing. And you can take it with you, because it’s build on the open standard of development containers.

Development Containers (or just: Dev Containers)

Utilizing Dev Containers only takes a couple of lines of configuration. However, understanding the pieces can be tricky, especially if you’re new to containers generally. Let’s peek at the main .devcontainer file:

{
  "image": "golioth/golioth-zephyr-base:0.16.3-SDK-v0",
  "workspaceMount": "source=${localWorkspaceFolder},target=/zephyr-training/app,type=bind",
  "workspaceFolder": "/zephyr-training",
  "onCreateCommand": "bash -i /zephyr-training/app/.devcontainer/onCreateCommand.sh",
  "remoteEnv": { "LC_ALL": "C" },
  "customizations": {
    "vscode": {
      "settings": {
        "cmake.configureOnOpen": false,
        "cmake.showOptionsMovedNotification": false,
        "C_Cpp.default.compilerPath": "/opt/toolchains/zephyr-sdk-0.16.3/arm-zephyr-eabi/bin/arm-zephyr-eabi-gcc",
        "C_Cpp.default.compileCommands": "/zephyr-training/app/build/compile_commands.json",
        "git.autofetch": false
      },
      "extensions": [
        "ms-vscode.cpptools-extension-pack",
        "nordic-semiconductor.nrf-devicetree",
        "nordic-semiconductor.nrf-kconfig"
      ]
    }
  }
}

The top 3 lines of the .devcontainer file we use for training includes 3 critical elements:

  1.  A base level image with Zephyr pre-installed on top of a minimal Linux environment
  2. Where to mount the local workspace folder inside this container
  3. The name of the workspace folder and where to drop the user when they boot a container

A critical element having the toolchain pre-installed on a Debian image. The nice part is that you can utilize others’ containers, including ours! If you want to customize containers for your own builds, you might need to learn some new items.

Other customizations allow you to set up how you’d like VS Code to look and which extensions you’d like to have available on boot.

Extending with onCreateCommand.sh

You can add other capabilities once the box has been initiated and is building. This includes pulling in repository information and setting up .bashrc elements that you might find useful from the command line.

#!/bin/bash

west init -l app
west update
west zephyr-export
pip install -r deps/zephyr/scripts/requirements.txt
echo "alias ll='ls -lah'" >> $HOME/.bashrc
west completion bash > $HOME/west-completion.bash
echo 'source $HOME/west-completion.bash' >> $HOME/.bashrc
history -c

It’s key that the west environment was set up in the initial container, because this now utilizes those tools just like you would when setting up a local Zephyr environment on your computer using our docs.

Binary Downloads As The Main Output

If there’s one argument against this method, it’s that the output of the build is not directly loaded to your development board. We’re working on some capabilities to extend that (see the video for more details), but right now you need to download the binary (zephyr.hex) and load it to your board using a programming tool like nRF Connect For Desktop. That makes this method less optimal for rapid fire debugging. However, it’s possible to debug your program using GDB or something like Segger Ozone. It’s not the same “click to debug” experience that many hardware and firmware engineers are used to from vendor IDEs. We think the ability to get started in 60 seconds is worth the tradeoff.

Slides

View Mike’s slides below and watch the video for an in-depth look at how to create your own Codespaces installation and start with Zephyr today.

Golioth recently added the RAK5010 as a board that works “out of the box” with all of the Golioth Firmware SDK samples. This dev board offers a Nordic nRF52840 connected to a Quectel BG95 cellular modem–which is based on Qualcomm technology. This is quite a useful combination for a cell-connected sensor platform, GNSS/GPS asset tracker, Bluetooth gateway, and any number of other applications.

Honestly, a colossal number of boards work with Golioth thanks to Zephyr’s cross-vendor support. It’s just a matter of ensuring the board configuration is in place. Let’s dive into that today, to take a look at how we configured the RAK5010 for Zephyr and how to connect a programmer to flash the firmware.

Connecting a programmer to the RAK5010

The RAK5010 is programmed using JTAG over the ARM Single Wire Debug (SWD). There are a few things to remember with this configuration:

  • GPIO signals are 1.8V, so you must use a programmer that supports that logic level
  • The reset pin is not broken out to the headers, so software reset must be used

The SEGGER J-Link programmer is perfect for this, and already has runner support built into Zephyr.

J-link programmer connected to RAK5010 using a DIY IDC cable adapter

When I first started working with this board, used jumper cables to directly connect the pins on the RAK5010 to the pins inside the connector socket of the J-Link. The RAK5010 user guide has a connection diagram for this:

Image source: RAKwireless

I use my J-Link for a lot of different boards, so this quickly became annoying to hook back up each time I returned to it. Instead I ordered an adapter board and soldered my own little dongle that interfaces with a standard 1.27mm IDC cable
Front and back of a simple adapter that converts SWD 0.1" pins to a 1.27mm 10-pin IDC socket

I have also tested using a 1.8V JTAG programmer with this board. It will work, but I found for larger Zephyr programs I had to load the binary manually in the GDB console. Your mileage may vary.

Zephyr configuration for the RAK5010

Support for the RAKwireless RAK5010 is built into Zephyr. However, I found a few usability issues. These have already been accounted for in the board files for all of the Golioth’s Zephyr sample applications. Let’s walk through the details.

Use USB as a serial connection

If you want to use the USB port for serial output, the USB-CDC driver needs to be turned on and configured. This involves mapping the console in devicetree and enabling it in Kconfig:

/ {
  chosen {
    zephyr,console = &cdc_acm_uart0;
    zephyr,shell-uart = &cdc_acm_uart0;
  };

};

&zephyr_udc0 {
  cdc_acm_uart0: cdc_acm_uart0 {
    compatible = "zephyr,cdc-acm-uart";
  };
};
# USB
CONFIG_USB_DEVICE_STACK=y
CONFIG_USB_DEVICE_PRODUCT="RAK5010 Zephyr"
CONFIG_USB_DEVICE_PID=0x0004
CONFIG_USB_DEVICE_INITIALIZE_AT_BOOT=y

CONFIG_UART_CONSOLE=y
CONFIG_UART_LINE_CTRL=y
CONFIG_LOG_BACKEND_RTT=n
CONFIG_USB_CDC_ACM_LOG_LEVEL_OFF=y

Use the Zephyr modem driver

The Quectel BG95 works well with Zephyr’s new(ish) modem driver. The board is supported by the in-tree cellular-modem sample and I cribbed most of the configuration from there.

CONFIG_UART_ASYNC_API=y
CONFIG_MODEM_CELLULAR_APN="internet"

# Networking
CONFIG_NETWORKING=y
CONFIG_NET_NATIVE=y
CONFIG_NET_L2_PPP=y
CONFIG_NET_IPV4=y
CONFIG_NET_UDP=y
CONFIG_NET_SOCKETS=y
CONFIG_NET_CONTEXT_RCVTIMEO=y

# Modem driver
CONFIG_MODEM=y
CONFIG_MODEM_CELLULAR=y

The one quirk that I found, compared to other Zephyr networking, is that this board needed an explicit call to bring up the network interface. That’s now built-in to the Golioth common library, if you’re in need of help, check out the commit where it was added.

Build and flash for the RAK5010

Building and flashing is nearly the same for this board as any other. The one caveat is that if you don’t account for a lack of the reset pin, you’ll need to manually press the reset button (or power cycle) on the board for the binary to begin running. The –softreset flag shown below solves this for me. (This is the case for the default nrfjprog runner, I also tested the pyocd runner and found reset worked as expected).

west build -b rak5010_nrf52840
west flash --softreset

Other considerations

I had the chance to dig into the modem driver in a project that uses the RAK5010. It’s pretty incredible, and it looks like the chat script system will enable us to add any AT-command based modem to Zephyr with relatively little pain.

The chat script steps for the BG95 are in-tree. There is one step that always throws a timeout warning when first connecting. In my testing, this didn’t affect behavior (beyond delaying the connection for the few seconds for the timeout to occur).

As I write this post, v3.6.0 is the most recent Zephyr release. However, in a few weeks v3.7.0 will be released and this will upend board definitions. A new paradigm for defining boards has been adopted and it’s probably worth your time to get familiar with the new hardware model.