Tag Archive for: Troubleshooting

Learning Devicetree is one of the more difficult parts of getting comfortable with Zephyr. I find the error messages can be extremely long and hard to decipher. One tool that has helped me along the way is the ability to look at the header files that are being generated when the Devicetree files are combined at build time. Every project has a build/include/generated/devicetree_generated.h file that you can reference against error messages.

Let’s look an example of a common Devicetree error message and how I might troubleshoot it.

Building zephyr/samples/sensor/bme280

The Bosche BME280 sensor is one of our favorites here at Golioth. Let’s build the Zephyr sample application for that sensor, using a Nordic nRF9160-DK. The only change we need to make is to add an overlay file for this board:

/* Warning: we've made an error in this file for the demo */
&i2c3 {
    pinctrl-0 = < &i2c2_default >;
    pinctrl-1 = < &i2c2_sleep >;
    pinctrl-names = "default", "sleep";

    bme280@77 {
        compatible = "bosch,bme280";
        reg = <0x77>;
    };
};

&pinctrl {
    i2c2_default: i2c2_default {
        group1 {
            psels = <NRF_PSEL(TWIM_SDA, 0, 12)>,
                <NRF_PSEL(TWIM_SCL, 0, 13)>;
        };
    };

    i2c2_sleep: i2c2_sleep {
        group1 {
            psels = <NRF_PSEL(TWIM_SDA, 0, 12)>,
                <NRF_PSEL(TWIM_SCL, 0, 13)>;
            low-power-enable;
        };
    };
};

Now if you try to build this application:

$ west build -b nrf9160dk_nrf9160_ns . -p

You will eventually be greeted by dozens of lines of error output. The part of the error I usually look at most closely is the line that actually says error in it:

/home/mike/golioth-ncs-workspace/zephyr/include/zephyr/device.h:83:41:
 error: '__device_dts_ord_109' undeclared here (not in a function);
 did you mean '__device_dts_ord_19'?                                                                                                     
   83 | #define DEVICE_NAME_GET(dev_id) _CONCAT(__device_, dev_id)                                                                       
      |                                         ^~~~~~~~~

Now __device_dts_ord_109 is certainly not part of my code. But I recognize the format as belonging to the Devicetree build process. Let’s see if we can make more sens of that identifier.

Troubleshooting Devicetree with generated header files

Look in the build/include/generated/devicetree_generated.h file that was generated during the build process. Near the top, in comments for this file, you will find the Node dependency ordering list.

* Node dependency ordering (ordinal and path):
*   0   /
*   1   /aliases
*   2   /analog-connector
*   3   /chosen
*   4   /connector
*   5   /entropy_bt_hci
*   6   /gpio-interface
*   7   /soc
*   8   /soc/peripheral@40000000
*   9   /soc/peripheral@40000000/gpio@842500
*   10  /gpio-reset
 
... 96 lines removed for blog post brevity ...
 
*   107 /soc/peripheral@40000000/flash-controller@39000/flash@0/partitions/partition@f0000
*   108 /soc/peripheral@40000000/flash-controller@39000/flash@0/partitions/partition@fa000
*   109 /soc/peripheral@40000000/i2c@b000
*   110 /soc/peripheral@40000000/i2c@b000/bme280@77

Okay, now we’re getting somewhere! I can see in the list above that node 109 (the number that appeared in the error message) is an i2c bus, and node 110 is our BME280 node on that i2c bus. So the error we’re getting relates in some way to this node being undeclared.

The easiest way to look at the declaration of the nodes is to view the build/zephyr/zephyr.dts file that is the combination of all Devicetree files during the build process:

i2c3: i2c@b000 {
    compatible = "nordic,nrf-twim";
    #address-cells = < 0x1 >;
    #size-cells = < 0x0 >;
    reg = < 0xb000 0x1000 >;
    clock-frequency = < 0x186a0 >;
    interrupts = < 0xb 0x1 >;
    status = "disabled";
    pinctrl-0 = < &i2c2_default >;
    pinctrl-1 = < &i2c2_sleep >;
    pinctrl-names = "default", "sleep";
    bme280@77 {
        compatible = "bosch,bme280";
        reg = < 0x77 >;
    };
};

I was able to find i2c@b000 in this file. It corresponds to the i2c3 node I want to use for my sensor. And indeed, you can see the sensor node is present. So why can’t the build system locate this node? The answer is in line 264: status = "disabled“.

Disabled nodes are not included in the build. So even though we see information here, the preprocessor will not include symbols for this node. If we want to use this peripheral, we need to enable it. That is the mistake I made in my overlay file.

Correcting the Overlay File

Correcting the overlay file is a simple matter of enabling our target node. If you’re like me, you might assume the opposite of disabled is enabled, but you would be wrong. Zephyr wants enabled nodes to use the okay keyword:

&i2c3 {
    status = "okay";
    pinctrl-0 = < &i2c2_default >;
    pinctrl-1 = < &i2c2_sleep >;
    pinctrl-names = "default", "sleep";
    bme280@77 {
        compatible = "bosch,bme280";
        reg = <0x77>;
    };
};

&spi3 {
    /* The nRF9160 cannot have both
     * i2c3 and spi3 enabled concurrently */

    status = "disabled";
};

&pinctrl {
    i2c2_default: i2c2_default {
        group1 {
            psels = <NRF_PSEL(TWIM_SDA, 0, 12)>,
                <NRF_PSEL(TWIM_SCL, 0, 13)>;
        };
    };

    i2c2_sleep: i2c2_sleep {
        group1 {
            psels = <NRF_PSEL(TWIM_SDA, 0, 12)>,
                <NRF_PSEL(TWIM_SCL, 0, 13)>;
            low-power-enable;
        };
    };
};

When solving this issue I also received an error after enabling i2c3 because spi3 was already enabled by default. This device can only have one of those enabled at a time, which explains the additional node above that disables the unused SPI bus.

Conclusion

Understanding Devicetree errors is a bit like playing jazz. There’s a pattern to it, but you do need to develop a bit of a feel for it. That begins with developing a sense for what the error output is telling you. I hope this tidbit will make things a bit easier.

If you have other Zephyr troubleshooting tricks we should know about, we’d love to hear it! Please share your experiences on the Golioth Forum!

In this article, we showcase how to use Wireshark–an open source, free network analysis tool–to troubleshoot wireless mesh networks set up using OpenThread, Zephyr, and Golioth. The tooling shown here can also be used for other Thread-based devices, assuming you understand the layers of the network. We used these tools internally to help us when get Thread devices to connect with Golioth and take advantage of all of the features we have to offer IoT device makers.

Building Thread Networks

Golioth started building out Thread networks when several users approached us about their interest in creating Golioth-managed Thread devices. We created example projects to show our users how to create mesh networks of custom low-power sensors and connect them back to the internet. We benefit from the fact that Thread network devices are IPv6 devices (thanks to 6LoWPAN), and that they talk over the CoAP protocol, all of which aligns very well with Golioth capabilities. We showed this in our most recent blog post about custom Thread nodes connecting through an OpenThread Border Router (OTBR) back to Golioth and transmitting information that can be displayed anywhere on the web.

Hardware and firmware engineers can utilize the Golioth Zephyr SDK to implement a wide range of features on Thread nodes and interact with those nodes like any other internet-connected device. In the Golioth Red Demo showcased at a number of recent live events, we had nodes that could report back sensor data and react to stimulus from the cloud; future versions could also get firmware updates directly from the cloud.

As in any hardware and firmware development process, things didn’t always go according to plan. When we were troubleshooting our early Proof of Concept, we needed to check which part of the chain was not passing packets along. We broke out Wireshark to start sniffing packets and figured out that there was a mismatch in the number of bytes being sent (since fixed). We think this kind of pinpoint accuracy in troubleshooting is a tool that all our users should have in their toolbox.

Good Security is meant to slow you down

Golioth is secure by default, which means all packets going to our Cloud must be encrypted. Normally, this is a feature! You don’t want anyone with a packet sniffer to be able to see plain-text data. However, when you do want to see what’s inside a packet during troubleshooting, you need to make sure you have the keys to unlock everything. You also need to make sure you have the tools properly configured for the various layers involved in Thread networks. These will be the steps we review below and in the video.

Setting up a sniffer

In order to use Wireshark to troubleshoot a Thread network, you’ll need the following:

Pretty simple!

The first step is getting the tools onto the dongle. Much like the dongle was used as a Radio Co-Processor on the Open Thread Border Router, we’ll be using a different set of firmware to sniff radio packets and hand them over USB to the computer. This firmware is specific to 802.15.4, which is the Physical and MAC layer used by Thread. Download the firmware from Nordic Semiconductor and load it onto your dongle and you’re ready to go!

Next you need to be able to interact with the output of the dongle. This includes installing a python script in Wireshark that is located in the Nordic Semiconductor repository. Around the 2:15 mark in the video, Mike shows where and how to install this in Wireshark.

Configuration Settings

Other important parts of the video include things like:

  • 3:45Choosing the correct 802.15.4 channel (channel 15 for most Golioth examples)
  • 5:00Network settings for Wireshark to capture Thread network traffic
  • 7:15Adding a Pre-Shared Key (PSK) to decrypt DTLS packets

Once those configuration steps are done, you can view wireless traffic coming from your Thread node, through each of the layers, up to the Golioth servers, onto specified endpoints like /logs. In the decoded payload area (bottom window), you can also see the messages that are actually being sent, in this example a log message saying “starting connect”.

Tools for when you need them

Wireshark and plugins developed by the community make for a powerful set of tools for troubleshooting network problems. We hope that our examples and tutorials allow you to quickly deploy a Thread network and build out custom Thread nodes; but when you need a bit more insight or are looking to try something new, Wireshark can help.

We’re here to help too! You can reach us on our Discord or Forum, and can always reach us at [email protected].

Troubleshooting high complexity systems like Zephyr requires more thorough tools. Menuconfig allows users to see the layers of their system and adjust settings without requiring a complete system recompilation.

The troubleshoot loop

Modify, compile, test.

Modify, compile, test.

Modify, compile, test.

Modify, compile, test.

How do we break out of this loop of trying to change different settings in a program, recompiling the entire thing, and then waiting for a build to finish? Sure, there are some tools to modify things if you’re step debugging, such as changing parameters in memory. But you can’t go and allocate new memory after compiling normally. So what happens when you need to change things? You find the #define in the code, change the parameter, and recompile. What a slow process!

Moving up the complexity stack

We move up the “complexity stack” from a bare-metal device to running a Real Time Operating System (RTOS) in order to get access to higher level functions. Not only does this allow us to abstract things like network interfaces and target different types of hardware, but it also allows us to add layers of software that would be untenable when running bare-metal firmware. The downside, of course, is that it’s more complex.

When you’re trying to figure out what is going wrong in a complex system like Zephyr, it can mean chasing problems through many layers of functions and threads. It’s hard to keep track of where things are and what is “in charge” when it comes time to change things.

Enter Menuconfig

Menuconfig is a tool borrowed from Linux development that works in a similar way: a high complexity system that needs some level of organization. Obviously, in full Linux systems, the complexity often will be even higher than in an RTOS. In the video below, Marcin shows how he uses Menuconfig to turn features on and off during debugging, including with the Golioth “hello” example. As recommended in the video, new Zephyr users can also utilize Menuconfig to explore the system and which characteristics are enabled and available.