In this article, we showcase how to use Wireshark–an open source, free network analysis tool–to troubleshoot wireless mesh networks set up using OpenThread, Zephyr, and Golioth. The tooling shown here can also be used for other Thread-based devices, assuming you understand the layers of the network. We used these tools internally to help us when get Thread devices to connect with Golioth and take advantage of all of the features we have to offer IoT device makers.

Building Thread Networks

Golioth started building out Thread networks when several users approached us about their interest in creating Golioth-managed Thread devices. We created example projects to show our users how to create mesh networks of custom low-power sensors and connect them back to the internet. We benefit from the fact that Thread network devices are IPv6 devices (thanks to 6LoWPAN), and that they talk over the CoAP protocol, all of which aligns very well with Golioth capabilities. We showed this in our most recent blog post about custom Thread nodes connecting through an OpenThread Border Router (OTBR) back to Golioth and transmitting information that can be displayed anywhere on the web.

Hardware and firmware engineers can utilize the Golioth Zephyr SDK to implement a wide range of features on Thread nodes and interact with those nodes like any other internet-connected device. In the Golioth Red Demo showcased at a number of recent live events, we had nodes that could report back sensor data and react to stimulus from the cloud; future versions could also get firmware updates directly from the cloud.

As in any hardware and firmware development process, things didn’t always go according to plan. When we were troubleshooting our early Proof of Concept, we needed to check which part of the chain was not passing packets along. We broke out Wireshark to start sniffing packets and figured out that there was a mismatch in the number of bytes being sent (since fixed). We think this kind of pinpoint accuracy in troubleshooting is a tool that all our users should have in their toolbox.

Good Security is meant to slow you down

Golioth is secure by default, which means all packets going to our Cloud must be encrypted. Normally, this is a feature! You don’t want anyone with a packet sniffer to be able to see plain-text data. However, when you do want to see what’s inside a packet during troubleshooting, you need to make sure you have the keys to unlock everything. You also need to make sure you have the tools properly configured for the various layers involved in Thread networks. These will be the steps we review below and in the video.

Setting up a sniffer

In order to use Wireshark to troubleshoot a Thread network, you’ll need the following:

Pretty simple!

The first step is getting the tools onto the dongle. Much like the dongle was used as a Radio Co-Processor on the Open Thread Border Router, we’ll be using a different set of firmware to sniff radio packets and hand them over USB to the computer. This firmware is specific to 802.15.4, which is the Physical and MAC layer used by Thread. Download the firmware from Nordic Semiconductor and load it onto your dongle and you’re ready to go!

Next you need to be able to interact with the output of the dongle. This includes installing a python script in Wireshark that is located in the Nordic Semiconductor repository. Around the 2:15 mark in the video, Mike shows where and how to install this in Wireshark.

Configuration Settings

Other important parts of the video include things like:

  • 3:45Choosing the correct 802.15.4 channel (channel 15 for most Golioth examples)
  • 5:00Network settings for Wireshark to capture Thread network traffic
  • 7:15Adding a Pre-Shared Key (PSK) to decrypt DTLS packets

Once those configuration steps are done, you can view wireless traffic coming from your Thread node, through each of the layers, up to the Golioth servers, onto specified endpoints like /logs. In the decoded payload area (bottom window), you can also see the messages that are actually being sent, in this example a log message saying “starting connect”.

Tools for when you need them

Wireshark and plugins developed by the community make for a powerful set of tools for troubleshooting network problems. We hope that our examples and tutorials allow you to quickly deploy a Thread network and build out custom Thread nodes; but when you need a bit more insight or are looking to try something new, Wireshark can help.

We’re here to help too! You can reach us on our Discord or Forum, and can always reach us at [email protected].

TL;DR: we’ve enabled people to compile Zephyr programs from a computer with no toolchain installed, almost instantly.

Part of our charter at Golioth is to help people prototype and scale IoT devices faster. That’s why we offer an open source SDK built on top of Zephyr. We think this represents a “fast forward” or “cheat code” for quickly standing up an IoT device prototype. On the cloud side, our servers represent hundreds of hours of customization and testing; you can instantly connect and get access to resources that allow hardware and firmware developers to scale to thousands or millions of devices. But sometimes it can be scary to get started in a new ecosystem or Real Time Operating System (RTOS) like Zephyr, even if it will speed things up later. As such, we do public and private training for companies and individuals.

As part of the resources we offer, we maintain a Training site that walks people through how to get started using Zephyr, normally targeting remote training. You can follow along right now; you’ll need to purchase an Adafruit MagTag board and sign up for a free Dev Tier account, but everything else is covered on the training site. At the end of the training, you should understand how to interact with hardware in Zephyr and send data to and from the Golioth cloud over WiFi. It’s a short jump from there to re-target other hardware, including your custom designs.

The tripping points for the training often revolve around the installation process. This is multi-pronged:

  • The size of a Zephyr install is relatively large, even when you are only targeting a specific platform. Having multiple people in a room, even with good WiFi or network connectivity, means that the shared bandwidth will be a limiting factor. More trainees means slower downloads.
  • Everyone comes to training with a computer in a different state. They might have tried to install Zephyr tools in the past, or they might have a particularly rare Linux distro, or many other possible variations. It would be best if everyone showed up with a fresh OS install…but that is very unrealistic.
  • There are different expectations around how installations should go. Many embedded engineers are “Windows first” and expect a complete IDE for any new platform. Some silicon vendors help to support this in Zephyr, such as Nordic Semiconductor. But Zephyr was originally targeting Linux-based machines, and we have found the smoothest flow for installing tools for all of the platforms that Zephyr can target means you are Linux-first.

In this article, we’re going to talk about our attempt to normalize setups and have pre-installed tools using Kasm and Docker. These are not the only tools in this space; we have previously written about GitPod and are investigating GitHub Codespaces, but this is a look at one of the latest experiments we’re running at Golioth.

Kasm thin client

The concept of a browser based client or a “thin client” is nothing new. They were all the rage back in the day of time share servers (really those were “dumb terminals”) and then again in the 90s as computing was more ubiquitous throughout the office (with a centralized set of servers). The difference is that now things are much more graphical and running completely inside the browser.

Kasm was started in 2017 and includes an open source project run by Kasm Technologies. The company behind Kasm has a per seat licensing model or they will run the servers directly for you (once you’re past 5 trial seats). They specialize in visualizations around containers. Once you log into a Kasm server, you are able to launch a range of containers, normally a desktop view or a single app that will load up in your browser. You can try this for yourself on the Kasm demo page.

The server that we’re running on is a pre-configured image that I pulled from the Digital Ocean marketplace. I was able to install all of the required software on a provisioned server running in some unknown datacenter. All I did was log in the first time to get my credentials for a user and an admin, and the rest of my interaction was on the web interface that the Kasm server presents to me as an admin.

Docker

As a hardware engineer, Docker is one of those things I heard about for a long time and never really “got it”. I’m still not sure I do. But following the tutorial for customizing a Kasm container, I started to understand a bit more. In that set of tutorials I started from a base Operating System image (Ubuntu Focal) that allowed visualization through the browser. Then I was able to start customizing, adding things like custom files on the desktop, custom icons to launch programs I installed, or adding background images pulled in from the web. It was in this customization section that I could add all of the commands from the Golioth Docs for installing Zephyr tools.

My layman explanation of Docker would be “Creating a virtual computer where I can automatically install a bunch of software using shell scripts. Once I have built that virtual computer, I am able to use it over and over again, including different instances of that virtual computer (for this Kasm scenario)”. The analogy would be if I bought a bunch of laptops, had an install CD (remember those?) with all of the required software on them, and then I mailed the freshly installed laptop to everyone who is taking our training. Sound crazy? That’s one of the best solutions we have seen, where a trainer will bring a pelican case with 24 laptops freshly imaged to on-site training. Their training works flawlessly every time!

I don’t have much else to mention about Docker aside from the idea that it’s possible to script a bunch of install commands that match the install instructions we have on our Zephyr getting started guide. In fact, I used those very directions to build the container shown in the video above. So all I’m doing in this case is automating the install process, doing it once, and then deploying the container (with all of the software and dependencies installed) over and over again for different users.

Challenges

We don’t think this is the ultimate solution for our training, so much as an experiment that showcases what we can do with containerized solutions. There are some remaining challenges, and we would love to have some help from our community.

Loading firmware onto the device

Currently our plan (as shown in the video) is to have our users/trainees pull the final built binary to their local computer to run it on the device like the MagTag. This echoes the way the mbed online compiler worked.

If there is a bootloader and a USB to serial connection, it’s possible to directly load onto the embedded device. In the case of some Espressif boards, this would be something like having ESPtool.py installed locally on your machine. There are an increasing amount of tools that make this process easier, such as an ESP tool that allows you to load firmware using WebUSB. Certain specialized bootloaders like the one that comes default on the MagTag loads UF2 files. When the MagTag is plugged in over USB and a sequence of buttons are hit, the device shows up as a mass storage drive. You drop a UF2 formatted binary–which is just an alternative form of compiled format–onto the drive and the device reboots and starts running the code.

If it’s a board without a bootloader, the user would need to have a debugger and local tools to communicate with that debugger, such as a JLink device and JFlash software. This means they would still need some OS specific loader tools to get the binary into the embedded device. The user would not be able to take advantage of the built-in tools in west that allow direct loading onto the device.

You steppin’?

If you would like to do debugging instead of “printf/printk” debugging, you simply need to download a different file from the container. If you download the zephyr.elf file instead of the zephyr.bin file, you can load it into a 3rd party debugger like Segger Ozone (made by the same company as the JLink). We have done some experiments with this in the past, including also analyzing where the device is spending its time using SystemView. This would once again require installing local programs that could talk over the USB port to something like a JLink.

Experimental port forwarding and WebUSB

Some GDB debuggers/servers will host the control of the debugger over a port on the machine’s localhost. We have some experiments we’re trying where we forward this port to the container so we could directly run a debugger from a software debugger inside the container.

We have also heard some whispers of a WebUSB implementation that can tunnel to the container. So we could plug in a board on our host machine (ie. my laptop) and connect to it over WebUSB, and then forward all information along to the container machine (ie. the browser based desktop running on the Kasm server).

We would love to hear about other projects that are trying this.

Shared resources

The final challenge we are dealing with is the fact that we’re basically “renting” a computer to do exactly what we could be doing with the host machine sitting right in front of us. Most developers have access to very powerful machines and we are instead using the resources on a remote machine (the Kasm server). The cost of standardization is the cost of renting server time for each person in the workshop. It might be worth it, but it is a constraint and a challenge.

Containers are another tool

Anyone reading this with a web background is likely thinking, “Yeah, containers, cool, 2010 called and wants their headline back”. But we are excited about it because these tools are finally making their way into the historically sluggish embedded industry. While our use case of containers is mostly around zero-install-time training, others are using containers to automate their testing and implementing best software engineering practices for the range of devices they have on their desk or in the field.

We’d love to hear how you think we can improve our training and make it easier for you to learn more about Golioth, Zephyr, and building code instantly. Check out our forums, our Discord, ping us on Twitter, or send us an email at [email protected]

Thread (and its common implementation known as OpenThread) is a networking technology that is quickly gaining adoption due to the forthcoming Matter standard. Thread has been around for many years, but as it grows, it becomes more accessible using off-the-shelf firmware and hardware. So it’s accessible, but maybe not all that straightforward. This post will help with that.

As we have more customers talking to us about using Golioth as a management layer for both Matter solutions and for standalone industrial Thread networks, we thought we should expand our tools to better fit the needs of engineers designing new systems.

In this article and associated examples, we will show you how to set up a network at home using common components. We’ll build a custom device that is communicating back over that network utilizing features of Golioth that extend the networking layer: over-the-air firmware updates, time series data tracking, command/control structures, and instant logging.

Parts of this Thread Guide

In the past, we have written about the basics of getting a Thread demo up and running, and shown it working on video. Today we build upon that work and showcase a new set of resources so you can build your own Thread network with custom devices.

  • YouTube video – A walkthrough of the setup steps and troubleshooting steps with a newly provisioned Thread network.
  • A tutorial site – Follow along with a simplified set of directions for replicating what we have built. We think is the shortest path towards getting a working Thread network on your desk or bench.
  • Code repository – Start from working code on the node devices to see how you can customize code and have your sensor data streaming back to Golioth quickly.
  • This blog post

Getting Started

The majority of the step-by-step instructions are contained in the tutorial site for Golioth and OpenThread. If you’re interested in immediately replicating and then extending our setup, head over to the tutorial site to learn more.

Let’s take a higher level look at the OpenThread Border Router (OTBR), OpenThread nodes, and Golioth device management layer that make up this Thread network example.

OpenThread Border Router  (OTBR)

This is the key part in a Thread network, as it allows any arbitrary number of nodes that are communicating with one another (meshing) to then reach the outside internet. Each Thread device has an IPv6 address, which is great: That means a node that is meshing with 30 other nodes and connected to the internet (through an OTBR) is directly addressable from the internet. That’s an important piece. We might expect a higher power WiFi based device to have an IP address (assigned from a router), but probably not a low power sensor. The OTBR does a lot of the routing of information and translation of packets coming from node devices.

We build a DIY version of the OTBR because there aren’t many commercially available (yet). We use a Raspberry Pi and an nRF52840 Dongle to create a pipeline out to the wider internet.

OpenThread Nodes

In our first video/blog about Thread, we had nodes talking to the internet through an OTBR. However, the nodes only blinked and sent back logging messages and we didn’t give detailed instructions on how to build them. In the tutorial site and the video, we show how we can execute arbitrary code to do higher level functions, like data logging. We use the Laird Connectivity BT510, which is a sensor node built with the Nordic Semiconductor nRF52840. It also has a range of sensors built in and is contained in a waterproof case. We think it’s a great platform for building a small, reliable Thread network and we used it in our Red Demo that we showcased at the 2022 Zephyr Developer Summit and Embedded World.

The BT510 is a board already supported in Zephyr, which means we can very easily compile firmware for it and access all of the sensor drivers that are built into Zephyr, no custom out-of-tree code required. We use the OpenThread networking stack that is native to Zephyr, and the Golioth SDK, which allows each node to be pre-configured to talk to the Golioth servers. We then enable things like LightDB Stream to regularly send back sensor data from the device through the Thread network.

Golioth Device Management Layer

The Golioth Device Management Layer/Platform is already ready for you. If you don’t have an account, you can sign up on the Golioth Console, which will guide you through creating your first device on the platform; you can use the credentials for that digital version of a device to control your first Thread node.

Once your Thread device is connected, you’ll be able to see how often the device is connecting, view the latest data and logs being sent back from the device, and check which firmware versions are on each device. Any data sent to the Golioth platform from a Thread node can be aggregated into an external visualization platform, or wholesale exported to 3rd party services (AWS, Azure, GCP) using Output Streams.

What will you build?

We are sharing the know-how to build a Thread network. Following this guide enables all of your devices on the Thread network to communicate back to the wider internet. As a hardware engineer, I don’t really want to mess about with the network layer, I just want something that works. Instead, I’d rather focus on the end application and building end devices (nodes) that are useful to customers and users.

With Golioth, Zephyr, OpenThread, and some off-the-shelf hardware, you can get started quickly and you can start to connect custom devices with a powerful interface to the internet. What will you build? Please let us know on our Discord, Forum, or on Twitter.

Devices that connect to the Golioth Cloud communicate securely thanks to a pre-shared key (PSK) that encrypts all messages. But how do you get a unique set of credentials onto every device? The options for that just got a whole lot more interesting thanks to some new Golioth features. There are now two ways to set the credentials using either the device shell, or our command line tool that also fetches those credentials automatically!

Zephyr settings using the device shell

Zephyr has a shell option that runs on the device itself. This is great for things like network or i2c debugging (there are specialized commands for both of those and much more). You send your credentials over a serial connection and Golioth leverages the Zephyr settings subsystem to store the device credentials (PSK-ID and PSK) in flash memory.

As part of the getting started guide, you already set up a device in the Golioth Console. Use the Devices sidebar option to find that device again (or create a new one) and click on the Credentials tab to access your PSK-ID and PSK:

Golioth Device Credentals

Now we need some code to run on the device. The Golioth settings example already has this feature built in. For this example I’m using a Nordic nRF9160dk. You can build and flash the example right away. Normally we’d put credentials into the prj.conf file first, but this time we’ll just assign those from the shell!

cd ~/golioth-ncs-workspace/modules/lib/golioth/
west build -b nrf9160dk_nrf9160_ns samples/settings/
west flash

Once programming has completed, load up your serial terminal tool of choice. I like to use minicom -D /dev/ttyACM0 but you should have the same success with screen /dev/ttyACM0 115200 or any other similar tools.

  • You’ll be greeted by the uart:~$ command prompt
  • The syntax that we need is settings
    • PSK-ID needs to be assigned to golioth/psk-id
    • PSK needs to be assigned to golioth/psk
  • Reboot the device after changing the settings

Here’s what that looks like in action (important lines highlighted):

uart:~$ *** Booting Zephyr OS build v2.6.99-ncs1-1 ***
[00:00:00.209,716] <inf> golioth_system: Initializing
[00:00:00.216,278] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.216,278] <inf> fs_nvs: alloc wra: 0, fa8
[00:00:00.216,278] <inf> fs_nvs: data wra: 0, 90
uart:~$ settings set golioth/psk-id [email protected]
Setting golioth/psk-id to [email protected]
Setting golioth/psk-id saved as [email protected]
uart:~$ settings set golioth/psk my_complex_password
Setting golioth/psk to my_complex_password
Setting golioth/psk saved as my_complex_password
uart:~$ kernel reboot cold

After rebooting, the board connects to a cell tower and the connection to Golioth is successfully established!!

uart:~$ *** Booting Zephyr OS build v2.6.99-ncs1-1 ***
[00:00:00.215,850] <inf> golioth_system: Initializing
[00:00:00.222,381] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.222,412] <inf> fs_nvs: alloc wra: 0, fa8
[00:00:00.222,412] <inf> fs_nvs: data wra: 0, 90
[00:01:06.672,241] <dbg> golioth_hello.main: Start Hello sample
[00:01:06.672,485] <inf> golioth_hello: Sending hello! 0
[00:01:06.673,004] <wrn> golioth_hello: Failed to send hello!
[00:01:06.673,095] <inf> golioth_system: Starting connect
[00:01:06.967,102] <inf> golioth_system: Client connected!
[00:01:11.673,065] <inf> golioth_hello: Sending hello! 1
[00:01:16.674,316] <inf> golioth_hello: Sending hello! 2

Golioth Credentials automatically set from the command line

What if I told you that a one-line command could look up your device credentials from the Golioth Cloud and automatically send them to the device? This is literally the feature we’ve implemented. Now, I’m excited about the shell settings above, but this new command line feature is absolutely legendary!

Golioth device name

  1. Look up your device name from the Golioth Console
  2. Issue the command, using your device name and the correct port:
    1. goliothctl device config --name --port

Here’s what it looks like in action:

$ goliothctl device config --name nrf91-settings-demo --port /dev/ttyACM0
failed to get golioth/psk-id from device: setting not found
device success setting golioth/psk-id saved as [email protected]
failed to get golioth/psk from device: setting not found
device success setting golioth/psk saved as my_complex_password
closing serial read

And check this out, it’s a quick way to make sure you have the device credentials correct. Since you’re not copy/pasting or typing the credentials, you know you have it right as long as you get the name of the device right. Running the command a second time confirms those settings are correct:

$ goliothctl device config --name nrf91-settings-demo --port /dev/ttyACM0
golioth/psk-id in the device is already set to [email protected]
golioth/psk in the device is already set to my_complex_password
closing serial read

Visions of End Users and Bulk Provisioning

Two really easy ways to see the new features put to use are end users and manufacturing. Imaging sending devices with “stock” firmware out to customers and having them add their own credentials (we have a snazzy web-based demo in the works so stay tuned). The other thought is toward bulk-provisioning where a script can be used to register the new device on Golioth, generate credentials, and send them to the device all in the same step.

We’d love to heard about your experiences with these new tools. Catch up with us on the Golioth Discord so we can have a chat!

The line between the hardware and software worlds continues to blur. One advantage is the range of software tools that are coming into the hardware and firmware space, such as “Continuous Integration, Continuous Deployment” (CI/CD). This ensures each commit of code to a repository is immediate run against a slate of tests, compiled using a standardized compiler, and deployed to eligible devices for testing. What if your IoT firmware deployments happened automatically just by typing git push?

In this post and the associated video, Lead Engineer Alvaro Viebrantz talks about a sample project that compiles and delivers firmware to eligible devices automatically using GitHub Actions. Follow along in our repository on the Golioth Labs page.

Compiling in the cloud?

Most embedded engineers hear “cloud compilation” and start shaking their head no. Having an IDE or a local toolchain is the expectation for fast iteration and direct debugging of code using tried and true methods. But being “local only” also runs into dependencies that might be on your machine versus your co-worker’s machine. What’s worse, you might be lulled into a false sense of security and never venture past having a single machine that is capable of compiling critical device firmware. (Don’t believe me? Ask a long time firmware engineer about the steps they had to go through to keep that old computer running to compile code for the legacy project.)

Moving your final (production) builds to the cloud is beneficial because it removes localized dependencies. Distributed teams are increasingly common. Now when an engineer in Brazil, the US, Poland–or any other place your teammates might be–commit code to a repo, it receives the same treatment. If you have GitHub actions (or equivalent hooks) set up on your repository, it kicks off a compilation on a container on the cloud. The output firmware is then uploaded to Golioth. As we see in the video, it’s still possible to build on a local machine, but the standardization is very helpful as you move towards scaling your product and making production grade firmware.

Walking through the demo

Let’s recap the steps taking place in the video and discuss what is relevant about each step.

  • Build firmware locally
    • This showcases that while the toolchain is available on the cloud, it’s not only on the cloud. Developers can still create a local image for testing.
  • Load initial firmware to the device
    • Since the firmware is responsible for reacting to new firmware on the Golioth Cloud, we need to load an initial image to start checking whether an update is available. As Alvaro mentions, the firmware recognizes that the same version loaded on the device initially is also the latest on the cloud, so no action is taken.
  • Creating device credentials using the hardware ID
    • Golioth allows you to set a range of different IDs to help identify devices in the field. This is especially important as your fleet grows. As part of the provisioning process, Alvaro shows how the code on the firmware image can also extract a unique identifier that is programmed into the silicon at the factory, so it is possible to always verify which device is being identified, all the way down to the silicon.
  • Generate device credentials on Golioth
  • Set WiFi/Golioth credentials over the Console
    • Setting device credentials allows the Golioth Cloud to validate a device is able to talk to the network. Alvaro demonstrates using a serial connection with the device to add these credentials, a recent improvement that will enhance device programming in the factory or as users provision a new device.
  • Device connects and begins to send logs and LightDB State data
  • Make a small change and commit to the repo. Push a git tag.
    • Pushing the change to the repository and adding a tag is what alerts the GitHub actions logic to start compiling a new firmware image.
  • The firmware is built in the cloud as part of GitHub actions.
    • For this build, Alvaro is using our Arduino SDK along with PlatformIO. This is in a container that the GitHub action boots up and uses to standardize the firmware build.
  • The artifact is pushed to the Golioth cloud
    • As part of the setup process, the user will need to generate an API key to place on GitHub in order to allow GitHub to push the new firmware issue as an Artifact on Golioth.
  • Create a release using the Console and roll out the firmware to the eligible device.
    • This is a manual process in the video, but a user could also utilize the REST API in order to create a Release and set the release to be eligible for rollout to a set of devices that are tagged on Golioth (different than GitHub tags). Remember: Any action possible on the Console can be scripted using the REST API.

Extending the demo

One theme that is obvious throughout this video is the focus on moving your product to production. Programming devices as they come off the line is no small task and something we will continue to make content about here at Golioth. And future videos will describe methods for running real-world tests on hardware, something we are interested in given our support of hundreds of boards.

In the current demo, Alvaro shows loading credentials over serial. Past examples have also shown Bluetooth credentialing using MCUmgr. We’d love to hear more about how you’re creating your devices and how you want to program things as you move towards production. Please join us on our Discord or on our Forums to discuss more.

Looking for direct help getting your devices into production? Reach out at

Golioth is a flexible device management solution that allows you to manage devices that have IP addresses. This includes devices connecting over Ethernet, cellular, and Wi-Fi. Today we’re showcasing a demo of a Thread-based device. Utilizing the Golioth Console and API endpoints, it’s possible to manage a wide array of devices provisioned on the internet through a Thread Border Router. This enables even more low-power devices to reliably push data back to the internet for processing on the cloud.

In this post and the associated video, we show a demo of how developers can try out using Golioth to manage Thread based devices.

What is Thread?

Thread is a network protocol for low power IoT devices. It uses 6LoWPAN for mesh communications, in our case on 2.4GHz. In the examples above, we showcase Thread using OpenThread on an nRF52840 from Nordic Semiconductor. Multiple hardware vendors have Thread-based solutions, using Zephyr or other firmware solutions. Vit Prajzler shows how he is using the OpenThread documentation to build different types of devices that form a Thread mesh network, including a Radio Co-Processor (RCP) and a Full Thread Device (FTD).

Set up a Border Router

A Border Router is an element on a Thread Mesh network that allows the entire network to communicate with the broader internet. Each individual device (Node) has an IPv6 address, which allows Golioth to manage specific devices on a mesh.

While it’s possible to buy a few off-the-shelf Border Routers, we showcase building one using commonly available components. Vit is using a Linux computer (a Raspberry Pi), along with an nRF52840 dongle configured as an RCP to route network traffic from the mesh to the rest of the internet. In the case of his Thread network, Vit needs to include a translation layer (called NAT64) to talk from his IPv4-based router to the IPv6-based Thread network. In the future, when Vit gets an IPv6 address assigned to his home network from his ISP, no conversion will be required.

OpenThread software on a Linux device like a Raspberry Pi also has a web interface for managing connected devices. This is a useful way to visualize your network and the location of the Border Router within that network.

Using the Golioth interface in the Zephyr SDK

Once your Border Router has been set up, you can use the Golioth Zephyr SDK to compile an image for your device using OpenThread. In the video above, Vit is compiling for an nRF52840 (dongle). Once this device is loaded with Zephyr-based firmware and added to the mesh network–in this example, connecting directly to the Border Router–the device acts like any other IP connected device. It has an IPv6 address and can ping the address of the Border Router. It can also connect to various Golioth endpoints.

Vit demonstrates button presses and boot commands being logged in Zephyr, which then display over the UART. However, Zephyr logs are also tied to the Golioth logging end point (from the Thread network connection), so they can be viewed on the Golioth web console. As Vit presses a button on the Zephyr based device, the log message almost instantaneously shows up on the cloud.

Extending the examples

Towards the end of the video, Chris and Vit discuss how (Open)Thread-based devices built with Zephyr work very similarly to any other IP-based devices connecting to Golioth. This means that the other features that Golioth provides–such as control using LightDB State, or easy time series data tracking using LightDB Stream–is already in there. And yes, that means Firmware Updates as well. We’ll be showcasing these more in future videos.

We’re excited to bring Golioth and internet connectivity to even more devices, including the kind of teeny tiny low power ones that Thread enables. Please stop by the Golioth Discord or the Golioth Forums to ask questions and discuss the next device you’re working on!

Interested in Thread in a commercial application? Contact us at

 

Implementing remote firmware updates is one of the most critical steps towards building a resilient IoT deployment. In short: it allows you to change your device firmware in the field without being physically present with the device. This can be critical for security fixes, for feature upgrades, and for extending the lifetime of your IoT implementation. Today we’re going to be looking at Over-The-Air (OTA) firmware updates (DFU) for the Nordic Semiconductor nRF9160 cellular module (SIP). The associated video explanation and walkthrough (below) was published on our YouTube channel a couple months ago, but we’re reviewing it on our blog today.

Why are IoT firmware updates difficult?

There are many difficult components to OTA DFU, including on the system design level, at the device level, and on the hosting service. At a system level view, the hard part of a firmware update is knowing which device should be receiving the update. From the device perspective, it’s critical to be able to recover from any issues during the update and then verify the complete update was received; using cellular as the method of connectivity means there may be intermittent connectivity. With embedded devices, there are fewer ways to have a device intelligently recover, as may be the case with a full Linux system.  On the cloud side, having a reliable hosting and delivery system for firmware packages is also important. Simply sticking files on a FTP server represents a security risk and puts an outsized burden on the logic of the device in the field.

Firmware updates are easier with Golioth

Golioth provides a data pipe to cellular devices and other devices in the Zephyr ecosystem as a one stop shop for all data connectivity. This includes state data handling (digital twin), data streams coming back to the cloud, and logging.  Once that data pipe is established, we can also push firmware updates down to devices utilizing transport layers like CoAP.

We also benefit from the fact that the Zephyr project utilizes MCUBoot as the default bootloader. This open source bootloader (to which we are contributing members!) makes it easy to interact with from within an application. It’s as simple as pushing an updated firmware binary into memory and then calling an API to use that code upon reboot. Our recent blog posts around the ESP32 showcased how the newly-published port of MCUboot to the ESP32 platform enabled DFU on those parts. The Nordic Semiconductor nRF9160 has had this capability from the beginning because of its deep integration in the Zephyr ecosystem.

A key point shown in the video below is the signing of images using MCUboot.  Having a signed firmware binary means there is a way to check that the entire package is there. From a security perspective, the signing allows the device to validate that the firmware image that has arrive is secure and unchanged in transit.

Managing firmware rollout

The Golioth Console enables our users to manage a wide variety of devices in their IoT fleet. A deployment with 10,000+ devices would be difficult to manage using scripts alone. Withouth the ability to visualize which devices have been updated or not means there is higher likelihood of costly mistakes.

To aid in partitioning device deployments, users can tag devices to create logical representations of different groupings. You might use a tag to identify some devices as early recipients of a test firmware image. Blueprints can be used to group devices that match a certain hardware profile. In this way, Golioth users can target key portions of their device fleet to ensure that only those specific devices are being updated at any given time.

Artifacts and Releases

Golioth implements two distinct features that will allow for more complex firmware packages. The first is an Artifact. this is simply a binary that is uploaded to the Golioth cloud. For your security, Golioth has no knowledge of what is inside that image, only how it has been signed. As such, this might be a firmware image, a JPEG that will be displayed on a smart screen. An artifact can also be a binary blob being delivered to a device modem, or even firmware that’s being pushed down the line to other microcontrollers in the system. Each of these artifacts can be versioned and assigned a blueprint, so that they can only be deployed onto the eligible systems.

The second feature is the idea of a Release. Users can take individual Artifacts and bundle them together as a Release. For example, a Release might have an image to display on the screen, the base application firmware for the device we’re targeting (the nRF9160), and a firmware image for an auxiliary processor on board (e.g., an nRF52). Once the user is ready to deploy, they create a Release with matching tags and blueprints and an assigned Release version. Devices in the field that match the criteria set by the user on the console (tags, blueprints) will then be notified that a new device firmware release is available for them. The device in the field will start the firmware update process, downloading chunks of data.

Once the image is verified as complete, the device will utilize the MCUBoot API on the device and restart using that new image. In the event the Golioth user wants to roll back changes, they simply slide a switch on the Golioth Console and the device is alerted to download/restart the previous version of firmware. In this way, Golioth users have the ultimate flexibility for delivering firmware devices to a laser focused set of devices in their fleet.

Watch the demo

See this all happening in real time, including compiling a new firmware image, uploading it to the Golioth Console and watching the OTA DFU happen on the nRF9160.

Scaling makes things more difficult

As your deployments grow, things start to get messy.

At 100, or maybe even just below 1000 devices…you can do it however you’d like. You might have a spreadsheet listing the location of devices and all of the relevant information about them. But even then, it is a full-time job to manage devices. What’s more, as new people come in to take action on particular devices, they need to slice and dice the data about where the device is located, its status, the capabilities that the device has.  According to Satistia in 2021, there are more than 13.8 billion active IoT devices. It’s estimated that the number of active IoT devices will surpass 30 billion by the end of 2025.

A good organizational system will reduce mistakes

Planning and organizing will help you get your work done accurately while avoiding costly mistakes. Managing this huge number of devices requires a good organizational system. Golioth provides a simple solution to manage devices: tags.

Tags provide a flexible, flat hierarchical structure in which you can group multiple devices. We are going to explore the usefulness of tagging in the Golioth platform.

Why use tags?

A tag is a simple text, generally, one to three words, that provides details about a device. The tag is assigned to a unique device on the Golioth network. Later, when searching, it is a simple click or search term in order to locate similar devices with the same tag.

You can easily separate your devices with any criteria you choose. For example, a separation between development devices and production devices. This can be useful when you want to change configurations for a specific set of devices. By using tags, you can group these devices and push the needed configuration using something like the OTA DFU on the nRF91.

You can also group different devices depending on your deployments or regions. Maybe you’re building an asset tracker application and you want to separate your devices that are on the east coast of the US vs the west coast. Or maybe you want to only update devices that are in a particular time zone. Or even separate when you push an update based on timezone, preferring to push firmware when no one is actively using the device. With tags, it’s possible to target firmware update deployment where and when you want it.

Flat structure

Right now, Golioth tags are a flat structure. You can assign multiple tags to a device without the concept of hierarchy. Even with this flat structure, it’s possible to do some powerful things like we just described. For example, using tags with LightDB allows us to pull or send data to specifically tagged devices. If you want to toggle an LED on a group of devices, you can send the new state to that group by sending the command to the associated tag, as shown in the video below. This is practical for test purposes. You can manipulate a device’s tags graphically on the console, or you can use the Golioth Command Lines Interface (CLI) to remove, add, or modify certain tags. The CLI in particular can be really useful as you start to have a deployment that’s much bigger (hundreds, thousands, millions).

When you’re ready to retrieve data from groups of devices, you can also query data and even do aggregations using tags. For example, if you want to see the average temperature of devices located on the 3rd floor in a factory, you could filter the logs for “factory”, “3rd floor”, “temperature” and then aggregate the data. Extending this idea, pulling data from tagged devices in a region could even lead to a sort of ad-hoc mapping.

Watch the demo

 

On the scale of “none at all” to “X.509 certificates“, there is a wide range of security implementations possible. In this article, we discuss how a Pre-Shared Key (PSK) scheme is “secure enough” to get your prototype off the ground and why your first prototype on Golioth will be “secure by default”.

This article isn’t about production-level security

Getting devices connected to the internet is non-trivial. Security protocols and methods exist to keep unwanted devices off of networks, and put processes in place that require devices to verify that they are allowed onto a network. In bootstrapping a device to securely connect to the internet, many people choose plaintext connections instead of taking the time and headache of utilizing good security practices.

At the other end of the spectrum, we have strict security practices and the engineers charged with maintaining networks. Devices that seek to join those networks must implement the layers of encryption required to certify that a device is on a network. Security professionals stake their careers on solid methods and highly complex processes.

We are taking a middle ground.

Enter: the password

First, let’s define what a Pre-Shared Key (PSK) system looks like. With PSK, your device and your backend will share the same secret. If you ever used passwords, you are likely familiar with the concept. You can think of PSK as passwords not for human users, but for devices. Imagine you are a user of a service, to which you log in with a password and a username. When you signed up for your account, you told the service (pre-shared) the username and password you want to use for future log-ins. PSK is essentially the same if you substitute a user for a device. Each device is assigned (pre-shared) a key (“password”) and a PSK-ID (“username”) by the backend, which it then can use to authenticate itself to the backend.

What about the security of PSK usage at scale? To draw a security analogy, let’s start with considering how WiFi PSK works, and how much you trust it. In a PSK WiFi network, you configure a “password” on the WiFi access point, and then give the same password to anyone who you want to let connect to that network. Most home WiFi networks use PSK authentication, even though you might not call it PSK, but a “WiFi password”. Many corporate visitor networks work the same. TLS PSK in IoT is more robust. For authorization purposes, giving the same password (“PSK key”) wouldn’t help much. It would merely tell the device is authorized to use the service, but it would not prove anything about its identity. That’s why when using TLS PSK in IoT, you shouldn’t give the same password to multiple devices. Every device should have its own “password” (and “username” – a PSK identity).

By assigning a unique PSK to each device, compromising the password of one device will not compromise other devices. In this aspect, TLS PSK is more secure than your average WiFi connection. In many smart home deployments, the same WiFi password is shared by all devices, mostly because of its convenience. However, that also means that if any of those devices get compromised, your WiFi network is no longer secure. Anyone with that WiFi password can connect new devices to it.

PSK is good enough for prototypes

PSK has its use case, and it’s convenience motivated. You likely don’t want to be bothered with procuring a Hardware Security Module (HSM) and setting up a secure process to generate and distribute certificates. You likely need to get some level of security as quickly as possible. That’s the “good enough”. Certificates are going to give you great security, but are going to make everything way more complex. This is where PSK is a great candidate.

Adding a layer: PSK key rotation

To make it more difficult for a potential attacker to compromise your device, it is worth changing (rotating) your device’s keys once in a while. The process doesn’t have to be complicated, but it could be if you don’t design for it.

Start by building your device with two sets of keys. One main key (that is actively being used), and one backup key (only used under special conditions). That way, if you need to update your key, you can keep your existing main key in place and only update the backup key. After the update, if you have experienced a successful authentication for that device, all you need to do is swap the roles of the keys, and you’re good to go. On an unsuccessful start, just keep things as they are and try the key update process again.

When you head towards production

So if PSK is so great, why don’t we just drop certificates and use PSK instead? From the single-device perspective, it might seem like a good idea. But once you start thinking about hundreds, thousands, or millions of devices, things become more complicated and more expensive.

If you are preparing to manufacture millions of devices securely, you are likely considering many aspects of not only technical / hardware / software security, but also physical security of your operations. Who can build authentic devices for me? Who do I trust to build authentic devices?

With PSK, the naive approach could be just setting up a manufacturing station that generates random keys and upon producing a device with a specific key, it also registers the key “in the cloud”. This will make your production depend on internet availability, and availability of the servers that register the keys. Outage of any of these will bring your production to halt.

You might work around this problem by having a local database of keys on the manufacturing station, but that leaves you with a synchronization challenge, and you’d have to rely on a local storage of the manufacturing station to persist the information. Going down this path, it’s likely you’ll find more new problems than solutions.

Ultimately, PSK is not a great solution for manufacturing at scale. It might work well for small batches in your PoC, but make sure to consider how it will impact your product at scale.

“Secure by default” with Golioth

At Golioth, we want to make sure your devices can seamlessly go through all phases of an IoT project – from a concept, through a prototype, to production at scale. We want to set you and your project up for success from day one.

That’s why we decided to make TLS PSK authentication required for all devices connecting to Golioth. Using PSK during development requires very little overhead, and meets convenience requirements for fast iteration, fast onboarding, and simple debugging. Your prototype will have security that is based on battle-proven Internet standards and meets compliance and certification requirements of most IoT use-cases. You can assign multiple keys to any one device to try out PSK key rotation. Your team and end customers won’t need to worry about the prototype being safe. It will also pave the way for easy migration to more advanced security once you are ready to produce devices at scale.

Check out the video below to see PSK assignment in action and to hear more about PSK use at Golioth.

 

Troubleshooting high complexity systems like Zephyr requires more thorough tools. Menuconfig allows users to see the layers of their system and adjust settings without requiring a complete system recompilation.

The troubleshoot loop

Modify, compile, test.

Modify, compile, test.

Modify, compile, test.

Modify, compile, test.

How do we break out of this loop of trying to change different settings in a program, recompiling the entire thing, and then waiting for a build to finish? Sure, there are some tools to modify things if you’re step debugging, such as changing parameters in memory. But you can’t go and allocate new memory after compiling normally. So what happens when you need to change things? You find the #define in the code, change the parameter, and recompile. What a slow process!

Moving up the complexity stack

We move up the “complexity stack” from a bare-metal device to running a Real Time Operating System (RTOS) in order to get access to higher level functions. Not only does this allow us to abstract things like network interfaces and target different types of hardware, but it also allows us to add layers of software that would be untenable when running bare-metal firmware. The downside, of course, is that it’s more complex.

When you’re trying to figure out what is going wrong in a complex system like Zephyr, it can mean chasing problems through many layers of functions and threads. It’s hard to keep track of where things are and what is “in charge” when it comes time to change things.

Enter Menuconfig

Menuconfig is a tool borrowed from Linux development that works in a similar way: a high complexity system that needs some level of organization. Obviously, in full Linux systems, the complexity often will be even higher than in an RTOS. In the video below, Marcin shows how he uses Menuconfig to turn features on and off during debugging, including with the Golioth “hello” example. As recommended in the video, new Zephyr users can also utilize Menuconfig to explore the system and which characteristics are enabled and available.