Scaling makes things more difficult

As your deployments grow, things start to get messy.

At 100, or maybe even just below 1000 devices…you can do it however you’d like. You might have a spreadsheet listing the location of devices and all of the relevant information about them. But even then, it is a full-time job to manage devices. What’s more, as new people come in to take action on particular devices, they need to slice and dice the data about where the device is located, its status, the capabilities that the device has.  According to Satistia in 2021, there are more than 13.8 billion active IoT devices. It’s estimated that the number of active IoT devices will surpass 30 billion by the end of 2025.

A good organizational system will reduce mistakes

Planning and organizing will help you get your work done accurately while avoiding costly mistakes. Managing this huge number of devices requires a good organizational system. Golioth provides a simple solution to manage devices: tags.

Tags provide a flexible, flat hierarchical structure in which you can group multiple devices. We are going to explore the usefulness of tagging in the Golioth platform.

Why use tags?

A tag is a simple text, generally, one to three words, that provides details about a device. The tag is assigned to a unique device on the Golioth network. Later, when searching, it is a simple click or search term in order to locate similar devices with the same tag.

You can easily separate your devices with any criteria you choose. For example, a separation between development devices and production devices. This can be useful when you want to change configurations for a specific set of devices. By using tags, you can group these devices and push the needed configuration using something like the OTA DFU on the nRF91.

You can also group different devices depending on your deployments or regions. Maybe you’re building an asset tracker application and you want to separate your devices that are on the east coast of the US vs the west coast. Or maybe you want to only update devices that are in a particular time zone. Or even separate when you push an update based on timezone, preferring to push firmware when no one is actively using the device. With tags, it’s possible to target firmware update deployment where and when you want it.

Flat structure

Right now, Golioth tags are a flat structure. You can assign multiple tags to a device without the concept of hierarchy. Even with this flat structure, it’s possible to do some powerful things like we just described. For example, using tags with LightDB allows us to pull or send data to specifically tagged devices. If you want to toggle an LED on a group of devices, you can send the new state to that group by sending the command to the associated tag, as shown in the video below. This is practical for test purposes. You can manipulate a device’s tags graphically on the console, or you can use the Golioth Command Lines Interface (CLI) to remove, add, or modify certain tags. The CLI in particular can be really useful as you start to have a deployment that’s much bigger (hundreds, thousands, millions).

When you’re ready to retrieve data from groups of devices, you can also query data and even do aggregations using tags. For example, if you want to see the average temperature of devices located on the 3rd floor in a factory, you could filter the logs for “factory”, “3rd floor”, “temperature” and then aggregate the data. Extending this idea, pulling data from tagged devices in a region could even lead to a sort of ad-hoc mapping.

Watch the demo

 

On the scale of “none at all” to “X.509 certificates“, there is a wide range of security implementations possible. In this article, we discuss how a Pre-Shared Key (PSK) scheme is “secure enough” to get your prototype off the ground and why your first prototype on Golioth will be “secure by default”.

This article isn’t about production-level security

Getting devices connected to the internet is non-trivial. Security protocols and methods exist to keep unwanted devices off of networks, and put processes in place that require devices to verify that they are allowed onto a network. In bootstrapping a device to securely connect to the internet, many people choose plaintext connections instead of taking the time and headache of utilizing good security practices.

At the other end of the spectrum, we have strict security practices and the engineers charged with maintaining networks. Devices that seek to join those networks must implement the layers of encryption required to certify that a device is on a network. Security professionals stake their careers on solid methods and highly complex processes.

We are taking a middle ground.

Enter: the password

First, let’s define what a Pre-Shared Key (PSK) system looks like. With PSK, your device and your backend will share the same secret. If you ever used passwords, you are likely familiar with the concept. You can think of PSK as passwords not for human users, but for devices. Imagine you are a user of a service, to which you log in with a password and a username. When you signed up for your account, you told the service (pre-shared) the username and password you want to use for future log-ins. PSK is essentially the same if you substitute a user for a device. Each device is assigned (pre-shared) a key (“password”) and a PSK-ID (“username”) by the backend, which it then can use to authenticate itself to the backend.

What about the security of PSK usage at scale? To draw a security analogy, let’s start with considering how WiFi PSK works, and how much you trust it. In a PSK WiFi network, you configure a “password” on the WiFi access point, and then give the same password to anyone who you want to let connect to that network. Most home WiFi networks use PSK authentication, even though you might not call it PSK, but a “WiFi password”. Many corporate visitor networks work the same. TLS PSK in IoT is more robust. For authorization purposes, giving the same password (“PSK key”) wouldn’t help much. It would merely tell the device is authorized to use the service, but it would not prove anything about its identity. That’s why when using TLS PSK in IoT, you shouldn’t give the same password to multiple devices. Every device should have its own “password” (and “username” – a PSK identity).

By assigning a unique PSK to each device, compromising the password of one device will not compromise other devices. In this aspect, TLS PSK is more secure than your average WiFi connection. In many smart home deployments, the same WiFi password is shared by all devices, mostly because of its convenience. However, that also means that if any of those devices get compromised, your WiFi network is no longer secure. Anyone with that WiFi password can connect new devices to it.

PSK is good enough for prototypes

PSK has its use case, and it’s convenience motivated. You likely don’t want to be bothered with procuring a Hardware Security Module (HSM) and setting up a secure process to generate and distribute certificates. You likely need to get some level of security as quickly as possible. That’s the “good enough”. Certificates are going to give you great security, but are going to make everything way more complex. This is where PSK is a great candidate.

Adding a layer: PSK key rotation

To make it more difficult for a potential attacker to compromise your device, it is worth changing (rotating) your device’s keys once in a while. The process doesn’t have to be complicated, but it could be if you don’t design for it.

Start by building your device with two sets of keys. One main key (that is actively being used), and one backup key (only used under special conditions). That way, if you need to update your key, you can keep your existing main key in place and only update the backup key. After the update, if you have experienced a successful authentication for that device, all you need to do is swap the roles of the keys, and you’re good to go. On an unsuccessful start, just keep things as they are and try the key update process again.

When you head towards production

So if PSK is so great, why don’t we just drop certificates and use PSK instead? From the single-device perspective, it might seem like a good idea. But once you start thinking about hundreds, thousands, or millions of devices, things become more complicated and more expensive.

If you are preparing to manufacture millions of devices securely, you are likely considering many aspects of not only technical / hardware / software security, but also physical security of your operations. Who can build authentic devices for me? Who do I trust to build authentic devices?

With PSK, the naive approach could be just setting up a manufacturing station that generates random keys and upon producing a device with a specific key, it also registers the key “in the cloud”. This will make your production depend on internet availability, and availability of the servers that register the keys. Outage of any of these will bring your production to halt.

You might work around this problem by having a local database of keys on the manufacturing station, but that leaves you with a synchronization challenge, and you’d have to rely on a local storage of the manufacturing station to persist the information. Going down this path, it’s likely you’ll find more new problems than solutions.

Ultimately, PSK is not a great solution for manufacturing at scale. It might work well for small batches in your PoC, but make sure to consider how it will impact your product at scale.

“Secure by default” with Golioth

At Golioth, we want to make sure your devices can seamlessly go through all phases of an IoT project – from a concept, through a prototype, to production at scale. We want to set you and your project up for success from day one.

That’s why we decided to make TLS PSK authentication required for all devices connecting to Golioth. Using PSK during development requires very little overhead, and meets convenience requirements for fast iteration, fast onboarding, and simple debugging. Your prototype will have security that is based on battle-proven Internet standards and meets compliance and certification requirements of most IoT use-cases. You can assign multiple keys to any one device to try out PSK key rotation. Your team and end customers won’t need to worry about the prototype being safe. It will also pave the way for easy migration to more advanced security once you are ready to produce devices at scale.

Check out the video below to see PSK assignment in action and to hear more about PSK use at Golioth.

 

Troubleshooting high complexity systems like Zephyr requires more thorough tools. Menuconfig allows users to see the layers of their system and adjust settings without requiring a complete system recompilation.

The troubleshoot loop

Modify, compile, test.

Modify, compile, test.

Modify, compile, test.

Modify, compile, test.

How do we break out of this loop of trying to change different settings in a program, recompiling the entire thing, and then waiting for a build to finish? Sure, there are some tools to modify things if you’re step debugging, such as changing parameters in memory. But you can’t go and allocate new memory after compiling normally. So what happens when you need to change things? You find the #define in the code, change the parameter, and recompile. What a slow process!

Moving up the complexity stack

We move up the “complexity stack” from a bare-metal device to running a Real Time Operating System (RTOS) in order to get access to higher level functions. Not only does this allow us to abstract things like network interfaces and target different types of hardware, but it also allows us to add layers of software that would be untenable when running bare-metal firmware. The downside, of course, is that it’s more complex.

When you’re trying to figure out what is going wrong in a complex system like Zephyr, it can mean chasing problems through many layers of functions and threads. It’s hard to keep track of where things are and what is “in charge” when it comes time to change things.

Enter Menuconfig

Menuconfig is a tool borrowed from Linux development that works in a similar way: a high complexity system that needs some level of organization. Obviously, in full Linux systems, the complexity often will be even higher than in an RTOS. In the video below, Marcin shows how he uses Menuconfig to turn features on and off during debugging, including with the Golioth “hello” example. As recommended in the video, new Zephyr users can also utilize Menuconfig to explore the system and which characteristics are enabled and available.

 

 

Every IoT project needs to provision devices that are going to be available in the field. Leveraging open standards, Golioth cuts down on the required time and hassle for IoT development teams.

Provisioning is a critical step in IoT projects when they go to production. Unfortunately, this process remains a mystery for many engineers due to lack of information about the process. At a high level, provisioning is passing configurations and credentials to an IoT device so it can connect securely to the cloud. Once provisioned, the device can send telemetry, receive commands, or be updated (by OTA DFU) when it’s out in the field. How you provision a device depends a lot on the use case. 

(click the image above to see the full diagram)

Example use cases

First, let’s examine a customer-facing product like a smart light bulb. In this scenario, the first step would be for the user to provide WiFi credentials to connect to the user’s home network. On the platform side, the device would obtain a new set of credentials to connect to the backend services. These credentials would be specific to that particular user and device. Later, the user might decide to clean up the device to sell it, so the ability to remove device configurations and deleting a given set of credentials is important. This is a perfect example for using BLE provisioning like shown in the video below.  The user experience is seamless with any existing mobile app used for controlling the bulb and reporting data back from the end device.

Next, we’ll consider factory-level provisioning. An example device like a cellular asset tracker would be pre-provisioned at the factory before being used by your customer. Later the user will only associate that device with their account, but the credentials to talk to the cloud are already set on the device. This can be done as part of the manufacturing process, probing the device via Serial/UART to get the device hardware ID, provisioning it to the cloud, and sending credentials back to the device via the same transport. We can even have different firmware that will only provision in the factory. The device accepts the initial device configuration and saves the credentials to flash. Subsequent firmware that doesn’t have that initial feature enabled, making sure external parties can’t change or reverse engineer the initial configuration.

There are myriad ways that provisioning can be done. Each instance will depend on the factory environment, the capabilities of the user, and on the end application. The video below is a setup similar to the first example explained above, using a Bluetooth application to read and then program the end device, all while working with the Golioth cloud.

Our demo application

As you can see in the video, we developed an end-to-end sample that shows a practical scenario of provisioning IoT devices with a native mobile app, talking with an IoT device over Bluetooth, and provisioning device/credentials in Golioth Cloud. We leverage different tools for doing so:

  • MCUmgr as the device management subsystem and protocol.
  • Zephyr as the real-time operating system, that implements MCUmgr.
  • Open-source mobile SDK to integrate MCUmgr on an app
  • Golioth’s API and the Device/Credentials Management capabilities. 

The MCUmgr community developed multiple types of transports to interact with devices, a benefit of MCUmgr being an open standard and having a vibrant community. One option is to communicate with the device over serial UART using the `mcumgr` cli or even integrate that into your own set of provisioning tools. Another option is to use a mobile SDK that implements MCUmgr protocols over BLE to talk with devices.

We took the Bluetooth approach and forked Nordic’s MCmgr Example application, adding communication with Golioth APIs to manage devices. Once we discover the name of the device, we assign credentials via the REST API and securely send them over Bluetooth to the end device. The device is running one of Golioth’s samples that accepts dynamic configuration for WiFi and DTLS Pre Shared Keys to talk securely with our cloud. The device uses a different Golioth service called LightDB. Using this configuration engine, we can publish the on/off state of the light bulb using LightDB,show that data on a UI, and even send commands to change the state on the device. 

Source code for the mobile app:

More details on how to use our REST API and how to generate API Keys can be checked on our docs website.

References

A possible solution

Let’s pretend you’re in the middle of a global chip shortage.

Surprise! There’s no need to pretend, as we are all currently in the middle of a global chip shortage. Right now it’s very difficult to source certain components.

“Why don’t hardware makers just switch out the components when they can’t source them during the chip shortage? In fact, why don’t people switch chips on a regular basis?”

As a generalization, switching costs for embedded devices are very high. If we were able to magically solve all of the switching costs for the hardware, you’d still need to deal with the switching costs of firmware and cloud platforms. This often is even more dire than than the hardware switching costs. It’s significant at both an individual level (rewriting firmware to target different architectures and board setups) and at an institutional level (maintaining different platforms and interoperability).

Operating systems and Real Time Operating Systems (RTOS) help by abstracting away a lot of the individual hardware details. When a new device is added to an RTOS, it needs to fit within the constraints of the system. If your board has an i2c sensor on it, you need to ensure your supporting firmware for that board or chipset capable of working with the elements of the RTOS. Then you can take advantage of the drivers already written for other boards/chipsets on the platform. Assuming you are willing to work within that system, you can start to supercharge your development. It’s possible to switch out components quickly and confidently, helping to alleviate the woes of the chip shortage currently underway.

Making the switch

Let’s say you have a board using an ESP32 module. Due to sourcing problems, you can no longer source a particular LED on your board, but you don’t want to change your PCB. Instead, you ask a technician wire in an extra LED to a spare pin you have on your production board that has a larger landing area. You need to build a firmware image to drive a different pin on the microcontroller than you previously were using. Now “LED2” (as it’s called in your program) is not driving pin 18, but is instead driving pin 22. With Zephyr overlays, the switch will take 5 minutes. As Marcin shows in the video below, the device tree overlay is where we map the signals internal to the firmware to the physical pins being used.

Now let’s say you cannot source the ESP32 at all, for some reason. You could create a new overlay file for a different target that works with Zephyr, assign the pins to target the functions you need on a new PCB containing a different chip, and then target that device. The time consuming aspect would be checking all of the functions are performing the same as your previous platform. But once you have decided on a new platform, assigning pins and functions to your new device would occur through overlay files.

How to use Zephyr Overlays

In the video below, we  walk through the location and function of overlays in Zephyr. Marcin explains that customization of firmware images for particular hardware targets can be as simple as a different flag on the command line. In this particular example, we are showing how to change the pins for the ESP32 demo. Previously the ESP32 overlay was shown as part of our our LightDB Sample code (docs), which targeted an ESP32-DevKitC in that video.

About The Zephyr Project

The Zephyr Project is a popular and open source Real Time Operating System (RTOS) that enables complex features and easy connectivity to embedded devices.  The project is focused on vendor participation, long-term support, and in-depth security development life cycles for products.

About Golioth

Golioth helps users to speed up development and increase the chances that pilots will be put into production with a commercial IoT development platform built for scale. We offer standardized interfaces for connecting embedded devices to the cloud and build out software ecosystems that allow your projects to get to market faster. Golioth uses Zephyr as part of the Golioth SDK to bootstrap application examples and show how to utilize the range of networking features Golioth enables via APIs.

“I’m sorry boss, I am working as fast as I can here. I reprogrammed about 36 out of the total 50 units, but this is slow going. I only have one programming cable and I need to disassemble the deployed units so I can get to the header on the boards first.”

A bad firmware image on your deployed IoT devices can mean ruined weekends, upset customers, and lost time. Many businesses pursue a network based firmware update so that they can push new versions to their devices. In fact, this is a critical part of the firmware development process, often a very early one. Developing or implementing a bootloader allows engineers to ship new control software to their devices. A straw poll on Twitter showed that some engineers spend a significant amount of time putting this tooling in place.

While the “barely any time” group seems large, it also includes those who aren’t doing a custom bootloader, nor a bootloader that is networked:

In the past, networked firmware updates took a significant amount of planning and coordination between hardware, firmware, software, and web teams. Golioth has collapsed this down to a simple process.

Update all the devices in your fleet with the click of a button

Golioth Device Firmware Update (DFU) is possible because the Golioth SDK is built on top of the Zephyr Project. Part of that implementation includes MCUboot, an open source bootloader. Using open source software up and down the stack, Golioth enables quick, secure deployment of firmware packages to IoT devices throughout the world. The Golioth Console enables easy management of firmware releases, including multi-part binary bundles, enabling updates for devices as diverse as smart speakers, digital signage, machine learning enabled sensor systems, multiple processor embedded devices, and more.

In the video linked below, Lead Engineer Alvaro Viebrantz demonstrates with Chris Gammell how to update the firmware of an nRF52 based device over Ethernet. The video includes code snippets in Zephyr and walking through the build process using the command line tool West. Once the firmware image is built, Alvaro showcases how to push the image to the Golioth cloud, package it for delivery, and then deploying to Golioth enabled devices.

No more fussing with programming cables out in the field, Golioth allows engineers to update their devices with new features, requested fixes, and efficiency improvements. Try it out today!

About Golioth

Golioth is a cloud company dedicated to making it easier to prototype, deploy, and manage IoT systems. Learn more by joining the Golioth Beta program and reading through Golioth Documentation.