Taking Your Hardware To Production with Zephyr

Manufacturing is marathon, not a sprint. Zephyr RTOS includes numerous features to help you at every step along the way, from initial prototype, to maintaining your hardware fleet in the field. Golioth’s Developer Releations lead, Chris Gammell, spoke on this topic at the 2024 Embedded Open Source Summit.

Chris’ approach boils down to breaking manufacturing into five distinct phases:

Golioth - going to production with hardware

  1. Early prototype
  2. Custom hardware
  3. First device in production
  4. Scaling production
  5. Maintaining a scaled fleet

The challenges of each phase exist whether or not you’re using Zephyr. But this RTOS has good tools you should utilize to smooth out many wrinkles. Let’s walk through each phase to see what is involved. The full set of talk slides is available at the bottom of this post.

Early Prototyping on Dev Boards

Chris always starts his prototyping out with commercially available development boards when possible. This means the hardware is in a known working state. Even if you haven’t finalized all of your hardware choices, Zephyr offers great portability so you can relatively easily change to a different part without the need to scrap your early work.

Zephyr also offers a number of tools for early tinkering. The menuconfig system is excellent to explore the configuration options available for the peripherals you have chosen. And the Zephyr shell is fantastic when validating new parts. For instance, the sensor and i2c shells facilitate live interaction with your sensors before getting down to the business of writing C code. Read about Golioth community member Timon switching over to Zephyr for prototyping.

Custom HW, First Pilot

As you move into your first pilot, this will likely be the first time you stand up custom hardware to ensure the system design works. Take time here to validate all of the parts in the design. Now is when you should be looking to see you have the feature coverage necessary to meet your needs. Confirm that the parts you have on the board are all needed, and ditch the ones that aren’t.

This is also a great time to begin planning for how you will test and provision each device. What kind of test points do you need? Ensure you’ve correctly routed the programming header and test placement for quick work during manufacturing.

Zephyr’s debugging features come into play during here. Consider the best setup for Zephyr’s logging system, whether that’s just turning it on and off, or changing up backends like the Golioth logging backend that sends logs to the cloud. Give thread-aware debugging suites like Ozone and Systemview a try before you need them. You’ll get a ton of insight to how your system is performing before a showstopper forces you to!

First Devices in Production

Pick a number, maybe that’s 100, of devices to join your first manufacturing run. This will be the first glimpse you have into some of the issues that will surface when you scale your production.

At this point, Chris likes to reach for the Zephyr board definitions and makes use of the support for board revisions. When peculiar behavior happens, the ease of compiling the same code for two different board revisions will help you discover if it’s something that’s always been there, or just arrived at the party.

commands to build firmware for different revisions of a board

Now is the time to set up your hardware-in-the-loop testing. You need to move fast and manual testing is the opposite of that. It’s not too late to adapt hardware for automated testing and you’ll thank yourself later. Once you have a programming and serial interface to the boards, Zephyr will swoop in with Twister and pytest that can be run on every PR and merge to catch problems early and run cycle tests far more frequently than you would otherwise.

Finally, don’t forget to plan for how you will perform firmware updates. Sure, you can plug USB cables into the 100 units you have in front of you, but that’s going to get really old when you do two patch releases in the same week. Don’t wait until you start to scale, set up your OTA updates now so you can begin testing automatic updates. With Golioth, you can do OTA from day one!

Scaling Production

This is it, time to turn the process up to 11 and start churning out boards. Smart decisions now will have a huge impact on your bottom line, so firm up those decisions on whether or not you need the top chip version in the family or can take it down a notch or two.

Consider how your choices affect cost after production. For instance, power budget is often a very large consideration. Zephyr includes a Power Management that should be used for battery-operated devices.

Network bandwidth is more directly related to monetary cost; optimizing your data usage leads to lower cellular and data usage bills. Even small savings scale! Consider configuring log levels to disable debug and info messages during normal operation. The Golioth settings service or an RPC can be used to remotely configure this. The same is true for what data is being streamed back to the servers and how frequently. We also recommend implementing a reboot RPC as a simple version of the “have you tried turning it off and back on again?” adage.

Chris touches on the topic of developing test stands for use during manufacturing. These interface with your hardware, and may work in conjunction with custom Zephyr shell commands to control the device during tests.

Maintaining a Scaled Fleet

You haven’t really crossed the finish line until your deployed devices reach their usable lifetime in the field. This means maintenance as myriad different operating conditions are sure to turn up unknown behavior.

If you followed Chris’ guidance in previous steps, your OTA update system is already in place and can be utilized to push out updates to address problems. Be sure to take advantage of simple things like Zephyr’s watchdog subsystem for automatic reboot when all else fails. But ultimately you want to fix the problems in place, so leveraging core dumps, and perhaps pushing fixes outside of full updates using the LLEXT feature in Zephyr is worth a look.

Slides

Give Chris’ talk a shot. There’s a ton of useful information there, whether this is your first rodeo or you’ve been rolling boards off of the production line since Chris was still in diapers. Manufacturing is defined by change. Embrace that concept and you’ll never be left behind.

Slides are below, the video is embedded at the top of the post.

Talk with an Expert

Implementing an IoT project takes a team of people, and we want to help out as part of your team. If you want to troubleshoot a current problem or talk through a new project idea, we're here for you.

Start the discussion at forum.golioth.io