There are 512 supported boards (according to find -name board.yml | wc -l
) already in the Zephyr tree. Most of them are real hardware platforms and the remaining ones are virtual. Why would you bother with a virtual platform? Zephyr can probably build for the SoC or development board of your choice, right? In this post, I’m going to talk about the reasons you want to try out Native Simulator.
Spoiler: Your Zephyr applications development time will drop through the floor.
Zephyr support for virtual platforms
Zephyr comes with support for various virtual platforms, but two of them are most widely used:
- QEMU
- Native Simulator
Both are extensively used in Zephyr Continuous Integration pipelines as well as during development by Zephyr users.
QEMU
QEMU is a generic machine emulator. It emulates CPUs by interpreting architecture-specific instructions as well as some peripherals like UART, flash, and networking adapters. Its main advantage is that binary (compiled code) running on QEMU is very similar to the binary that runs on a real hardware. All the low-level instructions, memory-mapped peripheral access, constrained RAM, thread context switching, thread stack sizes, interrupt handling, step-debugging with GDB, and many others mechanisms behave almost the same as on a real microcontroller.
Networking with QEMU can be achieved by setting up a TUN/TAP interface on a Linux host system. Once set up, you attach to the emulated network adapter that is handled by Zephyr drivers. The application is built with Zephyr and has access to the same network as the host machine (like a Linux laptop). After correctly configuring the TUN/TAP interface it is possible to access internet without additional hardware.
Native Simulator
Native Simulator is a POSIX architecture based “board” (Zephyr target) that runs as a standalone Linux executable. It is based on native_simulator and Zephyr POSIX architecture. As opposed to QEMU, it does not need any middle layer that emulates instructions or peripheral access. Instead, Zephyr (under Native Simulator) runs natively on Linux with very little overhead. Most of the time, it’s as fast as any regular Linux application.
However, Native Simulator does not emulate microcontroller peripherals the same way as QEMU does. It has special modules and functions called trampolines. As an example, instead of using memory mapped I/O to handle UART drivers (and logging and shell modules that utilize UART backend) there are trampolines to translate UART access APIs to pseudo-terminal I/Os on the Linux host.
Networking with Native Simulator was possible with TUN/TAP interface. So development experience in terms of IoT applications was similar to QEMU.
The need for offloaded sockets
Issues with TUN/TAP
Networking with QEMU and/or Native Simulator requires root privileges on the host computer in order to create the TUN/TAP network interface. It routes the traffic between Zephyr and the internet. This is a bit of an inconvenience for hackers that have Zephyr SDK installed directly on their Linux workstation. Setting up proper privileges in Docker is possible as well, when such a container is used for development purposes. But what about networking in CI pipelines with GitHub Actions or GitLab CI? The only option to get that working are self-hosted runners.
Use of TUN/TAP interface allows us to test almost the entire Zephyr networking stack, down to the Ethernet layer. However there is no platform-specific driver that talks to an Ethernet phy. Instead, there is a driver that sends Ethernet frames to a virtual TUN/TAP interface that requires setup on the host (e.g. Linux) system. This has advantages like higher code coverage when testing IoT applications.
Unfortunately, there are many disadvantages as well. Setting up TUN/TAP interface requires running as a privileged user on the host system. This might not be an issue on personal PC or laptop. However, root access inside Docker might not always be possible. This is especially true when using existing infrastructure, like GitHub Codespaces, GitHub-hosted runners in GitHub Actions, or hosted GitLab Runners in GitLab.
Offloaded sockets as an alternative
Zephyr has quite a unique feature called socket offloading. This is a mechanism that allows us to utilize (offload to) an external networking stack. Such a stack can be implemented as a 3rd-party library with proprietary drivers that come with a modem. Alternatively, we could use this with an external modem, commonly used with AT commands. In both cases, the contract between the Zephyr application and the offloaded networking stack is socket-level API. One example platform that uses socket offloading is the Nordic nRF9160.
Native Simulator is just a Linux executable. There are no special permissions required to access internet when writing regular Linux programs in C.
What if BSD the compatible sockets API (socket()
, connect()
, recv()
, send()
, …) could be exposed to Zephyr when running under Native Simulator? This should be possible with a bunch of trampolines between Zephyr world and Linux world.
Native Simulator Offloaded Sockets
Implementation of socket offloading for Native Simulator was part of a recent hackday project I worked on at Golioth. At the end of day, UDP communication was working, without any setup. This confirmed the idea about networking in Zephyr without root privileges. The next step in the following months was contributing the work to Zephyr with many followup improvements, so that the community can use it.
Development speed
Why should Native Simulator be used for IoT firmware development instead of real hardware? Flashing firmware on a device, connecting to the internet, and then executing application takes a considerable amount of time. This is where Native Simulator with offloaded sockets shines.
Flashing is not part of the testing process when using Native Simulator. Connecting to the internet (e.g. using WiFi or Cellular) is not needed, since the host machine is connected all the time. And lastly, executing application code is much faster on the beefy host machine compared to a very constrained microcontroller.
This is just theory, so let’s look at some timing measurements for those not convinced yet. We’ll use http_get
with TLS with minimal modifications required to get connected to a WiFi Access Point. Modified code is available at https://github.com/mniestroj/zephyr/tree/native-sim-http-get-benchmark.
In this example we’ll use nRF52840DK with ESP32 running ESP-AT firmware. This is what the “flash + execute” process looks like:
This is how much time it took for each platform to run http_get
sample (once it was already built):
- Native Simulator: 0.42 s
- nRF52840-DK: 16.80 s (flash 10.90 s, run 5.90 s)
Wouldn’t you like to go 40 times faster in your development?
Next steps
Many improvements to Native Simulator Offloaded Sockets were contributed to Zephyr upstream last month. Those will be part of upcoming Zephyr 3.7.0 (planned for release on 2024/07/26). When the Golioth Firmware SDK includes those changes, it will be much faster to develop and test IoT applications.