Zephyr’s Native Simulator with Offloaded Sockets

There are 512 supported boards (according to find -name board.yml | wc -l) already in the Zephyr tree. Most of them are real hardware platforms and the remaining ones are virtual. Why would you bother with a virtual platform? Zephyr can probably build for the SoC or development board of your choice, right? In this post, I’m going to talk about the reasons you want to try out Native Simulator.

Spoiler: Your Zephyr applications development time will drop through the floor.

Zephyr support for virtual platforms

Zephyr comes with support for various virtual platforms, but two of them are most widely used:

  • QEMU
  • Native Simulator

Both are extensively used in Zephyr Continuous Integration pipelines as well as during development by Zephyr users.

QEMU

QEMU is a generic machine emulator. It emulates CPUs by interpreting architecture-specific instructions as well as some peripherals like UART, flash, and networking adapters. Its main advantage is that binary (compiled code) running on QEMU is very similar to the binary that runs on a real hardware. All the low-level instructions, memory-mapped peripheral access, constrained RAM, thread context switching, thread stack sizes, interrupt handling, step-debugging with GDB, and many others mechanisms behave almost the same as on a real microcontroller.

Networking with QEMU can be achieved by setting up a TUN/TAP interface on a Linux host system. Once set up, you attach to the emulated network adapter that is handled by Zephyr drivers. The application is built with Zephyr and has access to the same network as the host machine (like a Linux laptop). After correctly configuring the TUN/TAP interface it is possible to access internet without additional hardware.

Native Simulator

Native Simulator is a POSIX architecture based “board” (Zephyr target) that runs as a standalone Linux executable. It is based on native_simulator and Zephyr POSIX architecture. As opposed to QEMU, it does not need any middle layer that emulates instructions or peripheral access. Instead, Zephyr (under Native Simulator) runs natively on Linux with very little overhead. Most of the time, it’s as fast as any regular Linux application.

However, Native Simulator does not emulate microcontroller peripherals the same way as QEMU does. It has special modules and functions called trampolines. As an example, instead of using memory mapped I/O to handle UART drivers (and logging and shell modules that utilize UART backend) there are trampolines to translate UART access APIs to pseudo-terminal I/Os on the Linux host.

Networking with Native Simulator was possible with TUN/TAP interface. So development experience in terms of IoT applications was similar to QEMU.

The need for offloaded sockets

Issues with TUN/TAP

Networking with QEMU and/or Native Simulator requires root privileges on the host computer in order to create the TUN/TAP network interface. It routes the traffic between Zephyr and the internet. This is a bit of an inconvenience for hackers that have Zephyr SDK installed directly on their Linux workstation. Setting up proper privileges in Docker is possible as well, when such a container is used for development purposes. But what about networking in CI pipelines with GitHub Actions or GitLab CI? The only option to get that working are self-hosted runners.

Use of TUN/TAP interface allows us to test almost the entire Zephyr networking stack, down to the Ethernet layer. However there is no platform-specific driver that talks to an Ethernet phy. Instead, there is a driver that sends Ethernet frames to a virtual TUN/TAP interface that requires setup on the host (e.g. Linux) system. This has advantages like higher code coverage when testing IoT applications.

Unfortunately, there are many disadvantages as well. Setting up TUN/TAP interface requires running as a privileged user on the host system. This might not be an issue on personal PC or laptop. However, root access inside Docker might not always be possible. This is especially true when using existing infrastructure, like GitHub Codespaces, GitHub-hosted runners in GitHub Actions, or hosted GitLab Runners in GitLab.

Offloaded sockets as an alternative

Zephyr has quite a unique feature called socket offloading. This is a mechanism that allows us to utilize (offload to) an external networking stack. Such a stack can be implemented as a 3rd-party library with proprietary drivers that come with a modem. Alternatively, we could use this with an external modem, commonly used with AT commands. In both cases, the contract between the Zephyr application and the offloaded networking stack is socket-level API. One example platform that uses socket offloading is the Nordic nRF9160.

Native Simulator is just a Linux executable. There are no special permissions required to access internet when writing regular Linux programs in C.

What if BSD the compatible sockets API (socket(), connect(), recv(), send(), …) could be exposed to Zephyr when running under Native Simulator? This should be possible with a bunch of trampolines between Zephyr world and Linux world.

Native Simulator Offloaded Sockets

Implementation of socket offloading for Native Simulator was part of a recent hackday project I worked on at Golioth. At the end of day, UDP communication was working, without any setup. This confirmed the idea about networking in Zephyr without root privileges. The next step in the following months was contributing the work to Zephyr with many followup improvements, so that the community can use it.

Development speed

Why should Native Simulator be used for IoT firmware development instead of real hardware? Flashing firmware on a device, connecting to the internet, and then executing application takes a considerable amount of time. This is where Native Simulator with offloaded sockets shines.

Flashing is not part of the testing process when using Native Simulator. Connecting to the internet (e.g. using WiFi or Cellular) is not needed, since the host machine is connected all the time. And lastly, executing application code is much faster on the beefy host machine compared to a very constrained microcontroller.

This is just theory, so let’s look at some timing measurements for those not convinced yet. We’ll use http_getwith TLS with minimal modifications required to get connected to a WiFi Access Point. Modified code is available at https://github.com/mniestroj/zephyr/tree/native-sim-http-get-benchmark.

In this example we’ll use nRF52840DK with ESP32 running ESP-AT firmware. This is what the “flash + execute” process looks like:

Zephyr's http_get on native_sim vs nrf52840dk

This is how much time it took for each platform to run http_get sample (once it was already built):

  • Native Simulator: 0.42 s
  • nRF52840-DK: 16.80 s (flash 10.90 s, run 5.90 s)

Wouldn’t you like to go 40 times faster in your development?

Next steps

Many improvements to Native Simulator Offloaded Sockets were contributed to Zephyr upstream last month. Those will be part of upcoming Zephyr 3.7.0 (planned for release on 2024/07/26). When the Golioth Firmware SDK includes those changes, it will be much faster to develop and test IoT applications.

Marcin Niestrój
Marcin Niestrój
Marcin is a firmware developer on the Golioth SDK, which is based on the Zephyr SDK. He has worked in the embedded space for 10 years, 4 of those on Zephyr. Past upstream contributions have focused on the networking stack. He has an extensive background combining hardware, firmware, and the cloud.

Post Comments

No comments yet! Start the discussion at forum.golioth.io

More from this author

Related posts

spot_img

Latest posts

Using the ESP32-C3 as an AT modem on the Aludel Elixir

We're preparing to do some testing around power consumption of different services on the Aludel Elixir, our open source hardware with the Nordic nRF9160...

West Commands Every Zephyr User Should Know

Zephyr's west meta tool can perform a vast number of useful operations. Here's a collection of both command and uncomon commands that we find ourselves reaching for when working on Zephyr-based IoT projects.

How we use Allure Report to understand Continuous Integration Tests

Allure Report is an open source tool to better understand testing outcomes. Golioth runs over 500 Hardware in the Loop tests for each pull request. Here's how we use Allure Report to make sense of it all.

Want to stay up to date with the latest news?

We would love to hear from you! Please fill in your details and we will stay in touch. It's that simple!