Zephyr threads, workers, and message queues

Zephyr Threads, Work Queues, Message Queues and how we use them

Zephyr is a Real Time Operating System (RTOS). That means it’s built to let you do multiple things at the same time using a single (or limited number) of processing cores. To be fair, you’re not doing things at the same time; the RTOS shares processing time across all of the tasks, with a priority system for the more important ones.

The trick with an RTOS is to design your applications so that they utilize the real-time-ness and you don’t miss reacting to real-time events like receiving data from a network, taking sensor readings, or reacting to user input.

In this post, we’ll discuss the difference between Zephyr threads and work queues and show you why you might want to use one versus the other. We’ll also discuss how to use message queues to pass around data between running processes.

New to Golioth? Sign up for our newsletter to keep learning more about IoT development or create your free Golioth account to start building now.

Zephyr Threads

A thread is a set of commands to be executed by the CPU. The while(1) loop that runs inside of the main function of your application is a thread. But Zephyr allows you to create multiple threads. Each thread operates independently, with Zephyr’s built-in scheduler deciding when to run each thread.

For example, the Golioth Blue Demo run animations on the LEDs to indicate what mode the device was in. Instead of trying to get the loop in the main thread to update the lights on a tight schedule, we use a thread to run the animations. The main thread of the program deals with flow of the things happening on the device, and it can suspend/resume the animation thread as necessary.

I like to think of threads as tasks that need keep running, things that don’t return like a discrete function would. Your thread should have a while(1) and can call other functions just like you would in your main loop. It is up to you to yield processor control back to the scheduler so that it may run other threads. This is pretty easy, just call any of the k_sleep() functions or k_yield().

/* Every thread needs its own stack in RAM */
#define ANIMATE_PING_STACK	1024

/* Define and initialize the new thread */
K_THREAD_DEFINE(animate_ping_tid, ANIMATE_PING_STACK,
            animate_ping_thread, NULL, NULL, NULL,
            K_LOWEST_APPLICATION_THREAD_PRIO, 0, 0);

/* All animations handled by this thread */
extern void animate_ping_thread(void *d0, void *d1, void *d2) {
    uint8_t spinner_idx = 0;
    while(1) {
        led_state &= ~LEDS_LOGO;
        led_state |= leds_logo_order[spinner_idx];
        refresh_leds(led_state);
        if (++spinner_idx > 7) spinner_idx = 0;
        /* Control is yielded back to the scheduler during */
        /* sleep between animation frames                  */
        k_sleep(K_MSEC(100));
    }
}

int main(void) {
    /* Start animation thread running */
    k_thread_start(animate_ping_tid);

    while(1) {
        /* normal program flow */
    }
}

The example shown here runs an animation when an IoT device is trying to connect to a network. When it begins running, the LEDs will update every 100 milliseconds. Because the loop sleeps in between frames, control is given back to the scheduler or other tasks. Elsewhere in the program, we call k_thread_suspend(animate_ping_tid) and k_thread_resume(animate_ping_tid) to stop and start the animation.

Just be careful that you don’t have multiple threads trying to access the same resource at the same time (protect those resources with a mutex).

Zephyr Work Queues

A Zephyr workqueue is a thread with extra features bolted on. It also requires less setup and management than threads.

The work queue has a built-in buffer to store pending work (functions you want to run). Each item you add will be executed in the order you submitted it, like a “to-do list” for your processor. This is a perfect place for something that you want to run once and return from, but don’t want to do it inside of an interrupt service routine.

The “work” you submit to a work queue is a function. So you’re telling the work queue “run this function, then run this other function, then run this other…” you get the idea. Just set it and forget it–the Zephyr scheduler will run take care of popping work out of the queue and running it until the queue is empty. Many of the sensor readings we do in our demos are handled through work queues.

The work queue expects a specific type of function usually referred to as a “work handler”. You don’t actually give the handler any information, you just tell your work queue you want to add your handler to the list of pending work.

/* Work handler function */
void button_action_work_handler(struct k_work *work) {
    do_something_because_a_button_was_pressed();
}

/* Register the work handler */
K_WORK_DEFINE(button_action_work, button_action_work_handler);

/* Add a work item to the system workqueue from a button interrupt function */
void button_pressed(const struct device *dev, struct gpio_callback *cb,
            uint32_t pins)
{
    k_work_submit(&button_action_work);
}

This code snippet demonstrates using a work queue when a button is pressed. You want to spend as little time as possible in the button interrupt function. All this handler does is queue up a button action function that will be run by the scheduler, sometime after the interrupt service routine ends.

The example above uses the system workqueue. You also have the option of declaring your own workqueues which will need its own stack (remember… it’s a thread with extra features). If your application is freezing up when using the system workqueue, check to see if you’ve run our of stack space and set CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE accordingly.

The challenge here is how to get data to the worker since we can’t pass any parameters. How do you know which button was pressed? One approach is to place your work item inside of a struct along with the data you want to pass (button number, etc.). Then in the handler you can use CONTAINER_OF() to access the data in the structure. This is beyond the scope of today’s post, so let’s talk about a bit simpler approach we often take: using a message queue.

Zephyr Message Queues

A Zephyr Message Queue is a collection of fixed-sized data that is safe to access from multiple threads. You decide what the data is (a variable, a struct, a pointer, etc.) and how many of those objects the queue can contain (limited by your available RAM). Zephyr handles all of the logic necessary for adding to and removing from the queue.

#define BUTTON_ACTION_ARRAY_SIZE	16
#define BUTTON_ACTION_LEN		4
K_MSGQ_DEFINE(button_action_msgq,
        BUTTON_ACTION_LEN,
        BUTTON_ACTION_ARRAY_SIZE,
        4);

void button_pressed(const struct device *dev, struct gpio_callback *cb,
            uint32_t pins)
{
    /* add the current button states to the message queue */
    k_msgq_put(&button_action_msgq, &pins, K_NO_WAIT);

    /* This is a good place to queue a worker to process the buttons */
}

/* Work handler to process actions from button presses */
void button_action_work_handler(struct k_work *work) {
    /* Process everything in the message queue */
    while (k_msgq_num_used_get(&button_action_msgq)) {
        uint32_t button_states;
        k_msgq_get(&button_action_msgq, &button_states, K_NO_WAIT);

        /* Run some function based on which button was pressed */
        if (pins & BUTTON_0) {
            do_something_because_a_button_was_pressed();
        }
        /* Give the schedule a chance to run other tasks */
        k_yield();
    }
}

Continuing with our button analogy, the example above uses a message queue to record the button states for processing after the interrupt service routine returns. I’ve highlighted the lines that are crucial to the message queue system. Notice that the work handler uses a while loop to check if the message queue is empty. This is a nice pattern for processing all of the queued message. I’ve used k_yield() between loops to give the scheduler a chance to run other tasks.

Message queues are a spectacular tool for IoT applications. Golioth has used Message Queues to cache GPS readings while the cell modem is turned off on our Orange Demo. Periodically, the modem will turn on, send all of the readings from the queue to the Golioth servers, then turn off the radio once again to save power.

Threads, Workers, and Queues

We often focus on the “ecosystem” aspect of Zephyr, because we like that so many silicon vendors contribute to the code base. But it’s important to remember that Zephyr is a Real Time Operating System and has a fully featured scheduler. We’ve only just scraped the surface of what you can do with it. Hopefully the above explanations helped you to begin thinking in these patterns, understanding how to divide up the application work, and how to pass data between different parts of your code.

Talk with an Expert

Implementing an IoT project takes a team of people, and we want to help out as part of your team. If you want to troubleshoot a current problem or talk through a new project idea, we're here for you.

Start the discussion at forum.golioth.io