Using the Hugging Face Inference API for Device Audio Analysis

Golioth Pipelines works with Hugging Face, as shown in our recent AI launch. This post will highlight how to use an audio classification model on Hugging Face that accepts data recorded on a microcontroller-based device, sent over a secure network connection to Golioth, and routed through Pipelines.

While most commonly known as the place where models and data sets are uploaded and shared, Hugging Face also provides a compute service in the form of its free serverless inference API and production-ready dedicated inference endpoints. Unlike other platforms that offer only proprietary models, Hugging Face allows access to over 150,000 open source models via its inference APIs. Additionally, private models can be hosted on Hugging Face, which is a common use case for Golioth users that have trained models on data collected from their device fleets.

Audio Analysis with Pipelines

Because the Hugging Face inference APIs use HTTP, they are easy to target with the webhook transformer. The structure of the request body will depend on the model being invoked, but for models that operate on media files, such as audio or video, the payload is typically raw binary data.

In the following pipeline, we target the serverless inference API with an audio sample streamed from a device. In this scenario, we want to perform sentiment analysis of the audio, then pass the results onto Golioth’s timeseries database, LightDB Stream, so that changes in sentiment can be observed over time. An alternative destination, or multiple destinations, could easily be added.

Click here to use this pipeline in your project on Golioth.

filter:
  path: "/audio"
steps:
  - name: emotion-recognition
    transformer:
      type: webhook
      version: v1
      parameters:
        url: https://api-inference.huggingface.co/models/superb/hubert-large-superb-er
        headers:
          Authorization: $HUGGING_FACE_TOKEN
  - name: embed
    transformer:
      type: embed-in-json
      version: v1
      parameters:
        key: text
  - name: send-lightdb-stream
    destination:
      type: lightdb-stream
      version: v1

Note that though Hugging Face’s serverless inference API is free to use, it is rate-limited and subject to high latency and intermittent failures due to cold starts. For production use-cases, dedicated inference endpoints are recommended.

We can pick any supported model on Hugging Face for our audio analysis task. As shown in the URL, the Hubert-Large for Emotion Recognition model is targeted, and the audio content delivered on path /audio is delivered directly to Hugging Face. An example for how to upload audio to Golioth using an ESP32 can be found here.

Results from the emotion recognition inference look as follows.

[
  {
    "score": 0.6310836672782898,
    "label": "neu"
  },
  {
    "score": 0.2573806643486023,
    "label": "sad"
  },
  {
    "score": 0.09393830597400665,
    "label": "hap"
  },
  {
    "score": 0.017597444355487823,
    "label": "ang"
  }
]

Expanding Capabilities

Countless models are uploaded to Hugging Face on a daily basis, and the inference API integration with Golioth Pipelines makes it simple to incorporate the latest new functionality into any connected device product. Let us know what models you are using on the Golioth Forum!

Dan Mangum
Dan Mangum
Dan is an experienced engineering leader, having built products and teams at both large companies and small startups. He has a history of leadership in open source communities, and has worked across many layers of the technical stack, giving him unique insight into the constraints faced by Golioth’s customers and the requirements of a platform that enables their success.

Post Comments

No comments yet! Start the discussion at forum.golioth.io

More from this author

Related posts

spot_img

Latest posts

Free Zephyr Training: February 26th, 2025

Golioth is hosting a free Zephyr training on February 26th, 2025. Join us for this three-hour crash course that will get you up to speed with the basics of building IoT applications using Zephyr RTOS.

Golioth Firmware SDK v0.17.0

Yesterday, we released v0.17.0 of the Golioth Firmware SDK. This release introduces support for the Golioth Location service in the SDK. We've also improved...

Using Zephyr SMP with Multiple MCUs

It's easy to see that Golioth makes firmware updates for internet-connected devices a snap. But combine Golioth Cohorts, our improved OTA event log, and interconnectivity tools like SMP, and you end up with a powerful OTA scheme for all of the controllers in a complex system.

Want to stay up to date with the latest news?

We would love to hear from you! Please fill in your details and we will stay in touch. It's that simple!