Using the Hugging Face Inference API for Device Audio Analysis

Golioth Pipelines works with Hugging Face, as shown in our recent AI launch. This post will highlight how to use an audio classification model on Hugging Face that accepts data recorded on a microcontroller-based device, sent over a secure network connection to Golioth, and routed through Pipelines.

While most commonly known as the place where models and data sets are uploaded and shared, Hugging Face also provides a compute service in the form of its free serverless inference API and production-ready dedicated inference endpoints. Unlike other platforms that offer only proprietary models, Hugging Face allows access to over 150,000 open source models via its inference APIs. Additionally, private models can be hosted on Hugging Face, which is a common use case for Golioth users that have trained models on data collected from their device fleets.

Audio Analysis with Pipelines

Because the Hugging Face inference APIs use HTTP, they are easy to target with the webhook transformer. The structure of the request body will depend on the model being invoked, but for models that operate on media files, such as audio or video, the payload is typically raw binary data.

In the following pipeline, we target the serverless inference API with an audio sample streamed from a device. In this scenario, we want to perform sentiment analysis of the audio, then pass the results onto Golioth’s timeseries database, LightDB Stream, so that changes in sentiment can be observed over time. An alternative destination, or multiple destinations, could easily be added.

Click here to use this pipeline in your project on Golioth.

filter:
  path: "/audio"
steps:
  - name: emotion-recognition
    transformer:
      type: webhook
      version: v1
      parameters:
        url: https://api-inference.huggingface.co/models/superb/hubert-large-superb-er
        headers:
          Authorization: $HUGGING_FACE_TOKEN
  - name: embed
    transformer:
      type: embed-in-json
      version: v1
      parameters:
        key: text
  - name: send-lightdb-stream
    destination:
      type: lightdb-stream
      version: v1

Note that though Hugging Face’s serverless inference API is free to use, it is rate-limited and subject to high latency and intermittent failures due to cold starts. For production use-cases, dedicated inference endpoints are recommended.

We can pick any supported model on Hugging Face for our audio analysis task. As shown in the URL, the Hubert-Large for Emotion Recognition model is targeted, and the audio content delivered on path /audio is delivered directly to Hugging Face. An example for how to upload audio to Golioth using an ESP32 can be found here.

Results from the emotion recognition inference look as follows.

[
  {
    "score": 0.6310836672782898,
    "label": "neu"
  },
  {
    "score": 0.2573806643486023,
    "label": "sad"
  },
  {
    "score": 0.09393830597400665,
    "label": "hap"
  },
  {
    "score": 0.017597444355487823,
    "label": "ang"
  }
]

Expanding Capabilities

Countless models are uploaded to Hugging Face on a daily basis, and the inference API integration with Golioth Pipelines makes it simple to incorporate the latest new functionality into any connected device product. Let us know what models you are using on the Golioth Forum!

Dan Mangum
Dan Mangum
Dan is an experienced engineering leader, having built products and teams at both large companies and small startups. He has a history of leadership in open source communities, and has worked across many layers of the technical stack, giving him unique insight into the constraints faced by Golioth’s customers and the requirements of a platform that enables their success.

Post Comments

No comments yet! Start the discussion at forum.golioth.io

More from this author

Related posts

spot_img

Latest posts

Detecting Text From Images Over A Cellular Link

This post and video demonstrate taking a photo with a low power camera and cellular-based microcontroller and transmitting to the cloud for processing. Rick Jen from the Microsoft Azure team shows how to accept those images and work with the Azure AI OCR service to extract useful text and store it in a database.

Managing OTA Updates for Multiple IoT Hardware Variations

Get the right OTA update to the correct device every time! Goilioth's firmware update system lets you compile with a custom package name for each different hardware variant, and target the rollout to these devices using Cohorts.

Custom Board, Custom Zephyr Devicetree (AirTag Session 3)

Sign up for the AirTag Clone webinar series to watch Golioth build out board definition files for the Orleon board, a Bluetooth sensor platform/playground. The upcoming session on May 23rd will walk through setting up all required build files.

Want to stay up to date with the latest news?

Subscribe to our newsletter and get updates every 2 weeks. Follow the latest blogs and industry trends.