Detecting Text From Images Over A Cellular Link

Golioth enables embedded devices to access the power of the Cloud. There is an ever-increasing number of processing paths for the data generated by IoT devices, including those that need to operate with super low power. Today we’re going to show how an IoT device can take a photo, send it to the Cloud, then forward it to an AI OCR Service. We’re using a low power cellular-connected nRF9160-DK and an Arducam. The Golioth cloud collects the image from the device, then passes it to Microsoft Azure to handle the AI OCR work.

Setting up an embedded camera and cellular modem

The setup involves a nRF9160-DK, a cellular-based development board from Nordic and one you see often on our blog. Wait, this might actually look very familiar…because Mike has made a blog post about this hardware before! I used the example-upload-image repository and connected to an Arducam camera, just like Mike did. It’s great because it has onboard RAM and talks over SPI, which makes it well suited for embedded devices. We can trigger an image capture, pull the data down to the nRF9160, and transmit it over a cellular connection. This data goes through a CoAP gateway (you don’t really have to think about that part!) up to the Golioth Cloud.

Once the image hits the Golioth Cloud, it goes through a pipeline. This pipeline is what lets us send data to Azure. Specifically, the binary image data (octet stream) is sent to Azure Blob Storage. Rick set up a container named “golioth”, and takes over from there.

Detecting text with Rick Jen from Microsoft

Rick Jen is an Azure Principal Technical Specialist at Microsoft supporting a wide range of medical use cases for the Azure cloud. He approached us about being able to transmit interesting data from a real world device up into the Azure ecosystem. We see interesting applications happening every day on the Golioth cloud, many of which have yet to take advantage of the wide range of AI tools that are built for larger web applications.

After the image is passed through pipelines, it heads out towards the Azure cloud. The magic happens when the image finishes uploading to Azure Blob Storage. This generates an event that triggers a serverless Function App. The Function App then grabs the image from blob storage and sends it via HTTP POST to the Azure AI Vision service OCR REST API endpoint. This operation can be synchronous for smaller images (like ours) or asynchronous for larger ones that might take longer to process.

The results come back in a JSON format. For text recognition, the Azure AI Vision service extracts the text from the image. But it doesn’t just give you the text; it also provides the bounding box, which tells you the position of the text in the image, and a confidence score for how sure the AI is about the extraction. This resulting data is then inserted into Cosmos DB for storage.

How it went and what’s next

We demonstrated this by taking pictures of pages from a book. Seeing the extracted text, bounding boxes, and confidence scores appear in the database in near real-time was pretty cool. We noted that even with an unoptimized embedded camera and less-than-perfect lighting, the system worked well, but you could definitely improve results with better lighting, lensing, or image optimization.

Beyond just scanning book pages, this approach has tons of potential. Rick mentioned how Azure AI Vision is used in healthcare, like analyzing MRI images to detect anomalies that might be hard for human vision alone. Another cool use case is using the Document Intelligence service (a sister service to AI Vision) for processing handwritten forms, like prescriptions in clinics that haven’t gone fully digital yet. By defining the form layout, the service can extract specific information from handwritten documents and put it into a database, greatly reducing friction.

On the Golioth side, we’ve seen interesting applications like static asset tracking, where you can use images over time to detect if anything has moved or changed. We’ll continue to push what’s possible for power and data efficient devices and discuss how AI Orchestration can allow users to tune the right amount of AI for a given application. If you would like to discuss your application, reach out to [email protected] or drop a post in our forum.

Detecting Text From Images Over A Cellular Link

Setting up an embedded camera and cellular modem

Detecting text with Rick Jen from Microsoft

How it went and what’s next

Post Comments

No comments yet! Start the discussion at forum.golioth.io

More from this author

Related posts

Latest posts

Golioth Location General Availability

An Introduction to the Cyber Resilience Act (CRA) with Kate Stewart of The Zephyr Project

Enabling Bluetooth-to-Cloud on the Renesas DA14695

Want to stay up to date with the latest news?

Latest Posts

Golioth Location General Availability

An Introduction to the Cyber Resilience Act (CRA) with Kate Stewart of The Zephyr Project

Enabling Bluetooth-to-Cloud on the Renesas DA14695

Most Popular

Debugging Zephyr for Beginners: printk() and the Logging Subsystem

Program your microcontrollers from WSL2 with USB support

Our largest in-person training yet

Fast Access