Building A Smart Speaker Outside The Corporate Cloud Lewin Day

If you’re not worried about corporate surveillance bots scraping your shopping list and manipulating you through marketing, you can buy any number of off-the-shelf smart speakers for your home. Alternatively, you can roll your own like [arpy8] did, and keep your life a little more private.

The build is based around an ESP32 microcontroller. It connects to the ‘net via its inbuilt Wi-Fi connection, and listens out for your voice with an INMP441 omnidirectional microphone module. The audio data is trucked off to a backend server running a Whisper speech-to-text model. The text is then passed to Google’s Gemini 2.5 Flash large language model. The response generated is passed to the Piper Neural Voice text-to-speech engine, sent back to the ESP32, and spat out via the device’s DAC output and a speaker attached to an LM386 amplifier. Basically, anything you could ask Gemini, you can do with this device.

By virtue of using a commercial large language model, it’s not perfectly private by any means. Still, it’s at least a little farther removed than using a smart speaker that’s directly logged in to your Amazon/Google/Hulu/Beanstikk account. Files are on Github for those eager to dive into the code. We’ve seen some other fun builds along these lines before, too. Video after the break.

This articles is written by : Fady Askharoun Samy Askharoun

Why Amznusa?

AMZNUSA is a dynamic website that focuses on three primary categories: Technology, e-commerce and cryptocurrency news. It provides users with the latest updates and insights into online retail trends and the rapidly evolving world of digital currencies, helping visitors stay informed about both markets.

CategoryBlog Blog Posts