Member-only story

Build a ChatGPT based Voice Assistant

5 min readNov 3, 2024

This comes from seeing FOLOTOY on Twitter sharing a voice assistant built using a large language model, complete with the open-source framework used. I wanted to try out libraries and code related to large language models, so I followed the instructions and set up a demo application.

Following the introduction by the Twitter user, I also used the relevant framework to build a local voice assistant. However, since I don’t have a GPU and the smaller parameter language models didn’t perform well during testing, I used OpenAI’s API for the LLM part.

Overall Introduction

This voice assistant primarily uses the following frameworks and services:

snowboy: Used for voice detection, recording, and also supports Voice Activity Detection (VAD).
faster-whisper: Used for speech-to-text conversion, utilizing an implementation of OpenAI’s Whisper model which is significantly faster than the official version.
SpeechRecognition: Used for recording. After snowboy detects the wake word, this library records the subsequent conversation and passes it to Whisper for speech-to-text conversion.
EmotiVoice: Converts text to speech. After querying GPT with the user’s conversation through an API, it generates and plays the response as audio.

Build a ChatGPT based Voice Assistant

Overall Introduction

Written by ohdarling

No responses yet