Member-only story
This comes from seeing FOLOTOY on Twitter sharing a voice assistant built using a large language model, complete with the open-source framework used. I wanted to try out libraries and code related to large language models, so I followed the instructions and set up a demo application.
Following the introduction by the Twitter user, I also used the relevant framework to build a local voice assistant. However, since I don’t have a GPU and the smaller parameter language models didn’t perform well during testing, I used OpenAI’s API for the LLM part.
Overall Introduction
This voice assistant primarily uses the following frameworks and services:
- snowboy: Used for voice detection, recording, and also supports Voice Activity Detection (VAD).
- faster-whisper: Used for speech-to-text conversion, utilizing an implementation of OpenAI’s Whisper model which is significantly faster than the official version.
- SpeechRecognition: Used for recording. After snowboy detects the wake word, this library records the subsequent conversation and passes it to Whisper for speech-to-text conversion.
- EmotiVoice: Converts text to speech. After querying GPT with the user’s conversation through an API, it generates and plays the response as audio.