Set Up Voice Control
In this section we will configure your Satellite1 so it can control smart devices in your home intuitively. By the end of this section you should be able to say:
- "Hey Jarvis, are any doors unlocked in the home?"
- "Hey Jarvis, lock those doors please."
- "Hey Jarvis, what is the difference between a black hole and white hole?"
- "Hey Jarvis, close the garage door and turn on the TV then tell me a joke."
What is a Voice Pipeline?
Think about your interaction with any voice assistant:
- Wake Word - You say a special phrase, like "Hey, Jarvis!"
- Speech-to-Text - Your voice command is recorded and converted to a text transcription.
- Conversation Agent - The text transcription is processed by a rules-based engine (or perhaps an LLM) which executes your command and generates a text response.
- Text-to-Speech - The text response is converted into a synthetic voice response that is played back through the speaker.
That's a voice pipeline. It's the backbone of any voice assistant. Each step in a voice pipeline can be modified and customized to fit your needs. What wake word do you want? What language are you speaking? Do you want a standard voice response or to hear Arnold Schwarzenegger speak back to you? Do you want Home Assistant, Google, or OpenAI to process and execute your command? Follow the steps below to set up a voice pipeline for your Satellite1.
Create a Voice Pipeline:
-
Name your pipeline. Select your preferred Conversation Agent, Speech-to-Text, and Text-to-Speech engine.
>
Standard Conversation Agents
There are two standard voice pipelines we recommend trying out to get your feet wet:
Home Assistant's Cloud Assist Pipeline (Requires paid Home Assistant Cloud account, response times are fast!)
Home Assistant's Local Assist Pipeline (Free and completely private, response times depend on your hardware)
Local AI Conversation Agents
The FutureProofHomes team is working on a Local AI Base Station that "just works". Therefore, these docs will avoid sending you down a deep local AI rabbit hole. :) Thanks for your patience & stay tuned!
Once you have one of the standard pipelines operational, you can upgrade to a Generative AI conversation agent. These agents allow the voice assistant to respond to more natural, conversational commands like, “It’s dark in here,” to turn on the lights, instead of requiring specific phrases like, “Turn on the living room lights.” Implementing completely local Generative AI LLMs, however, is not for the faint of heart. Your results may vary depending on factors such as the model you choose (e.g., Llama, Qwen, etc.), the number of entities exposed, how those entities and their aliases are named, the GPU’s capabilities (more VRAM is better), and many other variables. If you’re ready to take on this challenge, here are some tutorial videos to help you get started. Good luck!
Ollama AI Powered Conversation Agent (Free, requires a GPU, and can be hard to set up with simi-reliable function calling.)
Cloud AI Conversation Agents
Warning: You are entrusting a cloud-based artificial intelligence that does not protect your privacy with control over your home. Proceed with caution!
Google AI Conversation Agent (Free, but will collect your data.)
OpenAI ChatGPT Conversation Agent (Expensive, and not open at all, despite the marketing name, and will collect your data.)
NOTE: The following prompt has perfomed well with both OpenAI and Google's conversation agents.
Your name is Jarvis and you are a voice assistant for Home Assistant.
Answer questions about the world truthfully.
Answer in plain text without markdown language.
Keep responses simple and to the point.
Always use 12hr time formats.
Combine Conversation Agents
Combine a standard conversation agent with an AI conversation agent (It's like magic!)
Requires Home Assistant version 2024.12.1 or later
A "fall back" pipeline will first use a non-AI conversation agent to process your request, and if that fails it will fall back to your preferred AI conversation agent. Combining these two conversation agents results in an almost magical voice experience and is highly recommended.
Simply toggle on the "Prefer handling commands locally" switch underneath your Generative AI conversation agent:
[Read more about this feature release](https://www.home-assistant.io/blog/2024/12/04/release-202412/#let-your-voice-assistant-fall-back-to-an-llm-based-agent){ .md-button .md-button--primary }
Assign a Voice Pipeline
-
Go to Settings -> Devices & Services -> ESPHome and click "1 device" under your Satellite1 device.
> -
You can also set your preferred wake word. (NOTE: If you want to build your own custom wake word, then read here)
>
Congratulations! You've created your own voice pipeline for your Satellite1.
Exposing Entities
Your home assistant likely has hundreds if not thousands of entities. If you want to use them in your voice assistant, then you need to expose them to your voice assistant. Here's how: