AI voice startup Vapi hits $500M valuation after winning Amazon Ring over 40 rivals

Learn to build a basic AI voice assistant that can understand spoken questions and respond with intelligent answers using Python and OpenAI's API.

Introduction

In this tutorial, you'll learn how to build a basic AI voice assistant that can handle customer support calls using Python and the OpenAI API. This is similar to what companies like Vapi are doing to automate their customer service operations. We'll create a simple voice interaction system that can understand spoken questions and respond with text-based answers, which can then be converted to speech.

Prerequisites

Basic Python knowledge
Python 3.7 or higher installed on your computer
An OpenAI API key (free to get at platform.openai.com)
Microphone and speakers or headphones for testing
Internet connection

Step-by-Step Instructions

1. Set up your Python environment

First, we need to create a virtual environment to keep our project organized and avoid conflicts with other Python packages.

python -m venv ai_voice_assistant
source ai_voice_assistant/bin/activate  # On Windows use: ai_voice_assistant\Scripts\activate

This creates an isolated environment for our project where we can install specific packages without affecting your system Python.

2. Install required packages

We'll need several Python libraries for voice processing and API communication:

pip install openai pyaudio speechrecognition pyttsx3

These packages provide: OpenAI API access, voice recognition, text-to-speech conversion, and audio input/output capabilities.

3. Create your main Python script

Create a file called voice_assistant.py and start with this basic structure:

import speech_recognition as sr
import openai
import pyttsx3
import os

# Initialize text-to-speech engine
engine = pyttsx3.init()

# Set up OpenAI API key
openai.api_key = os.getenv('OPENAI_API_KEY')

# Initialize speech recognizer
recognizer = sr.Recognizer()

This sets up the core components we'll use: speech recognition for understanding what you say, text-to-speech for responding, and OpenAI API for generating intelligent responses.

4. Create a function to listen for voice input

Add this function to handle voice input from your microphone:

def listen_for_input():
    with sr.Microphone() as source:
        print("Listening...")
        recognizer.adjust_for_ambient_noise(source)
        audio = recognizer.listen(source)
    
    try:
        text = recognizer.recognize_google(audio)
        print(f"You said: {text}")
        return text
    except sr.UnknownValueError:
        print("Sorry, I didn't understand that.")
        return None
    except sr.RequestError:
        print("Could not request results from Google Speech Recognition service.")
        return None

This function uses Google's speech recognition service to convert your spoken words into text. It's important to adjust for ambient noise so the system can better distinguish your voice from background sounds.

5. Create a function to get AI responses

Now we'll add the function that sends your questions to OpenAI and gets intelligent responses:

def get_ai_response(user_input):
    try:
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a helpful customer service assistant."},
                {"role": "user", "content": user_input}
            ],
            max_tokens=150,
            temperature=0.7
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        print(f"Error getting AI response: {e}")
        return "Sorry, I'm having trouble connecting to the service."

This function sends your question to OpenAI's GPT-3.5 model and returns a helpful response. The system prompt tells the AI to act as a customer service assistant, which is perfect for our use case.

6. Create a function to speak the AI response

Add this function to convert the AI's text response into spoken words:

def speak_response(response):
    print(f"AI says: {response}")
    engine.say(response)
    engine.runAndWait()

This uses the pyttsx3 library to convert text to speech, making our assistant interactive and voice-based.

7. Create the main interaction loop

Now we'll put everything together in a main loop:

def main():
    print("AI Voice Assistant initialized. Say 'quit' to exit.")
    
    while True:
        user_input = listen_for_input()
        
        if user_input:
            if 'quit' in user_input.lower():
                print("Goodbye!")
                break
            
            ai_response = get_ai_response(user_input)
            speak_response(ai_response)
        
        # Small delay to prevent rapid processing
        import time
        time.sleep(0.1)

if __name__ == "__main__":
    main()

This loop keeps the assistant running, listening for your input, getting responses from AI, and speaking them aloud until you say 'quit'.

8. Set up your API key

Create a file called .env in your project directory and add your OpenAI API key:

OPENAI_API_KEY=your_actual_api_key_here

Then modify your script to load this key:

from dotenv import load_dotenv
load_dotenv()

Don't forget to install python-dotenv: pip install python-dotenv

9. Test your voice assistant

Run your script with: python voice_assistant.py

When prompted, speak clearly into your microphone. Try asking questions like "What time is it?" or "How do I reset my password?" The assistant will respond with text and voice output.

Summary

In this tutorial, you've built a basic AI voice assistant that can listen to your voice, send questions to OpenAI's language model, and respond with both text and spoken answers. This demonstrates the core technology that companies like Vapi are using to automate customer service operations. While this is a simplified version, it shows how voice-based AI assistants work at a fundamental level, which is similar to what's happening in enterprise AI voice solutions.

Key concepts covered include: voice input recognition, API integration with OpenAI, text-to-speech conversion, and creating interactive voice applications. This foundation can be expanded with features like better error handling, more sophisticated conversation flow, and integration with actual customer support systems.