DeepL, known for text translation, now wants to translate your voice

Learn to build a basic voice translation application using DeepL's API and Python. This beginner-friendly tutorial teaches you how to capture voice input, translate it in real-time, and speak the results aloud.

Introduction

In this tutorial, you'll learn how to create a basic voice translation application using DeepL's API. This is perfect for beginners who want to understand how AI-powered translation works in real-time applications. We'll build a simple Python program that can translate spoken words from one language to another using DeepL's voice translation capabilities.

Prerequisites

Before starting this tutorial, you'll need:

A computer with Python 3.6 or higher installed
An internet connection
A DeepL API key (you can get a free one from DeepL's website)
Basic understanding of how to use a command line interface
Microphone access on your computer

Step-by-Step Instructions

Step 1: Set Up Your Python Environment

First, we need to create a new Python project folder and install the required libraries. Open your command line interface and run these commands:

mkdir deepL_voice_translator
 cd deepL_voice_translator
 python -m venv venv
 source venv/bin/activate   # On Windows use: venv\Scripts\activate

Why this step? Creating a virtual environment keeps our project dependencies isolated from other Python projects on your computer, preventing conflicts between different library versions.

Step 2: Install Required Libraries

Now we'll install the libraries we need for our voice translation application:

pip install deepl
pip install pyaudio
pip install speechrecognition
pip install pyttsx3

Why this step? Each library serves a specific purpose: DeepL for translation, PyAudio and SpeechRecognition for capturing voice input, and pyttsx3 for speaking the translated text aloud.

Step 3: Get Your DeepL API Key

Visit DeepL's developer page and sign up for a free account. After signing up, you'll receive an API key. Copy this key and save it in a secure location.

Why this step? The API key authenticates your application with DeepL's servers, allowing you to use their translation services. Without it, you won't be able to access the translation functionality.

Step 4: Create the Main Translation Script

Create a new file called translator.py and add this basic structure:

import speech_recognition as sr
import deepl
import pyttsx3

# Initialize the speech recognizer
recognizer = sr.Recognizer()

# Initialize text-to-speech engine
engine = pyttsx3.init()

# Set your DeepL API key here
DEEPL_API_KEY = 'YOUR_DEEPL_API_KEY_HERE'

Why this step? This sets up the basic structure of our application and imports all the necessary libraries we'll use for voice recognition, translation, and text-to-speech.

Step 5: Add Voice Input Functionality

Add this function to your translator.py file:

def listen_for_speech():
    with sr.Microphone() as source:
        print("Listening... Speak now!")
        recognizer.adjust_for_ambient_noise(source)
        audio = recognizer.listen(source)
    
    try:
        text = recognizer.recognize_google(audio)
        print(f"You said: {text}")
        return text
    except sr.UnknownValueError:
        print("Sorry, I couldn't understand what you said.")
        return None
    except sr.RequestError:
        print("Sorry, there was an error with the speech recognition service.")
        return None

Why this step? This function captures speech from your microphone, converts it to text, and handles common errors that might occur during voice recognition.

Step 6: Add Translation Functionality

Add this function to handle the translation:

def translate_text(text, target_language='DE'):
    try:
        # Initialize DeepL translator
        translator = deepl.Translator(DEEPL_API_KEY)
        
        # Perform translation
        result = translator.translate_text(text, target_lang=target_language)
        
        print(f"Translated text: {result.text}")
        return result.text
    except Exception as e:
        print(f"Translation error: {e}")
        return None

Why this step? This function connects to DeepL's API and translates the recognized text into your desired language. The 'DE' target language means German - you can change this to any supported language code.

Step 7: Add Text-to-Speech Output

Add this function to speak the translated text:

def speak_text(text):
    if text:
        engine.say(text)
        engine.runAndWait()
        print(f"Spoken: {text}")

Why this step? This function uses your computer's text-to-speech capabilities to vocalize the translated text, completing the translation loop from voice input to voice output.

Step 8: Create the Main Program Loop

Add the main execution logic to your script:

def main():
    print("DeepL Voice Translator Started")
    print("Press Ctrl+C to exit")
    
    while True:
        try:
            # Listen for speech
            text = listen_for_speech()
            
            if text:
                # Translate the text
                translated_text = translate_text(text, 'DE')  # Translate to German
                
                if translated_text:
                    # Speak the translation
                    speak_text(translated_text)
        except KeyboardInterrupt:
            print("\nGoodbye!")
            break

if __name__ == "__main__":
    main()

Why this step? This creates the main loop of our application, continuously listening for speech, translating it, and speaking the result. The KeyboardInterrupt exception allows users to exit gracefully.

Step 9: Test Your Application

Before running, replace YOUR_DEEPL_API_KEY_HERE with your actual DeepL API key. Then run:

python translator.py

Why this step? Testing your application ensures all components work together correctly and helps identify any issues before using it in real situations.

Step 10: Customize Your Translation Settings

Experiment with different language codes in the translate_text function:

EN - English
FR - French
ES - Spanish
IT - Italian
PT - Portuguese

Why this step? Different language codes allow you to translate between various languages, making your application more versatile for different use cases.

Summary

Congratulations! You've built a basic voice translation application using DeepL's technology. This simple program demonstrates how AI translation can work in real-time applications, similar to what DeepL is developing for meeting tools like Zoom and Microsoft Teams. While this example is basic, it shows the fundamental concepts behind voice translation technology. You can expand this application by adding features like language selection menus, saving translation history, or integrating with video conferencing platforms.