Building a Multilingual Text-to-Speech Tool with Python and gTTS

Text-to-speech (TTS) technology has become increasingly important in our digital world, serving various purposes from accessibility improvements to content consumption on the go. Today, I'll walk you through a practical Python command-line tool that converts text files into speech in multiple languages using Google's Text-to-Speech (gTTS) service.

Key Features

  • Support for 15 languages including English, Spanish, French, German, and more
  • Simple command-line interface
  • Handles UTF-8 encoded text files
  • Generates high-quality MP3 audio files
  • Easy installation and setup

Technical Implementation

The tool is built with Python and uses several key components:

  1. gTTS Library: The core functionality relies on Google's Text-to-Speech service through the gTTS Python package, which handles the text-to-speech conversion and supports multiple languages.
  2. Command Line Interface: Built using Python's argparse module, providing a user-friendly interface with options for:
    • Input text file path (-i or --input-file)
    • Output MP3 file path (-o or --output-file)
    • Language selection (-l or --language)
    • Language list display (--list-languages)

Here's the core text-to-speech conversion function:

def text_to_speech(text: str, lang: str, output_path: str) -> str:
    """
    Convert text to speech using Google Text-to-Speech.

    Args:
        text: The text to convert to speech
        lang: Language code (e.g., 'es' for Spanish, 'en' for English)
        output_path: Full path for the output MP3 file

    Returns:
        Path to the generated audio file
    """
    try:
        # Verify language is supported
        if lang not in SUPPORTED_LANGUAGES:
            raise ValueError(
                f"Language code '{lang}' is not supported. Supported languages: {', '.join(f'{k} ({v})' for k, v in SUPPORTED_LANGUAGES.items())}")

        # Create output directory if it doesn't exist
        output_dir = os.path.dirname(output_path)
        if output_dir:
            Path(output_dir).mkdir(parents=True, exist_ok=True)

        # Generate speech using specified language
        tts = gTTS(text=text, lang=lang, slow=False)

        # Save the audio file
        tts.save(output_path)
        print(f"Successfully generated {SUPPORTED_LANGUAGES[lang]} audio file: {output_path}")
        return output_path

    except Exception as e:
        print(f"Error generating audio: {str(e)}")
        return None

The supported languages are defined in a dictionary:

SUPPORTED_LANGUAGES = {
    'en': 'English',
    'es': 'Spanish',
    'fr': 'French',
    'de': 'German',
    'it': 'Italian',
    'pt': 'Portuguese',
    'pl': 'Polish',
    'ru': 'Russian',
    'nl': 'Dutch',
    'cs': 'Czech',
    'ja': 'Japanese',
    'ko': 'Korean',
    'zh': 'Chinese',
    'ar': 'Arabic',
    'hi': 'Hindi'
}

The command-line interface is implemented using argparse:

def main():
    parser = argparse.ArgumentParser(description='Convert text file to speech in multiple languages')
    parser.add_argument('--input-file', '-i', required=True,
                        help='Path to input text file (.txt)')
    parser.add_argument('--output-file', '-o', required=True,
                        help='Path to output audio file (.mp3)')
    parser.add_argument('--language', '-l', default='es',
                        help='Language code (e.g., es, en, fr). Use --list-languages to see all options')
    parser.add_argument('--list-languages', action='store_true',
                        help='List all supported languages and their codes')

Setup and Usage

Getting started with the tool is straightforward:

  1. Create and activate a Python virtual environment:

    python3.12 -m venv .venv
    source .venv/bin/activate
  2. Install dependencies:

    pip install -r requirements.txt
  3. Run the tool:

    # Convert English text to speech
    python text_to_speech.py -i data_examples/ibiza_en.txt -o output/ibiza_en.mp3 -l en
    
    # Convert Spanish text to speech
    python text_to_speech.py -i data_examples/ibiza_es.txt -o output/ibiza_es.mp3 -l es
    
    # List all supported languages
    python text_to_speech.py --list-languages

Example input text (ibiza_en.txt):

Ibiza is a beautiful Mediterranean island, part of Spain's Balearic Islands, famous for its vibrant nightlife, stunning beaches, and crystal-clear waters. The island is home to some of the world's most famous clubs, attracting top DJs and party-goers from around the globe.

Example Use Cases

The tool is particularly useful for:

  1. Language Learning: Create audio versions of study materials in different languages
  2. Content Creation: Convert written content into audio format for podcasts or audio books
  3. Accessibility: Make text content accessible to visually impaired users
  4. Travel Guides: Generate audio guides in multiple languages

Technical Details

The implementation uses several Python best practices:

  • Type Hints: Functions include type annotations for better code clarity
  • Modular Design: Functionality is split into focused functions
  • Clear Documentation: Each function includes detailed docstrings
  • Robust Error Handling: Comprehensive exception handling throughout
  • Path Management: Uses pathlib for cross-platform compatibility

Here's an example of the file reading function with error handling:

def read_text_file(file_path: str) -> str:
    """Read text from a file."""
    if not file_path.endswith('.txt'):
        raise ValueError("Input file must have .txt extension")

    try:
        with open(file_path, 'r', encoding='utf-8') as file:
            return file.read().strip()
    except Exception as e:
        print(f"Error reading file: {str(e)}")
        return None

Conclusion

This text-to-speech tool demonstrates how powerful speech synthesis capabilities can be implemented with relatively simple Python code. Whether you're looking to create audio content in multiple languages or need a reliable way to convert text to speech, this tool provides a solid foundation.

The complete source code and documentation are available on GitHub: https://github.com/tomdwor/text-to-speech

Comments

Popular posts from this blog

Schematy rozwiązywania równań różniczkowych [Polish]

Vibrating string equation (without damping)

PyCharm - useful shortcuts