Building a Multilingual Text-to-Speech Tool with Python and gTTS
Text-to-speech (TTS) technology has become increasingly important in our digital world, serving various purposes from accessibility improvements to content consumption on the go. Today, I'll walk you through a practical Python command-line tool that converts text files into speech in multiple languages using Google's Text-to-Speech (gTTS) service.
Key Features
- Support for 15 languages including English, Spanish, French, German, and more
- Simple command-line interface
- Handles UTF-8 encoded text files
- Generates high-quality MP3 audio files
- Easy installation and setup
Technical Implementation
The tool is built with Python and uses several key components:
- gTTS Library: The core functionality relies on Google's Text-to-Speech service through the gTTS Python package, which handles the text-to-speech conversion and supports multiple languages.
- Command Line Interface: Built using Python's argparse module, providing a user-friendly interface with options for:
- Input text file path (-i or --input-file)
- Output MP3 file path (-o or --output-file)
- Language selection (-l or --language)
- Language list display (--list-languages)
Here's the core text-to-speech conversion function:
def text_to_speech(text: str, lang: str, output_path: str) -> str:
"""
Convert text to speech using Google Text-to-Speech.
Args:
text: The text to convert to speech
lang: Language code (e.g., 'es' for Spanish, 'en' for English)
output_path: Full path for the output MP3 file
Returns:
Path to the generated audio file
"""
try:
# Verify language is supported
if lang not in SUPPORTED_LANGUAGES:
raise ValueError(
f"Language code '{lang}' is not supported. Supported languages: {', '.join(f'{k} ({v})' for k, v in SUPPORTED_LANGUAGES.items())}")
# Create output directory if it doesn't exist
output_dir = os.path.dirname(output_path)
if output_dir:
Path(output_dir).mkdir(parents=True, exist_ok=True)
# Generate speech using specified language
tts = gTTS(text=text, lang=lang, slow=False)
# Save the audio file
tts.save(output_path)
print(f"Successfully generated {SUPPORTED_LANGUAGES[lang]} audio file: {output_path}")
return output_path
except Exception as e:
print(f"Error generating audio: {str(e)}")
return None
The supported languages are defined in a dictionary:
SUPPORTED_LANGUAGES = {
'en': 'English',
'es': 'Spanish',
'fr': 'French',
'de': 'German',
'it': 'Italian',
'pt': 'Portuguese',
'pl': 'Polish',
'ru': 'Russian',
'nl': 'Dutch',
'cs': 'Czech',
'ja': 'Japanese',
'ko': 'Korean',
'zh': 'Chinese',
'ar': 'Arabic',
'hi': 'Hindi'
}
The command-line interface is implemented using argparse:
def main():
parser = argparse.ArgumentParser(description='Convert text file to speech in multiple languages')
parser.add_argument('--input-file', '-i', required=True,
help='Path to input text file (.txt)')
parser.add_argument('--output-file', '-o', required=True,
help='Path to output audio file (.mp3)')
parser.add_argument('--language', '-l', default='es',
help='Language code (e.g., es, en, fr). Use --list-languages to see all options')
parser.add_argument('--list-languages', action='store_true',
help='List all supported languages and their codes')
Setup and Usage
Getting started with the tool is straightforward:
-
Create and activate a Python virtual environment:
python3.12 -m venv .venv source .venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Run the tool:
# Convert English text to speech python text_to_speech.py -i data_examples/ibiza_en.txt -o output/ibiza_en.mp3 -l en # Convert Spanish text to speech python text_to_speech.py -i data_examples/ibiza_es.txt -o output/ibiza_es.mp3 -l es # List all supported languages python text_to_speech.py --list-languages
Example input text (ibiza_en.txt):
Ibiza is a beautiful Mediterranean island, part of Spain's Balearic Islands, famous for its vibrant nightlife, stunning beaches, and crystal-clear waters. The island is home to some of the world's most famous clubs, attracting top DJs and party-goers from around the globe.
Example Use Cases
The tool is particularly useful for:
- Language Learning: Create audio versions of study materials in different languages
- Content Creation: Convert written content into audio format for podcasts or audio books
- Accessibility: Make text content accessible to visually impaired users
- Travel Guides: Generate audio guides in multiple languages
Technical Details
The implementation uses several Python best practices:
- Type Hints: Functions include type annotations for better code clarity
- Modular Design: Functionality is split into focused functions
- Clear Documentation: Each function includes detailed docstrings
- Robust Error Handling: Comprehensive exception handling throughout
- Path Management: Uses pathlib for cross-platform compatibility
Here's an example of the file reading function with error handling:
def read_text_file(file_path: str) -> str:
"""Read text from a file."""
if not file_path.endswith('.txt'):
raise ValueError("Input file must have .txt extension")
try:
with open(file_path, 'r', encoding='utf-8') as file:
return file.read().strip()
except Exception as e:
print(f"Error reading file: {str(e)}")
return None
Conclusion
This text-to-speech tool demonstrates how powerful speech synthesis capabilities can be implemented with relatively simple Python code. Whether you're looking to create audio content in multiple languages or need a reliable way to convert text to speech, this tool provides a solid foundation.
The complete source code and documentation are available on GitHub: https://github.com/tomdwor/text-to-speech
Comments
Post a Comment