Text-to-Video in Python

Posted in category Tutorials on
1092 Words ~6 Minute Reading Time • Subscribe to receive updates on Tutorials
Eric David Smith
Software Engineer / Musician / Entrepreneur
Text-to-Video in Python by Eric David Smith

Tutorial: Text-to-Video using Python

Video content is king. However, producing engaging video content can be time-consuming or require expensive software. Luckily, with Python and a few libraries, we can automate the creation of dynamic text-to-video content. This post will guide you through creating a Python script that transforms a text file into a video with a changing background color.


Before we start, make sure to install the following Python libraries:

You can install these libraries with pip:

pip install pillow gTTS moviepy pydub colorsys

The Python Script

Create a file called text_to_video.py or something you want and copy the following code into it:

from PIL import ImageFont, ImageDraw, Image
from gtts import gTTS
from moviepy.editor import ImageSequenceClip, AudioFileClip
import argparse
from pydub import AudioSegment
import colorsys
import numpy as np
import os

# Variables for customization
TEXT_SPEED = 24  # frames per second
TEXT_COLOR = (255, 255, 255)
FONT_PATH = "DMSerifDisplay-Regular.ttf" # Path to .ttf font file (change this to your font file)
BACKGROUND_SPEED = 0.8  # Background color change speed (lower value means slower)
TIMING_ADJUSTMENT = -0.3  # Adjusts the duration of each word in the video
START_BG_COLOR = "#000000"  # Start color in HEX
END_BG_COLOR = "#6638f0"  # End color in HEX

# Function to convert HEX color to RGB
def hex_to_rgb(hex_color):
    hex_color = hex_color.lstrip("#")
    return tuple(int(hex_color[i : i + 2], 16) for i in (0, 2, 4))

# interpolate color
def interpolate_color(start_color, end_color, progress):
    start_color = hex_to_rgb(start_color)
    end_color = hex_to_rgb(end_color)

    start_h, start_s, start_v = colorsys.rgb_to_hsv(
        start_color[0] / 255, start_color[1] / 255, start_color[2] / 255
    end_h, end_s, end_v = colorsys.rgb_to_hsv(
        end_color[0] / 255, end_color[1] / 255, end_color[2] / 255

    interpolated_h = start_h + (end_h - start_h) * progress
    interpolated_s = start_s + (end_s - start_s) * progress
    interpolated_v = start_v + (end_v - start_v) * progress

    r, g, b = colorsys.hsv_to_rgb(interpolated_h, interpolated_s, interpolated_v)

    return int(r * 255), int(g * 255), int(b * 255)

def text_to_video(textfile, outputfile):
    with open(textfile, "r") as f:
        lines = f.read()

    words = lines.split()
    images = []
    durations = []

    fnt = ImageFont.truetype(FONT_PATH, FONT_SIZE)

    # Generate speech for the whole text and save as a temporary file
    tts = gTTS(text=lines, lang="en")

    # Measure the speech duration using pydub
    full_audio = AudioSegment.from_file("temp.mp3")
    full_audio_duration = len(full_audio) / 1000  # duration in seconds
    avg_word_duration = full_audio_duration / len(words)  # average duration per word
    # Inside your text_to_video function, when setting frame duration:
        avg_word_duration + TIMING_ADJUSTMENT
    )  # Adjust frame duration based on average word duration and timing adjustment

    for i, word in enumerate(words):
        # Calculate text size and position only once per word
        text_width, text_height = fnt.getsize(word)
        position = ((VIDEO_SIZE[0] - text_width) / 2, (VIDEO_SIZE[1] - text_height) / 2)

        # Calculate background color based on word index and total number of words
        background_progress = i / len(words)
        background_color = interpolate_color(
            START_BG_COLOR, END_BG_COLOR, background_progress

        img = Image.new(
            "RGB", VIDEO_SIZE, color=background_color
        )  # Set background color
        d = ImageDraw.Draw(img)
        d.text(position, word, font=fnt, fill=TEXT_COLOR)

        )  # Set frame duration based on average word duration

    audioclip = AudioFileClip("temp.mp3")
    clip = ImageSequenceClip(images, durations=durations)
    clip = clip.set_audio(audioclip)

    clip.fps = TEXT_SPEED
    clip.write_videofile(outputfile, codec="libx264")

    # Remove the temporary file

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Convert text file to video")
    parser.add_argument("textfile", help="The name of the text file to convert")
    parser.add_argument("outputfile", help="The name of the output mp4 file")
        help="Set this flag if you want a short video",
    args = parser.parse_args()

    VIDEO_SHORT = args.format_short
    VIDEO_SIZE = (1080, 1920) if VIDEO_SHORT else (1920, 1080)  # width, height

    text_to_video(args.textfile, args.outputfile)

How to Run

To execute the script, you need to have a text file ready with the content you want to transform into video. This script will read the text file, create a video where each word appears in sync with a spoken version of the text (using Google's Text-to-Speech service), and save it as an MP4 file.

To run the script, use the following command in your terminal:

python script_name.py input_text.txt output_video_name.mp4

To create Youtube Shorts, pass a flag like:

python script_name.py input_text.txt output_video_name_short.mp4 --format-short

Note: You need to have ffmpeg installed on your system to run this script. If you don't have it, you can install it with brew install ffmpeg on Mac or sudo apt install ffmpeg on Linux.

also running the script without the --format-short flag will create a 1080x1920 video. You can adjust the video dimensions by changing the VIDEO_SIZE variable in the script.


There are several variables at the top of the script that you can adjust to customize the output:

Feel free to experiment with these values to create a video that suits your needs.

Example Videos

Here are a few example videos created with this script:

1080p example
Short Vertical example

Text File Used

Create a text file with the content you want to transform into video. Name it input_text.txt or something you want.

Here is the text file I used to create the example videos:

Technology is a useful servant but a dangerous master. This is a quote by Christian Lous Lange. Thank you for watching. Have a great day!

As you can see, the possibilities are endless. You can create engaging and dynamic videos from simple text files using this Python script.

Happy coding!

Supporting My Work

Please consider Buying Me A Coffee. I work hard to bring you my best content and any support would be greatly appreciated. Thank you for your support!


Eric David Smith
Software Engineer / Musician / Entrepreneur

Related Blog Posts

Scroll →

Blog Post Tags