ArtBox
ArtBox is a tool set for handling multimedia files.
- Repository: mediatoolbox-org/artbox
- License: BSD-3-Clause
Features
- Generate project configurations from presentation files (PDF + PPTX)
- Render narrated slide decks to MP4 from a YAML project file
- Convert text to speech and speech to text
- Download YouTube videos and captions
- Process audio and video files from the CLI
Setup
ArtBox depends on system packages that may vary by platform. A conda or mamba environment is recommended:
$ mamba create --name artbox "python>=3.10,<3.14" "pygobject>=3.44.1,<3.49" pip
$ conda activate artbox
$ pip install artbox
Examples
For the following examples, create a temporary folder:
$ mkdir /tmp/artbox
Generate a project configuration
If you exported your presentation slides as PDF and speaker notes as PPTX, you can scaffold a project file automatically:
$ artbox init \
--source-pdf /tmp/artbox/presentation.pdf \
--notes-pptx /tmp/artbox/presentation.pptx \
--output /tmp/artbox/project.yaml
Then render the project:
$ artbox render --project /tmp/artbox/project.yaml
Convert text to audio
By default, artbox speech uses
edge-tts, but you can switch to
gtts with --engine gtts.
$ echo "Are you ready to join Link and Zelda in fighting off this unprecedented threat to Hyrule?" > /tmp/artbox/text.md
$ artbox speech from-text \
--title artbox \
--input-path /tmp/artbox/text.md \
--output-path /tmp/artbox/speech.mp3 \
--engine edge-tts
If you need a different language:
$ echo "Bom dia, mundo!" > /tmp/artbox/text.md
$ artbox speech from-text \
--title artbox \
--input-path /tmp/artbox/text.md \
--output-path /tmp/artbox/speech.mp3 \
--lang pt
For edge-tts, you can also specify locale, rate, volume, and pitch:
$ echo "Do you want some coffee?" > /tmp/artbox/text.md
$ artbox speech from-text \
--title artbox \
--input-path /tmp/artbox/text.md \
--output-path /tmp/artbox/speech.mp3 \
--engine edge-tts \
--lang en-IN \
--rate +10% \
--volume -10% \
--pitch -5Hz
Convert audio to text
ArtBox uses speechrecognition for speech-to-text (currently google engine):
$ artbox speech to-text \
--input-path /tmp/artbox/speech.mp3 \
--output-path /tmp/artbox/text-from-speech.md \
--lang en
Download a YouTube video
$ artbox youtube download \
--url https://www.youtube.com/watch?v=zw47_q9wbBE \
--output-path /tmp/artbox/
To request a specific resolution:
$ artbox youtube download \
--url https://www.youtube.com/watch?v=zw47_q9wbBE \
--output-path /tmp/artbox/ \
--resolution 360p
If you encounter bot detection, enable OAuth:
$ artbox youtube download \
--url https://www.youtube.com/watch?v=zw47_q9wbBE \
--output-path /tmp/artbox/ \
--use-oauth
Download YouTube captions
$ artbox youtube cc \
--url https://www.youtube.com/watch?v=zw47_q9wbBE \
--output-path /tmp/artbox/cc.txt \
--lang en \
--format text
Create a song based on notes
$ echo '["E", "D#", "E", "D#", "E", "B", "D", "C", "A"]' > /tmp/artbox/notes.txt
$ artbox sound notes-to-audio \
--input-path /tmp/artbox/notes.txt \
--output-path /tmp/artbox/music.mp3 \
--duration 2
Generate an audio spectrogram
$ artbox sound spectrogram \
--input-path /tmp/artbox/music.mp3 \
--output-path /tmp/artbox/spectrogram.png
Remove audio from a video
$ artbox video remove-audio \
--input-path "/tmp/artbox/sample.mp4" \
--output-path /tmp/artbox/video-without-audio.mp4
Extract audio from a video
$ artbox video extract-audio \
--input-path "/tmp/artbox/sample.mp4" \
--output-path /tmp/artbox/video-audio.mp3
Get metadata from a video
$ artbox video get-metadata \
--input-path "/tmp/artbox/sample.mp4" \
--output-path /tmp/artbox/video-metadata.json
Combine audio and video files
$ artbox video combine-video-and-audio \
--video-path /tmp/artbox/video-without-audio.mp4 \
--audio-path /tmp/artbox/video-audio.mp3 \
--output-path /tmp/artbox/video-combined.mp4
Additional dependencies
If you want to play audio from Python, you can install playsound:
$ pip wheel --use-pep517 "playsound (==1.3.0)"