IntermediateVideoCaptionerVideo SubtitlesAIPythonASRLLMTTSCLIGUIPyQt5WhisperFFmpegTranslationSpeech Recognition

VideoCaptioner: AI-Powered Video Subtitling Assistant

Complete setup guide for VideoCaptioner - an LLM-powered intelligent subtitle assistant for video. Includes CLI and GUI installation, ASR engines (Bijian, Jianying, Whisper), LLM-based subtitle optimization, translation services, TTS dubbing, and video synthesis with customizable subtitle styling.

Step 1
About VideoCaptioner
VideoCaptioner is an AI-powered video subtitling tool that handles the complete subtitle workflow: speech recognition (ASR), subtitle segmentation, LLM-based optimization, translation, and video synthesis. It supports multiple ASR engines including free cloud-based options (Bijian, Jianying) and local Whisper models. The tool features a modern PyQt5 GUI and a full-featured CLI for automation and scripting.
```
Key Features:
- Speech-to-text transcription via multiple ASR engines
- Semantic-based subtitle segmentation using LLM
- Context-aware translation with reflection mechanism
- Voice dubbing (TTS) with Edge, SiliconFlow, Gemini
- Video synthesis with customizable subtitle styling
- Free cloud services for ASR and translation (no API key needed)
- Cross-platform: Windows, macOS, Linux
```
Step 2
System Requirements
VideoCaptioner requires Python 3.10+ and FFmpeg for video processing. The free cloud-based features work with minimal requirements. Local Whisper models require additional disk space (1-3GB) and benefit from GPU acceleration. PyQt5 GUI works on all major platforms with pre-compiled binaries available for Windows.
```
# Check Python version (3.10 or higher required)
python --version
python3 --version

# Check FFmpeg installation (required for video processing)
ffmpeg -version
ffprobe -version

# If FFmpeg is not installed:
# Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg

# macOS (with Homebrew):
brew install ffmpeg

# Windows (with Chocolatey):
choco install ffmpeg

# Windows (with Scoop):
scoop install ffmpeg

# Verify installation
ffmpeg -version | head -1
ffprobe -version | head -1
```
⚠ Heads up: FFmpeg is a hard requirement. Without it, video processing features will fail. The 'doctor' command can diagnose missing dependencies: `videocaptioner doctor`
Step 3
Installation via pip (Recommended)
The easiest way to install VideoCaptioner is via pip. This installs both the CLI and GUI components. The package handles all Python dependencies automatically. No additional configuration is needed for free features (Bijian ASR, Bing/Google translation).
```
# Install VideoCaptioner (CLI + GUI)
pip install videocaptioner

# Verify installation
videocaptioner --version
videocaptioner --help

# Launch GUI (three equivalent ways)
videocaptioner
videocaptioner gui
videocaptioner-gui

# Run CLI help to see all commands
videocaptioner --help
```
Step 4
Installation via Windows Package
Windows users can download pre-built installers from GitHub Releases. This avoids the need to install Python separately and provides a traditional Windows installation experience with Start menu integration.
```
1. Visit: https://github.com/WEIFENG2333/VideoCaptioner/releases
2. Download the latest Windows installer (e.g., videocaptioner-{version}-windows-x64.exe)
3. Run the installer and follow the setup wizard
4. Launch from Start Menu or desktop shortcut
5. The application includes Python runtime and all dependencies
```
⚠ Heads up: Windows packages are signed but may trigger SmartScreen warnings. This is expected for unsigned or newly published installers. Download only from official GitHub releases.
Step 5
Installation via macOS Script
macOS users can use the one-line installation script which handles dependencies and configuration automatically. This is the recommended approach for macOS users who want a quick setup.
```
# One-line macOS installation
curl -fsSL https://raw.githubusercontent.com/WEIFENG2333/VideoCaptioner/master/scripts/run.sh | bash

# Alternative: Install via pip with platform-specific optimizations
pip install videocaptioner

# macOS security: Remove quarantine attribute if app is blocked
xattr -d com.apple.quarantine $(which videocaptioner-gui) 2>/dev/null || true

# Launch the GUI
videocaptioner-gui
```
⚠ Heads up: macOS may block the application due to security settings. If the app doesn't launch, go to System Settings → Privacy & Security → click 'Open Anyway' next to the VideoCaptioner warning.

Step 6

Development Setup

For development, clone the repository and use uv (Python package manager) for dependency management. The project uses uv for fast, reproducible builds with proper virtual environment isolation.

# Clone the repository
git clone https://github.com/WEIFENG2333/VideoCaptioner.git
cd VideoCaptioner

# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Sync dependencies and create virtual environment
uv sync

# Run GUI
uv run videocaptioner

# Run CLI help
uv run videocaptioner --help

# Type checking
uv run pyright

# Run tests
uv run pytest tests/test_cli/ -q

# Lint with ruff
uv run ruff check videocaptioner

Step 7

Configuration Fundamentals

VideoCaptioner uses a layered configuration system with priority: CLI arguments > Environment variables > Config file > Defaults. The configuration file is located at ~/.config/videocaptioner/config.toml on Linux/macOS. Run 'config init' for interactive setup or 'doctor' to diagnose issues.

# Interactive configuration setup
videocaptioner config init

# Non-interactive setup with profile (for CI/automated environments)
videocaptioner config init --non-interactive --profile dubbing

# View current configuration
videocaptioner config show

# Get specific config value
videocaptioner config get llm.api_key

# Set configuration values
videocaptioner config set llm.api_key sk-your-api-key-here
videocaptioner config set llm.api_base https://api.openai.com/v1
videocaptioner config set llm.model gpt-4o-mini

# Find config file location
videocaptioner config path

# Diagnose environment and dependencies
videocaptioner doctor
videocaptioner doctor --json  # JSON output for scripting

⚠ Heads up: API keys should never be committed to version control. Use environment variables or secure config storage. The --json flag for doctor is useful for CI/CD diagnostics.

Step 8

Configuration File Format

The configuration file uses TOML format with sections for llm, transcribe, subtitle, translate, and dubbing. Each section contains provider-specific settings. Edit the config file directly or use the config CLI commands to modify settings.

# Config file: ~/.config/videocaptioner/config.toml

# LLM settings (for subtitle optimization and translation)
[llm]
api_key = "sk-your-api-key"
api_base = "https://api.openai.com/v1"
model = "gpt-4o-mini"

# Transcription settings
[transcribe]
asr = "bijian"  # Options: bijian, jianying, whisper-api, whisper-cpp, faster-whisper
language = "auto"  # Auto-detect or specify: zh, en, ja, etc.

# Subtitle processing
[subtitle]
optimize = true  # LLM-based text correction
split = true     # Semantic-based segmentation

# Translation settings
[translate]
service = "bing"  # Options: bing (free), google (free), llm (paid)

# Dubbing/TTS settings
[dubbing]
preset = "edge-cn-female"  # Edge TTS preset
api_key = ""  # For SiliconFlow/Gemini
voice = "xiaoxiao"
timing = "balanced"  # Options: strict, balanced, natural, none
audio_mode = "replace"  # Options: replace, mix, duck
tts_workers = 5  # Concurrent TTS workers

Step 9

Quick Start: Speech-to-Text Transcription

The free Bijian ASR engine (requires network, no API key) provides high-quality transcription for Chinese and English audio. Simply run the transcribe command with your video or audio file. The output is an SRT subtitle file by default.

# Transcribe video using free Bijian ASR (Chinese/English)
videocaptioner transcribe video.mp4 --asr bijian

# Transcribe with explicit language
videocaptioner transcribe video.mp4 --asr bijian --language zh
videocaptioner transcribe video.mp4 --asr bijian --language en

# Transcribe audio files directly
videocaptioner transcribe audio.mp3 --asr bijian

# Output to custom path
videocaptioner transcribe video.mp4 --asr bijian -o output_dir/

# Generate word-level timestamps (for advanced processing)
videocaptioner transcribe video.mp4 --asr bijian --word-timestamps

# Output in different formats
videocaptioner transcribe video.mp4 --asr bijian --format json
videocaptioner transcribe video.mp4 --asr bijian --format ass  # Advanced Subtitles

⚠ Heads up: Bijian and Jianying ASR engines only support Chinese and English. For other languages, use whisper-api (requires API key) or whisper-cpp (local model).

Step 10

Subtitle Translation (Free Services)

VideoCaptioner provides free translation via Bing and Google APIs (no API key required). The subtitle command can optimize existing subtitles, segment them semantically, and translate to target languages. Support for 100+ languages via BCP 47 codes.

# Translate subtitle to English using free Bing translation
videocaptioner subtitle input.srt --translator bing --target-language en

# Translate to Japanese
videocaptioner subtitle input.srt --translator bing --target-language ja

# Translate to Korean
videocaptioner subtitle input.srt --translator bing --target-language ko

# Translate to Spanish
videocaptioner subtitle input.srt --translator bing --target-language es

# Use Google translation (also free)
videocaptioner subtitle input.srt --translator google --target-language fr

# Create bilingual subtitles (Chinese source + English target)
videocaptioner subtitle input.srt --translator bing --target-language en --layout target-above

# Optimize without translation (LLM-based correction)
videocaptioner subtitle input.srt --no-translate --api-key $OPENAI_API_KEY

# Skip optimization, only translate
videocaptioner subtitle input.srt --translator bing --target-language en --no-optimize

# List of common language codes:
# zh-Hans (Simplified Chinese), zh-Hant (Traditional Chinese)
# en (English), ja (Japanese), ko (Korean)
# fr (French), de (German), es (Spanish)
# ru (Russian), ar (Arabic), pt (Portuguese)

Step 11

LLM-Based Subtitle Translation

For higher quality results, use LLM-based optimization to correct ASR errors, improve punctuation, and enhance readability. LLM translation offers reflection-based optimization for better context understanding. Requires OpenAI-compatible API key.

# Configure LLM API (OpenAI or compatible)
videocaptioner config set llm.api_key $OPENAI_API_KEY
videocaptioner config set llm.api_base https://api.openai.com/v1
videocaptioner config set llm.model gpt-4o-mini

# Or use environment variables
export OPENAI_API_KEY="sk-your-key"
export OPENAI_BASE_URL="https://api.openai.com/v1"
export OPENAI_MODEL="gpt-4o-mini"

# Optimize subtitles with LLM (correct ASR errors)
videocaptioner subtitle input.srt --api-key $OPENAI_API_KEY

# Translate with LLM (higher quality, context-aware)
videocaptioner subtitle input.srt --translator llm --target-language en --api-key $OPENAI_API_KEY

# Enable reflection-based translation (higher quality, slower)
videocaptioner subtitle input.srt --translator llm --target-language en --reflect --api-key $OPENAI_API_KEY

# Custom prompt for optimization
videocaptioner subtitle input.srt --prompt "Make subtitles more formal and professional" --api-key $OPENAI_API_KEY

# Alternative LLM providers (OpenAI-compatible):
# - SiliconCloud: https://cloud.siliconflow.cn
# - DeepSeek: https://platform.deepseek.com
# - Local: Ollama with OpenAI compatibility layer

⚠ Heads up: LLM-based features incur API costs. gpt-4o-mini is cost-effective. Reflection mode uses more tokens for better quality but is slower and more expensive.

Step 12

Video Synthesis: Burn Subtitles into Video

The synthesize command burns subtitles into video as hard-coded (permanently embedded) or soft subtitles (removable in player). VideoCaptioner supports two rendering modes: ASS (traditional with outline/shadow) and rounded (modern with rounded background). Custom subtitle styles are available via the style command.

# Burn subtitles as hard-coded (visible in video)
videocaptioner synthesize video.mp4 -s subtitle.srt --subtitle-mode hard

# Add removable soft subtitles (track embedded)
videocaptioner synthesize video.mp4 -s subtitle.srt --subtitle-mode soft

# High quality output
videocaptioner synthesize video.mp4 -s subtitle.srt --subtitle-mode hard --quality ultra

# Use preset style (anime style)
videocaptioner synthesize video.mp4 -s subtitle.srt --subtitle-mode hard --style anime

# Custom subtitle styling via JSON override
videocaptioner synthesize video.mp4 -s subtitle.srt --subtitle-mode hard \
  --style-override '{"outline_color": "#ff0000", "font_size": 48}'

# Rounded background mode (modern look)
videocaptioner synthesize video.mp4 -s subtitle.srt --subtitle-mode hard --render-mode rounded

# Custom rounded style with white text and semi-transparent background
videocaptioner synthesize video.mp4 -s subtitle.srt --subtitle-mode hard \
  --render-mode rounded \
  --style-override '{"text_color": "#ffffff", "bg_color": "#ff000099", "corner_radius": 12}'

# View all available styles
videocaptioner style

# Output to custom path
videocaptioner synthesize video.mp4 -s subtitle.srt --subtitle-mode hard -o output.mp4

Step 13

Voice Dubbing (TTS)

Generate dubbed audio or video from subtitles using various TTS services. Edge TTS is free and works without API key. SiliconFlow CosyVoice and Gemini TTS offer higher quality with voice cloning capabilities. Supports multi-speaker attribution and voice mapping.

# Generate audio using Edge TTS (free, no API key)
videocaptioner dub subtitle.srt --preset edge-cn-female -o dub.wav

# Chinese female voice (Edge TTS)
videocaptioner dub input.srt --preset edge-cn-female -o output.wav

# English friendly voice (Gemini, requires API key)
videocaptioner dub input.srt --preset gemini-en-friendly \
  --tts-api-key $VIDEOCAPTIONER_TTS_API_KEY -o output.wav

# SiliconFlow CosyVoice2
videocaptioner dub input.srt --preset siliconflow-cn-female \
  --tts-api-key $VIDEOCAPTIONER_TTS_API_KEY -o output.wav

# Multi-speaker with voice mapping
videocaptioner dub input.srt --video video.mp4 \
  --speaker-voice Alice=anna \
  --speaker-voice Bob=benjamin \
  -o video_dubbed.mp4

# Speaker syntax in subtitle file:
# [Alice] Hello, this is Alice speaking.
# Bob: This line uses another voice.

# Voice cloning with SiliconFlow
videocaptioner dub input.srt \
  --speaker-clone Alice=reference_audio.mp3|This is the reference text \
  --tts-api-key $VIDEOCAPTIONER_TTS_API_KEY \
  -o output.mp4

# Timing strategies
videocaptioner dub input.srt --preset edge-cn-female --timing strict -o output.wav  # Match subtitle timing
videocaptioner dub input.srt --preset edge-cn-female --timing natural -o output.wav  # Natural speech speed

# Audio output modes when embedding in video
videocaptioner dub input.srt --video video.mp4 --audio-mode replace -o output.mp4  # Replace original audio
videocaptioner dub input.srt --video video.mp4 --audio-mode mix -o output.mp4      # Mix with original
videocaptioner dub input.srt --video video.mp4 --audio-mode duck -o output.mp4      # Lower original as background

⚠ Heads up: Edge TTS requires network access and may have regional restrictions. Voice cloning with SiliconFlow requires API key and reference audio samples.

Step 14

Full Pipeline: End-to-End Processing

The process command automates the complete workflow: video → transcription → optimization → translation → dubbing → synthesis. This is the most powerful command for automated video localization. All intermediate files are generated in the output directory.

# Full pipeline with free services (ASR + Translation)
videocaptioner process video.mp4 --asr bijian --translator bing --target-language en

# Full pipeline including dubbing (generate dubbed video)
videocaptioner process video.mp4 --asr bijian --translator bing --target-language ja --dub-only

# Full pipeline with Edge TTS dubbing
videocaptioner process video.mp4 \
  --asr bijian \
  --translator bing \
  --target-language zh-Hans \
  --dub-only \
  --timing strict

# Chinese video → English dubbed video with Gemini TTS
videocaptioner process video.mp4 \
  --translator bing \
  --target-language en \
  --dub-only \
  --preset gemini-en-friendly \
  --tts-api-key $VIDEOCAPTIONER_TTS_API_KEY

# Full pipeline with LLM optimization
videocaptioner process video.mp4 \
  --asr whisper-api \
  --whisper-api-key $WHISPER_API_KEY \
  --translator llm \
  --target-language fr \
  --api-key $OPENAI_API_KEY \
  -v  # Verbose output

# Output to custom directory
videocaptioner process video.mp4 --asr bijian --translator bing --to en -o ./output/

# Quiet mode for scripting
videocaptioner process video.mp4 --asr bijian --translator bing --to en -q

Step 15

Download Online Videos

VideoCaptioner integrates yt-dlp for downloading videos from YouTube, Bilibili, and many other platforms. This is useful for processing online content directly without manual downloading.

# Download from YouTube
videocaptioner download "https://youtube.com/watch?v=xxx"

# Download to specific directory
videocaptioner download "https://youtube.com/watch?v=xxx" -o ./downloads/

# Download from Bilibili
videocaptioner download "https://bilibili.com/video/BVxxxxx"

# Then process the downloaded video
videocaptioner process downloaded_video.mp4 --asr bijian --translator bing --to en

Step 16

GUI Desktop Application

The PyQt5-based GUI provides a user-friendly interface for all VideoCaptioner features. It includes visual progress tracking, drag-and-drop file upload, and integrated settings management. The GUI automatically opens when running videocaptioner without arguments.

Launch Methods:
- videocaptioner (opens GUI by default)
- videocaptioner gui (explicit GUI launch)
- videocaptioner-gui (GUI-only command)

GUI Features:
- Drag-and-drop video files
- Visual progress bars for each processing stage
- Configurable ASR/translation/TTS settings
- Built-in subtitle editor
- Preview rendered subtitles
- Batch processing queue
- Project management

Settings Location in GUI:
- File: Settings → API Configuration
- Or use CLI: videocaptioner config init

Supported File Formats:
- Video: MP4, MKV, AVI, MOV, WEBM
- Audio: MP3, WAV, FLAC, M4A
- Subtitle: SRT, VTT, ASS, SUB

Step 17

Troubleshooting Common Issues

The doctor command diagnoses most issues. Common problems include missing FFmpeg, API key misconfiguration, and macOS security blocking. Check the application logs and use verbose mode for detailed diagnostics.

# Run diagnostics
videocaptioner doctor
videocaptioner doctor --json  # For scripting/CI

# Check if FFmpeg is properly installed
which ffmpeg
ffmpeg -version

# Test API connectivity (OpenAI)
curl -H "Authorization: Bearer $OPENAI_API_KEY" https://api.openai.com/v1/models

# View verbose logs
videocaptioner transcribe video.mp4 --asr bijian -v

# macOS security issues
xattr -d com.apple.quarantine /path/to/videocaptioner

# Port already in use (GUI)
videocaptioner config set server.port 8889

# Disk space for Whisper models
df -h  # Check available space

# Clear cache if models are corrupted
rm -rf ~/.cache/whisper

# Python version issues
python --version  # Must be 3.10-3.12
python3.11 --version  # Try specific version

# Virtual environment conflicts
which python
which pip
python -m pip list | grep videocaptioner

⚠ Heads up: Exit codes indicate error types: 0=success, 1=general error, 2=config error, 3=file not found, 4=missing dependency, 5=runtime error.

Step 18

Performance Optimization

Optimize processing speed by choosing appropriate ASR engines, enabling GPU acceleration for local Whisper models, and adjusting TTS worker concurrency. Use smaller LLM models for faster turnaround when quality requirements allow.

# ASR speed comparison (fastest to slowest):
# Bijian (cloud) - Fast, free, Chinese/English only
# Jianying (cloud) - Fast, free, Chinese/English only  
# Whisper API - Requires payment, all languages
# Faster-Whisper (local) - GPU accelerated, requires VRAM
# Whisper-CPP (local) - CPU friendly, slower

# GPU acceleration for local Whisper
# NVIDIA GPU users:
nvidia-smi  # Verify GPU detection
# Install CUDA-enabled pytorch for faster processing

# Adjust TTS concurrency for faster dubbing
videocaptioner config set dubbing.tts_workers 10  # Default is 5

# Use cost-effective LLM models
videocaptioner config set llm.model gpt-4o-mini  # Cheaper than gpt-4

# Batch processing (process multiple videos)
for video in *.mp4; do
  videocaptioner process "$video" --asr bijian --translator bing --to en -q
done

# Parallel processing with xargs
ls *.mp4 | xargs -P 4 -I {} videocaptioner process {} --asr bijian --translator bing --to en -q

Step 19

Environment Variables Reference

Environment variables provide secure configuration for API keys and override default settings. This is the recommended approach for CI/CD pipelines and automated workflows. Variables are checked before config file values.

# LLM Configuration
export OPENAI_API_KEY="sk-your-key"
export OPENAI_BASE_URL="https://api.openai.com/v1"
export OPENAI_MODEL="gpt-4o-mini"

# TTS/Dubbing Configuration  
export VIDEOCAPTIONER_DUB_PRESET="edge-cn-female"
export VIDEOCAPTIONER_TTS_API_KEY="your-tts-api-key"
export VIDEOCAPTIONER_TTS_API_BASE="https://api.example.com"
export VIDEOCAPTIONER_TTS_MODEL="cosyvoice2"
export VIDEOCAPTIONER_TTS_VOICE="xiaoxiao"
export VIDEOCAPTIONER_TTS_WORKERS=5
export VIDEOCAPTIONER_DUB_TIMING="balanced"
export VIDEOCAPTIONER_DUB_AUDIO_MODE="replace"
export VIDEOCAPTIONER_TTS_MAX_SPEED=1.5
export VIDEOCAPTIONER_TTS_REWRITE_TOO_LONG=true

# Example: Run with all env vars
export OPENAI_API_KEY="sk-xxx"
export VIDEOCAPTIONER_TTS_API_KEY="your-key"
videocaptioner process video.mp4 --asr bijian --translator bing --to en

Step 20

API Provider Options

VideoCaptioner supports multiple providers for each service stage. Cloud-based services require network access but no local compute. LLM services must be OpenAI API-compatible. TTS services vary in quality and pricing.

ASR (Speech Recognition) Providers:
- Bijian (必剪): FREE, cloud-based, Chinese/English, no API key
- Jianying (剪映): FREE, cloud-based, Chinese/English, no API key
- Whisper API: Paid (OpenAI), all languages, highest accuracy
- Faster-Whisper: Local models, offline, requires GPU for speed
- Whisper-CPP: Local models, CPU-friendly, slower

Translation Providers:
- Bing: FREE, good quality, all languages
- Google: FREE, good quality, all languages
- LLM: Paid (OpenAI-compatible), contextual, highest quality

TTS (Text-to-Speech) Providers:
- Edge TTS: FREE, Microsoft Azure, good quality voices
- SiliconFlow CosyVoice2: Paid, Chinese-focused, voice cloning
- Gemini TTS: Paid, Google, natural sounding
- Coqui TTS: Self-hosted, open-source alternatives

LLM Providers (for optimization/translation):
- OpenAI: gpt-4, gpt-4o-mini, gpt-3.5-turbo
- DeepSeek: deepseek-chat (cost-effective)
- SiliconCloud: Multiple model options
- Local: Ollama, LM Studio with OpenAI compatibility

Step 21

Project Structure and Resources

VideoCaptioner is built with Python using modern tooling. The project structure separates CLI, UI, and core processing modules. Resources include subtitle styles, translations, and assets bundled with the package.

Tech Stack:
- Core: Python 3.10+ with type hints
- CLI: Click-based command-line interface
- GUI: PyQt5 + PyQt-Fluent-Widgets
- Audio Processing: pydub (FFmpeg wrapper)
- Video Processing: FFmpeg/FFprobe
- TTS: edge-tts, SiliconFlow API, Gemini API
- LLM: openai SDK (100% compatible with OpenAI API)
- Download: yt-dlp (supports 1000+ sites)
- Config: TOML files with platformdirs
- Package Manager: uv (modern, fast)
- Build: hatchling with hatch-vcs
- Testing: pytest with markers for integration/LLM tests
- Linting: Ruff + Pyright

Project Structure:
videocaptioner/
  ├── cli/          # Command-line interface
  ├── ui/           # PyQt5 GUI
  ├── core/         # Processing logic
  ├── asr/          # Speech recognition
  ├── subtitle/     # Subtitle processing
  ├── translate/    # Translation services
  ├── dubbing/      # TTS and voice generation
  ├── resource/     # Styles, translations, assets
  └── tests/        # Unit and integration tests

Key Dependencies:
- requests: HTTP client
- openai: LLM integration
- yt-dlp: Video downloading
- pydub: Audio manipulation
- PyQt5: GUI framework
- platformdirs: Cross-platform paths
- tenacity: Retry logic
- pillow: Image processing
- fonttools: Font manipulation

Step 22

Links and Resources

Official documentation, GitHub repository, online documentation, and community resources are available for further reference and support.

Official Resources:
- GitHub: https://github.com/WEIFENG2333/VideoCaptioner
- Documentation: https://weifeng2333.github.io/VideoCaptioner/
- Online Demo: https://www.videocaptioner.cn
- Releases: https://github.com/WEIFENG2333/VideoCaptioner/releases

Related Technologies:
- Whisper (OpenAI): https://github.com/openai/whisper
- Faster-Whisper: https://github.com/guillaumekln/faster-whisper
- Edge TTS: Microsoft Azure TTS
- yt-dlp: https://github.com/yt-dlp/yt-dlp
- PyQt5: https://www.riverbankcomputing.com/software/pyqt
- FFmpeg: https://ffmpeg.org

Community:
- Issues: https://github.com/WEIFENG2333/VideoCaptioner/issues
- Chinese Community: QQ Group (see GitHub README)
- License: GPL-3.0