IntermediateKrillinAIVideo TranslationAIGoWhisperLLMTTSDockerOpenAIVoice CloningFFmpeg

KrillinAI: Video Translation and Dubbing Setup

Complete setup guide for KrillinAI - an AI-powered video translation and dubbing tool that supports 100 languages. Includes installation for Windows/Linux/macOS, configuration of Whisper speech recognition, LLM translation, voice cloning with TTS, and Docker deployment options.

Step 1
System Prerequisites
KrillinAI is a Go-based video translation and dubbing tool that integrates speech recognition, LLM translation, and voice synthesis. It supports desktop and server deployment modes across all major platforms. The tool automatically handles dependency installation, but you'll need API access for speech recognition and translation services.
```
# Check system compatibility
uname -a  # Linux/macOS
systeminfo  # Windows

# Verify internet connection (required for API calls and model downloads)
ping -c 3 api.openai.com

# Check available disk space (models can be 1-3GB)
df -h  # Linux/macOS
wmic logicaldisk get size,freespace,caption  # Windows
```
⚠ Heads up: KrillinAI requires stable internet for API calls to speech recognition and LLM services. Local Whisper models (FasterWhisper, WhisperCpp) will download automatically on first use and require 1-3GB storage per model.
Step 2
Choose Your Deployment Mode
KrillinAI offers two deployment modes: Desktop (GUI application with built-in browser interface) and Server (lightweight web service accessed via browser). Desktop mode is recommended for most users as it provides a sleek interface with no additional configuration. Server mode is ideal for headless deployments, Docker containers, or remote access scenarios.
```
Desktop Mode:
- Pre-packaged with web UI
- One-click launch
- Automatic port management
- Ideal for: Local development, content creators

Server Mode:
- Minimal footprint
- Manual config.toml setup required
- Browser access at http://127.0.0.1:8888
- Ideal for: Production deployments, Docker, remote servers
```

Step 3

Download KrillinAI Executable

Download the pre-compiled executable for your operating system from the GitHub releases page. KrillinAI provides native builds for Windows (x64 and x86), Linux (x64 and ARM64), and macOS (Intel and Apple Silicon). Choose the desktop version (filename contains 'desktop') for GUI mode, or the standard version for server mode.

# Visit the releases page
https://github.com/krillinai/KrillinAI/releases

# Download for your platform:
# - Windows Desktop: krillinai-desktop-windows-amd64.exe
# - Linux Desktop: krillinai-desktop-linux-amd64
# - macOS Desktop (Intel): krillinai-desktop-darwin-amd64
# - macOS Desktop (M-series): krillinai-desktop-darwin-arm64
# - Server versions: Same names without "desktop"

# Linux/macOS: Make executable
chmod +x krillinai-desktop-linux-amd64

# Optional: Move to system PATH
sudo mv krillinai-desktop-linux-amd64 /usr/local/bin/krillinai
krillinai --version

Step 4

macOS Security Configuration

macOS blocks unsigned executables by default. You must manually trust the KrillinAI binary before first launch. This is a standard security measure for applications distributed outside the Mac App Store.

# Remove quarantine attribute from the downloaded executable
xattr -d com.apple.quarantine krillinai-desktop-darwin-arm64

# Alternative: Trust via System Settings
# 1. Try to open the app (will be blocked)
# 2. System Settings → Privacy & Security
# 3. Scroll to "Security" section
# 4. Click "Open Anyway" next to the KrillinAI message

# Verify executable is trusted
spctl -a -v krillinai-desktop-darwin-arm64

⚠ Heads up: This step is ONLY required on macOS. Windows and Linux users can skip directly to launching the application.

Step 5
Launch Desktop Version (Recommended)
For desktop mode, simply double-click the executable. KrillinAI will automatically start a local web server and open the interface in your default browser. No configuration file is required for basic usage with cloud-based services.
```
# Windows: Double-click the .exe file
# Or from Command Prompt:
krillinai-desktop-windows-amd64.exe

# Linux:
./krillinai-desktop-linux-amd64

# macOS:
./krillinai-desktop-darwin-arm64

# The web interface will automatically open at:
# http://127.0.0.1:8888
```
Step 6
Server Version Setup (Optional)
For server deployments, create a configuration directory and populate config.toml from the example template. The configuration file defines API credentials, model choices, and service endpoints. Server mode requires manual configuration before first launch.
```
# Create configuration directory
mkdir -p config
cd config

# Download example configuration
curl -O https://raw.githubusercontent.com/krillinai/KrillinAI/master/config-example.toml

# Rename and edit
mv config-example.toml config.toml
nano config.toml  # or vim, code, etc.
```

Step 7

Configure Speech Recognition (Transcribe)

KrillinAI supports multiple speech recognition engines: cloud-based OpenAI Whisper (all platforms), local FasterWhisper (Windows/Linux), WhisperKit (macOS M-series only), WhisperCpp (all platforms), and Alibaba Cloud ASR. Cloud providers require API keys; local engines download models automatically on first use.

[transcribe]
provider = "openai"  # Options: openai, faster-whisper, whisper-kit, whisper-cpp, aliyun

# OpenAI Whisper (cloud)
[transcribe.openai]
api_key = "sk-your-openai-api-key-here"
model = "whisper-1"  # Default and only option

# FasterWhisper (local, Windows/Linux)
[transcribe.faster-whisper]
model = "large-v2"  # Options: tiny, medium, large-v2
# Auto-downloads on first use; requires 1-3GB disk space

# WhisperKit (local, macOS M-series only)
[transcribe.whisper-kit]
model = "default"

# WhisperCpp (local, cross-platform)
[transcribe.whisper-cpp]
model = "base"  # Lightweight, fast

# Alibaba Cloud ASR (requires separate setup)
[transcribe.aliyun]
access_key_id = "your-access-key"
access_key_secret = "your-secret"
app_key = "your-app-key"

⚠ Heads up: OpenAI Whisper requires a paid API key. Free accounts have limited quota. Local Whisper engines (FasterWhisper, WhisperCpp) run offline but require significant CPU/GPU resources.

Step 8

Configure LLM Translation

KrillinAI is compatible with any LLM provider that implements the OpenAI API specification. This includes OpenAI GPT models, Google Gemini, DeepSeek, Alibaba Tongyi Qianwen, and locally-hosted models via OpenAI-compatible servers (e.g., LM Studio, Ollama with OpenAI compatibility layer).

[llm]
provider = "openai"  # Any OpenAI-compatible API

# OpenAI GPT
[llm.openai]
api_key = "sk-your-openai-api-key-here"
model = "gpt-4"  # Options: gpt-4, gpt-4-turbo, gpt-3.5-turbo
base_url = "https://api.openai.com/v1"  # Default

# DeepSeek (OpenAI-compatible)
[llm.deepseek]
api_key = "your-deepseek-api-key"
model = "deepseek-chat"
base_url = "https://api.deepseek.com/v1"

# Local model via Ollama (with OpenAI compatibility)
[llm.local]
api_key = "not-needed"  # Placeholder
model = "llama3.1:70b"
base_url = "http://localhost:11434/v1"  # Ollama OpenAI endpoint

# Google Gemini (via OpenAI-compatible proxy)
[llm.gemini]
api_key = "your-gemini-api-key"
model = "gemini-pro"
base_url = "https://generativelanguage.googleapis.com/v1beta/openai/"

Step 9

Configure Text-to-Speech (TTS)

TTS is optional but recommended for complete video dubbing. KrillinAI supports OpenAI TTS (simple, high-quality), Alibaba Cloud Voice Service (more voice options), and CosyVoice (advanced voice cloning). Voice cloning preserves the original speaker's characteristics in the target language.

[tts]
provider = "openai"  # Options: openai, aliyun, cosyvoice

# OpenAI TTS (recommended for simplicity)
[tts.openai]
api_key = "sk-your-openai-api-key-here"
model = "tts-1-hd"  # Options: tts-1, tts-1-hd
voice = "alloy"  # Options: alloy, echo, fable, onyx, nova, shimmer

# Alibaba Cloud Voice Service
[tts.aliyun]
access_key_id = "your-access-key"
access_key_secret = "your-secret"
app_key = "your-app-key"
voice = "xiaoyun"  # See Alibaba Cloud docs for full list

# CosyVoice (voice cloning)
[tts.cosyvoice]
endpoint = "http://localhost:50000"  # Self-hosted CosyVoice server
voice_preset = "default"  # Or path to voice sample for cloning

⚠ Heads up: Voice cloning with CosyVoice requires self-hosting a separate inference server. See the CosyVoice documentation for setup instructions.

Step 10

Launch Server Version

With configuration complete, start the KrillinAI server. The application will bind to port 8888 by default and serve the web interface. Access the UI in your browser to begin translating videos.

# Ensure config/config.toml exists
ls config/config.toml

# Start the server
./krillinai  # Linux/macOS
krillinai.exe  # Windows

# The server will output:
# Server listening on http://127.0.0.1:8888

# Open in browser
open http://127.0.0.1:8888  # macOS
xdg-open http://127.0.0.1:8888  # Linux
start http://127.0.0.1:8888  # Windows

Step 11

Docker Deployment (Alternative)

KrillinAI supports Docker for isolated, reproducible deployments. The Docker image includes all dependencies and can be configured via environment variables or a mounted config.toml file. This is the recommended approach for production deployments and CI/CD pipelines.

# Clone the repository to access docker-compose.yml
git clone https://github.com/krillinai/KrillinAI.git
cd KrillinAI

# Option 1: Docker Compose (recommended)
docker-compose up -d

# Option 2: Manual Docker run
docker run -d \
  -p 8888:8888 \
  -v $(pwd)/config:/app/config \
  -v $(pwd)/output:/app/output \
  --name krillinai \
  krillinai/krillinai:latest

# View logs
docker-compose logs -f
# or
docker logs -f krillinai

# Access the interface
http://localhost:8888

⚠ Heads up: Mount the config directory as a volume to persist configuration. Mount an output directory to access translated videos outside the container.

Step 12

Translate Your First Video

KrillinAI provides a web interface for uploading videos or fetching from URLs (including YouTube via yt-dlp integration). The workflow is: upload video → select source/target languages → choose output format (landscape/portrait) → start translation. Progress updates appear in real-time.

1. Open http://127.0.0.1:8888 in your browser
2. Click "New Translation Task"
3. Upload a video file OR paste a YouTube/video URL
4. Select source language (or auto-detect)
5. Select target language(s) - supports 100+ languages
6. Choose output format:
   - Landscape (16:9) for YouTube
   - Portrait (9:16) for TikTok, Instagram Reels
   - Auto-detect from source video
7. Configure options:
   - Enable/disable voice dubbing (TTS)
   - Enable/disable subtitle burning
   - Adjust subtitle styling
8. Click "Start Translation"
9. Monitor progress in the task list
10. Download completed video from the output panel

Step 13

Understanding the Translation Pipeline

KrillinAI executes a three-stage pipeline: Transcribe (speech to text via Whisper), Translate (text to text via LLM with context preservation), and Synthesize (text to speech via TTS). Each stage can be configured independently. The LLM translation stage uses context-aware chunking to maintain semantic coherence across subtitle boundaries.

Pipeline Stages:

1. Transcribe (Audio → Text)
   - Extract audio from video
   - Run Whisper speech recognition
   - Generate timestamped subtitles
   - Intelligent segmentation for natural breaks

2. Translate (Text → Text)
   - Chunk subtitles with context overlap
   - LLM translates with semantic awareness
   - Preserve timing and formatting
   - Adjust subtitle length for target language

3. Synthesize (Text → Audio)
   - Generate dubbed audio via TTS
   - Optional: Clone original voice characteristics
   - Align audio timing with video
   - Mix dubbed audio with video

4. Compose (Final Output)
   - Burn subtitles into video (optional)
   - Adjust aspect ratio for target platform
   - Export in optimized format

Step 14

Configure API Keys via Web Interface

The desktop version allows configuring API keys directly in the web interface without editing config.toml. Navigate to Settings → API Configuration to input your OpenAI, Alibaba Cloud, or other provider credentials. Changes are saved to the application's data directory.

Web UI Configuration Path:

1. Click the ⚙️ Settings icon (top-right)
2. Select "API Configuration" tab
3. Choose your provider:
   - OpenAI: Enter API key for Whisper + GPT + TTS
   - Alibaba Cloud: Enter AccessKey, SecretKey, AppKey
   - Custom: Enter base_url for OpenAI-compatible APIs
4. Test connection: Click "Validate" button
5. Save configuration
6. Restart if prompted (desktop version may require restart)

Settings are stored in:
- Windows: %APPDATA%\krillinai\config.toml
- macOS: ~/Library/Application Support/krillinai/config.toml
- Linux: ~/.config/krillinai/config.toml

Step 15

Troubleshooting Common Issues

Common issues include model download failures (check internet connection and disk space), API authentication errors (verify keys are valid and have sufficient quota), and video processing errors (ensure input video is not corrupted). Check the application logs for detailed error messages.

# View application logs
# Desktop version: Check the web UI console (F12 in browser)
# Server version: Logs print to terminal

# Check API connectivity
curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer sk-your-api-key"

# Verify model downloads (local Whisper)
# Models stored in:
# - Windows: %USERPROFILE%\.cache\whisper
# - Linux/macOS: ~/.cache/whisper
ls ~/.cache/whisper

# Test Whisper locally (debug)
python -c "import whisper; whisper.load_model('base')"

# FFmpeg not found (rare, should auto-install)
# Manually install:
sudo apt install ffmpeg  # Ubuntu/Debian
brew install ffmpeg  # macOS
choco install ffmpeg  # Windows

# Port 8888 already in use
# Edit config.toml:
[server]
port = 8889  # Change to available port

# Docker container fails to start
docker-compose logs
docker inspect krillinai

⚠ Heads up: OpenAI API rate limits apply. Free tier accounts have strict quotas (3 requests/min). Upgrade to a paid account for production use.

Step 16

Performance Optimization

For faster processing, use local Whisper models (FasterWhisper on GPU-enabled systems), batch multiple videos, and choose smaller LLM models (gpt-3.5-turbo instead of gpt-4). GPU acceleration significantly improves Whisper transcription speed.

# Enable GPU acceleration for FasterWhisper (NVIDIA)
# Requires CUDA toolkit installed
nvidia-smi  # Verify GPU is detected

# In config.toml:
[transcribe.faster-whisper]
model = "large-v2"
device = "cuda"  # Options: cpu, cuda
compute_type = "float16"  # Faster on GPU

# CPU optimization (multi-threading)
[transcribe.faster-whisper]
device = "cpu"
compute_type = "int8"  # Quantized for speed
num_threads = 8  # Match your CPU core count

# Batch processing via API (future feature)
# For now, queue multiple tasks in the web UI

# Use smaller models for faster turnaround:
# - Whisper: tiny (fast, lower accuracy) vs large-v2 (slow, high accuracy)
# - LLM: gpt-3.5-turbo (fast, cheap) vs gpt-4 (slow, expensive, best quality)

Step 17

Production Deployment Checklist

For production deployments, enable HTTPS (via reverse proxy like Nginx or Caddy), implement authentication (basic auth or OAuth), set up monitoring and logging, configure automatic backups of configuration and output files, and use Docker Compose with health checks and restart policies.

# docker-compose.prod.yml
version: '3.8'
services:
  krillinai:
    image: krillinai/krillinai:latest
    restart: always
    ports:
      - "127.0.0.1:8888:8888"  # Bind to localhost only
    volumes:
      - ./config:/app/config:ro  # Read-only config
      - ./output:/app/output
      - ./logs:/app/logs
    environment:
      - LOG_LEVEL=info
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8888/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    mem_limit: 8g
    cpus: 4

  nginx:
    image: nginx:alpine
    restart: always
    ports:
      - "443:443"
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
      - ./ssl:/etc/nginx/ssl:ro
    depends_on:
      - krillinai

⚠ Heads up: Never expose the KrillinAI web interface directly to the internet without authentication. Use a reverse proxy with HTTPS and implement access controls.

Step 18

Community and Support Resources

KrillinAI maintains active community channels for troubleshooting and feature requests. The project documentation is hosted on DeepWiki, and the GitHub issues tracker is monitored by maintainers. For real-time support, join the QQ group (primarily Chinese-language community).

Official Resources:

- GitHub Repository: https://github.com/krillinai/KrillinAI
- Documentation: https://deepwiki.com/krillinai/KlicStudio
- Issue Tracker: https://github.com/krillinai/KrillinAI/issues
- QQ Group: 754069680 (Chinese community)

Related Projects:
- Whisper: https://github.com/openai/whisper
- FasterWhisper: https://github.com/guillaumekln/faster-whisper
- CosyVoice: https://github.com/FunAudioLLM/CosyVoice

Useful Documentation:
- Docker Deployment: See docker.md in repository
- Alibaba Cloud Setup: See aliyun.md in repository
- API Reference: Coming soon (project is actively developed)