TechSetupGuides
IntermediateKrillinAIVideo TranslationAIGoWhisperLLMTTSDockerOpenAIVoice CloningFFmpeg

KrillinAI: Video Translation and Dubbing Setup

Complete setup guide for KrillinAI - an AI-powered video translation and dubbing tool that supports 100 languages. Includes installation for Windows/Linux/macOS, configuration of Whisper speech recognition, LLM translation, voice cloning with TTS, and Docker deployment options.

  1. Step 1

    System Prerequisites

    KrillinAI is a Go-based video translation and dubbing tool that integrates speech recognition, LLM translation, and voice synthesis. It supports desktop and server deployment modes across all major platforms. The tool automatically handles dependency installation, but you'll need API access for speech recognition and translation services.

    # Check system compatibility
    uname -a  # Linux/macOS
    systeminfo  # Windows
    
    # Verify internet connection (required for API calls and model downloads)
    ping -c 3 api.openai.com
    
    # Check available disk space (models can be 1-3GB)
    df -h  # Linux/macOS
    wmic logicaldisk get size,freespace,caption  # Windows
    ⚠ Heads up: KrillinAI requires stable internet for API calls to speech recognition and LLM services. Local Whisper models (FasterWhisper, WhisperCpp) will download automatically on first use and require 1-3GB storage per model.
  2. Step 2

    Choose Your Deployment Mode

    KrillinAI offers two deployment modes: Desktop (GUI application with built-in browser interface) and Server (lightweight web service accessed via browser). Desktop mode is recommended for most users as it provides a sleek interface with no additional configuration. Server mode is ideal for headless deployments, Docker containers, or remote access scenarios.

    Desktop Mode:
    - Pre-packaged with web UI
    - One-click launch
    - Automatic port management
    - Ideal for: Local development, content creators
    
    Server Mode:
    - Minimal footprint
    - Manual config.toml setup required
    - Browser access at http://127.0.0.1:8888
    - Ideal for: Production deployments, Docker, remote servers
  3. Step 3

    Download KrillinAI Executable

    Download the pre-compiled executable for your operating system from the GitHub releases page. KrillinAI provides native builds for Windows (x64 and x86), Linux (x64 and ARM64), and macOS (Intel and Apple Silicon). Choose the desktop version (filename contains 'desktop') for GUI mode, or the standard version for server mode.

    # Visit the releases page
    https://github.com/krillinai/KrillinAI/releases
    
    # Download for your platform:
    # - Windows Desktop: krillinai-desktop-windows-amd64.exe
    # - Linux Desktop: krillinai-desktop-linux-amd64
    # - macOS Desktop (Intel): krillinai-desktop-darwin-amd64
    # - macOS Desktop (M-series): krillinai-desktop-darwin-arm64
    # - Server versions: Same names without "desktop"
    
    # Linux/macOS: Make executable
    chmod +x krillinai-desktop-linux-amd64
    
    # Optional: Move to system PATH
    sudo mv krillinai-desktop-linux-amd64 /usr/local/bin/krillinai
    krillinai --version
  4. Step 4

    macOS Security Configuration

    macOS blocks unsigned executables by default. You must manually trust the KrillinAI binary before first launch. This is a standard security measure for applications distributed outside the Mac App Store.

    # Remove quarantine attribute from the downloaded executable
    xattr -d com.apple.quarantine krillinai-desktop-darwin-arm64
    
    # Alternative: Trust via System Settings
    # 1. Try to open the app (will be blocked)
    # 2. System Settings → Privacy & Security
    # 3. Scroll to "Security" section
    # 4. Click "Open Anyway" next to the KrillinAI message
    
    # Verify executable is trusted
    spctl -a -v krillinai-desktop-darwin-arm64
    ⚠ Heads up: This step is ONLY required on macOS. Windows and Linux users can skip directly to launching the application.
  5. Step 5

    Launch Desktop Version (Recommended)

    For desktop mode, simply double-click the executable. KrillinAI will automatically start a local web server and open the interface in your default browser. No configuration file is required for basic usage with cloud-based services.

    # Windows: Double-click the .exe file
    # Or from Command Prompt:
    krillinai-desktop-windows-amd64.exe
    
    # Linux:
    ./krillinai-desktop-linux-amd64
    
    # macOS:
    ./krillinai-desktop-darwin-arm64
    
    # The web interface will automatically open at:
    # http://127.0.0.1:8888
  6. Step 6

    Server Version Setup (Optional)

    For server deployments, create a configuration directory and populate config.toml from the example template. The configuration file defines API credentials, model choices, and service endpoints. Server mode requires manual configuration before first launch.

    # Create configuration directory
    mkdir -p config
    cd config
    
    # Download example configuration
    curl -O https://raw.githubusercontent.com/krillinai/KrillinAI/master/config-example.toml
    
    # Rename and edit
    mv config-example.toml config.toml
    nano config.toml  # or vim, code, etc.
  7. Step 7

    Configure Speech Recognition (Transcribe)

    KrillinAI supports multiple speech recognition engines: cloud-based OpenAI Whisper (all platforms), local FasterWhisper (Windows/Linux), WhisperKit (macOS M-series only), WhisperCpp (all platforms), and Alibaba Cloud ASR. Cloud providers require API keys; local engines download models automatically on first use.

    [transcribe]
    provider = "openai"  # Options: openai, faster-whisper, whisper-kit, whisper-cpp, aliyun
    
    # OpenAI Whisper (cloud)
    [transcribe.openai]
    api_key = "sk-your-openai-api-key-here"
    model = "whisper-1"  # Default and only option
    
    # FasterWhisper (local, Windows/Linux)
    [transcribe.faster-whisper]
    model = "large-v2"  # Options: tiny, medium, large-v2
    # Auto-downloads on first use; requires 1-3GB disk space
    
    # WhisperKit (local, macOS M-series only)
    [transcribe.whisper-kit]
    model = "default"
    
    # WhisperCpp (local, cross-platform)
    [transcribe.whisper-cpp]
    model = "base"  # Lightweight, fast
    
    # Alibaba Cloud ASR (requires separate setup)
    [transcribe.aliyun]
    access_key_id = "your-access-key"
    access_key_secret = "your-secret"
    app_key = "your-app-key"
    ⚠ Heads up: OpenAI Whisper requires a paid API key. Free accounts have limited quota. Local Whisper engines (FasterWhisper, WhisperCpp) run offline but require significant CPU/GPU resources.
  8. Step 8

    Configure LLM Translation

    KrillinAI is compatible with any LLM provider that implements the OpenAI API specification. This includes OpenAI GPT models, Google Gemini, DeepSeek, Alibaba Tongyi Qianwen, and locally-hosted models via OpenAI-compatible servers (e.g., LM Studio, Ollama with OpenAI compatibility layer).

    [llm]
    provider = "openai"  # Any OpenAI-compatible API
    
    # OpenAI GPT
    [llm.openai]
    api_key = "sk-your-openai-api-key-here"
    model = "gpt-4"  # Options: gpt-4, gpt-4-turbo, gpt-3.5-turbo
    base_url = "https://api.openai.com/v1"  # Default
    
    # DeepSeek (OpenAI-compatible)
    [llm.deepseek]
    api_key = "your-deepseek-api-key"
    model = "deepseek-chat"
    base_url = "https://api.deepseek.com/v1"
    
    # Local model via Ollama (with OpenAI compatibility)
    [llm.local]
    api_key = "not-needed"  # Placeholder
    model = "llama3.1:70b"
    base_url = "http://localhost:11434/v1"  # Ollama OpenAI endpoint
    
    # Google Gemini (via OpenAI-compatible proxy)
    [llm.gemini]
    api_key = "your-gemini-api-key"
    model = "gemini-pro"
    base_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
  9. Step 9

    Configure Text-to-Speech (TTS)

    TTS is optional but recommended for complete video dubbing. KrillinAI supports OpenAI TTS (simple, high-quality), Alibaba Cloud Voice Service (more voice options), and CosyVoice (advanced voice cloning). Voice cloning preserves the original speaker's characteristics in the target language.

    [tts]
    provider = "openai"  # Options: openai, aliyun, cosyvoice
    
    # OpenAI TTS (recommended for simplicity)
    [tts.openai]
    api_key = "sk-your-openai-api-key-here"
    model = "tts-1-hd"  # Options: tts-1, tts-1-hd
    voice = "alloy"  # Options: alloy, echo, fable, onyx, nova, shimmer
    
    # Alibaba Cloud Voice Service
    [tts.aliyun]
    access_key_id = "your-access-key"
    access_key_secret = "your-secret"
    app_key = "your-app-key"
    voice = "xiaoyun"  # See Alibaba Cloud docs for full list
    
    # CosyVoice (voice cloning)
    [tts.cosyvoice]
    endpoint = "http://localhost:50000"  # Self-hosted CosyVoice server
    voice_preset = "default"  # Or path to voice sample for cloning
    ⚠ Heads up: Voice cloning with CosyVoice requires self-hosting a separate inference server. See the CosyVoice documentation for setup instructions.
  10. Step 10

    Launch Server Version

    With configuration complete, start the KrillinAI server. The application will bind to port 8888 by default and serve the web interface. Access the UI in your browser to begin translating videos.

    # Ensure config/config.toml exists
    ls config/config.toml
    
    # Start the server
    ./krillinai  # Linux/macOS
    krillinai.exe  # Windows
    
    # The server will output:
    # Server listening on http://127.0.0.1:8888
    
    # Open in browser
    open http://127.0.0.1:8888  # macOS
    xdg-open http://127.0.0.1:8888  # Linux
    start http://127.0.0.1:8888  # Windows
  11. Step 11

    Docker Deployment (Alternative)

    KrillinAI supports Docker for isolated, reproducible deployments. The Docker image includes all dependencies and can be configured via environment variables or a mounted config.toml file. This is the recommended approach for production deployments and CI/CD pipelines.

    # Clone the repository to access docker-compose.yml
    git clone https://github.com/krillinai/KrillinAI.git
    cd KrillinAI
    
    # Option 1: Docker Compose (recommended)
    docker-compose up -d
    
    # Option 2: Manual Docker run
    docker run -d \
      -p 8888:8888 \
      -v $(pwd)/config:/app/config \
      -v $(pwd)/output:/app/output \
      --name krillinai \
      krillinai/krillinai:latest
    
    # View logs
    docker-compose logs -f
    # or
    docker logs -f krillinai
    
    # Access the interface
    http://localhost:8888
    ⚠ Heads up: Mount the config directory as a volume to persist configuration. Mount an output directory to access translated videos outside the container.
  12. Step 12

    Translate Your First Video

    KrillinAI provides a web interface for uploading videos or fetching from URLs (including YouTube via yt-dlp integration). The workflow is: upload video → select source/target languages → choose output format (landscape/portrait) → start translation. Progress updates appear in real-time.

    1. Open http://127.0.0.1:8888 in your browser
    2. Click "New Translation Task"
    3. Upload a video file OR paste a YouTube/video URL
    4. Select source language (or auto-detect)
    5. Select target language(s) - supports 100+ languages
    6. Choose output format:
       - Landscape (16:9) for YouTube
       - Portrait (9:16) for TikTok, Instagram Reels
       - Auto-detect from source video
    7. Configure options:
       - Enable/disable voice dubbing (TTS)
       - Enable/disable subtitle burning
       - Adjust subtitle styling
    8. Click "Start Translation"
    9. Monitor progress in the task list
    10. Download completed video from the output panel
  13. Step 13

    Understanding the Translation Pipeline

    KrillinAI executes a three-stage pipeline: Transcribe (speech to text via Whisper), Translate (text to text via LLM with context preservation), and Synthesize (text to speech via TTS). Each stage can be configured independently. The LLM translation stage uses context-aware chunking to maintain semantic coherence across subtitle boundaries.

    Pipeline Stages:
    
    1. Transcribe (Audio → Text)
       - Extract audio from video
       - Run Whisper speech recognition
       - Generate timestamped subtitles
       - Intelligent segmentation for natural breaks
    
    2. Translate (Text → Text)
       - Chunk subtitles with context overlap
       - LLM translates with semantic awareness
       - Preserve timing and formatting
       - Adjust subtitle length for target language
    
    3. Synthesize (Text → Audio)
       - Generate dubbed audio via TTS
       - Optional: Clone original voice characteristics
       - Align audio timing with video
       - Mix dubbed audio with video
    
    4. Compose (Final Output)
       - Burn subtitles into video (optional)
       - Adjust aspect ratio for target platform
       - Export in optimized format
  14. Step 14

    Configure API Keys via Web Interface

    The desktop version allows configuring API keys directly in the web interface without editing config.toml. Navigate to Settings → API Configuration to input your OpenAI, Alibaba Cloud, or other provider credentials. Changes are saved to the application's data directory.

    Web UI Configuration Path:
    
    1. Click the ⚙️ Settings icon (top-right)
    2. Select "API Configuration" tab
    3. Choose your provider:
       - OpenAI: Enter API key for Whisper + GPT + TTS
       - Alibaba Cloud: Enter AccessKey, SecretKey, AppKey
       - Custom: Enter base_url for OpenAI-compatible APIs
    4. Test connection: Click "Validate" button
    5. Save configuration
    6. Restart if prompted (desktop version may require restart)
    
    Settings are stored in:
    - Windows: %APPDATA%\krillinai\config.toml
    - macOS: ~/Library/Application Support/krillinai/config.toml
    - Linux: ~/.config/krillinai/config.toml
  15. Step 15

    Troubleshooting Common Issues

    Common issues include model download failures (check internet connection and disk space), API authentication errors (verify keys are valid and have sufficient quota), and video processing errors (ensure input video is not corrupted). Check the application logs for detailed error messages.

    # View application logs
    # Desktop version: Check the web UI console (F12 in browser)
    # Server version: Logs print to terminal
    
    # Check API connectivity
    curl https://api.openai.com/v1/models \
      -H "Authorization: Bearer sk-your-api-key"
    
    # Verify model downloads (local Whisper)
    # Models stored in:
    # - Windows: %USERPROFILE%\.cache\whisper
    # - Linux/macOS: ~/.cache/whisper
    ls ~/.cache/whisper
    
    # Test Whisper locally (debug)
    python -c "import whisper; whisper.load_model('base')"
    
    # FFmpeg not found (rare, should auto-install)
    # Manually install:
    sudo apt install ffmpeg  # Ubuntu/Debian
    brew install ffmpeg  # macOS
    choco install ffmpeg  # Windows
    
    # Port 8888 already in use
    # Edit config.toml:
    [server]
    port = 8889  # Change to available port
    
    # Docker container fails to start
    docker-compose logs
    docker inspect krillinai
    ⚠ Heads up: OpenAI API rate limits apply. Free tier accounts have strict quotas (3 requests/min). Upgrade to a paid account for production use.
  16. Step 16

    Performance Optimization

    For faster processing, use local Whisper models (FasterWhisper on GPU-enabled systems), batch multiple videos, and choose smaller LLM models (gpt-3.5-turbo instead of gpt-4). GPU acceleration significantly improves Whisper transcription speed.

    # Enable GPU acceleration for FasterWhisper (NVIDIA)
    # Requires CUDA toolkit installed
    nvidia-smi  # Verify GPU is detected
    
    # In config.toml:
    [transcribe.faster-whisper]
    model = "large-v2"
    device = "cuda"  # Options: cpu, cuda
    compute_type = "float16"  # Faster on GPU
    
    # CPU optimization (multi-threading)
    [transcribe.faster-whisper]
    device = "cpu"
    compute_type = "int8"  # Quantized for speed
    num_threads = 8  # Match your CPU core count
    
    # Batch processing via API (future feature)
    # For now, queue multiple tasks in the web UI
    
    # Use smaller models for faster turnaround:
    # - Whisper: tiny (fast, lower accuracy) vs large-v2 (slow, high accuracy)
    # - LLM: gpt-3.5-turbo (fast, cheap) vs gpt-4 (slow, expensive, best quality)
  17. Step 17

    Production Deployment Checklist

    For production deployments, enable HTTPS (via reverse proxy like Nginx or Caddy), implement authentication (basic auth or OAuth), set up monitoring and logging, configure automatic backups of configuration and output files, and use Docker Compose with health checks and restart policies.

    # docker-compose.prod.yml
    version: '3.8'
    services:
      krillinai:
        image: krillinai/krillinai:latest
        restart: always
        ports:
          - "127.0.0.1:8888:8888"  # Bind to localhost only
        volumes:
          - ./config:/app/config:ro  # Read-only config
          - ./output:/app/output
          - ./logs:/app/logs
        environment:
          - LOG_LEVEL=info
        healthcheck:
          test: ["CMD", "curl", "-f", "http://localhost:8888/health"]
          interval: 30s
          timeout: 10s
          retries: 3
        mem_limit: 8g
        cpus: 4
    
      nginx:
        image: nginx:alpine
        restart: always
        ports:
          - "443:443"
          - "80:80"
        volumes:
          - ./nginx.conf:/etc/nginx/nginx.conf:ro
          - ./ssl:/etc/nginx/ssl:ro
        depends_on:
          - krillinai
    ⚠ Heads up: Never expose the KrillinAI web interface directly to the internet without authentication. Use a reverse proxy with HTTPS and implement access controls.
  18. Step 18

    Community and Support Resources

    KrillinAI maintains active community channels for troubleshooting and feature requests. The project documentation is hosted on DeepWiki, and the GitHub issues tracker is monitored by maintainers. For real-time support, join the QQ group (primarily Chinese-language community).

    Official Resources:
    
    - GitHub Repository: https://github.com/krillinai/KrillinAI
    - Documentation: https://deepwiki.com/krillinai/KlicStudio
    - Issue Tracker: https://github.com/krillinai/KrillinAI/issues
    - QQ Group: 754069680 (Chinese community)
    
    Related Projects:
    - Whisper: https://github.com/openai/whisper
    - FasterWhisper: https://github.com/guillaumekln/faster-whisper
    - CosyVoice: https://github.com/FunAudioLLM/CosyVoice
    
    Useful Documentation:
    - Docker Deployment: See docker.md in repository
    - Alibaba Cloud Setup: See aliyun.md in repository
    - API Reference: Coming soon (project is actively developed)

Feature requests

Sign in to suggest features or vote on existing ones.

No feature requests yet.

Discussion

0 people marked this as worked·Sign in to mark your own.

Sign in to join the discussion.

No comments yet.