Files
miku-discord/README.md

19 KiB

🎤 Miku Discord Bot 💙

Miku Banner Docker Python Discord.py

The world's #1 Virtual Idol, now in your Discord server! 🌱

FeaturesQuick StartArchitectureAPIContributing


🌟 About

Meet Hatsune Miku - a fully-featured, AI-powered Discord bot that brings the cheerful, energetic virtual idol to life in your server! Powered by local LLMs (Llama 3.1), vision models (MiniCPM-V), and a sophisticated autonomous behavior system, Miku can chat, react, share content, and even change her profile picture based on her mood!

Why This Bot?

  • 🎭 14 Dynamic Moods - From bubbly to melancholy, Miku's personality adapts
  • 🤖 Smart Autonomous Behavior - Context-aware decisions without spamming
  • 👁️ Vision Capabilities - Analyzes images, videos, and GIFs in conversations
  • 🎨 Auto Profile Pictures - Fetches & crops anime faces from Danbooru based on mood
  • 💬 DM Support - Personal conversations with mood tracking
  • 🐦 Twitter Integration - Shares Miku-related tweets and figurine announcements
  • 🎮 ComfyUI Integration - Natural language image generation requests
  • 🔊 Voice Chat Ready - Fish.audio TTS integration (docs included)
  • 📊 RESTful API - Full control via HTTP endpoints
  • 🐳 Production Ready - Docker Compose with GPU support

Features

🧠 AI & LLM Integration

  • Local LLM Processing with Llama 3.1 8B (via llama.cpp + llama-swap)
  • Automatic Model Switching - Text ↔️ Vision models swap on-demand
  • OpenAI-Compatible API - Easy migration and integration
  • Conversation History - Per-user context with RAG-style retrieval
  • Smart Prompting - Mood-aware system prompts with personality profiles

🎭 Mood & Personality System

14 Available Moods (click to expand)
  • 😊 Neutral - Classic cheerful Miku
  • 😴 Asleep - Sleepy and minimally responsive
  • 😪 Sleepy - Getting tired, simple responses
  • 🎉 Excited - Extra energetic and enthusiastic
  • 💫 Bubbly - Playful and giggly
  • 🤔 Curious - Inquisitive and wondering
  • 😳 Shy - Blushing and hesitant
  • 🤪 Silly - Goofy and fun-loving
  • 😠 Angry - Frustrated or upset
  • 😤 Irritated - Mildly annoyed
  • 😢 Melancholy - Sad and reflective
  • 😏 Flirty - Playful and teasing
  • 💕 Romantic - Sweet and affectionate
  • 🎯 Serious - Focused and thoughtful
  • Per-Server Mood Tracking - Different moods in different servers
  • DM Mood Persistence - Separate mood state for private conversations
  • Automatic Mood Shifts - Responds to conversation sentiment

🤖 Autonomous Behavior System V2

The bot features a sophisticated context-aware decision engine that makes Miku feel alive:

  • Intelligent Activity Detection - Tracks message frequency, user presence, and channel activity
  • Non-Intrusive - Won't spam or interrupt important conversations
  • Mood-Based Personality - Behavioral patterns change with mood
  • Multiple Action Types:
    • 💬 General conversation starters
    • 👋 Engaging specific users
    • 🐦 Sharing Miku tweets
    • 💬 Joining ongoing conversations
    • 🎨 Changing profile pictures
    • 😊 Reacting to messages

Rate Limiting: Minimum 30-second cooldown between autonomous actions to prevent spam.

👁️ Vision & Media Processing

  • Image Analysis - Describe images shared in chat using MiniCPM-V 4.5
  • Video Understanding - Extracts frames and analyzes video content
  • GIF Support - Processes animated GIFs (converts to MP4 if needed)
  • Embed Content Extraction - Reads Twitter/X embeds without API
  • Face Detection - On-demand anime face detection service (GPU-accelerated)

🎨 Dynamic Profile Picture System

  • Danbooru Integration - Searches for Miku artwork
  • Smart Cropping - Automatic face detection and 1:1 crop
  • Mood-Based Selection - Filters by tags matching current mood
  • Quality Filtering - Only uses high-quality, safe-rated images
  • Fallback System - Graceful degradation if detection fails

🐦 Twitter Features

  • Tweet Sharing - Automatically fetches and shares Miku-related tweets
  • Figurine Notifications - DM subscribers about new Miku figurine releases
  • Embed Compatibility - Uses fxtwitter for better Discord previews
  • Duplicate Prevention - Tracks sent tweets to avoid repeats

🎮 ComfyUI Image Generation

  • Natural Language Detection - "Draw me as Miku swimming in a pool"
  • Workflow Integration - Connects to external ComfyUI instance
  • Smart Prompting - Enhances user requests with context

📡 REST API Dashboard

Full-featured FastAPI server with endpoints for:

  • Mood management (get/set/reset)
  • Conversation history
  • Autonomous actions (trigger manually)
  • Profile picture updates
  • Server configuration
  • DM analysis reports

🔧 Developer Features

  • Docker Compose Setup - One command deployment
  • GPU Acceleration - NVIDIA runtime for models and face detection
  • Health Checks - Automatic service monitoring
  • Volume Persistence - Conversation history and settings saved
  • Hot Reload - Update without restarting (for development)

🚀 Quick Start

Prerequisites

  • Docker & Docker Compose installed
  • NVIDIA GPU with CUDA support (for model inference)
  • Discord Bot Token (Create one here)
  • At least 8GB VRAM recommended (4GB minimum)

Installation

  1. Clone the repository

    git clone https://github.com/yourusername/miku-discord.git
    cd miku-discord
    
  2. Set up your bot token

    Edit docker-compose.yml and replace the DISCORD_BOT_TOKEN:

    environment:
      - DISCORD_BOT_TOKEN=your_token_here
      - OWNER_USER_ID=your_discord_user_id  # For DM reports
    
  3. Add your models

    Place these GGUF models in the models/ directory:

    • Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf (text model)
    • MiniCPM-V-4_5-Q3_K_S.gguf (vision model)
    • MiniCPM-V-4_5-mmproj-f16.gguf (vision projector)
  4. Launch the bot

    docker-compose up -d
    
  5. Check logs

    docker-compose logs -f miku-bot
    
  6. Access the dashboard

    Open http://localhost:3939 in your browser

Optional: ComfyUI Integration

If you have ComfyUI running, update the path in docker-compose.yml:

volumes:
  - /path/to/your/ComfyUI/output:/app/ComfyUI/output:ro

Optional: Face Detection Service

Start the anime face detector when needed:

docker-compose --profile tools up -d anime-face-detector

Access Gradio UI at http://localhost:7860


🏗️ Architecture

Service Overview

┌─────────────────────────────────────────────────────────────┐
│                        Discord API                          │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│                     Miku Bot (Python)                       │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   Discord    │  │   FastAPI    │  │  Autonomous  │     │
│  │  Event Loop  │  │   Server     │  │    Engine    │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└───────────┬────────────────┬────────────────┬──────────────┘
            │                │                │
            ▼                ▼                ▼
┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐
│   llama-swap    │ │   ComfyUI       │ │ Face Detector│
│  (Model Server) │ │ (Image Gen)     │ │  (On-Demand) │
│                 │ │                 │ │              │
│  • Llama 3.1    │ │  • Workflows    │ │  • Gradio UI │
│  • MiniCPM-V    │ │  • GPU Accel    │ │  • FastAPI   │
│  • Auto-swap    │ │                 │ │              │
└─────────────────┘ └─────────────────┘ └──────────────┘
         │
         ▼
   ┌──────────┐
   │  Models  │
   │  (GGUF)  │
   └──────────┘

Tech Stack

Component Technology
Bot Framework Discord.py 2.0+
LLM Backend llama.cpp + llama-swap
Text Model Llama 3.1 8B Instruct
Vision Model MiniCPM-V 4.5
API Server FastAPI + Uvicorn
Image Gen ComfyUI (external)
Face Detection Anime-Face-Detector (Gradio)
Database JSON files (conversation history, settings)
Containerization Docker + Docker Compose
GPU Runtime NVIDIA Container Toolkit

Key Components

1. llama-swap (Model Server)

  • Automatically loads/unloads models based on requests
  • Prevents VRAM exhaustion by swapping between text and vision models
  • OpenAI-compatible /v1/chat/completions endpoint
  • Configurable TTL (time-to-live) per model

2. Autonomous Engine V2

  • Tracks message activity, user presence, and channel engagement
  • Calculates "engagement scores" per server
  • Makes context-aware decisions without LLM overhead
  • Personality profiles per mood (e.g., shy mood = less engaging)

3. Server Manager

  • Per-guild configuration (mood, sleep state, autonomous settings)
  • Scheduled tasks (bedtime reminders, autonomous ticks)
  • Persistent storage in servers_config.json

4. Conversation History

  • Vector-based RAG (Retrieval Augmented Generation)
  • Stores last 50 messages per user
  • Semantic search using FAISS
  • Context injection for continuity

📡 API Endpoints

The bot runs a FastAPI server on port 3939 with the following endpoints:

Mood Management

Endpoint Method Description
/servers/{guild_id}/mood GET Get current mood for server
/servers/{guild_id}/mood POST Set mood (body: {"mood": "excited"})
/servers/{guild_id}/mood/reset POST Reset to neutral mood
/mood GET Get DM mood (deprecated, use server-specific)

Autonomous Actions

Endpoint Method Description
/autonomous/general POST Make Miku say something random
/autonomous/engage POST Engage a random user
/autonomous/tweet POST Share a Miku tweet
/autonomous/reaction POST React to a recent message
/autonomous/custom POST Custom prompt (body: {"prompt": "..."})

Profile Pictures

Endpoint Method Description
/profile-picture/change POST Change profile picture (body: {"mood": "happy"})
/profile-picture/revert POST Revert to previous picture
/profile-picture/current GET Get current picture metadata

Utilities

Endpoint Method Description
/conversation/reset POST Clear conversation history for user
/logs GET View bot logs (last 1000 lines)
/prompt GET View current system prompt
/ GET Dashboard HTML page

Example Usage

# Set mood to excited
curl -X POST http://localhost:3939/servers/123456789/mood \
  -H "Content-Type: application/json" \
  -d '{"mood": "excited"}'

# Make Miku say something
curl -X POST http://localhost:3939/autonomous/general

# Change profile picture
curl -X POST http://localhost:3939/profile-picture/change \
  -H "Content-Type: application/json" \
  -d '{"mood": "flirty"}'

🎮 Usage Examples

Basic Interaction

User: Hey Miku! How are you today?
Miku: Miku's doing great! 💙 Thanks for asking! ✨

User: Can you see this? [uploads image]
Miku: Ooh! 👀 I see a cute cat sitting on a keyboard! So fluffy! 🐱

Mood Changes

User: /mood excited
Miku: YAYYY!!! 🎉✨ Miku is SO EXCITED right now!!! Let's have fun! 💙🎶

User: What's your favorite food?
Miku: NEGI!! 🌱🌱🌱 Green onions are THE BEST! Want some?! ✨

Image Generation

User: Draw yourself swimming in a pool
Miku: Ooh! Let me create that for you! 🎨✨ [generates image]

Autonomous Behavior

[After detecting activity in #general]
Miku: Hey everyone! 👋 What are you all talking about? 💙

🛠️ Configuration

Model Configuration (llama-swap-config.yaml)

models:
  llama3.1:
    cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf -ngl 99
    ttl: 1800  # 30 minutes
    
  vision:
    cmd: /app/llama-server --port ${PORT} --model /models/MiniCPM-V-4_5-Q3_K_S.gguf --mmproj /models/MiniCPM-V-4_5-mmproj-f16.gguf
    ttl: 900   # 15 minutes

Environment Variables

Variable Default Description
DISCORD_BOT_TOKEN Required Your Discord bot token
OWNER_USER_ID Optional Your Discord user ID (for DM reports)
LLAMA_URL http://llama-swap:8080 Model server endpoint
TEXT_MODEL llama3.1 Text generation model name
VISION_MODEL vision Vision model name

Persistent Storage

All data is stored in bot/memory/:

  • servers_config.json - Per-server settings
  • autonomous_config.json - Autonomous behavior settings
  • conversation_history/ - User conversation data
  • profile_pictures/ - Downloaded profile pictures
  • dms/ - DM conversation logs
  • figurine_subscribers.json - Figurine notification subscribers

📚 Documentation

Detailed documentation available in the readmes/ directory:


🐛 Troubleshooting

Bot won't start

Check if models are loaded:

docker-compose logs llama-swap

Verify GPU access:

docker run --rm --runtime=nvidia nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

High VRAM usage

  • Lower the -ngl parameter in llama-swap-config.yaml (reduce GPU layers)
  • Reduce context size with -c parameter
  • Use smaller quantization (Q3 instead of Q4)

Autonomous actions not triggering

  • Check autonomous_config.json - ensure enabled and cooldown settings
  • Verify activity in server (bot tracks engagement)
  • Check logs for decision engine output

Face detection not working

  • Ensure GPU is available: docker-compose --profile tools up -d anime-face-detector
  • Check API health: curl http://localhost:6078/health
  • View Gradio UI: http://localhost:7860

Models switching too frequently

Increase TTL in llama-swap-config.yaml:

ttl: 3600  # 1 hour instead of 30 minutes

🤝 Contributing

Contributions are welcome! Here's how you can help:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Setup

For local development without Docker:

# Install dependencies
cd bot
pip install -r requirements.txt

# Set environment variables
export DISCORD_BOT_TOKEN="your_token"
export LLAMA_URL="http://localhost:8080"

# Run the bot
python bot.py

Code Style

  • Use type hints where possible
  • Follow PEP 8 conventions
  • Add docstrings to functions
  • Comment complex logic

📝 License

This project is provided as-is for educational and personal use. Please respect:


🙏 Acknowledgments

  • Crypton Future Media - For creating Hatsune Miku
  • llama.cpp - For efficient local LLM inference
  • mostlygeek/llama-swap - For brilliant model management
  • Discord.py - For the excellent Discord API wrapper
  • OpenAI - For the API standard
  • MiniCPM-V Team - For the amazing vision model
  • Danbooru - For the artwork API
  • Fish.audio - For TTS integration (optional)

💙 Support

If you enjoy this project:

  • Star this repository
  • 🐛 Report bugs via Issues
  • 💬 Share your Miku bot setup in Discussions
  • 🎤 Listen to some Miku songs!

Made with 💙 by a Miku fan, for Miku fans

"The future begins now!" - Hatsune Miku 🎶

⬆ Back to Top