Koko210/miku-discord

Fork 0

Files

koko210Serve 37063b7f85 Add comprehensive README with features, setup, and documentation

2025-12-07 17:29:24 +02:00

19 KiB

Raw Blame History

🎤 Miku Discord Bot 💙

The world's #1 Virtual Idol, now in your Discord server! 🌱✨

Features • Quick Start • Architecture • API • Contributing

🌟 About

Meet Hatsune Miku - a fully-featured, AI-powered Discord bot that brings the cheerful, energetic virtual idol to life in your server! Powered by local LLMs (Llama 3.1), vision models (MiniCPM-V), and a sophisticated autonomous behavior system, Miku can chat, react, share content, and even change her profile picture based on her mood!

Why This Bot?

🎭 14 Dynamic Moods - From bubbly to melancholy, Miku's personality adapts
🤖 Smart Autonomous Behavior - Context-aware decisions without spamming
👁️ Vision Capabilities - Analyzes images, videos, and GIFs in conversations
🎨 Auto Profile Pictures - Fetches & crops anime faces from Danbooru based on mood
💬 DM Support - Personal conversations with mood tracking
🐦 Twitter Integration - Shares Miku-related tweets and figurine announcements
🎮 ComfyUI Integration - Natural language image generation requests
🔊 Voice Chat Ready - Fish.audio TTS integration (docs included)
📊 RESTful API - Full control via HTTP endpoints
🐳 Production Ready - Docker Compose with GPU support

✨ Features

🧠 AI & LLM Integration

Local LLM Processing with Llama 3.1 8B (via llama.cpp + llama-swap)
Automatic Model Switching - Text ↔️ Vision models swap on-demand
OpenAI-Compatible API - Easy migration and integration
Conversation History - Per-user context with RAG-style retrieval
Smart Prompting - Mood-aware system prompts with personality profiles

🎭 Mood & Personality System

14 Available Moods (click to expand)

😊 Neutral - Classic cheerful Miku
😴 Asleep - Sleepy and minimally responsive
😪 Sleepy - Getting tired, simple responses
🎉 Excited - Extra energetic and enthusiastic
💫 Bubbly - Playful and giggly
🤔 Curious - Inquisitive and wondering
😳 Shy - Blushing and hesitant
🤪 Silly - Goofy and fun-loving
😠 Angry - Frustrated or upset
😤 Irritated - Mildly annoyed
😢 Melancholy - Sad and reflective
😏 Flirty - Playful and teasing
💕 Romantic - Sweet and affectionate
🎯 Serious - Focused and thoughtful

Per-Server Mood Tracking - Different moods in different servers
DM Mood Persistence - Separate mood state for private conversations
Automatic Mood Shifts - Responds to conversation sentiment

🤖 Autonomous Behavior System V2

The bot features a sophisticated context-aware decision engine that makes Miku feel alive:

Intelligent Activity Detection - Tracks message frequency, user presence, and channel activity
Non-Intrusive - Won't spam or interrupt important conversations
Mood-Based Personality - Behavioral patterns change with mood
Multiple Action Types:
- 💬 General conversation starters
- 👋 Engaging specific users
- 🐦 Sharing Miku tweets
- 💬 Joining ongoing conversations
- 🎨 Changing profile pictures
- 😊 Reacting to messages

Rate Limiting: Minimum 30-second cooldown between autonomous actions to prevent spam.

👁️ Vision & Media Processing

Image Analysis - Describe images shared in chat using MiniCPM-V 4.5
Video Understanding - Extracts frames and analyzes video content
GIF Support - Processes animated GIFs (converts to MP4 if needed)
Embed Content Extraction - Reads Twitter/X embeds without API
Face Detection - On-demand anime face detection service (GPU-accelerated)

🎨 Dynamic Profile Picture System

Danbooru Integration - Searches for Miku artwork
Smart Cropping - Automatic face detection and 1:1 crop
Mood-Based Selection - Filters by tags matching current mood
Quality Filtering - Only uses high-quality, safe-rated images
Fallback System - Graceful degradation if detection fails

🐦 Twitter Features

Tweet Sharing - Automatically fetches and shares Miku-related tweets
Figurine Notifications - DM subscribers about new Miku figurine releases
Embed Compatibility - Uses fxtwitter for better Discord previews
Duplicate Prevention - Tracks sent tweets to avoid repeats

🎮 ComfyUI Image Generation

Natural Language Detection - "Draw me as Miku swimming in a pool"
Workflow Integration - Connects to external ComfyUI instance
Smart Prompting - Enhances user requests with context

📡 REST API Dashboard

Full-featured FastAPI server with endpoints for:

Mood management (get/set/reset)
Conversation history
Autonomous actions (trigger manually)
Profile picture updates
Server configuration
DM analysis reports

🔧 Developer Features

Docker Compose Setup - One command deployment
GPU Acceleration - NVIDIA runtime for models and face detection
Health Checks - Automatic service monitoring
Volume Persistence - Conversation history and settings saved
Hot Reload - Update without restarting (for development)

🚀 Quick Start

Prerequisites

Docker & Docker Compose installed
NVIDIA GPU with CUDA support (for model inference)
Discord Bot Token (Create one here)
At least 8GB VRAM recommended (4GB minimum)

Installation

Clone the repository

git clone https://github.com/yourusername/miku-discord.git
cd miku-discord

Set up your bot token

Edit docker-compose.yml and replace the DISCORD_BOT_TOKEN:

environment:
  - DISCORD_BOT_TOKEN=your_token_here
  - OWNER_USER_ID=your_discord_user_id  # For DM reports

Add your models

Place these GGUF models in the models/ directory:
- Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf (text model)
- MiniCPM-V-4_5-Q3_K_S.gguf (vision model)
- MiniCPM-V-4_5-mmproj-f16.gguf (vision projector)
Launch the bot
```
docker-compose up -d
```
Check logs
```
docker-compose logs -f miku-bot
```
Access the dashboard

Open http://localhost:3939 in your browser

Optional: ComfyUI Integration

If you have ComfyUI running, update the path in docker-compose.yml:

volumes:
  - /path/to/your/ComfyUI/output:/app/ComfyUI/output:ro

Optional: Face Detection Service

Start the anime face detector when needed:

docker-compose --profile tools up -d anime-face-detector

Access Gradio UI at http://localhost:7860

🏗️ Architecture

Service Overview

┌─────────────────────────────────────────────────────────────┐
│                        Discord API                          │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│                     Miku Bot (Python)                       │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   Discord    │  │   FastAPI    │  │  Autonomous  │     │
│  │  Event Loop  │  │   Server     │  │    Engine    │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└───────────┬────────────────┬────────────────┬──────────────┘
            │                │                │
            ▼                ▼                ▼
┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐
│   llama-swap    │ │   ComfyUI       │ │ Face Detector│
│  (Model Server) │ │ (Image Gen)     │ │  (On-Demand) │
│                 │ │                 │ │              │
│  • Llama 3.1    │ │  • Workflows    │ │  • Gradio UI │
│  • MiniCPM-V    │ │  • GPU Accel    │ │  • FastAPI   │
│  • Auto-swap    │ │                 │ │              │
└─────────────────┘ └─────────────────┘ └──────────────┘
         │
         ▼
   ┌──────────┐
   │  Models  │
   │  (GGUF)  │
   └──────────┘

Tech Stack

Component	Technology
Bot Framework	Discord.py 2.0+
LLM Backend	llama.cpp + llama-swap
Text Model	Llama 3.1 8B Instruct
Vision Model	MiniCPM-V 4.5
API Server	FastAPI + Uvicorn
Image Gen	ComfyUI (external)
Face Detection	Anime-Face-Detector (Gradio)
Database	JSON files (conversation history, settings)
Containerization	Docker + Docker Compose
GPU Runtime	NVIDIA Container Toolkit

Key Components

1. llama-swap (Model Server)

Automatically loads/unloads models based on requests
Prevents VRAM exhaustion by swapping between text and vision models
OpenAI-compatible /v1/chat/completions endpoint
Configurable TTL (time-to-live) per model

2. Autonomous Engine V2

Tracks message activity, user presence, and channel engagement
Calculates "engagement scores" per server
Makes context-aware decisions without LLM overhead
Personality profiles per mood (e.g., shy mood = less engaging)

3. Server Manager

Per-guild configuration (mood, sleep state, autonomous settings)
Scheduled tasks (bedtime reminders, autonomous ticks)
Persistent storage in servers_config.json

4. Conversation History

Vector-based RAG (Retrieval Augmented Generation)
Stores last 50 messages per user
Semantic search using FAISS
Context injection for continuity

📡 API Endpoints

The bot runs a FastAPI server on port 3939 with the following endpoints:

Mood Management

Endpoint	Method	Description
`/servers/{guild_id}/mood`	GET	Get current mood for server
`/servers/{guild_id}/mood`	POST	Set mood (body: `{"mood": "excited"}`)
`/servers/{guild_id}/mood/reset`	POST	Reset to neutral mood
`/mood`	GET	Get DM mood (deprecated, use server-specific)

Autonomous Actions

Endpoint	Method	Description
`/autonomous/general`	POST	Make Miku say something random
`/autonomous/engage`	POST	Engage a random user
`/autonomous/tweet`	POST	Share a Miku tweet
`/autonomous/reaction`	POST	React to a recent message
`/autonomous/custom`	POST	Custom prompt (body: `{"prompt": "..."}`)

Profile Pictures

Endpoint	Method	Description
`/profile-picture/change`	POST	Change profile picture (body: `{"mood": "happy"}`)
`/profile-picture/revert`	POST	Revert to previous picture
`/profile-picture/current`	GET	Get current picture metadata

Utilities

Endpoint	Method	Description
`/conversation/reset`	POST	Clear conversation history for user
`/logs`	GET	View bot logs (last 1000 lines)
`/prompt`	GET	View current system prompt
`/`	GET	Dashboard HTML page

Example Usage

# Set mood to excited
curl -X POST http://localhost:3939/servers/123456789/mood \
  -H "Content-Type: application/json" \
  -d '{"mood": "excited"}'

# Make Miku say something
curl -X POST http://localhost:3939/autonomous/general

# Change profile picture
curl -X POST http://localhost:3939/profile-picture/change \
  -H "Content-Type: application/json" \
  -d '{"mood": "flirty"}'

🎮 Usage Examples

Basic Interaction

User: Hey Miku! How are you today?
Miku: Miku's doing great! 💙 Thanks for asking! ✨

User: Can you see this? [uploads image]
Miku: Ooh! 👀 I see a cute cat sitting on a keyboard! So fluffy! 🐱

Mood Changes

User: /mood excited
Miku: YAYYY!!! 🎉✨ Miku is SO EXCITED right now!!! Let's have fun! 💙🎶

User: What's your favorite food?
Miku: NEGI!! 🌱🌱🌱 Green onions are THE BEST! Want some?! ✨

Image Generation

User: Draw yourself swimming in a pool
Miku: Ooh! Let me create that for you! 🎨✨ [generates image]

Autonomous Behavior

[After detecting activity in #general]
Miku: Hey everyone! 👋 What are you all talking about? 💙

🛠️ Configuration

Model Configuration (`llama-swap-config.yaml`)

models:
  llama3.1:
    cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf -ngl 99
    ttl: 1800  # 30 minutes
    
  vision:
    cmd: /app/llama-server --port ${PORT} --model /models/MiniCPM-V-4_5-Q3_K_S.gguf --mmproj /models/MiniCPM-V-4_5-mmproj-f16.gguf
    ttl: 900   # 15 minutes

Environment Variables

Variable	Default	Description
`DISCORD_BOT_TOKEN`	Required	Your Discord bot token
`OWNER_USER_ID`	Optional	Your Discord user ID (for DM reports)
`LLAMA_URL`	`http://llama-swap:8080`	Model server endpoint
`TEXT_MODEL`	`llama3.1`	Text generation model name
`VISION_MODEL`	`vision`	Vision model name

Persistent Storage

All data is stored in bot/memory/:

servers_config.json - Per-server settings
autonomous_config.json - Autonomous behavior settings
conversation_history/ - User conversation data
profile_pictures/ - Downloaded profile pictures
dms/ - DM conversation logs
figurine_subscribers.json - Figurine notification subscribers

📚 Documentation

Detailed documentation available in the readmes/ directory:

AUTONOMOUS_V2_IMPLEMENTED.md - Autonomous system V2 details
VOICE_CHAT_IMPLEMENTATION.md - Fish.audio TTS integration guide
PROFILE_PICTURE_FEATURE.md - Profile picture system
FACE_DETECTION_API_MIGRATION.md - Face detection setup
DM_ANALYSIS_FEATURE.md - DM interaction analytics
MOOD_SYSTEM_ANALYSIS.md - Mood system deep dive
QUICK_REFERENCE.md - Ollama → llama.cpp migration guide

🐛 Troubleshooting

Bot won't start

Check if models are loaded:

docker-compose logs llama-swap

Verify GPU access:

docker run --rm --runtime=nvidia nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

High VRAM usage

Lower the -ngl parameter in llama-swap-config.yaml (reduce GPU layers)
Reduce context size with -c parameter
Use smaller quantization (Q3 instead of Q4)

Autonomous actions not triggering

Check autonomous_config.json - ensure enabled and cooldown settings
Verify activity in server (bot tracks engagement)
Check logs for decision engine output

Face detection not working

Ensure GPU is available: docker-compose --profile tools up -d anime-face-detector
Check API health: curl http://localhost:6078/health
View Gradio UI: http://localhost:7860

Models switching too frequently

Increase TTL in llama-swap-config.yaml:

ttl: 3600  # 1 hour instead of 30 minutes

🤝 Contributing

Contributions are welcome! Here's how you can help:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup

For local development without Docker:

# Install dependencies
cd bot
pip install -r requirements.txt

# Set environment variables
export DISCORD_BOT_TOKEN="your_token"
export LLAMA_URL="http://localhost:8080"

# Run the bot
python bot.py

Code Style

Use type hints where possible
Follow PEP 8 conventions
Add docstrings to functions
Comment complex logic

📝 License

This project is provided as-is for educational and personal use. Please respect:

Discord's Terms of Service
Crypton Future Media's Piapro Character License
Model licenses (Llama 3.1, MiniCPM-V)

🙏 Acknowledgments

Crypton Future Media - For creating Hatsune Miku
llama.cpp - For efficient local LLM inference
mostlygeek/llama-swap - For brilliant model management
Discord.py - For the excellent Discord API wrapper
OpenAI - For the API standard
MiniCPM-V Team - For the amazing vision model
Danbooru - For the artwork API
Fish.audio - For TTS integration (optional)

💙 Support

If you enjoy this project:

⭐ Star this repository
🐛 Report bugs via Issues
💬 Share your Miku bot setup in Discussions
🎤 Listen to some Miku songs!

Made with 💙 by a Miku fan, for Miku fans

"The future begins now!" - Hatsune Miku 🎶✨

⬆ Back to Top

19 KiB Raw Blame History