Files
miku-discord/README.md

548 lines
19 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 🎤 Miku Discord Bot 💙
<div align="center">
![Miku Banner](https://img.shields.io/badge/Virtual_Idol-Hatsune_Miku-00CED1?style=for-the-badge&logo=discord&logoColor=white)
[![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=for-the-badge&logo=docker&logoColor=white)](https://www.docker.com/)
[![Python](https://img.shields.io/badge/python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/)
[![Discord.py](https://img.shields.io/badge/discord.py-2.0+-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discordpy.readthedocs.io/)
*The world's #1 Virtual Idol, now in your Discord server! 🌱✨*
[Features](#-features) • [Quick Start](#-quick-start) • [Architecture](#-architecture) • [API](#-api-endpoints) • [Contributing](#-contributing)
</div>
---
## 🌟 About
Meet **Hatsune Miku** - a fully-featured, AI-powered Discord bot that brings the cheerful, energetic virtual idol to life in your server! Powered by local LLMs (Llama 3.1), vision models (MiniCPM-V), and a sophisticated autonomous behavior system, Miku can chat, react, share content, and even change her profile picture based on her mood!
### Why This Bot?
- 🎭 **14 Dynamic Moods** - From bubbly to melancholy, Miku's personality adapts
- 🤖 **Smart Autonomous Behavior** - Context-aware decisions without spamming
- 👁️ **Vision Capabilities** - Analyzes images, videos, and GIFs in conversations
- 🎨 **Auto Profile Pictures** - Fetches & crops anime faces from Danbooru based on mood
- 💬 **DM Support** - Personal conversations with mood tracking
- 🐦 **Twitter Integration** - Shares Miku-related tweets and figurine announcements
- 🎮 **ComfyUI Integration** - Natural language image generation requests
- 🔊 **Voice Chat Ready** - Fish.audio TTS integration (docs included)
- 📊 **RESTful API** - Full control via HTTP endpoints
- 🐳 **Production Ready** - Docker Compose with GPU support
---
## ✨ Features
### 🧠 AI & LLM Integration
- **Local LLM Processing** with Llama 3.1 8B (via llama.cpp + llama-swap)
- **Automatic Model Switching** - Text ↔️ Vision models swap on-demand
- **OpenAI-Compatible API** - Easy migration and integration
- **Conversation History** - Per-user context with RAG-style retrieval
- **Smart Prompting** - Mood-aware system prompts with personality profiles
### 🎭 Mood & Personality System
<details>
<summary>14 Available Moods (click to expand)</summary>
- 😊 **Neutral** - Classic cheerful Miku
- 😴 **Asleep** - Sleepy and minimally responsive
- 😪 **Sleepy** - Getting tired, simple responses
- 🎉 **Excited** - Extra energetic and enthusiastic
- 💫 **Bubbly** - Playful and giggly
- 🤔 **Curious** - Inquisitive and wondering
- 😳 **Shy** - Blushing and hesitant
- 🤪 **Silly** - Goofy and fun-loving
- 😠 **Angry** - Frustrated or upset
- 😤 **Irritated** - Mildly annoyed
- 😢 **Melancholy** - Sad and reflective
- 😏 **Flirty** - Playful and teasing
- 💕 **Romantic** - Sweet and affectionate
- 🎯 **Serious** - Focused and thoughtful
</details>
- **Per-Server Mood Tracking** - Different moods in different servers
- **DM Mood Persistence** - Separate mood state for private conversations
- **Automatic Mood Shifts** - Responds to conversation sentiment
### 🤖 Autonomous Behavior System V2
The bot features a sophisticated **context-aware decision engine** that makes Miku feel alive:
- **Intelligent Activity Detection** - Tracks message frequency, user presence, and channel activity
- **Non-Intrusive** - Won't spam or interrupt important conversations
- **Mood-Based Personality** - Behavioral patterns change with mood
- **Multiple Action Types**:
- 💬 General conversation starters
- 👋 Engaging specific users
- 🐦 Sharing Miku tweets
- 💬 Joining ongoing conversations
- 🎨 Changing profile pictures
- 😊 Reacting to messages
**Rate Limiting**: Minimum 30-second cooldown between autonomous actions to prevent spam.
### 👁️ Vision & Media Processing
- **Image Analysis** - Describe images shared in chat using MiniCPM-V 4.5
- **Video Understanding** - Extracts frames and analyzes video content
- **GIF Support** - Processes animated GIFs (converts to MP4 if needed)
- **Embed Content Extraction** - Reads Twitter/X embeds without API
- **Face Detection** - On-demand anime face detection service (GPU-accelerated)
### 🎨 Dynamic Profile Picture System
- **Danbooru Integration** - Searches for Miku artwork
- **Smart Cropping** - Automatic face detection and 1:1 crop
- **Mood-Based Selection** - Filters by tags matching current mood
- **Quality Filtering** - Only uses high-quality, safe-rated images
- **Fallback System** - Graceful degradation if detection fails
### 🐦 Twitter Features
- **Tweet Sharing** - Automatically fetches and shares Miku-related tweets
- **Figurine Notifications** - DM subscribers about new Miku figurine releases
- **Embed Compatibility** - Uses fxtwitter for better Discord previews
- **Duplicate Prevention** - Tracks sent tweets to avoid repeats
### 🎮 ComfyUI Image Generation
- **Natural Language Detection** - "Draw me as Miku swimming in a pool"
- **Workflow Integration** - Connects to external ComfyUI instance
- **Smart Prompting** - Enhances user requests with context
### 📡 REST API Dashboard
Full-featured FastAPI server with endpoints for:
- Mood management (get/set/reset)
- Conversation history
- Autonomous actions (trigger manually)
- Profile picture updates
- Server configuration
- DM analysis reports
### 🔧 Developer Features
- **Docker Compose Setup** - One command deployment
- **GPU Acceleration** - NVIDIA runtime for models and face detection
- **Health Checks** - Automatic service monitoring
- **Volume Persistence** - Conversation history and settings saved
- **Hot Reload** - Update without restarting (for development)
---
## 🚀 Quick Start
### Prerequisites
- **Docker** & **Docker Compose** installed
- **NVIDIA GPU** with CUDA support (for model inference)
- **Discord Bot Token** ([Create one here](https://discord.com/developers/applications))
- At least **8GB VRAM** recommended (4GB minimum)
### Installation
1. **Clone the repository**
```bash
git clone https://github.com/yourusername/miku-discord.git
cd miku-discord
```
2. **Set up your bot token**
Edit `docker-compose.yml` and replace the `DISCORD_BOT_TOKEN`:
```yaml
environment:
- DISCORD_BOT_TOKEN=your_token_here
- OWNER_USER_ID=your_discord_user_id # For DM reports
```
3. **Add your models**
Place these GGUF models in the `models/` directory:
- `Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf` (text model)
- `MiniCPM-V-4_5-Q3_K_S.gguf` (vision model)
- `MiniCPM-V-4_5-mmproj-f16.gguf` (vision projector)
4. **Launch the bot**
```bash
docker-compose up -d
```
5. **Check logs**
```bash
docker-compose logs -f miku-bot
```
6. **Access the dashboard**
Open http://localhost:3939 in your browser
### Optional: ComfyUI Integration
If you have ComfyUI running, update the path in `docker-compose.yml`:
```yaml
volumes:
- /path/to/your/ComfyUI/output:/app/ComfyUI/output:ro
```
### Optional: Face Detection Service
Start the anime face detector when needed:
```bash
docker-compose --profile tools up -d anime-face-detector
```
Access Gradio UI at http://localhost:7860
---
## 🏗️ Architecture
### Service Overview
```
┌─────────────────────────────────────────────────────────────┐
│ Discord API │
└───────────────────────┬─────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Miku Bot (Python) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Discord │ │ FastAPI │ │ Autonomous │ │
│ │ Event Loop │ │ Server │ │ Engine │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└───────────┬────────────────┬────────────────┬──────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐
│ llama-swap │ │ ComfyUI │ │ Face Detector│
│ (Model Server) │ │ (Image Gen) │ │ (On-Demand) │
│ │ │ │ │ │
│ • Llama 3.1 │ │ • Workflows │ │ • Gradio UI │
│ • MiniCPM-V │ │ • GPU Accel │ │ • FastAPI │
│ • Auto-swap │ │ │ │ │
└─────────────────┘ └─────────────────┘ └──────────────┘
┌──────────┐
│ Models │
│ (GGUF) │
└──────────┘
```
### Tech Stack
| Component | Technology |
|-----------|-----------|
| **Bot Framework** | Discord.py 2.0+ |
| **LLM Backend** | llama.cpp + llama-swap |
| **Text Model** | Llama 3.1 8B Instruct |
| **Vision Model** | MiniCPM-V 4.5 |
| **API Server** | FastAPI + Uvicorn |
| **Image Gen** | ComfyUI (external) |
| **Face Detection** | Anime-Face-Detector (Gradio) |
| **Database** | JSON files (conversation history, settings) |
| **Containerization** | Docker + Docker Compose |
| **GPU Runtime** | NVIDIA Container Toolkit |
### Key Components
#### 1. **llama-swap** (Model Server)
- Automatically loads/unloads models based on requests
- Prevents VRAM exhaustion by swapping between text and vision models
- OpenAI-compatible `/v1/chat/completions` endpoint
- Configurable TTL (time-to-live) per model
#### 2. **Autonomous Engine V2**
- Tracks message activity, user presence, and channel engagement
- Calculates "engagement scores" per server
- Makes context-aware decisions without LLM overhead
- Personality profiles per mood (e.g., shy mood = less engaging)
#### 3. **Server Manager**
- Per-guild configuration (mood, sleep state, autonomous settings)
- Scheduled tasks (bedtime reminders, autonomous ticks)
- Persistent storage in `servers_config.json`
#### 4. **Conversation History**
- Vector-based RAG (Retrieval Augmented Generation)
- Stores last 50 messages per user
- Semantic search using FAISS
- Context injection for continuity
---
## 📡 API Endpoints
The bot runs a FastAPI server on port **3939** with the following endpoints:
### Mood Management
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/servers/{guild_id}/mood` | GET | Get current mood for server |
| `/servers/{guild_id}/mood` | POST | Set mood (body: `{"mood": "excited"}`) |
| `/servers/{guild_id}/mood/reset` | POST | Reset to neutral mood |
| `/mood` | GET | Get DM mood (deprecated, use server-specific) |
### Autonomous Actions
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/autonomous/general` | POST | Make Miku say something random |
| `/autonomous/engage` | POST | Engage a random user |
| `/autonomous/tweet` | POST | Share a Miku tweet |
| `/autonomous/reaction` | POST | React to a recent message |
| `/autonomous/custom` | POST | Custom prompt (body: `{"prompt": "..."}`) |
### Profile Pictures
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/profile-picture/change` | POST | Change profile picture (body: `{"mood": "happy"}`) |
| `/profile-picture/revert` | POST | Revert to previous picture |
| `/profile-picture/current` | GET | Get current picture metadata |
### Utilities
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/conversation/reset` | POST | Clear conversation history for user |
| `/logs` | GET | View bot logs (last 1000 lines) |
| `/prompt` | GET | View current system prompt |
| `/` | GET | Dashboard HTML page |
### Example Usage
```bash
# Set mood to excited
curl -X POST http://localhost:3939/servers/123456789/mood \
-H "Content-Type: application/json" \
-d '{"mood": "excited"}'
# Make Miku say something
curl -X POST http://localhost:3939/autonomous/general
# Change profile picture
curl -X POST http://localhost:3939/profile-picture/change \
-H "Content-Type: application/json" \
-d '{"mood": "flirty"}'
```
---
## 🎮 Usage Examples
### Basic Interaction
```
User: Hey Miku! How are you today?
Miku: Miku's doing great! 💙 Thanks for asking! ✨
User: Can you see this? [uploads image]
Miku: Ooh! 👀 I see a cute cat sitting on a keyboard! So fluffy! 🐱
```
### Mood Changes
```
User: /mood excited
Miku: YAYYY!!! 🎉✨ Miku is SO EXCITED right now!!! Let's have fun! 💙🎶
User: What's your favorite food?
Miku: NEGI!! 🌱🌱🌱 Green onions are THE BEST! Want some?! ✨
```
### Image Generation
```
User: Draw yourself swimming in a pool
Miku: Ooh! Let me create that for you! 🎨✨ [generates image]
```
### Autonomous Behavior
```
[After detecting activity in #general]
Miku: Hey everyone! 👋 What are you all talking about? 💙
```
---
## 🛠️ Configuration
### Model Configuration (`llama-swap-config.yaml`)
```yaml
models:
llama3.1:
cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf -ngl 99
ttl: 1800 # 30 minutes
vision:
cmd: /app/llama-server --port ${PORT} --model /models/MiniCPM-V-4_5-Q3_K_S.gguf --mmproj /models/MiniCPM-V-4_5-mmproj-f16.gguf
ttl: 900 # 15 minutes
```
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `DISCORD_BOT_TOKEN` | *Required* | Your Discord bot token |
| `OWNER_USER_ID` | *Optional* | Your Discord user ID (for DM reports) |
| `LLAMA_URL` | `http://llama-swap:8080` | Model server endpoint |
| `TEXT_MODEL` | `llama3.1` | Text generation model name |
| `VISION_MODEL` | `vision` | Vision model name |
### Persistent Storage
All data is stored in `bot/memory/`:
- `servers_config.json` - Per-server settings
- `autonomous_config.json` - Autonomous behavior settings
- `conversation_history/` - User conversation data
- `profile_pictures/` - Downloaded profile pictures
- `dms/` - DM conversation logs
- `figurine_subscribers.json` - Figurine notification subscribers
---
## 📚 Documentation
Detailed documentation available in the `readmes/` directory:
- **[AUTONOMOUS_V2_IMPLEMENTED.md](readmes/AUTONOMOUS_V2_IMPLEMENTED.md)** - Autonomous system V2 details
- **[VOICE_CHAT_IMPLEMENTATION.md](readmes/VOICE_CHAT_IMPLEMENTATION.md)** - Fish.audio TTS integration guide
- **[PROFILE_PICTURE_FEATURE.md](readmes/PROFILE_PICTURE_FEATURE.md)** - Profile picture system
- **[FACE_DETECTION_API_MIGRATION.md](readmes/FACE_DETECTION_API_MIGRATION.md)** - Face detection setup
- **[DM_ANALYSIS_FEATURE.md](readmes/DM_ANALYSIS_FEATURE.md)** - DM interaction analytics
- **[MOOD_SYSTEM_ANALYSIS.md](readmes/MOOD_SYSTEM_ANALYSIS.md)** - Mood system deep dive
- **[QUICK_REFERENCE.md](readmes/QUICK_REFERENCE.md)** - Ollama → llama.cpp migration guide
---
## 🐛 Troubleshooting
### Bot won't start
**Check if models are loaded:**
```bash
docker-compose logs llama-swap
```
**Verify GPU access:**
```bash
docker run --rm --runtime=nvidia nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
```
### High VRAM usage
- Lower the `-ngl` parameter in `llama-swap-config.yaml` (reduce GPU layers)
- Reduce context size with `-c` parameter
- Use smaller quantization (Q3 instead of Q4)
### Autonomous actions not triggering
- Check `autonomous_config.json` - ensure enabled and cooldown settings
- Verify activity in server (bot tracks engagement)
- Check logs for decision engine output
### Face detection not working
- Ensure GPU is available: `docker-compose --profile tools up -d anime-face-detector`
- Check API health: `curl http://localhost:6078/health`
- View Gradio UI: http://localhost:7860
### Models switching too frequently
Increase TTL in `llama-swap-config.yaml`:
```yaml
ttl: 3600 # 1 hour instead of 30 minutes
```
---
## 🤝 Contributing
Contributions are welcome! Here's how you can help:
1. **Fork the repository**
2. **Create a feature branch** (`git checkout -b feature/amazing-feature`)
3. **Commit your changes** (`git commit -m 'Add some amazing feature'`)
4. **Push to the branch** (`git push origin feature/amazing-feature`)
5. **Open a Pull Request**
### Development Setup
For local development without Docker:
```bash
# Install dependencies
cd bot
pip install -r requirements.txt
# Set environment variables
export DISCORD_BOT_TOKEN="your_token"
export LLAMA_URL="http://localhost:8080"
# Run the bot
python bot.py
```
### Code Style
- Use type hints where possible
- Follow PEP 8 conventions
- Add docstrings to functions
- Comment complex logic
---
## 📝 License
This project is provided as-is for educational and personal use. Please respect:
- Discord's [Terms of Service](https://discord.com/terms)
- Crypton Future Media's [Piapro Character License](https://piapro.net/intl/en_for_creators.html)
- Model licenses (Llama 3.1, MiniCPM-V)
---
## 🙏 Acknowledgments
- **Crypton Future Media** - For creating Hatsune Miku
- **llama.cpp** - For efficient local LLM inference
- **mostlygeek/llama-swap** - For brilliant model management
- **Discord.py** - For the excellent Discord API wrapper
- **OpenAI** - For the API standard
- **MiniCPM-V Team** - For the amazing vision model
- **Danbooru** - For the artwork API
- **Fish.audio** - For TTS integration (optional)
---
## 💙 Support
If you enjoy this project:
- ⭐ Star this repository
- 🐛 Report bugs via [Issues](https://github.com/yourusername/miku-discord/issues)
- 💬 Share your Miku bot setup in [Discussions](https://github.com/yourusername/miku-discord/discussions)
- 🎤 Listen to some Miku songs!
---
<div align="center">
**Made with 💙 by a Miku fan, for Miku fans**
*"The future begins now!" - Hatsune Miku* 🎶✨
[⬆ Back to Top](#-miku-discord-bot-)
</div>