Initial commit: Miku Discord Bot

2025-12-07 17:15:09 +02:00
commit 8c74ad5260
206 changed files with 50125 additions and 0 deletions
--- a/QUICK_REFERENCE.md
+++ b/QUICK_REFERENCE.md
@@ -0,0 +1,207 @@
+# Quick Reference: Ollama → Llama.cpp Migration
+
+## Environment Variables
+
+| Old (Ollama) | New (llama.cpp) | Purpose |
+|--------------|-----------------|---------|
+| `OLLAMA_URL` | `LLAMA_URL` | Server endpoint |
+| `OLLAMA_MODEL` | `TEXT_MODEL` | Text generation model name |
+| N/A | `VISION_MODEL` | Vision model name |
+
+## API Endpoints
+
+| Purpose | Old (Ollama) | New (llama.cpp) |
+|---------|--------------|-----------------|
+| Text generation | `/api/generate` | `/v1/chat/completions` |
+| Vision | `/api/generate` | `/v1/chat/completions` |
+| Health check | `GET /` | `GET /health` |
+| Model management | Manual `switch_model()` | Automatic via llama-swap |
+
+## Function Changes
+
+| Old Function | New Function | Status |
+|--------------|--------------|--------|
+| `query_ollama()` | `query_llama()` | Aliased for compatibility |
+| `analyze_image_with_qwen()` | `analyze_image_with_vision()` | Aliased for compatibility |
+| `switch_model()` | **Removed** | llama-swap handles automatically |
+
+## Request Format
+
+### Text Generation
+
+**Before (Ollama):**
+```python
+payload = {
+    "model": "llama3.1",
+    "prompt": "Hello world",
+    "system": "You are Miku",
+    "stream": False
+}
+await session.post(f"{OLLAMA_URL}/api/generate", json=payload)
+```
+
+**After (OpenAI):**
+```python
+payload = {
+    "model": "llama3.1",
+    "messages": [
+        {"role": "system", "content": "You are Miku"},
+        {"role": "user", "content": "Hello world"}
+    ],
+    "stream": False
+}
+await session.post(f"{LLAMA_URL}/v1/chat/completions", json=payload)
+```
+
+### Vision Analysis
+
+**Before (Ollama):**
+```python
+await switch_model("moondream")  # Manual switch!
+payload = {
+    "model": "moondream",
+    "prompt": "Describe this image",
+    "images": [base64_img],
+    "stream": False
+}
+await session.post(f"{OLLAMA_URL}/api/generate", json=payload)
+```
+
+**After (OpenAI):**
+```python
+# No manual switch needed!
+payload = {
+    "model": "moondream",  # llama-swap auto-switches
+    "messages": [{
+        "role": "user",
+        "content": [
+            {"type": "text", "text": "Describe this image"},
+            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_img}"}}
+        ]
+    }],
+    "stream": False
+}
+await session.post(f"{LLAMA_URL}/v1/chat/completions", json=payload)
+```
+
+## Response Format
+
+**Before (Ollama):**
+```json
+{
+  "response": "Hello! I'm Miku!",
+  "model": "llama3.1"
+}
+```
+
+**After (OpenAI):**
+```json
+{
+  "choices": [{
+    "message": {
+      "role": "assistant",
+      "content": "Hello! I'm Miku!"
+    }
+  }],
+  "model": "llama3.1"
+}
+```
+
+## Docker Services
+
+**Before:**
+```yaml
+services:
+  ollama:
+    image: ollama/ollama
+    ports: ["11434:11434"]
+    volumes: ["ollama_data:/root/.ollama"]
+    
+  bot:
+    environment:
+      - OLLAMA_URL=http://ollama:11434
+      - OLLAMA_MODEL=llama3.1
+```
+
+**After:**
+```yaml
+services:
+  llama-swap:
+    image: ghcr.io/mostlygeek/llama-swap:cuda
+    ports: ["8080:8080"]
+    volumes:
+      - ./models:/models
+      - ./llama-swap-config.yaml:/app/config.yaml
+    
+  bot:
+    environment:
+      - LLAMA_URL=http://llama-swap:8080
+      - TEXT_MODEL=llama3.1
+      - VISION_MODEL=moondream
+```
+
+## Model Management
+
+| Feature | Ollama | llama.cpp + llama-swap |
+|---------|--------|------------------------|
+| Model loading | Manual `ollama pull` | Download GGUF files to `/models` |
+| Model switching | Manual `switch_model()` call | Automatic based on request |
+| Model unloading | Manual or never | Automatic after TTL (30m text, 15m vision) |
+| VRAM management | Always loaded | Load on demand, unload when idle |
+| Storage format | Ollama format | GGUF files |
+| Location | Docker volume | Host directory `./models/` |
+
+## Configuration Files
+
+| File | Purpose | Format |
+|------|---------|--------|
+| `docker-compose.yml` | Service orchestration | YAML |
+| `llama-swap-config.yaml` | Model configs, TTL settings | YAML |
+| `models/llama3.1.gguf` | Text model weights | Binary GGUF |
+| `models/moondream.gguf` | Vision model weights | Binary GGUF |
+| `models/moondream-mmproj.gguf` | Vision projector | Binary GGUF |
+
+## Monitoring
+
+| Tool | URL | Purpose |
+|------|-----|---------|
+| llama-swap Web UI | http://localhost:8080/ui | Monitor models, logs, timers |
+| Health endpoint | http://localhost:8080/health | Check if server is ready |
+| Running models | http://localhost:8080/running | List currently loaded models |
+| Metrics | http://localhost:8080/metrics | Prometheus-compatible metrics |
+
+## Common Commands
+
+```bash
+# Check what's running
+curl http://localhost:8080/running
+
+# Check health
+curl http://localhost:8080/health
+
+# Manually unload all models
+curl -X POST http://localhost:8080/models/unload
+
+# View logs
+docker-compose logs -f llama-swap
+
+# Restart services
+docker-compose restart
+
+# Check model files
+ls -lh models/
+```
+
+## Quick Troubleshooting
+
+| Issue | Solution |
+|-------|----------|
+| "Model not found" | Verify files in `./models/` match config |
+| CUDA errors | Check: `docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi` |
+| Slow responses | First load is slow; subsequent loads use cache |
+| High VRAM usage | Models will auto-unload after TTL expires |
+| Bot can't connect | Check: `curl http://localhost:8080/health` |
+
+---
+
+**Remember:** The migration maintains backward compatibility. Old function names are aliased, so existing code continues to work!