Implemented experimental real production ready voice chat, relegated old flow to voice debug mode. New Web UI panel for Voice Chat.

2026-01-20 23:06:17 +02:00
parent 362108f4b0
commit 2934efba22
31 changed files with 5408 additions and 357 deletions
--- a/VOICE_CHAT_CONTEXT.md
+++ b/VOICE_CHAT_CONTEXT.md
@@ -0,0 +1,225 @@
+# Voice Chat Context System
+
+## Implementation Complete ✅
+
+Added comprehensive voice chat context to give Miku awareness of the conversation environment.
+
+---
+
+## Features
+
+### 1. Voice-Aware System Prompt
+Miku now knows she's in a voice chat and adjusts her behavior:
+- ✅ Aware she's speaking via TTS
+- ✅ Knows who she's talking to (user names included)
+- ✅ Understands responses will be spoken aloud
+- ✅ Instructed to keep responses short (1-3 sentences)
+- ✅ **CRITICAL: Instructed to only use English** (TTS can't handle Japanese well)
+
+### 2. Conversation History (Last 8 Exchanges)
+- Stores last 16 messages (8 user + 8 assistant)
+- Maintains context across multiple voice interactions
+- Automatically trimmed to keep memory manageable
+- Each message includes username for multi-user context
+
+### 3. Personality Integration
+- Loads `miku_lore.txt` - Her background, personality, likes/dislikes
+- Loads `miku_prompt.txt` - Core personality instructions
+- Combines with voice-specific instructions
+- Maintains character consistency
+
+### 4. Reduced Log Spam
+- Set voice_recv logger to CRITICAL level
+- Suppresses routine CryptoErrors and RTCP packets
+- Only shows actual critical errors
+
+---
+
+## System Prompt Structure
+
+```
+[miku_prompt.txt content]
+
+[miku_lore.txt content]
+
+VOICE CHAT CONTEXT:
+- You are currently in a voice channel speaking with {user.name} and others
+- Your responses will be spoken aloud via text-to-speech
+- Keep responses SHORT and CONVERSATIONAL (1-3 sentences max)
+- Speak naturally as if having a real-time voice conversation
+- IMPORTANT: Only respond in ENGLISH! The TTS system cannot handle Japanese or other languages well.
+- Be expressive and use casual language, but stay in character as Miku
+
+Remember: This is a live voice conversation, so be concise and engaging!
+```
+
+---
+
+## Conversation Flow
+
+```
+User speaks → STT transcribes → Add to history
+                                      ↓
+                              [System Prompt]
+                              [Last 8 exchanges]
+                              [Current user message]
+                                      ↓
+                                  LLM generates
+                                      ↓
+                              Add response to history
+                                      ↓
+                              Stream to TTS → Speak
+```
+
+---
+
+## Message History Format
+
+```python
+conversation_history = [
+    {"role": "user", "content": "koko210: Hey Miku, how are you?"},
+    {"role": "assistant", "content": "Hey koko210! I'm doing great, thanks for asking!"},
+    {"role": "user", "content": "koko210: Can you sing something?"},
+    {"role": "assistant", "content": "I'd love to! What song would you like to hear?"},
+    # ... up to 16 messages total (8 exchanges)
+]
+```
+
+---
+
+## Configuration
+
+### Conversation History Limit
+**Current**: 16 messages (8 exchanges)
+
+To adjust, edit `voice_manager.py`:
+```python
+# Keep only last 8 exchanges (16 messages = 8 user + 8 assistant)
+if len(self.conversation_history) > 16:
+    self.conversation_history = self.conversation_history[-16:]
+```
+
+**Recommendations**:
+- **8 exchanges**: Good balance (current setting)
+- **12 exchanges**: More context, slightly more tokens
+- **4 exchanges**: Minimal context, faster responses
+
+### Response Length
+**Current**: max_tokens=200
+
+To adjust:
+```python
+payload = {
+    "max_tokens": 200  # Change this
+}
+```
+
+---
+
+## Language Enforcement
+
+### Why English-Only?
+The RVC TTS system is trained on English audio and struggles with:
+- Japanese characters (even though Miku is Japanese!)
+- Special characters
+- Mixed language text
+- Non-English phonetics
+
+### Implementation
+The system prompt explicitly tells Miku:
+> **IMPORTANT: Only respond in ENGLISH! The TTS system cannot handle Japanese or other languages well.**
+
+This is reinforced in every voice chat interaction.
+
+---
+
+## Testing
+
+### Test 1: Basic Conversation
+```
+User: "Hey Miku!"
+Miku: "Hi there! Great to hear from you!" (should be in English)
+User: "How are you doing?"
+Miku: "I'm doing wonderful! How about you?" (remembers previous exchange)
+```
+
+### Test 2: Context Retention
+Have a multi-turn conversation and verify Miku remembers:
+- Previous topics discussed
+- User names
+- Conversation flow
+
+### Test 3: Response Length
+Verify responses are:
+- Short (1-3 sentences)
+- Conversational
+- Not truncated mid-sentence
+
+### Test 4: Language Enforcement
+Try asking in Japanese or requesting Japanese response:
+- Miku should politely respond in English
+- Should explain she needs to use English for voice chat
+
+---
+
+## Monitoring
+
+### Check Conversation History
+```bash
+# Add debug logging to voice_manager.py to see history
+logger.debug(f"Conversation history: {self.conversation_history}")
+```
+
+### Check System Prompt
+```bash
+docker exec miku-bot cat /app/miku_prompt.txt
+docker exec miku-bot cat /app/miku_lore.txt
+```
+
+### Monitor Responses
+```bash
+docker logs -f miku-bot | grep "Voice response complete"
+```
+
+---
+
+## Files Modified
+
+1. **bot/bot.py**
+   - Changed voice_recv logger level from WARNING to CRITICAL
+   - Suppresses CryptoError spam
+
+2. **bot/utils/voice_manager.py**
+   - Added `conversation_history` to `VoiceSession.__init__()`
+   - Updated `_generate_voice_response()` to load lore files
+   - Built comprehensive voice-aware system prompt
+   - Implemented conversation history tracking (last 8 exchanges)
+   - Added English-only instruction
+   - Saves both user and assistant messages to history
+
+---
+
+## Benefits
+
+✅ **Better Context**: Miku remembers previous exchanges  
+✅ **Cleaner Logs**: No more CryptoError spam  
+✅ **Natural Responses**: Knows she's in voice chat, responds appropriately  
+✅ **Language Consistency**: Enforces English for TTS compatibility  
+✅ **Personality Intact**: Still loads lore and personality files  
+✅ **User Awareness**: Knows who she's talking to  
+
+---
+
+## Next Steps
+
+1. **Test thoroughly** with multi-turn conversations
+2. **Adjust history length** if needed (currently 8 exchanges)
+3. **Fine-tune response length** based on TTS performance
+4. **Add conversation reset** command if needed (e.g., `!miku reset`)
+5. **Consider adding** conversation summaries for very long sessions
+
+---
+
+**Status**: ✅ **DEPLOYED AND READY FOR TESTING**
+
+Miku now has full context awareness in voice chat with personality, conversation history, and language enforcement!