6.3 KiB
Voice Chat Context System
Implementation Complete ✅
Added comprehensive voice chat context to give Miku awareness of the conversation environment.
Features
1. Voice-Aware System Prompt
Miku now knows she's in a voice chat and adjusts her behavior:
- ✅ Aware she's speaking via TTS
- ✅ Knows who she's talking to (user names included)
- ✅ Understands responses will be spoken aloud
- ✅ Instructed to keep responses short (1-3 sentences)
- ✅ CRITICAL: Instructed to only use English (TTS can't handle Japanese well)
2. Conversation History (Last 8 Exchanges)
- Stores last 16 messages (8 user + 8 assistant)
- Maintains context across multiple voice interactions
- Automatically trimmed to keep memory manageable
- Each message includes username for multi-user context
3. Personality Integration
- Loads
miku_lore.txt- Her background, personality, likes/dislikes - Loads
miku_prompt.txt- Core personality instructions - Combines with voice-specific instructions
- Maintains character consistency
4. Reduced Log Spam
- Set voice_recv logger to CRITICAL level
- Suppresses routine CryptoErrors and RTCP packets
- Only shows actual critical errors
System Prompt Structure
[miku_prompt.txt content]
[miku_lore.txt content]
VOICE CHAT CONTEXT:
- You are currently in a voice channel speaking with {user.name} and others
- Your responses will be spoken aloud via text-to-speech
- Keep responses SHORT and CONVERSATIONAL (1-3 sentences max)
- Speak naturally as if having a real-time voice conversation
- IMPORTANT: Only respond in ENGLISH! The TTS system cannot handle Japanese or other languages well.
- Be expressive and use casual language, but stay in character as Miku
Remember: This is a live voice conversation, so be concise and engaging!
Conversation Flow
User speaks → STT transcribes → Add to history
↓
[System Prompt]
[Last 8 exchanges]
[Current user message]
↓
LLM generates
↓
Add response to history
↓
Stream to TTS → Speak
Message History Format
conversation_history = [
{"role": "user", "content": "koko210: Hey Miku, how are you?"},
{"role": "assistant", "content": "Hey koko210! I'm doing great, thanks for asking!"},
{"role": "user", "content": "koko210: Can you sing something?"},
{"role": "assistant", "content": "I'd love to! What song would you like to hear?"},
# ... up to 16 messages total (8 exchanges)
]
Configuration
Conversation History Limit
Current: 16 messages (8 exchanges)
To adjust, edit voice_manager.py:
# Keep only last 8 exchanges (16 messages = 8 user + 8 assistant)
if len(self.conversation_history) > 16:
self.conversation_history = self.conversation_history[-16:]
Recommendations:
- 8 exchanges: Good balance (current setting)
- 12 exchanges: More context, slightly more tokens
- 4 exchanges: Minimal context, faster responses
Response Length
Current: max_tokens=200
To adjust:
payload = {
"max_tokens": 200 # Change this
}
Language Enforcement
Why English-Only?
The RVC TTS system is trained on English audio and struggles with:
- Japanese characters (even though Miku is Japanese!)
- Special characters
- Mixed language text
- Non-English phonetics
Implementation
The system prompt explicitly tells Miku:
IMPORTANT: Only respond in ENGLISH! The TTS system cannot handle Japanese or other languages well.
This is reinforced in every voice chat interaction.
Testing
Test 1: Basic Conversation
User: "Hey Miku!"
Miku: "Hi there! Great to hear from you!" (should be in English)
User: "How are you doing?"
Miku: "I'm doing wonderful! How about you?" (remembers previous exchange)
Test 2: Context Retention
Have a multi-turn conversation and verify Miku remembers:
- Previous topics discussed
- User names
- Conversation flow
Test 3: Response Length
Verify responses are:
- Short (1-3 sentences)
- Conversational
- Not truncated mid-sentence
Test 4: Language Enforcement
Try asking in Japanese or requesting Japanese response:
- Miku should politely respond in English
- Should explain she needs to use English for voice chat
Monitoring
Check Conversation History
# Add debug logging to voice_manager.py to see history
logger.debug(f"Conversation history: {self.conversation_history}")
Check System Prompt
docker exec miku-bot cat /app/miku_prompt.txt
docker exec miku-bot cat /app/miku_lore.txt
Monitor Responses
docker logs -f miku-bot | grep "Voice response complete"
Files Modified
-
bot/bot.py
- Changed voice_recv logger level from WARNING to CRITICAL
- Suppresses CryptoError spam
-
bot/utils/voice_manager.py
- Added
conversation_historytoVoiceSession.__init__() - Updated
_generate_voice_response()to load lore files - Built comprehensive voice-aware system prompt
- Implemented conversation history tracking (last 8 exchanges)
- Added English-only instruction
- Saves both user and assistant messages to history
- Added
Benefits
✅ Better Context: Miku remembers previous exchanges
✅ Cleaner Logs: No more CryptoError spam
✅ Natural Responses: Knows she's in voice chat, responds appropriately
✅ Language Consistency: Enforces English for TTS compatibility
✅ Personality Intact: Still loads lore and personality files
✅ User Awareness: Knows who she's talking to
Next Steps
- Test thoroughly with multi-turn conversations
- Adjust history length if needed (currently 8 exchanges)
- Fine-tune response length based on TTS performance
- Add conversation reset command if needed (e.g.,
!miku reset) - Consider adding conversation summaries for very long sessions
Status: ✅ DEPLOYED AND READY FOR TESTING
Miku now has full context awareness in voice chat with personality, conversation history, and language enforcement!