Implemented experimental real production ready voice chat, relegated old flow to voice debug mode. New Web UI panel for Voice Chat.
This commit is contained in:
225
VOICE_CHAT_CONTEXT.md
Normal file
225
VOICE_CHAT_CONTEXT.md
Normal file
@@ -0,0 +1,225 @@
|
||||
# Voice Chat Context System
|
||||
|
||||
## Implementation Complete ✅
|
||||
|
||||
Added comprehensive voice chat context to give Miku awareness of the conversation environment.
|
||||
|
||||
---
|
||||
|
||||
## Features
|
||||
|
||||
### 1. Voice-Aware System Prompt
|
||||
Miku now knows she's in a voice chat and adjusts her behavior:
|
||||
- ✅ Aware she's speaking via TTS
|
||||
- ✅ Knows who she's talking to (user names included)
|
||||
- ✅ Understands responses will be spoken aloud
|
||||
- ✅ Instructed to keep responses short (1-3 sentences)
|
||||
- ✅ **CRITICAL: Instructed to only use English** (TTS can't handle Japanese well)
|
||||
|
||||
### 2. Conversation History (Last 8 Exchanges)
|
||||
- Stores last 16 messages (8 user + 8 assistant)
|
||||
- Maintains context across multiple voice interactions
|
||||
- Automatically trimmed to keep memory manageable
|
||||
- Each message includes username for multi-user context
|
||||
|
||||
### 3. Personality Integration
|
||||
- Loads `miku_lore.txt` - Her background, personality, likes/dislikes
|
||||
- Loads `miku_prompt.txt` - Core personality instructions
|
||||
- Combines with voice-specific instructions
|
||||
- Maintains character consistency
|
||||
|
||||
### 4. Reduced Log Spam
|
||||
- Set voice_recv logger to CRITICAL level
|
||||
- Suppresses routine CryptoErrors and RTCP packets
|
||||
- Only shows actual critical errors
|
||||
|
||||
---
|
||||
|
||||
## System Prompt Structure
|
||||
|
||||
```
|
||||
[miku_prompt.txt content]
|
||||
|
||||
[miku_lore.txt content]
|
||||
|
||||
VOICE CHAT CONTEXT:
|
||||
- You are currently in a voice channel speaking with {user.name} and others
|
||||
- Your responses will be spoken aloud via text-to-speech
|
||||
- Keep responses SHORT and CONVERSATIONAL (1-3 sentences max)
|
||||
- Speak naturally as if having a real-time voice conversation
|
||||
- IMPORTANT: Only respond in ENGLISH! The TTS system cannot handle Japanese or other languages well.
|
||||
- Be expressive and use casual language, but stay in character as Miku
|
||||
|
||||
Remember: This is a live voice conversation, so be concise and engaging!
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Conversation Flow
|
||||
|
||||
```
|
||||
User speaks → STT transcribes → Add to history
|
||||
↓
|
||||
[System Prompt]
|
||||
[Last 8 exchanges]
|
||||
[Current user message]
|
||||
↓
|
||||
LLM generates
|
||||
↓
|
||||
Add response to history
|
||||
↓
|
||||
Stream to TTS → Speak
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Message History Format
|
||||
|
||||
```python
|
||||
conversation_history = [
|
||||
{"role": "user", "content": "koko210: Hey Miku, how are you?"},
|
||||
{"role": "assistant", "content": "Hey koko210! I'm doing great, thanks for asking!"},
|
||||
{"role": "user", "content": "koko210: Can you sing something?"},
|
||||
{"role": "assistant", "content": "I'd love to! What song would you like to hear?"},
|
||||
# ... up to 16 messages total (8 exchanges)
|
||||
]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Conversation History Limit
|
||||
**Current**: 16 messages (8 exchanges)
|
||||
|
||||
To adjust, edit `voice_manager.py`:
|
||||
```python
|
||||
# Keep only last 8 exchanges (16 messages = 8 user + 8 assistant)
|
||||
if len(self.conversation_history) > 16:
|
||||
self.conversation_history = self.conversation_history[-16:]
|
||||
```
|
||||
|
||||
**Recommendations**:
|
||||
- **8 exchanges**: Good balance (current setting)
|
||||
- **12 exchanges**: More context, slightly more tokens
|
||||
- **4 exchanges**: Minimal context, faster responses
|
||||
|
||||
### Response Length
|
||||
**Current**: max_tokens=200
|
||||
|
||||
To adjust:
|
||||
```python
|
||||
payload = {
|
||||
"max_tokens": 200 # Change this
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Language Enforcement
|
||||
|
||||
### Why English-Only?
|
||||
The RVC TTS system is trained on English audio and struggles with:
|
||||
- Japanese characters (even though Miku is Japanese!)
|
||||
- Special characters
|
||||
- Mixed language text
|
||||
- Non-English phonetics
|
||||
|
||||
### Implementation
|
||||
The system prompt explicitly tells Miku:
|
||||
> **IMPORTANT: Only respond in ENGLISH! The TTS system cannot handle Japanese or other languages well.**
|
||||
|
||||
This is reinforced in every voice chat interaction.
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
### Test 1: Basic Conversation
|
||||
```
|
||||
User: "Hey Miku!"
|
||||
Miku: "Hi there! Great to hear from you!" (should be in English)
|
||||
User: "How are you doing?"
|
||||
Miku: "I'm doing wonderful! How about you?" (remembers previous exchange)
|
||||
```
|
||||
|
||||
### Test 2: Context Retention
|
||||
Have a multi-turn conversation and verify Miku remembers:
|
||||
- Previous topics discussed
|
||||
- User names
|
||||
- Conversation flow
|
||||
|
||||
### Test 3: Response Length
|
||||
Verify responses are:
|
||||
- Short (1-3 sentences)
|
||||
- Conversational
|
||||
- Not truncated mid-sentence
|
||||
|
||||
### Test 4: Language Enforcement
|
||||
Try asking in Japanese or requesting Japanese response:
|
||||
- Miku should politely respond in English
|
||||
- Should explain she needs to use English for voice chat
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Check Conversation History
|
||||
```bash
|
||||
# Add debug logging to voice_manager.py to see history
|
||||
logger.debug(f"Conversation history: {self.conversation_history}")
|
||||
```
|
||||
|
||||
### Check System Prompt
|
||||
```bash
|
||||
docker exec miku-bot cat /app/miku_prompt.txt
|
||||
docker exec miku-bot cat /app/miku_lore.txt
|
||||
```
|
||||
|
||||
### Monitor Responses
|
||||
```bash
|
||||
docker logs -f miku-bot | grep "Voice response complete"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Files Modified
|
||||
|
||||
1. **bot/bot.py**
|
||||
- Changed voice_recv logger level from WARNING to CRITICAL
|
||||
- Suppresses CryptoError spam
|
||||
|
||||
2. **bot/utils/voice_manager.py**
|
||||
- Added `conversation_history` to `VoiceSession.__init__()`
|
||||
- Updated `_generate_voice_response()` to load lore files
|
||||
- Built comprehensive voice-aware system prompt
|
||||
- Implemented conversation history tracking (last 8 exchanges)
|
||||
- Added English-only instruction
|
||||
- Saves both user and assistant messages to history
|
||||
|
||||
---
|
||||
|
||||
## Benefits
|
||||
|
||||
✅ **Better Context**: Miku remembers previous exchanges
|
||||
✅ **Cleaner Logs**: No more CryptoError spam
|
||||
✅ **Natural Responses**: Knows she's in voice chat, responds appropriately
|
||||
✅ **Language Consistency**: Enforces English for TTS compatibility
|
||||
✅ **Personality Intact**: Still loads lore and personality files
|
||||
✅ **User Awareness**: Knows who she's talking to
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Test thoroughly** with multi-turn conversations
|
||||
2. **Adjust history length** if needed (currently 8 exchanges)
|
||||
3. **Fine-tune response length** based on TTS performance
|
||||
4. **Add conversation reset** command if needed (e.g., `!miku reset`)
|
||||
5. **Consider adding** conversation summaries for very long sessions
|
||||
|
||||
---
|
||||
|
||||
**Status**: ✅ **DEPLOYED AND READY FOR TESTING**
|
||||
|
||||
Miku now has full context awareness in voice chat with personality, conversation history, and language enforcement!
|
||||
Reference in New Issue
Block a user