226 lines
6.3 KiB
Markdown
226 lines
6.3 KiB
Markdown
|
|
# Voice Chat Context System
|
||
|
|
|
||
|
|
## Implementation Complete ✅
|
||
|
|
|
||
|
|
Added comprehensive voice chat context to give Miku awareness of the conversation environment.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Features
|
||
|
|
|
||
|
|
### 1. Voice-Aware System Prompt
|
||
|
|
Miku now knows she's in a voice chat and adjusts her behavior:
|
||
|
|
- ✅ Aware she's speaking via TTS
|
||
|
|
- ✅ Knows who she's talking to (user names included)
|
||
|
|
- ✅ Understands responses will be spoken aloud
|
||
|
|
- ✅ Instructed to keep responses short (1-3 sentences)
|
||
|
|
- ✅ **CRITICAL: Instructed to only use English** (TTS can't handle Japanese well)
|
||
|
|
|
||
|
|
### 2. Conversation History (Last 8 Exchanges)
|
||
|
|
- Stores last 16 messages (8 user + 8 assistant)
|
||
|
|
- Maintains context across multiple voice interactions
|
||
|
|
- Automatically trimmed to keep memory manageable
|
||
|
|
- Each message includes username for multi-user context
|
||
|
|
|
||
|
|
### 3. Personality Integration
|
||
|
|
- Loads `miku_lore.txt` - Her background, personality, likes/dislikes
|
||
|
|
- Loads `miku_prompt.txt` - Core personality instructions
|
||
|
|
- Combines with voice-specific instructions
|
||
|
|
- Maintains character consistency
|
||
|
|
|
||
|
|
### 4. Reduced Log Spam
|
||
|
|
- Set voice_recv logger to CRITICAL level
|
||
|
|
- Suppresses routine CryptoErrors and RTCP packets
|
||
|
|
- Only shows actual critical errors
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## System Prompt Structure
|
||
|
|
|
||
|
|
```
|
||
|
|
[miku_prompt.txt content]
|
||
|
|
|
||
|
|
[miku_lore.txt content]
|
||
|
|
|
||
|
|
VOICE CHAT CONTEXT:
|
||
|
|
- You are currently in a voice channel speaking with {user.name} and others
|
||
|
|
- Your responses will be spoken aloud via text-to-speech
|
||
|
|
- Keep responses SHORT and CONVERSATIONAL (1-3 sentences max)
|
||
|
|
- Speak naturally as if having a real-time voice conversation
|
||
|
|
- IMPORTANT: Only respond in ENGLISH! The TTS system cannot handle Japanese or other languages well.
|
||
|
|
- Be expressive and use casual language, but stay in character as Miku
|
||
|
|
|
||
|
|
Remember: This is a live voice conversation, so be concise and engaging!
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Conversation Flow
|
||
|
|
|
||
|
|
```
|
||
|
|
User speaks → STT transcribes → Add to history
|
||
|
|
↓
|
||
|
|
[System Prompt]
|
||
|
|
[Last 8 exchanges]
|
||
|
|
[Current user message]
|
||
|
|
↓
|
||
|
|
LLM generates
|
||
|
|
↓
|
||
|
|
Add response to history
|
||
|
|
↓
|
||
|
|
Stream to TTS → Speak
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Message History Format
|
||
|
|
|
||
|
|
```python
|
||
|
|
conversation_history = [
|
||
|
|
{"role": "user", "content": "koko210: Hey Miku, how are you?"},
|
||
|
|
{"role": "assistant", "content": "Hey koko210! I'm doing great, thanks for asking!"},
|
||
|
|
{"role": "user", "content": "koko210: Can you sing something?"},
|
||
|
|
{"role": "assistant", "content": "I'd love to! What song would you like to hear?"},
|
||
|
|
# ... up to 16 messages total (8 exchanges)
|
||
|
|
]
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
### Conversation History Limit
|
||
|
|
**Current**: 16 messages (8 exchanges)
|
||
|
|
|
||
|
|
To adjust, edit `voice_manager.py`:
|
||
|
|
```python
|
||
|
|
# Keep only last 8 exchanges (16 messages = 8 user + 8 assistant)
|
||
|
|
if len(self.conversation_history) > 16:
|
||
|
|
self.conversation_history = self.conversation_history[-16:]
|
||
|
|
```
|
||
|
|
|
||
|
|
**Recommendations**:
|
||
|
|
- **8 exchanges**: Good balance (current setting)
|
||
|
|
- **12 exchanges**: More context, slightly more tokens
|
||
|
|
- **4 exchanges**: Minimal context, faster responses
|
||
|
|
|
||
|
|
### Response Length
|
||
|
|
**Current**: max_tokens=200
|
||
|
|
|
||
|
|
To adjust:
|
||
|
|
```python
|
||
|
|
payload = {
|
||
|
|
"max_tokens": 200 # Change this
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Language Enforcement
|
||
|
|
|
||
|
|
### Why English-Only?
|
||
|
|
The RVC TTS system is trained on English audio and struggles with:
|
||
|
|
- Japanese characters (even though Miku is Japanese!)
|
||
|
|
- Special characters
|
||
|
|
- Mixed language text
|
||
|
|
- Non-English phonetics
|
||
|
|
|
||
|
|
### Implementation
|
||
|
|
The system prompt explicitly tells Miku:
|
||
|
|
> **IMPORTANT: Only respond in ENGLISH! The TTS system cannot handle Japanese or other languages well.**
|
||
|
|
|
||
|
|
This is reinforced in every voice chat interaction.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Testing
|
||
|
|
|
||
|
|
### Test 1: Basic Conversation
|
||
|
|
```
|
||
|
|
User: "Hey Miku!"
|
||
|
|
Miku: "Hi there! Great to hear from you!" (should be in English)
|
||
|
|
User: "How are you doing?"
|
||
|
|
Miku: "I'm doing wonderful! How about you?" (remembers previous exchange)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Test 2: Context Retention
|
||
|
|
Have a multi-turn conversation and verify Miku remembers:
|
||
|
|
- Previous topics discussed
|
||
|
|
- User names
|
||
|
|
- Conversation flow
|
||
|
|
|
||
|
|
### Test 3: Response Length
|
||
|
|
Verify responses are:
|
||
|
|
- Short (1-3 sentences)
|
||
|
|
- Conversational
|
||
|
|
- Not truncated mid-sentence
|
||
|
|
|
||
|
|
### Test 4: Language Enforcement
|
||
|
|
Try asking in Japanese or requesting Japanese response:
|
||
|
|
- Miku should politely respond in English
|
||
|
|
- Should explain she needs to use English for voice chat
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Monitoring
|
||
|
|
|
||
|
|
### Check Conversation History
|
||
|
|
```bash
|
||
|
|
# Add debug logging to voice_manager.py to see history
|
||
|
|
logger.debug(f"Conversation history: {self.conversation_history}")
|
||
|
|
```
|
||
|
|
|
||
|
|
### Check System Prompt
|
||
|
|
```bash
|
||
|
|
docker exec miku-bot cat /app/miku_prompt.txt
|
||
|
|
docker exec miku-bot cat /app/miku_lore.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
### Monitor Responses
|
||
|
|
```bash
|
||
|
|
docker logs -f miku-bot | grep "Voice response complete"
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Files Modified
|
||
|
|
|
||
|
|
1. **bot/bot.py**
|
||
|
|
- Changed voice_recv logger level from WARNING to CRITICAL
|
||
|
|
- Suppresses CryptoError spam
|
||
|
|
|
||
|
|
2. **bot/utils/voice_manager.py**
|
||
|
|
- Added `conversation_history` to `VoiceSession.__init__()`
|
||
|
|
- Updated `_generate_voice_response()` to load lore files
|
||
|
|
- Built comprehensive voice-aware system prompt
|
||
|
|
- Implemented conversation history tracking (last 8 exchanges)
|
||
|
|
- Added English-only instruction
|
||
|
|
- Saves both user and assistant messages to history
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Benefits
|
||
|
|
|
||
|
|
✅ **Better Context**: Miku remembers previous exchanges
|
||
|
|
✅ **Cleaner Logs**: No more CryptoError spam
|
||
|
|
✅ **Natural Responses**: Knows she's in voice chat, responds appropriately
|
||
|
|
✅ **Language Consistency**: Enforces English for TTS compatibility
|
||
|
|
✅ **Personality Intact**: Still loads lore and personality files
|
||
|
|
✅ **User Awareness**: Knows who she's talking to
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
1. **Test thoroughly** with multi-turn conversations
|
||
|
|
2. **Adjust history length** if needed (currently 8 exchanges)
|
||
|
|
3. **Fine-tune response length** based on TTS performance
|
||
|
|
4. **Add conversation reset** command if needed (e.g., `!miku reset`)
|
||
|
|
5. **Consider adding** conversation summaries for very long sessions
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Status**: ✅ **DEPLOYED AND READY FOR TESTING**
|
||
|
|
|
||
|
|
Miku now has full context awareness in voice chat with personality, conversation history, and language enforcement!
|