4.7 KiB
4.7 KiB
STT Fix Applied - Ready for Testing
Summary
Fixed all three issues preventing the ONNX-based Parakeet STT from working:
- ✅ CUDA Support: Updated Docker base image to include cuDNN 9
- ✅ Port Configuration: Fixed bot to connect to port 8766 (found TWO places)
- ✅ Protocol Compatibility: Updated event handler for new ONNX format
Files Modified
1. stt-parakeet/Dockerfile
- FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
+ FROM nvidia/cuda:12.6.2-cudnn-runtime-ubuntu22.04
2. bot/utils/stt_client.py
- stt_url: str = "ws://miku-stt:8000/ws/stt"
+ stt_url: str = "ws://miku-stt:8766/ws/stt"
Added new methods:
send_final()- Request final transcriptionsend_reset()- Clear audio buffer
Updated _handle_event() to support:
- New ONNX protocol:
{"type": "transcript", "is_final": true/false} - Legacy protocol:
{"type": "partial"},{"type": "final"}(backward compatibility)
3. bot/utils/voice_receiver.py ⚠️ KEY FIX
- def __init__(self, voice_manager, stt_url: str = "ws://miku-stt:8000/ws/stt"):
+ def __init__(self, voice_manager, stt_url: str = "ws://miku-stt:8766/ws/stt"):
This was the missing piece! The voice_receiver was overriding the default URL.
Container Status
STT Container ✅
$ docker logs miku-stt 2>&1 | tail -10
CUDA Version 12.6.2
INFO:asr.asr_pipeline:Providers: [('CUDAExecutionProvider', ...)]
INFO:asr.asr_pipeline:Model loaded successfully
INFO:__main__:Server running on ws://0.0.0.0:8766
INFO:__main__:Active connections: 0
Status: ✅ Running with CUDA acceleration
Bot Container ✅
- Files copied directly into running container (faster than rebuild)
- Python bytecode cache cleared
- Container restarted
Testing Instructions
Test 1: Basic Connection
- Join a voice channel in Discord
- Run
!miku listen - Expected: Bot connects without "Connection Refused" error
- Check logs:
docker logs miku-bot 2>&1 | grep "STT"
Test 2: Transcription
- After running
!miku listen, speak into your microphone - Expected: Your speech is transcribed
- Check STT logs:
docker logs miku-stt 2>&1 | tail -20 - Check bot logs: Look for "Partial transcript" or "Final transcript" messages
Test 3: Performance
- Monitor GPU usage:
nvidia-smi -l 1 - Expected: GPU utilization increases when transcribing
- Expected: Transcription completes in ~0.5-1 second
Monitoring Commands
Check Both Containers
docker logs -f --tail=50 miku-bot miku-stt
Check STT Service Health
docker ps | grep miku-stt
docker logs miku-stt 2>&1 | grep "CUDA\|Providers\|Server running"
Check for Errors
# Bot errors
docker logs miku-bot 2>&1 | grep -i "error\|failed" | tail -20
# STT errors
docker logs miku-stt 2>&1 | grep -i "error\|failed" | tail -20
Test WebSocket Connection
# From host machine
curl -i -N \
-H "Connection: Upgrade" \
-H "Upgrade: websocket" \
-H "Sec-WebSocket-Version: 13" \
-H "Sec-WebSocket-Key: test" \
http://localhost:8766/
Known Issues & Workarounds
Issue: Bot Still Shows Old Errors
Symptom: After restart, logs still show port 8000 errors
Cause: Python module caching or log entries from before restart
Solution:
# Clear cache and restart
docker exec miku-bot find /app -name "*.pyc" -delete
docker restart miku-bot
# Wait 10 seconds for full restart
sleep 10
Issue: Container Rebuild Takes 15+ Minutes
Cause: playwright install downloads chromium/firefox browsers (~500MB)
Workaround: Instead of full rebuild, use docker cp:
docker cp bot/utils/stt_client.py miku-bot:/app/utils/stt_client.py
docker cp bot/utils/voice_receiver.py miku-bot:/app/utils/voice_receiver.py
docker restart miku-bot
Next Steps
For Full Deployment (after testing)
-
Rebuild bot container properly:
docker-compose build miku-bot docker-compose up -d miku-bot -
Remove old STT directory:
mv stt stt.backup -
Update documentation to reflect new architecture
Optional Enhancements
- Add
send_final()call when user stops speaking (VAD integration) - Implement progressive transcription display
- Add transcription quality metrics/logging
- Test with multiple simultaneous users
Quick Reference
| Component | Old (NeMo) | New (ONNX) |
|---|---|---|
| Port | 8000 | 8766 |
| VRAM | 4-5GB | 2-3GB |
| Speed | 2-3s | 0.5-1s |
| cuDNN | 8 | 9 |
| CUDA | 12.1 | 12.6.2 |
| Protocol | Auto VAD | Manual control |
Status: ✅ ALL FIXES APPLIED - READY FOR USER TESTING
Last Updated: January 18, 2026 20:47 EET