# STT Fix Applied - Ready for Testing ## Summary Fixed all three issues preventing the ONNX-based Parakeet STT from working: 1. ✅ **CUDA Support**: Updated Docker base image to include cuDNN 9 2. ✅ **Port Configuration**: Fixed bot to connect to port 8766 (found TWO places) 3. ✅ **Protocol Compatibility**: Updated event handler for new ONNX format --- ## Files Modified ### 1. `stt-parakeet/Dockerfile` ```diff - FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04 + FROM nvidia/cuda:12.6.2-cudnn-runtime-ubuntu22.04 ``` ### 2. `bot/utils/stt_client.py` ```diff - stt_url: str = "ws://miku-stt:8000/ws/stt" + stt_url: str = "ws://miku-stt:8766/ws/stt" ``` Added new methods: - `send_final()` - Request final transcription - `send_reset()` - Clear audio buffer Updated `_handle_event()` to support: - New ONNX protocol: `{"type": "transcript", "is_final": true/false}` - Legacy protocol: `{"type": "partial"}`, `{"type": "final"}` (backward compatibility) ### 3. `bot/utils/voice_receiver.py` ⚠️ **KEY FIX** ```diff - def __init__(self, voice_manager, stt_url: str = "ws://miku-stt:8000/ws/stt"): + def __init__(self, voice_manager, stt_url: str = "ws://miku-stt:8766/ws/stt"): ``` **This was the missing piece!** The `voice_receiver` was overriding the default URL. --- ## Container Status ### STT Container ✅ ```bash $ docker logs miku-stt 2>&1 | tail -10 ``` ``` CUDA Version 12.6.2 INFO:asr.asr_pipeline:Providers: [('CUDAExecutionProvider', ...)] INFO:asr.asr_pipeline:Model loaded successfully INFO:__main__:Server running on ws://0.0.0.0:8766 INFO:__main__:Active connections: 0 ``` **Status**: ✅ Running with CUDA acceleration ### Bot Container ✅ - Files copied directly into running container (faster than rebuild) - Python bytecode cache cleared - Container restarted --- ## Testing Instructions ### Test 1: Basic Connection 1. Join a voice channel in Discord 2. Run `!miku listen` 3. **Expected**: Bot connects without "Connection Refused" error 4. **Check logs**: `docker logs miku-bot 2>&1 | grep "STT"` ### Test 2: Transcription 1. After running `!miku listen`, speak into your microphone 2. **Expected**: Your speech is transcribed 3. **Check STT logs**: `docker logs miku-stt 2>&1 | tail -20` 4. **Check bot logs**: Look for "Partial transcript" or "Final transcript" messages ### Test 3: Performance 1. Monitor GPU usage: `nvidia-smi -l 1` 2. **Expected**: GPU utilization increases when transcribing 3. **Expected**: Transcription completes in ~0.5-1 second --- ## Monitoring Commands ### Check Both Containers ```bash docker logs -f --tail=50 miku-bot miku-stt ``` ### Check STT Service Health ```bash docker ps | grep miku-stt docker logs miku-stt 2>&1 | grep "CUDA\|Providers\|Server running" ``` ### Check for Errors ```bash # Bot errors docker logs miku-bot 2>&1 | grep -i "error\|failed" | tail -20 # STT errors docker logs miku-stt 2>&1 | grep -i "error\|failed" | tail -20 ``` ### Test WebSocket Connection ```bash # From host machine curl -i -N \ -H "Connection: Upgrade" \ -H "Upgrade: websocket" \ -H "Sec-WebSocket-Version: 13" \ -H "Sec-WebSocket-Key: test" \ http://localhost:8766/ ``` --- ## Known Issues & Workarounds ### Issue: Bot Still Shows Old Errors **Symptom**: After restart, logs still show port 8000 errors **Cause**: Python module caching or log entries from before restart **Solution**: ```bash # Clear cache and restart docker exec miku-bot find /app -name "*.pyc" -delete docker restart miku-bot # Wait 10 seconds for full restart sleep 10 ``` ### Issue: Container Rebuild Takes 15+ Minutes **Cause**: `playwright install` downloads chromium/firefox browsers (~500MB) **Workaround**: Instead of full rebuild, use `docker cp`: ```bash docker cp bot/utils/stt_client.py miku-bot:/app/utils/stt_client.py docker cp bot/utils/voice_receiver.py miku-bot:/app/utils/voice_receiver.py docker restart miku-bot ``` --- ## Next Steps ### For Full Deployment (after testing) 1. Rebuild bot container properly: ```bash docker-compose build miku-bot docker-compose up -d miku-bot ``` 2. Remove old STT directory: ```bash mv stt stt.backup ``` 3. Update documentation to reflect new architecture ### Optional Enhancements 1. Add `send_final()` call when user stops speaking (VAD integration) 2. Implement progressive transcription display 3. Add transcription quality metrics/logging 4. Test with multiple simultaneous users --- ## Quick Reference | Component | Old (NeMo) | New (ONNX) | |-----------|------------|------------| | **Port** | 8000 | 8766 | | **VRAM** | 4-5GB | 2-3GB | | **Speed** | 2-3s | 0.5-1s | | **cuDNN** | 8 | 9 | | **CUDA** | 12.1 | 12.6.2 | | **Protocol** | Auto VAD | Manual control | --- **Status**: ✅ **ALL FIXES APPLIED - READY FOR USER TESTING** Last Updated: January 18, 2026 20:47 EET