Files
miku-discord/STT_FIX_COMPLETE.md

4.7 KiB

STT Fix Applied - Ready for Testing

Summary

Fixed all three issues preventing the ONNX-based Parakeet STT from working:

  1. CUDA Support: Updated Docker base image to include cuDNN 9
  2. Port Configuration: Fixed bot to connect to port 8766 (found TWO places)
  3. Protocol Compatibility: Updated event handler for new ONNX format

Files Modified

1. stt-parakeet/Dockerfile

- FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
+ FROM nvidia/cuda:12.6.2-cudnn-runtime-ubuntu22.04

2. bot/utils/stt_client.py

- stt_url: str = "ws://miku-stt:8000/ws/stt"
+ stt_url: str = "ws://miku-stt:8766/ws/stt"

Added new methods:

  • send_final() - Request final transcription
  • send_reset() - Clear audio buffer

Updated _handle_event() to support:

  • New ONNX protocol: {"type": "transcript", "is_final": true/false}
  • Legacy protocol: {"type": "partial"}, {"type": "final"} (backward compatibility)

3. bot/utils/voice_receiver.py ⚠️ KEY FIX

- def __init__(self, voice_manager, stt_url: str = "ws://miku-stt:8000/ws/stt"):
+ def __init__(self, voice_manager, stt_url: str = "ws://miku-stt:8766/ws/stt"):

This was the missing piece! The voice_receiver was overriding the default URL.


Container Status

STT Container

$ docker logs miku-stt 2>&1 | tail -10
CUDA Version 12.6.2
INFO:asr.asr_pipeline:Providers: [('CUDAExecutionProvider', ...)]
INFO:asr.asr_pipeline:Model loaded successfully
INFO:__main__:Server running on ws://0.0.0.0:8766
INFO:__main__:Active connections: 0

Status: Running with CUDA acceleration

Bot Container

  • Files copied directly into running container (faster than rebuild)
  • Python bytecode cache cleared
  • Container restarted

Testing Instructions

Test 1: Basic Connection

  1. Join a voice channel in Discord
  2. Run !miku listen
  3. Expected: Bot connects without "Connection Refused" error
  4. Check logs: docker logs miku-bot 2>&1 | grep "STT"

Test 2: Transcription

  1. After running !miku listen, speak into your microphone
  2. Expected: Your speech is transcribed
  3. Check STT logs: docker logs miku-stt 2>&1 | tail -20
  4. Check bot logs: Look for "Partial transcript" or "Final transcript" messages

Test 3: Performance

  1. Monitor GPU usage: nvidia-smi -l 1
  2. Expected: GPU utilization increases when transcribing
  3. Expected: Transcription completes in ~0.5-1 second

Monitoring Commands

Check Both Containers

docker logs -f --tail=50 miku-bot miku-stt

Check STT Service Health

docker ps | grep miku-stt
docker logs miku-stt 2>&1 | grep "CUDA\|Providers\|Server running"

Check for Errors

# Bot errors
docker logs miku-bot 2>&1 | grep -i "error\|failed" | tail -20

# STT errors
docker logs miku-stt 2>&1 | grep -i "error\|failed" | tail -20

Test WebSocket Connection

# From host machine
curl -i -N \
  -H "Connection: Upgrade" \
  -H "Upgrade: websocket" \
  -H "Sec-WebSocket-Version: 13" \
  -H "Sec-WebSocket-Key: test" \
  http://localhost:8766/

Known Issues & Workarounds

Issue: Bot Still Shows Old Errors

Symptom: After restart, logs still show port 8000 errors

Cause: Python module caching or log entries from before restart

Solution:

# Clear cache and restart
docker exec miku-bot find /app -name "*.pyc" -delete
docker restart miku-bot

# Wait 10 seconds for full restart
sleep 10

Issue: Container Rebuild Takes 15+ Minutes

Cause: playwright install downloads chromium/firefox browsers (~500MB)

Workaround: Instead of full rebuild, use docker cp:

docker cp bot/utils/stt_client.py miku-bot:/app/utils/stt_client.py
docker cp bot/utils/voice_receiver.py miku-bot:/app/utils/voice_receiver.py
docker restart miku-bot

Next Steps

For Full Deployment (after testing)

  1. Rebuild bot container properly:

    docker-compose build miku-bot
    docker-compose up -d miku-bot
    
  2. Remove old STT directory:

    mv stt stt.backup
    
  3. Update documentation to reflect new architecture

Optional Enhancements

  1. Add send_final() call when user stops speaking (VAD integration)
  2. Implement progressive transcription display
  3. Add transcription quality metrics/logging
  4. Test with multiple simultaneous users

Quick Reference

Component Old (NeMo) New (ONNX)
Port 8000 8766
VRAM 4-5GB 2-3GB
Speed 2-3s 0.5-1s
cuDNN 8 9
CUDA 12.1 12.6.2
Protocol Auto VAD Manual control

Status: ALL FIXES APPLIED - READY FOR USER TESTING

Last Updated: January 18, 2026 20:47 EET