Files

koko210Serve 2934efba22 Implemented experimental real production ready voice chat, relegated old flow to voice debug mode. New Web UI panel for Voice Chat.

2026-01-20 23:06:17 +02:00

4.7 KiB

Raw Blame History

STT Fix Applied - Ready for Testing

Summary

Fixed all three issues preventing the ONNX-based Parakeet STT from working:

✅ CUDA Support: Updated Docker base image to include cuDNN 9
✅ Port Configuration: Fixed bot to connect to port 8766 (found TWO places)
✅ Protocol Compatibility: Updated event handler for new ONNX format

Files Modified

1. `stt-parakeet/Dockerfile`

- FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
+ FROM nvidia/cuda:12.6.2-cudnn-runtime-ubuntu22.04

2. `bot/utils/stt_client.py`

- stt_url: str = "ws://miku-stt:8000/ws/stt"
+ stt_url: str = "ws://miku-stt:8766/ws/stt"

Added new methods:

send_final() - Request final transcription
send_reset() - Clear audio buffer

Updated _handle_event() to support:

New ONNX protocol: {"type": "transcript", "is_final": true/false}
Legacy protocol: {"type": "partial"}, {"type": "final"} (backward compatibility)

3. `bot/utils/voice_receiver.py` ⚠️ KEY FIX

- def __init__(self, voice_manager, stt_url: str = "ws://miku-stt:8000/ws/stt"):
+ def __init__(self, voice_manager, stt_url: str = "ws://miku-stt:8766/ws/stt"):

This was the missing piece! The voice_receiver was overriding the default URL.

Container Status

STT Container ✅

$ docker logs miku-stt 2>&1 | tail -10

CUDA Version 12.6.2
INFO:asr.asr_pipeline:Providers: [('CUDAExecutionProvider', ...)]
INFO:asr.asr_pipeline:Model loaded successfully
INFO:__main__:Server running on ws://0.0.0.0:8766
INFO:__main__:Active connections: 0

Status: ✅ Running with CUDA acceleration

Bot Container ✅

Files copied directly into running container (faster than rebuild)
Python bytecode cache cleared
Container restarted

Testing Instructions

Test 1: Basic Connection

Join a voice channel in Discord
Run !miku listen
Expected: Bot connects without "Connection Refused" error
Check logs: docker logs miku-bot 2>&1 | grep "STT"

Test 2: Transcription

After running !miku listen, speak into your microphone
Expected: Your speech is transcribed
Check STT logs: docker logs miku-stt 2>&1 | tail -20
Check bot logs: Look for "Partial transcript" or "Final transcript" messages

Test 3: Performance

Monitor GPU usage: nvidia-smi -l 1
Expected: GPU utilization increases when transcribing
Expected: Transcription completes in ~0.5-1 second

Monitoring Commands

Check Both Containers

docker logs -f --tail=50 miku-bot miku-stt

Check STT Service Health

docker ps | grep miku-stt
docker logs miku-stt 2>&1 | grep "CUDA\|Providers\|Server running"

Check for Errors

# Bot errors
docker logs miku-bot 2>&1 | grep -i "error\|failed" | tail -20

# STT errors
docker logs miku-stt 2>&1 | grep -i "error\|failed" | tail -20

Test WebSocket Connection

# From host machine
curl -i -N \
  -H "Connection: Upgrade" \
  -H "Upgrade: websocket" \
  -H "Sec-WebSocket-Version: 13" \
  -H "Sec-WebSocket-Key: test" \
  http://localhost:8766/

Known Issues & Workarounds

Issue: Bot Still Shows Old Errors

Symptom: After restart, logs still show port 8000 errors

Cause: Python module caching or log entries from before restart

Solution:

# Clear cache and restart
docker exec miku-bot find /app -name "*.pyc" -delete
docker restart miku-bot

# Wait 10 seconds for full restart
sleep 10

Issue: Container Rebuild Takes 15+ Minutes

Cause: playwright install downloads chromium/firefox browsers (~500MB)

Workaround: Instead of full rebuild, use docker cp:

docker cp bot/utils/stt_client.py miku-bot:/app/utils/stt_client.py
docker cp bot/utils/voice_receiver.py miku-bot:/app/utils/voice_receiver.py
docker restart miku-bot

Next Steps

For Full Deployment (after testing)

Rebuild bot container properly:

docker-compose build miku-bot
docker-compose up -d miku-bot

Remove old STT directory:
```
mv stt stt.backup
```
Update documentation to reflect new architecture

Optional Enhancements

Add send_final() call when user stops speaking (VAD integration)
Implement progressive transcription display
Add transcription quality metrics/logging
Test with multiple simultaneous users

Quick Reference

Component	Old (NeMo)	New (ONNX)
Port	8000	8766
VRAM	4-5GB	2-3GB
Speed	2-3s	0.5-1s
cuDNN	8	9
CUDA	12.1	12.6.2
Protocol	Auto VAD	Manual control

Status: ✅ ALL FIXES APPLIED - READY FOR USER TESTING

Last Updated: January 18, 2026 20:47 EET

4.7 KiB Raw Blame History