miku-discord/stt-parakeet/REFACTORING.md

# Refactoring Summary

## Overview

Successfully refactored the Parakeet ASR codebase to use the `onnx-asr` library with ONNX Runtime GPU support for NVIDIA GTX 1660.

## Changes Made

### 1. Dependencies (`requirements.txt`)
- **Removed**: `onnxruntime-gpu`, `silero-vad`
- **Added**: `onnx-asr[gpu,hub]`, `soundfile`
- **Kept**: `numpy<2.0`, `websockets`, `sounddevice`

### 2. ASR Pipeline (`asr/asr_pipeline.py`)
- Completely refactored to use `onnx_asr.load_model()`
- Added support for:
  - GPU acceleration via CUDA/TensorRT
  - Model quantization (int8, fp16)
  - Voice Activity Detection (VAD)
  - Batch processing
  - Streaming audio chunks
- Configurable execution providers for GPU optimization
- Automatic model download from Hugging Face

### 3. VAD Module (`vad/silero_vad.py`)
- Refactored to use `onnx_asr.load_vad()`
- Integrated Silero VAD via onnx-asr
- Simplified API for VAD operations
- Note: VAD is best used via `model.with_vad()` method

### 4. WebSocket Server (`server/ws_server.py`)
- Created from scratch for streaming ASR
- Features:
  - Real-time audio streaming
  - JSON-based protocol
  - Support for multiple concurrent connections
  - Buffer management for audio chunks
  - Error handling and logging

### 5. Microphone Client (`client/mic_stream.py`)
- Created streaming client using `sounddevice`
- Features:
  - Real-time microphone capture
  - WebSocket streaming to server
  - Audio device selection
  - Automatic format conversion (float32 to int16)
  - Async communication

### 6. Test Script (`tools/test_offline.py`)
- Completely rewritten for onnx-asr
- Features:
  - Command-line interface
  - Support for WAV files
  - Optional VAD and quantization
  - Audio statistics and diagnostics

### 7. Diagnostics Tool (`tools/diagnose.py`)
- New comprehensive system check tool
- Checks:
  - Python version
  - Installed packages
  - CUDA availability
  - ONNX Runtime providers
  - Audio devices
  - Model files

### 8. Setup Script (`setup_env.sh`)
- Automated setup script
- Features:
  - Virtual environment creation
  - Dependency installation
  - CUDA/GPU detection
  - System diagnostics
  - Optional model download

### 9. Documentation
- **README.md**: Comprehensive documentation with:
  - Installation instructions
  - Usage examples
  - Configuration options
  - Troubleshooting guide
  - Performance tips

- **QUICKSTART.md**: Quick start guide with:
  - 5-minute setup
  - Common commands
  - Troubleshooting
  - Performance optimization

- **example.py**: Simple usage example

## Key Benefits

### 1. GPU Optimization
- Native CUDA support via ONNX Runtime
- Configurable GPU memory limits
- Optional TensorRT for even faster inference
- Automatic fallback to CPU if GPU unavailable

### 2. Simplified Model Management
- Automatic model download from Hugging Face
- No manual ONNX export needed
- Pre-converted models ready to use
- Support for quantized versions

### 3. Better Performance
- Optimized ONNX inference
- GPU acceleration on GTX 1660
- ~50-100x realtime on GPU
- Reduced memory usage with quantization

### 4. Improved Usability
- Simpler API
- Better error handling
- Comprehensive logging
- Easy configuration

### 5. Modern Features
- WebSocket streaming
- Real-time transcription
- VAD integration
- Batch processing

## Model Information

- **Model**: Parakeet TDT 0.6B V3 (Multilingual)
- **Source**: https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx
- **Size**: ~600MB
- **Languages**: 25+ languages
- **Location**: `models/parakeet/` (auto-downloaded)

## File Structure

```
parakeet-test/
├── asr/
│   ├── __init__.py              ✓ Updated
│   └── asr_pipeline.py          ✓ Refactored
├── client/
│   ├── __init__.py              ✓ Updated
│   └── mic_stream.py            ✓ New
├── server/
│   ├── __init__.py              ✓ Updated
│   └── ws_server.py             ✓ New
├── vad/
│   ├── __init__.py              ✓ Updated
│   └── silero_vad.py            ✓ Refactored
├── tools/
│   ├── diagnose.py              ✓ New
│   └── test_offline.py          ✓ Refactored
├── models/
│   └── parakeet/                ✓ Auto-created
├── requirements.txt             ✓ Updated
├── setup_env.sh                 ✓ New
├── README.md                    ✓ New
├── QUICKSTART.md                ✓ New
├── example.py                   ✓ New
├── .gitignore                   ✓ New
└── REFACTORING.md               ✓ This file
```

## Migration from Old Code

### Old Code Pattern:
```python
# Manual ONNX session creation
import onnxruntime as ort
session = ort.InferenceSession("encoder.onnx", providers=["CUDAExecutionProvider"])
# Manual preprocessing and decoding
```

### New Code Pattern:
```python
# Simple onnx-asr interface
import onnx_asr
model = onnx_asr.load_model("nemo-parakeet-tdt-0.6b-v3")
text = model.recognize("audio.wav")
```

## Testing Instructions

### 1. Setup
```bash
./setup_env.sh
source venv/bin/activate
```

### 2. Run Diagnostics
```bash
python3 tools/diagnose.py
```

### 3. Test Offline
```bash
python3 tools/test_offline.py test.wav
```

### 4. Test Streaming
```bash
# Terminal 1
python3 server/ws_server.py

# Terminal 2
python3 client/mic_stream.py
```

## Known Limitations

1. **Audio Format**: Only WAV files with PCM encoding supported directly
2. **Segment Length**: Models work best with <30 second segments
3. **GPU Memory**: Requires at least 2-3GB GPU memory
4. **Sample Rate**: 16kHz recommended for best results

## Future Enhancements

Possible improvements:
- [ ] Add support for other audio formats (MP3, FLAC, etc.)
- [ ] Implement beam search decoding
- [ ] Add language selection option
- [ ] Support for speaker diarization
- [ ] REST API in addition to WebSocket
- [ ] Docker containerization
- [ ] Batch file processing script
- [ ] Real-time visualization of transcription

## References

- [onnx-asr GitHub](https://github.com/istupakov/onnx-asr)
- [onnx-asr Documentation](https://istupakov.github.io/onnx-asr/)
- [Parakeet ONNX Model](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx)
- [Original Parakeet Model](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3)
- [ONNX Runtime](https://onnxruntime.ai/)

## Support

For issues related to:
- **onnx-asr library**: https://github.com/istupakov/onnx-asr/issues
- **This implementation**: Check logs and run diagnose.py
- **GPU/CUDA issues**: Verify nvidia-smi and CUDA installation

---

**Refactoring completed on**: January 18, 2026
**Primary changes**: Migration to onnx-asr library for simplified ONNX inference with GPU support