# Refactoring Summary ## Overview Successfully refactored the Parakeet ASR codebase to use the `onnx-asr` library with ONNX Runtime GPU support for NVIDIA GTX 1660. ## Changes Made ### 1. Dependencies (`requirements.txt`) - **Removed**: `onnxruntime-gpu`, `silero-vad` - **Added**: `onnx-asr[gpu,hub]`, `soundfile` - **Kept**: `numpy<2.0`, `websockets`, `sounddevice` ### 2. ASR Pipeline (`asr/asr_pipeline.py`) - Completely refactored to use `onnx_asr.load_model()` - Added support for: - GPU acceleration via CUDA/TensorRT - Model quantization (int8, fp16) - Voice Activity Detection (VAD) - Batch processing - Streaming audio chunks - Configurable execution providers for GPU optimization - Automatic model download from Hugging Face ### 3. VAD Module (`vad/silero_vad.py`) - Refactored to use `onnx_asr.load_vad()` - Integrated Silero VAD via onnx-asr - Simplified API for VAD operations - Note: VAD is best used via `model.with_vad()` method ### 4. WebSocket Server (`server/ws_server.py`) - Created from scratch for streaming ASR - Features: - Real-time audio streaming - JSON-based protocol - Support for multiple concurrent connections - Buffer management for audio chunks - Error handling and logging ### 5. Microphone Client (`client/mic_stream.py`) - Created streaming client using `sounddevice` - Features: - Real-time microphone capture - WebSocket streaming to server - Audio device selection - Automatic format conversion (float32 to int16) - Async communication ### 6. Test Script (`tools/test_offline.py`) - Completely rewritten for onnx-asr - Features: - Command-line interface - Support for WAV files - Optional VAD and quantization - Audio statistics and diagnostics ### 7. Diagnostics Tool (`tools/diagnose.py`) - New comprehensive system check tool - Checks: - Python version - Installed packages - CUDA availability - ONNX Runtime providers - Audio devices - Model files ### 8. Setup Script (`setup_env.sh`) - Automated setup script - Features: - Virtual environment creation - Dependency installation - CUDA/GPU detection - System diagnostics - Optional model download ### 9. Documentation - **README.md**: Comprehensive documentation with: - Installation instructions - Usage examples - Configuration options - Troubleshooting guide - Performance tips - **QUICKSTART.md**: Quick start guide with: - 5-minute setup - Common commands - Troubleshooting - Performance optimization - **example.py**: Simple usage example ## Key Benefits ### 1. GPU Optimization - Native CUDA support via ONNX Runtime - Configurable GPU memory limits - Optional TensorRT for even faster inference - Automatic fallback to CPU if GPU unavailable ### 2. Simplified Model Management - Automatic model download from Hugging Face - No manual ONNX export needed - Pre-converted models ready to use - Support for quantized versions ### 3. Better Performance - Optimized ONNX inference - GPU acceleration on GTX 1660 - ~50-100x realtime on GPU - Reduced memory usage with quantization ### 4. Improved Usability - Simpler API - Better error handling - Comprehensive logging - Easy configuration ### 5. Modern Features - WebSocket streaming - Real-time transcription - VAD integration - Batch processing ## Model Information - **Model**: Parakeet TDT 0.6B V3 (Multilingual) - **Source**: https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx - **Size**: ~600MB - **Languages**: 25+ languages - **Location**: `models/parakeet/` (auto-downloaded) ## File Structure ``` parakeet-test/ ├── asr/ │ ├── __init__.py ✓ Updated │ └── asr_pipeline.py ✓ Refactored ├── client/ │ ├── __init__.py ✓ Updated │ └── mic_stream.py ✓ New ├── server/ │ ├── __init__.py ✓ Updated │ └── ws_server.py ✓ New ├── vad/ │ ├── __init__.py ✓ Updated │ └── silero_vad.py ✓ Refactored ├── tools/ │ ├── diagnose.py ✓ New │ └── test_offline.py ✓ Refactored ├── models/ │ └── parakeet/ ✓ Auto-created ├── requirements.txt ✓ Updated ├── setup_env.sh ✓ New ├── README.md ✓ New ├── QUICKSTART.md ✓ New ├── example.py ✓ New ├── .gitignore ✓ New └── REFACTORING.md ✓ This file ``` ## Migration from Old Code ### Old Code Pattern: ```python # Manual ONNX session creation import onnxruntime as ort session = ort.InferenceSession("encoder.onnx", providers=["CUDAExecutionProvider"]) # Manual preprocessing and decoding ``` ### New Code Pattern: ```python # Simple onnx-asr interface import onnx_asr model = onnx_asr.load_model("nemo-parakeet-tdt-0.6b-v3") text = model.recognize("audio.wav") ``` ## Testing Instructions ### 1. Setup ```bash ./setup_env.sh source venv/bin/activate ``` ### 2. Run Diagnostics ```bash python3 tools/diagnose.py ``` ### 3. Test Offline ```bash python3 tools/test_offline.py test.wav ``` ### 4. Test Streaming ```bash # Terminal 1 python3 server/ws_server.py # Terminal 2 python3 client/mic_stream.py ``` ## Known Limitations 1. **Audio Format**: Only WAV files with PCM encoding supported directly 2. **Segment Length**: Models work best with <30 second segments 3. **GPU Memory**: Requires at least 2-3GB GPU memory 4. **Sample Rate**: 16kHz recommended for best results ## Future Enhancements Possible improvements: - [ ] Add support for other audio formats (MP3, FLAC, etc.) - [ ] Implement beam search decoding - [ ] Add language selection option - [ ] Support for speaker diarization - [ ] REST API in addition to WebSocket - [ ] Docker containerization - [ ] Batch file processing script - [ ] Real-time visualization of transcription ## References - [onnx-asr GitHub](https://github.com/istupakov/onnx-asr) - [onnx-asr Documentation](https://istupakov.github.io/onnx-asr/) - [Parakeet ONNX Model](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx) - [Original Parakeet Model](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) - [ONNX Runtime](https://onnxruntime.ai/) ## Support For issues related to: - **onnx-asr library**: https://github.com/istupakov/onnx-asr/issues - **This implementation**: Check logs and run diagnose.py - **GPU/CUDA issues**: Verify nvidia-smi and CUDA installation --- **Refactoring completed on**: January 18, 2026 **Primary changes**: Migration to onnx-asr library for simplified ONNX inference with GPU support