# Server & Client Usage Guide ## ✅ Server is Working! The WebSocket server is running on port **8766** with GPU acceleration. ## Quick Start ### 1. Start the Server ```bash ./run.sh server/ws_server.py ``` Server will start on: `ws://localhost:8766` ### 2. Test with Simple Client ```bash ./run.sh test_client.py test.wav ``` ### 3. Use Microphone Client ```bash # List audio devices first ./run.sh client/mic_stream.py --list-devices # Start streaming from microphone ./run.sh client/mic_stream.py # Or specify device ./run.sh client/mic_stream.py --device 0 ``` ## Available Clients ### 1. **test_client.py** - Simple File Testing ```bash ./run.sh test_client.py your_audio.wav ``` - Sends audio file to server - Shows real-time transcription - Good for testing ### 2. **client/mic_stream.py** - Live Microphone ```bash ./run.sh client/mic_stream.py ``` - Captures from microphone - Streams to server - Real-time transcription display ### 3. **Custom Client** - Your Own Script ```python import asyncio import websockets import json async def connect(): async with websockets.connect("ws://localhost:8766") as ws: # Send audio as int16 PCM bytes audio_bytes = your_audio_data.astype('int16').tobytes() await ws.send(audio_bytes) # Receive transcription response = await ws.recv() result = json.loads(response) print(result['text']) asyncio.run(connect()) ``` ## Server Options ```bash # Custom host/port ./run.sh server/ws_server.py --host 0.0.0.0 --port 9000 # Enable VAD (for long audio) ./run.sh server/ws_server.py --use-vad # Different model ./run.sh server/ws_server.py --model nemo-parakeet-tdt-0.6b-v3 # Change sample rate ./run.sh server/ws_server.py --sample-rate 16000 ``` ## Client Options ### Microphone Client ```bash # List devices ./run.sh client/mic_stream.py --list-devices # Use specific device ./run.sh client/mic_stream.py --device 2 # Custom server URL ./run.sh client/mic_stream.py --url ws://192.168.1.100:8766 # Adjust chunk duration (lower = lower latency) ./run.sh client/mic_stream.py --chunk-duration 0.05 ``` ## Protocol The server uses a simple JSON-based protocol: ### Server → Client Messages ```json { "type": "info", "message": "Connected to ASR server", "sample_rate": 16000 } ``` ```json { "type": "transcript", "text": "transcribed text here", "is_final": false } ``` ```json { "type": "error", "message": "error description" } ``` ### Client → Server Messages **Send audio:** - Binary data (int16 PCM, little-endian) - Sample rate: 16000 Hz - Mono channel **Send commands:** ```json {"type": "final"} // Process remaining buffer {"type": "reset"} // Reset audio buffer ``` ## Audio Format Requirements - **Format**: int16 PCM (bytes) - **Sample Rate**: 16000 Hz - **Channels**: Mono (1) - **Byte Order**: Little-endian ### Convert Audio in Python ```python import numpy as np import soundfile as sf # Load audio audio, sr = sf.read("file.wav", dtype='float32') # Convert to mono if audio.ndim > 1: audio = audio[:, 0] # Resample if needed (install resampy) if sr != 16000: import resampy audio = resampy.resample(audio, sr, 16000) # Convert to int16 for sending audio_int16 = (audio * 32767).astype(np.int16) audio_bytes = audio_int16.tobytes() ``` ## Examples ### Browser Client (JavaScript) ```javascript const ws = new WebSocket('ws://localhost:8766'); ws.onopen = () => { console.log('Connected!'); // Capture from microphone navigator.mediaDevices.getUserMedia({ audio: true }) .then(stream => { const audioContext = new AudioContext({ sampleRate: 16000 }); const source = audioContext.createMediaStreamSource(stream); const processor = audioContext.createScriptProcessor(4096, 1, 1); processor.onaudioprocess = (e) => { const audioData = e.inputBuffer.getChannelData(0); // Convert float32 to int16 const int16Data = new Int16Array(audioData.length); for (let i = 0; i < audioData.length; i++) { int16Data[i] = Math.max(-32768, Math.min(32767, audioData[i] * 32768)); } ws.send(int16Data.buffer); }; source.connect(processor); processor.connect(audioContext.destination); }); }; ws.onmessage = (event) => { const data = JSON.parse(event.data); if (data.type === 'transcript') { console.log('Transcription:', data.text); } }; ``` ### Python Script Client ```python #!/usr/bin/env python3 import asyncio import websockets import sounddevice as sd import numpy as np import json async def stream_microphone(): uri = "ws://localhost:8766" async with websockets.connect(uri) as ws: print("Connected!") def audio_callback(indata, frames, time, status): # Convert to int16 and send audio = (indata[:, 0] * 32767).astype(np.int16) asyncio.create_task(ws.send(audio.tobytes())) # Start recording with sd.InputStream(callback=audio_callback, channels=1, samplerate=16000, blocksize=1600): # 0.1 second chunks while True: response = await ws.recv() data = json.loads(response) if data.get('type') == 'transcript': print(f"→ {data['text']}") asyncio.run(stream_microphone()) ``` ## Performance With GPU (GTX 1660): - **Latency**: <100ms per chunk - **Throughput**: ~50-100x realtime - **GPU Memory**: ~1.3GB - **Languages**: 25+ (auto-detected) ## Troubleshooting ### Server won't start ```bash # Check if port is in use lsof -i:8766 # Kill existing server pkill -f ws_server.py # Restart ./run.sh server/ws_server.py ``` ### Client can't connect ```bash # Check server is running ps aux | grep ws_server # Check firewall sudo ufw allow 8766 ``` ### No transcription output - Check audio format (must be int16 PCM, 16kHz, mono) - Check chunk size (not too small) - Check server logs for errors ### GPU not working - Server will fall back to CPU automatically - Check `nvidia-smi` for GPU status - Verify CUDA libraries are loaded (should be automatic with `./run.sh`) ## Next Steps 1. **Test the server**: `./run.sh test_client.py test.wav` 2. **Try microphone**: `./run.sh client/mic_stream.py` 3. **Build your own client** using the examples above Happy transcribing! 🎤