6.5 KiB
6.5 KiB
Server & Client Usage Guide
✅ Server is Working!
The WebSocket server is running on port 8766 with GPU acceleration.
Quick Start
1. Start the Server
./run.sh server/ws_server.py
Server will start on: ws://localhost:8766
2. Test with Simple Client
./run.sh test_client.py test.wav
3. Use Microphone Client
# List audio devices first
./run.sh client/mic_stream.py --list-devices
# Start streaming from microphone
./run.sh client/mic_stream.py
# Or specify device
./run.sh client/mic_stream.py --device 0
Available Clients
1. test_client.py - Simple File Testing
./run.sh test_client.py your_audio.wav
- Sends audio file to server
- Shows real-time transcription
- Good for testing
2. client/mic_stream.py - Live Microphone
./run.sh client/mic_stream.py
- Captures from microphone
- Streams to server
- Real-time transcription display
3. Custom Client - Your Own Script
import asyncio
import websockets
import json
async def connect():
async with websockets.connect("ws://localhost:8766") as ws:
# Send audio as int16 PCM bytes
audio_bytes = your_audio_data.astype('int16').tobytes()
await ws.send(audio_bytes)
# Receive transcription
response = await ws.recv()
result = json.loads(response)
print(result['text'])
asyncio.run(connect())
Server Options
# Custom host/port
./run.sh server/ws_server.py --host 0.0.0.0 --port 9000
# Enable VAD (for long audio)
./run.sh server/ws_server.py --use-vad
# Different model
./run.sh server/ws_server.py --model nemo-parakeet-tdt-0.6b-v3
# Change sample rate
./run.sh server/ws_server.py --sample-rate 16000
Client Options
Microphone Client
# List devices
./run.sh client/mic_stream.py --list-devices
# Use specific device
./run.sh client/mic_stream.py --device 2
# Custom server URL
./run.sh client/mic_stream.py --url ws://192.168.1.100:8766
# Adjust chunk duration (lower = lower latency)
./run.sh client/mic_stream.py --chunk-duration 0.05
Protocol
The server uses a simple JSON-based protocol:
Server → Client Messages
{
"type": "info",
"message": "Connected to ASR server",
"sample_rate": 16000
}
{
"type": "transcript",
"text": "transcribed text here",
"is_final": false
}
{
"type": "error",
"message": "error description"
}
Client → Server Messages
Send audio:
- Binary data (int16 PCM, little-endian)
- Sample rate: 16000 Hz
- Mono channel
Send commands:
{"type": "final"} // Process remaining buffer
{"type": "reset"} // Reset audio buffer
Audio Format Requirements
- Format: int16 PCM (bytes)
- Sample Rate: 16000 Hz
- Channels: Mono (1)
- Byte Order: Little-endian
Convert Audio in Python
import numpy as np
import soundfile as sf
# Load audio
audio, sr = sf.read("file.wav", dtype='float32')
# Convert to mono
if audio.ndim > 1:
audio = audio[:, 0]
# Resample if needed (install resampy)
if sr != 16000:
import resampy
audio = resampy.resample(audio, sr, 16000)
# Convert to int16 for sending
audio_int16 = (audio * 32767).astype(np.int16)
audio_bytes = audio_int16.tobytes()
Examples
Browser Client (JavaScript)
const ws = new WebSocket('ws://localhost:8766');
ws.onopen = () => {
console.log('Connected!');
// Capture from microphone
navigator.mediaDevices.getUserMedia({ audio: true })
.then(stream => {
const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);
const processor = audioContext.createScriptProcessor(4096, 1, 1);
processor.onaudioprocess = (e) => {
const audioData = e.inputBuffer.getChannelData(0);
// Convert float32 to int16
const int16Data = new Int16Array(audioData.length);
for (let i = 0; i < audioData.length; i++) {
int16Data[i] = Math.max(-32768, Math.min(32767, audioData[i] * 32768));
}
ws.send(int16Data.buffer);
};
source.connect(processor);
processor.connect(audioContext.destination);
});
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'transcript') {
console.log('Transcription:', data.text);
}
};
Python Script Client
#!/usr/bin/env python3
import asyncio
import websockets
import sounddevice as sd
import numpy as np
import json
async def stream_microphone():
uri = "ws://localhost:8766"
async with websockets.connect(uri) as ws:
print("Connected!")
def audio_callback(indata, frames, time, status):
# Convert to int16 and send
audio = (indata[:, 0] * 32767).astype(np.int16)
asyncio.create_task(ws.send(audio.tobytes()))
# Start recording
with sd.InputStream(callback=audio_callback,
channels=1,
samplerate=16000,
blocksize=1600): # 0.1 second chunks
while True:
response = await ws.recv()
data = json.loads(response)
if data.get('type') == 'transcript':
print(f"→ {data['text']}")
asyncio.run(stream_microphone())
Performance
With GPU (GTX 1660):
- Latency: <100ms per chunk
- Throughput: ~50-100x realtime
- GPU Memory: ~1.3GB
- Languages: 25+ (auto-detected)
Troubleshooting
Server won't start
# Check if port is in use
lsof -i:8766
# Kill existing server
pkill -f ws_server.py
# Restart
./run.sh server/ws_server.py
Client can't connect
# Check server is running
ps aux | grep ws_server
# Check firewall
sudo ufw allow 8766
No transcription output
- Check audio format (must be int16 PCM, 16kHz, mono)
- Check chunk size (not too small)
- Check server logs for errors
GPU not working
- Server will fall back to CPU automatically
- Check
nvidia-smifor GPU status - Verify CUDA libraries are loaded (should be automatic with
./run.sh)
Next Steps
- Test the server:
./run.sh test_client.py test.wav - Try microphone:
./run.sh client/mic_stream.py - Build your own client using the examples above
Happy transcribing! 🎤