Files

koko210Serve 362108f4b0 Decided on Parakeet ONNX Runtime. Works pretty great. Realtime voice chat possible now. UX lacking.

2026-01-19 00:29:44 +02:00

6.5 KiB

Raw Blame History

Server & Client Usage Guide

✅ Server is Working!

The WebSocket server is running on port 8766 with GPU acceleration.

Quick Start

1. Start the Server

./run.sh server/ws_server.py

Server will start on: ws://localhost:8766

2. Test with Simple Client

./run.sh test_client.py test.wav

3. Use Microphone Client

# List audio devices first
./run.sh client/mic_stream.py --list-devices

# Start streaming from microphone
./run.sh client/mic_stream.py

# Or specify device
./run.sh client/mic_stream.py --device 0

Available Clients

1. test_client.py - Simple File Testing

./run.sh test_client.py your_audio.wav

Sends audio file to server
Shows real-time transcription
Good for testing

2. client/mic_stream.py - Live Microphone

./run.sh client/mic_stream.py

Captures from microphone
Streams to server
Real-time transcription display

3. Custom Client - Your Own Script

import asyncio
import websockets
import json

async def connect():
    async with websockets.connect("ws://localhost:8766") as ws:
        # Send audio as int16 PCM bytes
        audio_bytes = your_audio_data.astype('int16').tobytes()
        await ws.send(audio_bytes)
        
        # Receive transcription
        response = await ws.recv()
        result = json.loads(response)
        print(result['text'])

asyncio.run(connect())

Server Options

# Custom host/port
./run.sh server/ws_server.py --host 0.0.0.0 --port 9000

# Enable VAD (for long audio)
./run.sh server/ws_server.py --use-vad

# Different model
./run.sh server/ws_server.py --model nemo-parakeet-tdt-0.6b-v3

# Change sample rate
./run.sh server/ws_server.py --sample-rate 16000

Client Options

Microphone Client

# List devices
./run.sh client/mic_stream.py --list-devices

# Use specific device
./run.sh client/mic_stream.py --device 2

# Custom server URL
./run.sh client/mic_stream.py --url ws://192.168.1.100:8766

# Adjust chunk duration (lower = lower latency)
./run.sh client/mic_stream.py --chunk-duration 0.05

Protocol

The server uses a simple JSON-based protocol:

Server → Client Messages

{
  "type": "info",
  "message": "Connected to ASR server",
  "sample_rate": 16000
}

{
  "type": "transcript",
  "text": "transcribed text here",
  "is_final": false
}

{
  "type": "error",
  "message": "error description"
}

Client → Server Messages

Send audio:

Binary data (int16 PCM, little-endian)
Sample rate: 16000 Hz
Mono channel

Send commands:

{"type": "final"}   // Process remaining buffer
{"type": "reset"}   // Reset audio buffer

Audio Format Requirements

Format: int16 PCM (bytes)
Sample Rate: 16000 Hz
Channels: Mono (1)
Byte Order: Little-endian

Convert Audio in Python

import numpy as np
import soundfile as sf

# Load audio
audio, sr = sf.read("file.wav", dtype='float32')

# Convert to mono
if audio.ndim > 1:
    audio = audio[:, 0]

# Resample if needed (install resampy)
if sr != 16000:
    import resampy
    audio = resampy.resample(audio, sr, 16000)

# Convert to int16 for sending
audio_int16 = (audio * 32767).astype(np.int16)
audio_bytes = audio_int16.tobytes()

Examples

Browser Client (JavaScript)

const ws = new WebSocket('ws://localhost:8766');

ws.onopen = () => {
    console.log('Connected!');
    
    // Capture from microphone
    navigator.mediaDevices.getUserMedia({ audio: true })
        .then(stream => {
            const audioContext = new AudioContext({ sampleRate: 16000 });
            const source = audioContext.createMediaStreamSource(stream);
            const processor = audioContext.createScriptProcessor(4096, 1, 1);
            
            processor.onaudioprocess = (e) => {
                const audioData = e.inputBuffer.getChannelData(0);
                // Convert float32 to int16
                const int16Data = new Int16Array(audioData.length);
                for (let i = 0; i < audioData.length; i++) {
                    int16Data[i] = Math.max(-32768, Math.min(32767, audioData[i] * 32768));
                }
                ws.send(int16Data.buffer);
            };
            
            source.connect(processor);
            processor.connect(audioContext.destination);
        });
};

ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    if (data.type === 'transcript') {
        console.log('Transcription:', data.text);
    }
};

Python Script Client

#!/usr/bin/env python3
import asyncio
import websockets
import sounddevice as sd
import numpy as np
import json

async def stream_microphone():
    uri = "ws://localhost:8766"
    
    async with websockets.connect(uri) as ws:
        print("Connected!")
        
        def audio_callback(indata, frames, time, status):
            # Convert to int16 and send
            audio = (indata[:, 0] * 32767).astype(np.int16)
            asyncio.create_task(ws.send(audio.tobytes()))
        
        # Start recording
        with sd.InputStream(callback=audio_callback,
                           channels=1,
                           samplerate=16000,
                           blocksize=1600):  # 0.1 second chunks
            
            while True:
                response = await ws.recv()
                data = json.loads(response)
                if data.get('type') == 'transcript':
                    print(f"→ {data['text']}")

asyncio.run(stream_microphone())

Performance

With GPU (GTX 1660):

Latency: <100ms per chunk
Throughput: ~50-100x realtime
GPU Memory: ~1.3GB
Languages: 25+ (auto-detected)

Troubleshooting

Server won't start

# Check if port is in use
lsof -i:8766

# Kill existing server
pkill -f ws_server.py

# Restart
./run.sh server/ws_server.py

Client can't connect

# Check server is running
ps aux | grep ws_server

# Check firewall
sudo ufw allow 8766

No transcription output

Check audio format (must be int16 PCM, 16kHz, mono)
Check chunk size (not too small)
Check server logs for errors

GPU not working

Server will fall back to CPU automatically
Check nvidia-smi for GPU status
Verify CUDA libraries are loaded (should be automatic with ./run.sh)

Next Steps

Test the server: ./run.sh test_client.py test.wav
Try microphone: ./run.sh client/mic_stream.py
Build your own client using the examples above

Happy transcribing! 🎤

6.5 KiB Raw Blame History