Files
miku-discord/stt-parakeet/CLIENT_GUIDE.md

6.5 KiB

Server & Client Usage Guide

Server is Working!

The WebSocket server is running on port 8766 with GPU acceleration.

Quick Start

1. Start the Server

./run.sh server/ws_server.py

Server will start on: ws://localhost:8766

2. Test with Simple Client

./run.sh test_client.py test.wav

3. Use Microphone Client

# List audio devices first
./run.sh client/mic_stream.py --list-devices

# Start streaming from microphone
./run.sh client/mic_stream.py

# Or specify device
./run.sh client/mic_stream.py --device 0

Available Clients

1. test_client.py - Simple File Testing

./run.sh test_client.py your_audio.wav
  • Sends audio file to server
  • Shows real-time transcription
  • Good for testing

2. client/mic_stream.py - Live Microphone

./run.sh client/mic_stream.py
  • Captures from microphone
  • Streams to server
  • Real-time transcription display

3. Custom Client - Your Own Script

import asyncio
import websockets
import json

async def connect():
    async with websockets.connect("ws://localhost:8766") as ws:
        # Send audio as int16 PCM bytes
        audio_bytes = your_audio_data.astype('int16').tobytes()
        await ws.send(audio_bytes)
        
        # Receive transcription
        response = await ws.recv()
        result = json.loads(response)
        print(result['text'])

asyncio.run(connect())

Server Options

# Custom host/port
./run.sh server/ws_server.py --host 0.0.0.0 --port 9000

# Enable VAD (for long audio)
./run.sh server/ws_server.py --use-vad

# Different model
./run.sh server/ws_server.py --model nemo-parakeet-tdt-0.6b-v3

# Change sample rate
./run.sh server/ws_server.py --sample-rate 16000

Client Options

Microphone Client

# List devices
./run.sh client/mic_stream.py --list-devices

# Use specific device
./run.sh client/mic_stream.py --device 2

# Custom server URL
./run.sh client/mic_stream.py --url ws://192.168.1.100:8766

# Adjust chunk duration (lower = lower latency)
./run.sh client/mic_stream.py --chunk-duration 0.05

Protocol

The server uses a simple JSON-based protocol:

Server → Client Messages

{
  "type": "info",
  "message": "Connected to ASR server",
  "sample_rate": 16000
}
{
  "type": "transcript",
  "text": "transcribed text here",
  "is_final": false
}
{
  "type": "error",
  "message": "error description"
}

Client → Server Messages

Send audio:

  • Binary data (int16 PCM, little-endian)
  • Sample rate: 16000 Hz
  • Mono channel

Send commands:

{"type": "final"}   // Process remaining buffer
{"type": "reset"}   // Reset audio buffer

Audio Format Requirements

  • Format: int16 PCM (bytes)
  • Sample Rate: 16000 Hz
  • Channels: Mono (1)
  • Byte Order: Little-endian

Convert Audio in Python

import numpy as np
import soundfile as sf

# Load audio
audio, sr = sf.read("file.wav", dtype='float32')

# Convert to mono
if audio.ndim > 1:
    audio = audio[:, 0]

# Resample if needed (install resampy)
if sr != 16000:
    import resampy
    audio = resampy.resample(audio, sr, 16000)

# Convert to int16 for sending
audio_int16 = (audio * 32767).astype(np.int16)
audio_bytes = audio_int16.tobytes()

Examples

Browser Client (JavaScript)

const ws = new WebSocket('ws://localhost:8766');

ws.onopen = () => {
    console.log('Connected!');
    
    // Capture from microphone
    navigator.mediaDevices.getUserMedia({ audio: true })
        .then(stream => {
            const audioContext = new AudioContext({ sampleRate: 16000 });
            const source = audioContext.createMediaStreamSource(stream);
            const processor = audioContext.createScriptProcessor(4096, 1, 1);
            
            processor.onaudioprocess = (e) => {
                const audioData = e.inputBuffer.getChannelData(0);
                // Convert float32 to int16
                const int16Data = new Int16Array(audioData.length);
                for (let i = 0; i < audioData.length; i++) {
                    int16Data[i] = Math.max(-32768, Math.min(32767, audioData[i] * 32768));
                }
                ws.send(int16Data.buffer);
            };
            
            source.connect(processor);
            processor.connect(audioContext.destination);
        });
};

ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    if (data.type === 'transcript') {
        console.log('Transcription:', data.text);
    }
};

Python Script Client

#!/usr/bin/env python3
import asyncio
import websockets
import sounddevice as sd
import numpy as np
import json

async def stream_microphone():
    uri = "ws://localhost:8766"
    
    async with websockets.connect(uri) as ws:
        print("Connected!")
        
        def audio_callback(indata, frames, time, status):
            # Convert to int16 and send
            audio = (indata[:, 0] * 32767).astype(np.int16)
            asyncio.create_task(ws.send(audio.tobytes()))
        
        # Start recording
        with sd.InputStream(callback=audio_callback,
                           channels=1,
                           samplerate=16000,
                           blocksize=1600):  # 0.1 second chunks
            
            while True:
                response = await ws.recv()
                data = json.loads(response)
                if data.get('type') == 'transcript':
                    print(f"→ {data['text']}")

asyncio.run(stream_microphone())

Performance

With GPU (GTX 1660):

  • Latency: <100ms per chunk
  • Throughput: ~50-100x realtime
  • GPU Memory: ~1.3GB
  • Languages: 25+ (auto-detected)

Troubleshooting

Server won't start

# Check if port is in use
lsof -i:8766

# Kill existing server
pkill -f ws_server.py

# Restart
./run.sh server/ws_server.py

Client can't connect

# Check server is running
ps aux | grep ws_server

# Check firewall
sudo ufw allow 8766

No transcription output

  • Check audio format (must be int16 PCM, 16kHz, mono)
  • Check chunk size (not too small)
  • Check server logs for errors

GPU not working

  • Server will fall back to CPU automatically
  • Check nvidia-smi for GPU status
  • Verify CUDA libraries are loaded (should be automatic with ./run.sh)

Next Steps

  1. Test the server: ./run.sh test_client.py test.wav
  2. Try microphone: ./run.sh client/mic_stream.py
  3. Build your own client using the examples above

Happy transcribing! 🎤