# Server & Client Usage Guide

## ✅ Server is Working!

The WebSocket server is running on port **8766** with GPU acceleration.

## Quick Start

### 1. Start the Server

```bash
./run.sh server/ws_server.py
```

Server will start on: `ws://localhost:8766`

### 2. Test with Simple Client

```bash
./run.sh test_client.py test.wav
```

### 3. Use Microphone Client

```bash
# List audio devices first
./run.sh client/mic_stream.py --list-devices

# Start streaming from microphone
./run.sh client/mic_stream.py

# Or specify device
./run.sh client/mic_stream.py --device 0
```

## Available Clients

### 1. **test_client.py** - Simple File Testing
```bash
./run.sh test_client.py your_audio.wav
```
- Sends audio file to server
- Shows real-time transcription
- Good for testing

### 2. **client/mic_stream.py** - Live Microphone
```bash
./run.sh client/mic_stream.py
```
- Captures from microphone
- Streams to server
- Real-time transcription display

### 3. **Custom Client** - Your Own Script

```python
import asyncio
import websockets
import json

async def connect():
    async with websockets.connect("ws://localhost:8766") as ws:
        # Send audio as int16 PCM bytes
        audio_bytes = your_audio_data.astype('int16').tobytes()
        await ws.send(audio_bytes)
        
        # Receive transcription
        response = await ws.recv()
        result = json.loads(response)
        print(result['text'])

asyncio.run(connect())
```

## Server Options

```bash
# Custom host/port
./run.sh server/ws_server.py --host 0.0.0.0 --port 9000

# Enable VAD (for long audio)
./run.sh server/ws_server.py --use-vad

# Different model
./run.sh server/ws_server.py --model nemo-parakeet-tdt-0.6b-v3

# Change sample rate
./run.sh server/ws_server.py --sample-rate 16000
```

## Client Options

### Microphone Client
```bash
# List devices
./run.sh client/mic_stream.py --list-devices

# Use specific device
./run.sh client/mic_stream.py --device 2

# Custom server URL
./run.sh client/mic_stream.py --url ws://192.168.1.100:8766

# Adjust chunk duration (lower = lower latency)
./run.sh client/mic_stream.py --chunk-duration 0.05
```

## Protocol

The server uses a simple JSON-based protocol:

### Server → Client Messages

```json
{
  "type": "info",
  "message": "Connected to ASR server",
  "sample_rate": 16000
}
```

```json
{
  "type": "transcript",
  "text": "transcribed text here",
  "is_final": false
}
```

```json
{
  "type": "error",
  "message": "error description"
}
```

### Client → Server Messages

**Send audio:**
- Binary data (int16 PCM, little-endian)
- Sample rate: 16000 Hz
- Mono channel

**Send commands:**
```json
{"type": "final"}   // Process remaining buffer
{"type": "reset"}   // Reset audio buffer
```

## Audio Format Requirements

- **Format**: int16 PCM (bytes)
- **Sample Rate**: 16000 Hz
- **Channels**: Mono (1)
- **Byte Order**: Little-endian

### Convert Audio in Python

```python
import numpy as np
import soundfile as sf

# Load audio
audio, sr = sf.read("file.wav", dtype='float32')

# Convert to mono
if audio.ndim > 1:
    audio = audio[:, 0]

# Resample if needed (install resampy)
if sr != 16000:
    import resampy
    audio = resampy.resample(audio, sr, 16000)

# Convert to int16 for sending
audio_int16 = (audio * 32767).astype(np.int16)
audio_bytes = audio_int16.tobytes()
```

## Examples

### Browser Client (JavaScript)

```javascript
const ws = new WebSocket('ws://localhost:8766');

ws.onopen = () => {
    console.log('Connected!');
    
    // Capture from microphone
    navigator.mediaDevices.getUserMedia({ audio: true })
        .then(stream => {
            const audioContext = new AudioContext({ sampleRate: 16000 });
            const source = audioContext.createMediaStreamSource(stream);
            const processor = audioContext.createScriptProcessor(4096, 1, 1);
            
            processor.onaudioprocess = (e) => {
                const audioData = e.inputBuffer.getChannelData(0);
                // Convert float32 to int16
                const int16Data = new Int16Array(audioData.length);
                for (let i = 0; i < audioData.length; i++) {
                    int16Data[i] = Math.max(-32768, Math.min(32767, audioData[i] * 32768));
                }
                ws.send(int16Data.buffer);
            };
            
            source.connect(processor);
            processor.connect(audioContext.destination);
        });
};

ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    if (data.type === 'transcript') {
        console.log('Transcription:', data.text);
    }
};
```

### Python Script Client

```python
#!/usr/bin/env python3
import asyncio
import websockets
import sounddevice as sd
import numpy as np
import json

async def stream_microphone():
    uri = "ws://localhost:8766"
    
    async with websockets.connect(uri) as ws:
        print("Connected!")
        
        def audio_callback(indata, frames, time, status):
            # Convert to int16 and send
            audio = (indata[:, 0] * 32767).astype(np.int16)
            asyncio.create_task(ws.send(audio.tobytes()))
        
        # Start recording
        with sd.InputStream(callback=audio_callback,
                           channels=1,
                           samplerate=16000,
                           blocksize=1600):  # 0.1 second chunks
            
            while True:
                response = await ws.recv()
                data = json.loads(response)
                if data.get('type') == 'transcript':
                    print(f"→ {data['text']}")

asyncio.run(stream_microphone())
```

## Performance

With GPU (GTX 1660):
- **Latency**: <100ms per chunk
- **Throughput**: ~50-100x realtime
- **GPU Memory**: ~1.3GB
- **Languages**: 25+ (auto-detected)

## Troubleshooting

### Server won't start
```bash
# Check if port is in use
lsof -i:8766

# Kill existing server
pkill -f ws_server.py

# Restart
./run.sh server/ws_server.py
```

### Client can't connect
```bash
# Check server is running
ps aux | grep ws_server

# Check firewall
sudo ufw allow 8766
```

### No transcription output
- Check audio format (must be int16 PCM, 16kHz, mono)
- Check chunk size (not too small)
- Check server logs for errors

### GPU not working
- Server will fall back to CPU automatically
- Check `nvidia-smi` for GPU status
- Verify CUDA libraries are loaded (should be automatic with `./run.sh`)

## Next Steps

1. **Test the server**: `./run.sh test_client.py test.wav`
2. **Try microphone**: `./run.sh client/mic_stream.py`
3. **Build your own client** using the examples above

Happy transcribing! 🎤