miku-discord/stt-parakeet/REMOTE_USAGE.md

# Remote Microphone Streaming Setup

This guide shows how to use the ASR system with a client on one machine streaming audio to a server on another machine.

## Architecture

```
┌─────────────────┐                    ┌─────────────────┐
│  Client Machine │                    │  Server Machine │
│                 │                    │                 │
│  🎤 Microphone  │  ───WebSocket───▶  │  🖥️  Display    │
│                 │      (Audio)       │                 │
│  client/        │                    │  server/        │
│  mic_stream.py  │                    │  display_server │
└─────────────────┘                    └─────────────────┘
```

## Server Setup (Machine with GPU)

### 1. Start the server with live display

```bash
cd /home/koko210Serve/parakeet-test
source venv/bin/activate
PYTHONPATH=/home/koko210Serve/parakeet-test python server/display_server.py
```

**Options:**
```bash
python server/display_server.py --host 0.0.0.0 --port 8766
```

The server will:
- ✅ Bind to all network interfaces (0.0.0.0)
- ✅ Display transcriptions in real-time with color coding
- ✅ Show progressive updates as audio streams in
- ✅ Highlight final transcriptions when complete

### 2. Configure firewall (if needed)

Allow incoming connections on port 8766:
```bash
# Ubuntu/Debian
sudo ufw allow 8766/tcp

# CentOS/RHEL
sudo firewall-cmd --permanent --add-port=8766/tcp
sudo firewall-cmd --reload
```

### 3. Get the server's IP address

```bash
# Find your server's IP address
ip addr show | grep "inet " | grep -v 127.0.0.1
```

Example output: `192.168.1.100`

## Client Setup (Remote Machine)

### 1. Install dependencies on client machine

Create a minimal Python environment:

```bash
# Create virtual environment
python3 -m venv asr-client
source asr-client/bin/activate

# Install only client dependencies
pip install websockets sounddevice numpy
```

### 2. Copy the client script

Copy `client/mic_stream.py` to your client machine:

```bash
# On server machine
scp client/mic_stream.py user@client-machine:~/

# Or download it via your preferred method
```

### 3. List available microphones

```bash
python mic_stream.py --list-devices
```

Example output:
```
Available audio input devices:
--------------------------------------------------------------------------------
[0] Built-in Microphone
    Channels: 2
    Sample rate: 44100.0 Hz
[1] USB Microphone
    Channels: 1
    Sample rate: 48000.0 Hz
--------------------------------------------------------------------------------
```

### 4. Start streaming

```bash
python mic_stream.py --url ws://SERVER_IP:8766
```

Replace `SERVER_IP` with your server's IP address (e.g., `ws://192.168.1.100:8766`)

**Options:**
```bash
# Use specific microphone device
python mic_stream.py --url ws://192.168.1.100:8766 --device 1

# Change sample rate (if needed)
python mic_stream.py --url ws://192.168.1.100:8766 --sample-rate 16000

# Adjust chunk size for network latency
python mic_stream.py --url ws://192.168.1.100:8766 --chunk-duration 0.2
```

## Usage Flow

### 1. Start Server
On the server machine:
```bash
cd /home/koko210Serve/parakeet-test
source venv/bin/activate
PYTHONPATH=/home/koko210Serve/parakeet-test python server/display_server.py
```

You'll see:
```
================================================================================
ASR Server - Live Transcription Display
================================================================================
Server: ws://0.0.0.0:8766
Sample Rate: 16000 Hz
Model: Parakeet TDT 0.6B V3
================================================================================

Server is running and ready for connections!
Waiting for clients...
```

### 2. Connect Client
On the client machine:
```bash
python mic_stream.py --url ws://192.168.1.100:8766
```

You'll see:
```
Connected to server: ws://192.168.1.100:8766
Recording started. Press Ctrl+C to stop.
```

### 3. Speak into Microphone
- Speak naturally into your microphone
- Watch the **server terminal** for real-time transcriptions
- Progressive updates appear in yellow as you speak
- Final transcriptions appear in green when you pause

### 4. Stop Streaming
Press `Ctrl+C` on the client to stop recording and disconnect.

## Display Color Coding

On the server display:

- **🟢 GREEN** = Final transcription (complete, accurate)
- **🟡 YELLOW** = Progressive update (in progress)
- **🔵 BLUE** = Connection events
- **⚪ WHITE** = Server status messages

## Example Session

### Server Display:
```
================================================================================
✓ Client connected: 192.168.1.50:45232
================================================================================

[14:23:15] 192.168.1.50:45232
  → Hello this is

[14:23:17] 192.168.1.50:45232
  → Hello this is a test of the remote

[14:23:19] 192.168.1.50:45232
  ✓ FINAL: Hello this is a test of the remote microphone streaming system.

[14:23:25] 192.168.1.50:45232
  → Can you hear me

[14:23:27] 192.168.1.50:45232
  ✓ FINAL: Can you hear me clearly?

================================================================================
✗ Client disconnected: 192.168.1.50:45232
================================================================================
```

### Client Display:
```
Connected to server: ws://192.168.1.100:8766
Recording started. Press Ctrl+C to stop.

Server: Connected to ASR server with live display
[PARTIAL] Hello this is
[PARTIAL] Hello this is a test of the remote
[FINAL] Hello this is a test of the remote microphone streaming system.
[PARTIAL] Can you hear me
[FINAL] Can you hear me clearly?

^C
Stopped by user
Disconnected from server
Client stopped by user
```

## Network Considerations

### Bandwidth Usage
- Sample rate: 16000 Hz
- Bit depth: 16-bit (int16)
- Bandwidth: ~32 KB/s per client
- Very low bandwidth - works well over WiFi or LAN

### Latency
- Progressive updates: Every ~2 seconds
- Final transcription: When audio stops or on demand
- Total latency: ~2-3 seconds (network + processing)

### Multiple Clients
The server supports multiple simultaneous clients:
- Each client gets its own session
- Transcriptions are tagged with client IP:port
- No interference between clients

## Troubleshooting

### Client Can't Connect
```
Error: [Errno 111] Connection refused
```
**Solution:**
1. Check server is running
2. Verify firewall allows port 8766
3. Confirm server IP address is correct
4. Test connectivity: `ping SERVER_IP`

### No Audio Being Captured
```
Recording started but no transcriptions appear
```
**Solution:**
1. Check microphone permissions
2. List devices: `python mic_stream.py --list-devices`
3. Try different device: `--device N`
4. Test microphone in other apps first

### Poor Transcription Quality
**Solution:**
1. Move closer to microphone
2. Reduce background noise
3. Speak clearly and at normal pace
4. Check microphone quality/settings

### High Latency
**Solution:**
1. Use wired connection instead of WiFi
2. Reduce chunk duration: `--chunk-duration 0.05`
3. Check network latency: `ping SERVER_IP`

## Security Notes

⚠️ **Important:** This setup uses WebSocket without encryption (ws://)

For production use:
- Use WSS (WebSocket Secure) with TLS certificates
- Add authentication (API keys, tokens)
- Restrict firewall rules to specific IP ranges
- Consider using VPN for remote access

## Advanced: Auto-start Server

Create a systemd service (Linux):

```bash
sudo nano /etc/systemd/system/asr-server.service
```

```ini
[Unit]
Description=ASR WebSocket Server
After=network.target

[Service]
Type=simple
User=YOUR_USERNAME
WorkingDirectory=/home/koko210Serve/parakeet-test
Environment="PYTHONPATH=/home/koko210Serve/parakeet-test"
ExecStart=/home/koko210Serve/parakeet-test/venv/bin/python server/display_server.py
Restart=always

[Install]
WantedBy=multi-user.target
```

Enable and start:
```bash
sudo systemctl enable asr-server
sudo systemctl start asr-server
sudo systemctl status asr-server
```

## Performance Tips

1. **Server:** Use GPU for best performance (~100ms latency)
2. **Client:** Use low chunk duration for responsiveness (0.1s default)
3. **Network:** Wired connection preferred, WiFi works fine
4. **Audio Quality:** 16kHz sample rate is optimal for speech

## Summary

✅ **Server displays transcriptions in real-time**
✅ **Client sends audio from remote microphone**
✅ **Progressive updates show live transcription**
✅ **Final results when speech pauses**
✅ **Multiple clients supported**
✅ **Low bandwidth, low latency**

Enjoy your remote ASR streaming system! 🎤 → 🌐 → 🖥️