Files
miku-discord/stt-parakeet/REMOTE_USAGE.md

8.7 KiB

Remote Microphone Streaming Setup

This guide shows how to use the ASR system with a client on one machine streaming audio to a server on another machine.

Architecture

┌─────────────────┐                    ┌─────────────────┐
│  Client Machine │                    │  Server Machine │
│                 │                    │                 │
│  🎤 Microphone  │  ───WebSocket───▶  │  🖥️  Display    │
│                 │      (Audio)       │                 │
│  client/        │                    │  server/        │
│  mic_stream.py  │                    │  display_server │
└─────────────────┘                    └─────────────────┘

Server Setup (Machine with GPU)

1. Start the server with live display

cd /home/koko210Serve/parakeet-test
source venv/bin/activate
PYTHONPATH=/home/koko210Serve/parakeet-test python server/display_server.py

Options:

python server/display_server.py --host 0.0.0.0 --port 8766

The server will:

  • Bind to all network interfaces (0.0.0.0)
  • Display transcriptions in real-time with color coding
  • Show progressive updates as audio streams in
  • Highlight final transcriptions when complete

2. Configure firewall (if needed)

Allow incoming connections on port 8766:

# Ubuntu/Debian
sudo ufw allow 8766/tcp

# CentOS/RHEL
sudo firewall-cmd --permanent --add-port=8766/tcp
sudo firewall-cmd --reload

3. Get the server's IP address

# Find your server's IP address
ip addr show | grep "inet " | grep -v 127.0.0.1

Example output: 192.168.1.100

Client Setup (Remote Machine)

1. Install dependencies on client machine

Create a minimal Python environment:

# Create virtual environment
python3 -m venv asr-client
source asr-client/bin/activate

# Install only client dependencies
pip install websockets sounddevice numpy

2. Copy the client script

Copy client/mic_stream.py to your client machine:

# On server machine
scp client/mic_stream.py user@client-machine:~/

# Or download it via your preferred method

3. List available microphones

python mic_stream.py --list-devices

Example output:

Available audio input devices:
--------------------------------------------------------------------------------
[0] Built-in Microphone
    Channels: 2
    Sample rate: 44100.0 Hz
[1] USB Microphone
    Channels: 1
    Sample rate: 48000.0 Hz
--------------------------------------------------------------------------------

4. Start streaming

python mic_stream.py --url ws://SERVER_IP:8766

Replace SERVER_IP with your server's IP address (e.g., ws://192.168.1.100:8766)

Options:

# Use specific microphone device
python mic_stream.py --url ws://192.168.1.100:8766 --device 1

# Change sample rate (if needed)
python mic_stream.py --url ws://192.168.1.100:8766 --sample-rate 16000

# Adjust chunk size for network latency
python mic_stream.py --url ws://192.168.1.100:8766 --chunk-duration 0.2

Usage Flow

1. Start Server

On the server machine:

cd /home/koko210Serve/parakeet-test
source venv/bin/activate
PYTHONPATH=/home/koko210Serve/parakeet-test python server/display_server.py

You'll see:

================================================================================
ASR Server - Live Transcription Display
================================================================================
Server: ws://0.0.0.0:8766
Sample Rate: 16000 Hz
Model: Parakeet TDT 0.6B V3
================================================================================

Server is running and ready for connections!
Waiting for clients...

2. Connect Client

On the client machine:

python mic_stream.py --url ws://192.168.1.100:8766

You'll see:

Connected to server: ws://192.168.1.100:8766
Recording started. Press Ctrl+C to stop.

3. Speak into Microphone

  • Speak naturally into your microphone
  • Watch the server terminal for real-time transcriptions
  • Progressive updates appear in yellow as you speak
  • Final transcriptions appear in green when you pause

4. Stop Streaming

Press Ctrl+C on the client to stop recording and disconnect.

Display Color Coding

On the server display:

  • 🟢 GREEN = Final transcription (complete, accurate)
  • 🟡 YELLOW = Progressive update (in progress)
  • 🔵 BLUE = Connection events
  • WHITE = Server status messages

Example Session

Server Display:

================================================================================
✓ Client connected: 192.168.1.50:45232
================================================================================

[14:23:15] 192.168.1.50:45232
  → Hello this is

[14:23:17] 192.168.1.50:45232
  → Hello this is a test of the remote

[14:23:19] 192.168.1.50:45232
  ✓ FINAL: Hello this is a test of the remote microphone streaming system.

[14:23:25] 192.168.1.50:45232
  → Can you hear me

[14:23:27] 192.168.1.50:45232
  ✓ FINAL: Can you hear me clearly?

================================================================================
✗ Client disconnected: 192.168.1.50:45232
================================================================================

Client Display:

Connected to server: ws://192.168.1.100:8766
Recording started. Press Ctrl+C to stop.

Server: Connected to ASR server with live display
[PARTIAL] Hello this is
[PARTIAL] Hello this is a test of the remote
[FINAL] Hello this is a test of the remote microphone streaming system.
[PARTIAL] Can you hear me
[FINAL] Can you hear me clearly?

^C
Stopped by user
Disconnected from server
Client stopped by user

Network Considerations

Bandwidth Usage

  • Sample rate: 16000 Hz
  • Bit depth: 16-bit (int16)
  • Bandwidth: ~32 KB/s per client
  • Very low bandwidth - works well over WiFi or LAN

Latency

  • Progressive updates: Every ~2 seconds
  • Final transcription: When audio stops or on demand
  • Total latency: ~2-3 seconds (network + processing)

Multiple Clients

The server supports multiple simultaneous clients:

  • Each client gets its own session
  • Transcriptions are tagged with client IP:port
  • No interference between clients

Troubleshooting

Client Can't Connect

Error: [Errno 111] Connection refused

Solution:

  1. Check server is running
  2. Verify firewall allows port 8766
  3. Confirm server IP address is correct
  4. Test connectivity: ping SERVER_IP

No Audio Being Captured

Recording started but no transcriptions appear

Solution:

  1. Check microphone permissions
  2. List devices: python mic_stream.py --list-devices
  3. Try different device: --device N
  4. Test microphone in other apps first

Poor Transcription Quality

Solution:

  1. Move closer to microphone
  2. Reduce background noise
  3. Speak clearly and at normal pace
  4. Check microphone quality/settings

High Latency

Solution:

  1. Use wired connection instead of WiFi
  2. Reduce chunk duration: --chunk-duration 0.05
  3. Check network latency: ping SERVER_IP

Security Notes

⚠️ Important: This setup uses WebSocket without encryption (ws://)

For production use:

  • Use WSS (WebSocket Secure) with TLS certificates
  • Add authentication (API keys, tokens)
  • Restrict firewall rules to specific IP ranges
  • Consider using VPN for remote access

Advanced: Auto-start Server

Create a systemd service (Linux):

sudo nano /etc/systemd/system/asr-server.service
[Unit]
Description=ASR WebSocket Server
After=network.target

[Service]
Type=simple
User=YOUR_USERNAME
WorkingDirectory=/home/koko210Serve/parakeet-test
Environment="PYTHONPATH=/home/koko210Serve/parakeet-test"
ExecStart=/home/koko210Serve/parakeet-test/venv/bin/python server/display_server.py
Restart=always

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl enable asr-server
sudo systemctl start asr-server
sudo systemctl status asr-server

Performance Tips

  1. Server: Use GPU for best performance (~100ms latency)
  2. Client: Use low chunk duration for responsiveness (0.1s default)
  3. Network: Wired connection preferred, WiFi works fine
  4. Audio Quality: 16kHz sample rate is optimal for speech

Summary

Server displays transcriptions in real-time
Client sends audio from remote microphone
Progressive updates show live transcription
Final results when speech pauses
Multiple clients supported
Low bandwidth, low latency

Enjoy your remote ASR streaming system! 🎤🌐🖥️