Files
miku-discord/stt-parakeet/REMOTE_USAGE.md

338 lines
8.7 KiB
Markdown

# Remote Microphone Streaming Setup
This guide shows how to use the ASR system with a client on one machine streaming audio to a server on another machine.
## Architecture
```
┌─────────────────┐ ┌─────────────────┐
│ Client Machine │ │ Server Machine │
│ │ │ │
│ 🎤 Microphone │ ───WebSocket───▶ │ 🖥️ Display │
│ │ (Audio) │ │
│ client/ │ │ server/ │
│ mic_stream.py │ │ display_server │
└─────────────────┘ └─────────────────┘
```
## Server Setup (Machine with GPU)
### 1. Start the server with live display
```bash
cd /home/koko210Serve/parakeet-test
source venv/bin/activate
PYTHONPATH=/home/koko210Serve/parakeet-test python server/display_server.py
```
**Options:**
```bash
python server/display_server.py --host 0.0.0.0 --port 8766
```
The server will:
- ✅ Bind to all network interfaces (0.0.0.0)
- ✅ Display transcriptions in real-time with color coding
- ✅ Show progressive updates as audio streams in
- ✅ Highlight final transcriptions when complete
### 2. Configure firewall (if needed)
Allow incoming connections on port 8766:
```bash
# Ubuntu/Debian
sudo ufw allow 8766/tcp
# CentOS/RHEL
sudo firewall-cmd --permanent --add-port=8766/tcp
sudo firewall-cmd --reload
```
### 3. Get the server's IP address
```bash
# Find your server's IP address
ip addr show | grep "inet " | grep -v 127.0.0.1
```
Example output: `192.168.1.100`
## Client Setup (Remote Machine)
### 1. Install dependencies on client machine
Create a minimal Python environment:
```bash
# Create virtual environment
python3 -m venv asr-client
source asr-client/bin/activate
# Install only client dependencies
pip install websockets sounddevice numpy
```
### 2. Copy the client script
Copy `client/mic_stream.py` to your client machine:
```bash
# On server machine
scp client/mic_stream.py user@client-machine:~/
# Or download it via your preferred method
```
### 3. List available microphones
```bash
python mic_stream.py --list-devices
```
Example output:
```
Available audio input devices:
--------------------------------------------------------------------------------
[0] Built-in Microphone
Channels: 2
Sample rate: 44100.0 Hz
[1] USB Microphone
Channels: 1
Sample rate: 48000.0 Hz
--------------------------------------------------------------------------------
```
### 4. Start streaming
```bash
python mic_stream.py --url ws://SERVER_IP:8766
```
Replace `SERVER_IP` with your server's IP address (e.g., `ws://192.168.1.100:8766`)
**Options:**
```bash
# Use specific microphone device
python mic_stream.py --url ws://192.168.1.100:8766 --device 1
# Change sample rate (if needed)
python mic_stream.py --url ws://192.168.1.100:8766 --sample-rate 16000
# Adjust chunk size for network latency
python mic_stream.py --url ws://192.168.1.100:8766 --chunk-duration 0.2
```
## Usage Flow
### 1. Start Server
On the server machine:
```bash
cd /home/koko210Serve/parakeet-test
source venv/bin/activate
PYTHONPATH=/home/koko210Serve/parakeet-test python server/display_server.py
```
You'll see:
```
================================================================================
ASR Server - Live Transcription Display
================================================================================
Server: ws://0.0.0.0:8766
Sample Rate: 16000 Hz
Model: Parakeet TDT 0.6B V3
================================================================================
Server is running and ready for connections!
Waiting for clients...
```
### 2. Connect Client
On the client machine:
```bash
python mic_stream.py --url ws://192.168.1.100:8766
```
You'll see:
```
Connected to server: ws://192.168.1.100:8766
Recording started. Press Ctrl+C to stop.
```
### 3. Speak into Microphone
- Speak naturally into your microphone
- Watch the **server terminal** for real-time transcriptions
- Progressive updates appear in yellow as you speak
- Final transcriptions appear in green when you pause
### 4. Stop Streaming
Press `Ctrl+C` on the client to stop recording and disconnect.
## Display Color Coding
On the server display:
- **🟢 GREEN** = Final transcription (complete, accurate)
- **🟡 YELLOW** = Progressive update (in progress)
- **🔵 BLUE** = Connection events
- **⚪ WHITE** = Server status messages
## Example Session
### Server Display:
```
================================================================================
✓ Client connected: 192.168.1.50:45232
================================================================================
[14:23:15] 192.168.1.50:45232
→ Hello this is
[14:23:17] 192.168.1.50:45232
→ Hello this is a test of the remote
[14:23:19] 192.168.1.50:45232
✓ FINAL: Hello this is a test of the remote microphone streaming system.
[14:23:25] 192.168.1.50:45232
→ Can you hear me
[14:23:27] 192.168.1.50:45232
✓ FINAL: Can you hear me clearly?
================================================================================
✗ Client disconnected: 192.168.1.50:45232
================================================================================
```
### Client Display:
```
Connected to server: ws://192.168.1.100:8766
Recording started. Press Ctrl+C to stop.
Server: Connected to ASR server with live display
[PARTIAL] Hello this is
[PARTIAL] Hello this is a test of the remote
[FINAL] Hello this is a test of the remote microphone streaming system.
[PARTIAL] Can you hear me
[FINAL] Can you hear me clearly?
^C
Stopped by user
Disconnected from server
Client stopped by user
```
## Network Considerations
### Bandwidth Usage
- Sample rate: 16000 Hz
- Bit depth: 16-bit (int16)
- Bandwidth: ~32 KB/s per client
- Very low bandwidth - works well over WiFi or LAN
### Latency
- Progressive updates: Every ~2 seconds
- Final transcription: When audio stops or on demand
- Total latency: ~2-3 seconds (network + processing)
### Multiple Clients
The server supports multiple simultaneous clients:
- Each client gets its own session
- Transcriptions are tagged with client IP:port
- No interference between clients
## Troubleshooting
### Client Can't Connect
```
Error: [Errno 111] Connection refused
```
**Solution:**
1. Check server is running
2. Verify firewall allows port 8766
3. Confirm server IP address is correct
4. Test connectivity: `ping SERVER_IP`
### No Audio Being Captured
```
Recording started but no transcriptions appear
```
**Solution:**
1. Check microphone permissions
2. List devices: `python mic_stream.py --list-devices`
3. Try different device: `--device N`
4. Test microphone in other apps first
### Poor Transcription Quality
**Solution:**
1. Move closer to microphone
2. Reduce background noise
3. Speak clearly and at normal pace
4. Check microphone quality/settings
### High Latency
**Solution:**
1. Use wired connection instead of WiFi
2. Reduce chunk duration: `--chunk-duration 0.05`
3. Check network latency: `ping SERVER_IP`
## Security Notes
⚠️ **Important:** This setup uses WebSocket without encryption (ws://)
For production use:
- Use WSS (WebSocket Secure) with TLS certificates
- Add authentication (API keys, tokens)
- Restrict firewall rules to specific IP ranges
- Consider using VPN for remote access
## Advanced: Auto-start Server
Create a systemd service (Linux):
```bash
sudo nano /etc/systemd/system/asr-server.service
```
```ini
[Unit]
Description=ASR WebSocket Server
After=network.target
[Service]
Type=simple
User=YOUR_USERNAME
WorkingDirectory=/home/koko210Serve/parakeet-test
Environment="PYTHONPATH=/home/koko210Serve/parakeet-test"
ExecStart=/home/koko210Serve/parakeet-test/venv/bin/python server/display_server.py
Restart=always
[Install]
WantedBy=multi-user.target
```
Enable and start:
```bash
sudo systemctl enable asr-server
sudo systemctl start asr-server
sudo systemctl status asr-server
```
## Performance Tips
1. **Server:** Use GPU for best performance (~100ms latency)
2. **Client:** Use low chunk duration for responsiveness (0.1s default)
3. **Network:** Wired connection preferred, WiFi works fine
4. **Audio Quality:** 16kHz sample rate is optimal for speech
## Summary
**Server displays transcriptions in real-time**
**Client sends audio from remote microphone**
**Progressive updates show live transcription**
**Final results when speech pauses**
**Multiple clients supported**
**Low bandwidth, low latency**
Enjoy your remote ASR streaming system! 🎤 → 🌐 → 🖥️