338 lines
8.7 KiB
Markdown
338 lines
8.7 KiB
Markdown
# Remote Microphone Streaming Setup
|
|
|
|
This guide shows how to use the ASR system with a client on one machine streaming audio to a server on another machine.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────┐ ┌─────────────────┐
|
|
│ Client Machine │ │ Server Machine │
|
|
│ │ │ │
|
|
│ 🎤 Microphone │ ───WebSocket───▶ │ 🖥️ Display │
|
|
│ │ (Audio) │ │
|
|
│ client/ │ │ server/ │
|
|
│ mic_stream.py │ │ display_server │
|
|
└─────────────────┘ └─────────────────┘
|
|
```
|
|
|
|
## Server Setup (Machine with GPU)
|
|
|
|
### 1. Start the server with live display
|
|
|
|
```bash
|
|
cd /home/koko210Serve/parakeet-test
|
|
source venv/bin/activate
|
|
PYTHONPATH=/home/koko210Serve/parakeet-test python server/display_server.py
|
|
```
|
|
|
|
**Options:**
|
|
```bash
|
|
python server/display_server.py --host 0.0.0.0 --port 8766
|
|
```
|
|
|
|
The server will:
|
|
- ✅ Bind to all network interfaces (0.0.0.0)
|
|
- ✅ Display transcriptions in real-time with color coding
|
|
- ✅ Show progressive updates as audio streams in
|
|
- ✅ Highlight final transcriptions when complete
|
|
|
|
### 2. Configure firewall (if needed)
|
|
|
|
Allow incoming connections on port 8766:
|
|
```bash
|
|
# Ubuntu/Debian
|
|
sudo ufw allow 8766/tcp
|
|
|
|
# CentOS/RHEL
|
|
sudo firewall-cmd --permanent --add-port=8766/tcp
|
|
sudo firewall-cmd --reload
|
|
```
|
|
|
|
### 3. Get the server's IP address
|
|
|
|
```bash
|
|
# Find your server's IP address
|
|
ip addr show | grep "inet " | grep -v 127.0.0.1
|
|
```
|
|
|
|
Example output: `192.168.1.100`
|
|
|
|
## Client Setup (Remote Machine)
|
|
|
|
### 1. Install dependencies on client machine
|
|
|
|
Create a minimal Python environment:
|
|
|
|
```bash
|
|
# Create virtual environment
|
|
python3 -m venv asr-client
|
|
source asr-client/bin/activate
|
|
|
|
# Install only client dependencies
|
|
pip install websockets sounddevice numpy
|
|
```
|
|
|
|
### 2. Copy the client script
|
|
|
|
Copy `client/mic_stream.py` to your client machine:
|
|
|
|
```bash
|
|
# On server machine
|
|
scp client/mic_stream.py user@client-machine:~/
|
|
|
|
# Or download it via your preferred method
|
|
```
|
|
|
|
### 3. List available microphones
|
|
|
|
```bash
|
|
python mic_stream.py --list-devices
|
|
```
|
|
|
|
Example output:
|
|
```
|
|
Available audio input devices:
|
|
--------------------------------------------------------------------------------
|
|
[0] Built-in Microphone
|
|
Channels: 2
|
|
Sample rate: 44100.0 Hz
|
|
[1] USB Microphone
|
|
Channels: 1
|
|
Sample rate: 48000.0 Hz
|
|
--------------------------------------------------------------------------------
|
|
```
|
|
|
|
### 4. Start streaming
|
|
|
|
```bash
|
|
python mic_stream.py --url ws://SERVER_IP:8766
|
|
```
|
|
|
|
Replace `SERVER_IP` with your server's IP address (e.g., `ws://192.168.1.100:8766`)
|
|
|
|
**Options:**
|
|
```bash
|
|
# Use specific microphone device
|
|
python mic_stream.py --url ws://192.168.1.100:8766 --device 1
|
|
|
|
# Change sample rate (if needed)
|
|
python mic_stream.py --url ws://192.168.1.100:8766 --sample-rate 16000
|
|
|
|
# Adjust chunk size for network latency
|
|
python mic_stream.py --url ws://192.168.1.100:8766 --chunk-duration 0.2
|
|
```
|
|
|
|
## Usage Flow
|
|
|
|
### 1. Start Server
|
|
On the server machine:
|
|
```bash
|
|
cd /home/koko210Serve/parakeet-test
|
|
source venv/bin/activate
|
|
PYTHONPATH=/home/koko210Serve/parakeet-test python server/display_server.py
|
|
```
|
|
|
|
You'll see:
|
|
```
|
|
================================================================================
|
|
ASR Server - Live Transcription Display
|
|
================================================================================
|
|
Server: ws://0.0.0.0:8766
|
|
Sample Rate: 16000 Hz
|
|
Model: Parakeet TDT 0.6B V3
|
|
================================================================================
|
|
|
|
Server is running and ready for connections!
|
|
Waiting for clients...
|
|
```
|
|
|
|
### 2. Connect Client
|
|
On the client machine:
|
|
```bash
|
|
python mic_stream.py --url ws://192.168.1.100:8766
|
|
```
|
|
|
|
You'll see:
|
|
```
|
|
Connected to server: ws://192.168.1.100:8766
|
|
Recording started. Press Ctrl+C to stop.
|
|
```
|
|
|
|
### 3. Speak into Microphone
|
|
- Speak naturally into your microphone
|
|
- Watch the **server terminal** for real-time transcriptions
|
|
- Progressive updates appear in yellow as you speak
|
|
- Final transcriptions appear in green when you pause
|
|
|
|
### 4. Stop Streaming
|
|
Press `Ctrl+C` on the client to stop recording and disconnect.
|
|
|
|
## Display Color Coding
|
|
|
|
On the server display:
|
|
|
|
- **🟢 GREEN** = Final transcription (complete, accurate)
|
|
- **🟡 YELLOW** = Progressive update (in progress)
|
|
- **🔵 BLUE** = Connection events
|
|
- **⚪ WHITE** = Server status messages
|
|
|
|
## Example Session
|
|
|
|
### Server Display:
|
|
```
|
|
================================================================================
|
|
✓ Client connected: 192.168.1.50:45232
|
|
================================================================================
|
|
|
|
[14:23:15] 192.168.1.50:45232
|
|
→ Hello this is
|
|
|
|
[14:23:17] 192.168.1.50:45232
|
|
→ Hello this is a test of the remote
|
|
|
|
[14:23:19] 192.168.1.50:45232
|
|
✓ FINAL: Hello this is a test of the remote microphone streaming system.
|
|
|
|
[14:23:25] 192.168.1.50:45232
|
|
→ Can you hear me
|
|
|
|
[14:23:27] 192.168.1.50:45232
|
|
✓ FINAL: Can you hear me clearly?
|
|
|
|
================================================================================
|
|
✗ Client disconnected: 192.168.1.50:45232
|
|
================================================================================
|
|
```
|
|
|
|
### Client Display:
|
|
```
|
|
Connected to server: ws://192.168.1.100:8766
|
|
Recording started. Press Ctrl+C to stop.
|
|
|
|
Server: Connected to ASR server with live display
|
|
[PARTIAL] Hello this is
|
|
[PARTIAL] Hello this is a test of the remote
|
|
[FINAL] Hello this is a test of the remote microphone streaming system.
|
|
[PARTIAL] Can you hear me
|
|
[FINAL] Can you hear me clearly?
|
|
|
|
^C
|
|
Stopped by user
|
|
Disconnected from server
|
|
Client stopped by user
|
|
```
|
|
|
|
## Network Considerations
|
|
|
|
### Bandwidth Usage
|
|
- Sample rate: 16000 Hz
|
|
- Bit depth: 16-bit (int16)
|
|
- Bandwidth: ~32 KB/s per client
|
|
- Very low bandwidth - works well over WiFi or LAN
|
|
|
|
### Latency
|
|
- Progressive updates: Every ~2 seconds
|
|
- Final transcription: When audio stops or on demand
|
|
- Total latency: ~2-3 seconds (network + processing)
|
|
|
|
### Multiple Clients
|
|
The server supports multiple simultaneous clients:
|
|
- Each client gets its own session
|
|
- Transcriptions are tagged with client IP:port
|
|
- No interference between clients
|
|
|
|
## Troubleshooting
|
|
|
|
### Client Can't Connect
|
|
```
|
|
Error: [Errno 111] Connection refused
|
|
```
|
|
**Solution:**
|
|
1. Check server is running
|
|
2. Verify firewall allows port 8766
|
|
3. Confirm server IP address is correct
|
|
4. Test connectivity: `ping SERVER_IP`
|
|
|
|
### No Audio Being Captured
|
|
```
|
|
Recording started but no transcriptions appear
|
|
```
|
|
**Solution:**
|
|
1. Check microphone permissions
|
|
2. List devices: `python mic_stream.py --list-devices`
|
|
3. Try different device: `--device N`
|
|
4. Test microphone in other apps first
|
|
|
|
### Poor Transcription Quality
|
|
**Solution:**
|
|
1. Move closer to microphone
|
|
2. Reduce background noise
|
|
3. Speak clearly and at normal pace
|
|
4. Check microphone quality/settings
|
|
|
|
### High Latency
|
|
**Solution:**
|
|
1. Use wired connection instead of WiFi
|
|
2. Reduce chunk duration: `--chunk-duration 0.05`
|
|
3. Check network latency: `ping SERVER_IP`
|
|
|
|
## Security Notes
|
|
|
|
⚠️ **Important:** This setup uses WebSocket without encryption (ws://)
|
|
|
|
For production use:
|
|
- Use WSS (WebSocket Secure) with TLS certificates
|
|
- Add authentication (API keys, tokens)
|
|
- Restrict firewall rules to specific IP ranges
|
|
- Consider using VPN for remote access
|
|
|
|
## Advanced: Auto-start Server
|
|
|
|
Create a systemd service (Linux):
|
|
|
|
```bash
|
|
sudo nano /etc/systemd/system/asr-server.service
|
|
```
|
|
|
|
```ini
|
|
[Unit]
|
|
Description=ASR WebSocket Server
|
|
After=network.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=YOUR_USERNAME
|
|
WorkingDirectory=/home/koko210Serve/parakeet-test
|
|
Environment="PYTHONPATH=/home/koko210Serve/parakeet-test"
|
|
ExecStart=/home/koko210Serve/parakeet-test/venv/bin/python server/display_server.py
|
|
Restart=always
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
Enable and start:
|
|
```bash
|
|
sudo systemctl enable asr-server
|
|
sudo systemctl start asr-server
|
|
sudo systemctl status asr-server
|
|
```
|
|
|
|
## Performance Tips
|
|
|
|
1. **Server:** Use GPU for best performance (~100ms latency)
|
|
2. **Client:** Use low chunk duration for responsiveness (0.1s default)
|
|
3. **Network:** Wired connection preferred, WiFi works fine
|
|
4. **Audio Quality:** 16kHz sample rate is optimal for speech
|
|
|
|
## Summary
|
|
|
|
✅ **Server displays transcriptions in real-time**
|
|
✅ **Client sends audio from remote microphone**
|
|
✅ **Progressive updates show live transcription**
|
|
✅ **Final results when speech pauses**
|
|
✅ **Multiple clients supported**
|
|
✅ **Low bandwidth, low latency**
|
|
|
|
Enjoy your remote ASR streaming system! 🎤 → 🌐 → 🖥️
|