# Voice Chat Implementation with Fish.audio

## Overview
This document explains how to integrate Fish.audio TTS API with the Miku Discord bot for voice channel conversations.

## Fish.audio API Setup

### 1. Get API Key
- Create account at https://fish.audio/
- Get API key from: https://fish.audio/app/api-keys/

### 2. Find Your Miku Voice Model ID
- Browse voices at https://fish.audio/
- Find your Miku voice model
- Copy the model ID from the URL (e.g., `8ef4a238714b45718ce04243307c57a7`)
- Or use the copy button on the voice page

## API Usage for Discord Voice Chat

### Basic TTS Request (REST API)
```python
import requests

def generate_speech(text: str, voice_id: str, api_key: str) -> bytes:
    """Generate speech using Fish.audio API"""
    url = "https://api.fish.audio/v1/tts"
    
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
        "model": "s1"  # Recommended model
    }
    
    payload = {
        "text": text,
        "reference_id": voice_id,  # Your Miku voice model ID
        "format": "mp3",           # or "pcm" for raw audio
        "latency": "balanced",     # Lower latency for real-time
        "temperature": 0.9,        # Controls randomness (0-1)
        "normalize": True          # Reduces latency
    }
    
    response = requests.post(url, json=payload, headers=headers)
    return response.content  # Returns audio bytes
```

### Real-time Streaming (WebSocket - Recommended for VC)
```python
from fish_audio_sdk import WebSocketSession, TTSRequest

def stream_to_discord(text: str, voice_id: str, api_key: str):
    """Stream audio directly to Discord voice channel"""
    ws_session = WebSocketSession(api_key)
    
    # Define text generator (can stream from LLM responses)
    def text_stream():
        # You can yield text as it's generated from your LLM
        yield text
    
    with ws_session:
        for audio_chunk in ws_session.tts(
            TTSRequest(
                text="",  # Empty when streaming
                reference_id=voice_id,
                format="pcm",        # Best for Discord
                sample_rate=48000    # Discord uses 48kHz
            ),
            text_stream()
        ):
            # Send audio_chunk to Discord voice channel
            yield audio_chunk
```

### Async Streaming (Better for Discord.py)
```python
from fish_audio_sdk import AsyncWebSocketSession, TTSRequest
import asyncio

async def async_stream_speech(text: str, voice_id: str, api_key: str):
    """Async streaming for Discord.py integration"""
    ws_session = AsyncWebSocketSession(api_key)
    
    async def text_stream():
        yield text
    
    async with ws_session:
        audio_buffer = bytearray()
        async for audio_chunk in ws_session.tts(
            TTSRequest(
                text="",
                reference_id=voice_id,
                format="pcm",
                sample_rate=48000
            ),
            text_stream()
        ):
            audio_buffer.extend(audio_chunk)
    
    return bytes(audio_buffer)
```

## Integration with Miku Bot

### Required Dependencies
Add to `requirements.txt`:
```
discord.py[voice]
PyNaCl
fish-audio-sdk
speech_recognition  # For STT
pydub  # Audio processing
```

### Environment Variables
Add to your `.env` or docker-compose.yml:
```bash
FISH_API_KEY=your_api_key_here
MIKU_VOICE_ID=your_miku_model_id_here
```

### Discord Voice Channel Flow
```
1. User speaks in VC
   ↓
2. Capture audio → Speech Recognition (STT)
   ↓
3. Convert speech to text
   ↓
4. Process with Miku's LLM (existing bot logic)
   ↓
5. Generate response text
   ↓
6. Send to Fish.audio TTS API
   ↓
7. Stream audio back to Discord VC
```

## Key Implementation Details

### For Low Latency Voice Chat:
- Use WebSocket streaming instead of REST API
- Set `latency: "balanced"` in requests
- Use `format: "pcm"` with `sample_rate: 48000` for Discord
- Stream LLM responses as they generate (don't wait for full response)

### Audio Format for Discord:
- **Sample Rate**: 48000 Hz (Discord standard)
- **Channels**: 1 (mono)
- **Format**: PCM (raw audio) or Opus (compressed)
- **Bit Depth**: 16-bit

### Cost Considerations:
- **TTS**: $15.00 per million UTF-8 bytes
- Example: ~$0.015 for 1000 characters
- Monitor usage at https://fish.audio/app/billing/

### API Features Available:
- **Temperature** (0-1): Controls speech randomness/expressiveness
- **Prosody**: Control speed and volume
  ```python
  "prosody": {
      "speed": 1.0,  # 0.5-2.0 range
      "volume": 0    # -10 to 10 dB
  }
  ```
- **Chunk Length** (100-300): Affects streaming speed
- **Normalize**: Reduces latency but may affect number/date pronunciation

## Example: Integrate with Existing LLM
```python
from utils.llm import query_ollama
from fish_audio_sdk import AsyncWebSocketSession, TTSRequest

async def miku_voice_response(user_message: str):
    """Generate Miku's response and convert to speech"""
    
    # 1. Get text response from existing LLM
    response_text = await query_ollama(
        prompt=user_message,
        model=globals.OLLAMA_MODEL
    )
    
    # 2. Convert to speech
    ws_session = AsyncWebSocketSession(globals.FISH_API_KEY)
    
    async def text_stream():
        # Can stream as LLM generates if needed
        yield response_text
    
    async with ws_session:
        async for audio_chunk in ws_session.tts(
            TTSRequest(
                text="",
                reference_id=globals.MIKU_VOICE_ID,
                format="pcm",
                sample_rate=48000
            ),
            text_stream()
        ):
            # Send to Discord voice channel
            yield audio_chunk
```

## Rate Limits
Check the current rate limits at:
https://docs.fish.audio/developer-platform/models-pricing/pricing-and-rate-limits

## Additional Resources
- **API Reference**: https://docs.fish.audio/api-reference/introduction
- **Python SDK**: https://github.com/fishaudio/fish-audio-python
- **WebSocket Docs**: https://docs.fish.audio/sdk-reference/python/websocket
- **Discord Community**: https://discord.com/invite/dF9Db2Tt3Y
- **Support**: support@fish.audio

## Next Steps
1. Create Fish.audio account and get API key
2. Find/select Miku voice model and get its ID
3. Install required dependencies
4. Implement voice channel connection in bot
5. Add speech-to-text for user audio
6. Connect Fish.audio TTS to output audio
7. Test latency and quality