Remove all Ollama remnants and complete migration to llama.cpp

- Remove Ollama-specific files (Dockerfile.ollama, entrypoint.sh) - Replace all query_ollama imports and calls with query_llama - Remove langchain-ollama dependency from requirements.txt - Update all utility files (autonomous, kindness, image_generation, etc.) - Update README.md documentation references - Maintain backward compatibility alias in llm.py
2025-12-07 17:50:08 +02:00
parent a6da4c0c2e
commit d58be3b33e
15 changed files with 39 additions and 286 deletions
--- a/readmes/VOICE_CHAT_IMPLEMENTATION.md
+++ b/readmes/VOICE_CHAT_IMPLEMENTATION.md
@@ -1,222 +0,0 @@
-# Voice Chat Implementation with Fish.audio
-
-## Overview
-This document explains how to integrate Fish.audio TTS API with the Miku Discord bot for voice channel conversations.
-
-## Fish.audio API Setup
-
-### 1. Get API Key
- Create account at https://fish.audio/
- Get API key from: https://fish.audio/app/api-keys/
-
-### 2. Find Your Miku Voice Model ID
- Browse voices at https://fish.audio/
- Find your Miku voice model
- Copy the model ID from the URL (e.g., `8ef4a238714b45718ce04243307c57a7`)
- Or use the copy button on the voice page
-
-## API Usage for Discord Voice Chat
-
-### Basic TTS Request (REST API)
-```python
-import requests
-
-def generate_speech(text: str, voice_id: str, api_key: str) -> bytes:
-    """Generate speech using Fish.audio API"""
-    url = "https://api.fish.audio/v1/tts"
-    
-    headers = {
-        "Authorization": f"Bearer {api_key}",
-        "Content-Type": "application/json",
-        "model": "s1"  # Recommended model
-    }
-    
-    payload = {
-        "text": text,
-        "reference_id": voice_id,  # Your Miku voice model ID
-        "format": "mp3",           # or "pcm" for raw audio
-        "latency": "balanced",     # Lower latency for real-time
-        "temperature": 0.9,        # Controls randomness (0-1)
-        "normalize": True          # Reduces latency
-    }
-    
-    response = requests.post(url, json=payload, headers=headers)
-    return response.content  # Returns audio bytes
-```
-
-### Real-time Streaming (WebSocket - Recommended for VC)
-```python
-from fish_audio_sdk import WebSocketSession, TTSRequest
-
-def stream_to_discord(text: str, voice_id: str, api_key: str):
-    """Stream audio directly to Discord voice channel"""
-    ws_session = WebSocketSession(api_key)
-    
-    # Define text generator (can stream from LLM responses)
-    def text_stream():
-        # You can yield text as it's generated from your LLM
-        yield text
-    
-    with ws_session:
-        for audio_chunk in ws_session.tts(
-            TTSRequest(
-                text="",  # Empty when streaming
-                reference_id=voice_id,
-                format="pcm",        # Best for Discord
-                sample_rate=48000    # Discord uses 48kHz
-            ),
-            text_stream()
-        ):
-            # Send audio_chunk to Discord voice channel
-            yield audio_chunk
-```
-
-### Async Streaming (Better for Discord.py)
-```python
-from fish_audio_sdk import AsyncWebSocketSession, TTSRequest
-import asyncio
-
-async def async_stream_speech(text: str, voice_id: str, api_key: str):
-    """Async streaming for Discord.py integration"""
-    ws_session = AsyncWebSocketSession(api_key)
-    
-    async def text_stream():
-        yield text
-    
-    async with ws_session:
-        audio_buffer = bytearray()
-        async for audio_chunk in ws_session.tts(
-            TTSRequest(
-                text="",
-                reference_id=voice_id,
-                format="pcm",
-                sample_rate=48000
-            ),
-            text_stream()
-        ):
-            audio_buffer.extend(audio_chunk)
-    
-    return bytes(audio_buffer)
-```
-
-## Integration with Miku Bot
-
-### Required Dependencies
-Add to `requirements.txt`:
-```
-discord.py[voice]
-PyNaCl
-fish-audio-sdk
-speech_recognition  # For STT
-pydub  # Audio processing
-```
-
-### Environment Variables
-Add to your `.env` or docker-compose.yml:
-```bash
-FISH_API_KEY=your_api_key_here
-MIKU_VOICE_ID=your_miku_model_id_here
-```
-
-### Discord Voice Channel Flow
-```
-1. User speaks in VC
-   ↓
-2. Capture audio → Speech Recognition (STT)
-   ↓
-3. Convert speech to text
-   ↓
-4. Process with Miku's LLM (existing bot logic)
-   ↓
-5. Generate response text
-   ↓
-6. Send to Fish.audio TTS API
-   ↓
-7. Stream audio back to Discord VC
-```
-
-## Key Implementation Details
-
-### For Low Latency Voice Chat:
- Use WebSocket streaming instead of REST API
- Set `latency: "balanced"` in requests
- Use `format: "pcm"` with `sample_rate: 48000` for Discord
- Stream LLM responses as they generate (don't wait for full response)
-
-### Audio Format for Discord:
- **Sample Rate**: 48000 Hz (Discord standard)
- **Channels**: 1 (mono)
- **Format**: PCM (raw audio) or Opus (compressed)
- **Bit Depth**: 16-bit
-
-### Cost Considerations:
- **TTS**: $15.00 per million UTF-8 bytes
- Example: ~$0.015 for 1000 characters
- Monitor usage at https://fish.audio/app/billing/
-
-### API Features Available:
- **Temperature** (0-1): Controls speech randomness/expressiveness
- **Prosody**: Control speed and volume
-  ```python
-  "prosody": {
-      "speed": 1.0,  # 0.5-2.0 range
-      "volume": 0    # -10 to 10 dB
-  }
-  ```
- **Chunk Length** (100-300): Affects streaming speed
- **Normalize**: Reduces latency but may affect number/date pronunciation
-
-## Example: Integrate with Existing LLM
-```python
-from utils.llm import query_ollama
-from fish_audio_sdk import AsyncWebSocketSession, TTSRequest
-
-async def miku_voice_response(user_message: str):
-    """Generate Miku's response and convert to speech"""
-    
-    # 1. Get text response from existing LLM
-    response_text = await query_ollama(
-        prompt=user_message,
-        model=globals.OLLAMA_MODEL
-    )
-    
-    # 2. Convert to speech
-    ws_session = AsyncWebSocketSession(globals.FISH_API_KEY)
-    
-    async def text_stream():
-        # Can stream as LLM generates if needed
-        yield response_text
-    
-    async with ws_session:
-        async for audio_chunk in ws_session.tts(
-            TTSRequest(
-                text="",
-                reference_id=globals.MIKU_VOICE_ID,
-                format="pcm",
-                sample_rate=48000
-            ),
-            text_stream()
-        ):
-            # Send to Discord voice channel
-            yield audio_chunk
-```
-
-## Rate Limits
-Check the current rate limits at:
-https://docs.fish.audio/developer-platform/models-pricing/pricing-and-rate-limits
-
-## Additional Resources
- **API Reference**: https://docs.fish.audio/api-reference/introduction
- **Python SDK**: https://github.com/fishaudio/fish-audio-python
- **WebSocket Docs**: https://docs.fish.audio/sdk-reference/python/websocket
- **Discord Community**: https://discord.com/invite/dF9Db2Tt3Y
- **Support**: support@fish.audio
-
-## Next Steps
-1. Create Fish.audio account and get API key
-2. Find/select Miku voice model and get its ID
-3. Install required dependencies
-4. Implement voice channel connection in bot
-5. Add speech-to-text for user audio
-6. Connect Fish.audio TTS to output audio
-7. Test latency and quality