Koko210/miku-discord

Files

koko210Serve bb82b7f146 Add interactive Chat with LLM interface to Web UI

Features:
- Real-time streaming chat interface (ChatGPT-like experience)
- Model selection: Text model (fast) or Vision model (image analysis)
- System prompt toggle: Chat with Miku's personality or raw LLM
- Mood selector: Choose from 14 different emotional states
- Full context integration: Uses complete miku_lore.txt, miku_prompt.txt, and miku_lyrics.txt
- Conversation memory: Maintains chat history throughout session
- Image upload support for vision model
- Horizontal scrolling tabs for responsive design
- Clear chat history functionality
- SSE (Server-Sent Events) for streaming responses
- Keyboard shortcuts (Ctrl+Enter to send)

Technical changes:
- Added POST /chat/stream endpoint in api.py with streaming support
- Updated ChatMessage model with mood, conversation_history, and image_data
- Integrated context_manager for proper Miku personality context
- Added Chat with LLM tab to index.html
- Implemented JavaScript streaming client with EventSource-like handling
- Added CSS for chat messages, typing indicators, and animations
- Made tab navigation horizontally scrollable for narrow viewports

2025-12-13 00:23:03 +02:00

8.7 KiB

Raw Blame History

Chat Interface Feature Documentation

Overview

A new "Chat with LLM" tab has been added to the Miku bot Web UI, allowing you to chat directly with the language models with full streaming support (similar to ChatGPT).

Features

1. Model Selection

💬 Text Model (Fast): Chat with the text-based LLM for quick conversations
👁️ Vision Model (Images): Use the vision model to analyze and discuss images

2. System Prompt Options

✅ Use Miku Personality: Attach the standard Miku personality system prompt
- Text model: Gets the full Miku character prompt (same as query_llama)
- Vision model: Gets a simplified Miku-themed image analysis prompt
❌ Raw LLM (No Prompt): Chat directly with the base LLM without any personality
- Great for testing raw model responses
- No character constraints

3. Real-time Streaming

Messages stream in character-by-character like ChatGPT
Shows typing indicator while waiting for response
Smooth, responsive interface

4. Vision Model Support

Upload images when using the vision model
Image preview before sending
Analyze images with Miku's personality or raw vision capabilities

5. Chat Management

Clear chat history button
Timestamps on all messages
Color-coded messages (user vs assistant)
Auto-scroll to latest message
Keyboard shortcut: Ctrl+Enter to send messages

Technical Implementation

Backend (api.py)

New Endpoint: `POST /chat/stream`

# Accepts:
{
  "message": "Your chat message",
  "model_type": "text" | "vision",
  "use_system_prompt": true | false,
  "image_data": "base64_encoded_image" (optional, for vision model)
}

# Returns: Server-Sent Events (SSE) stream
data: {"content": "streamed text chunk"}
data: {"done": true}
data: {"error": "error message"}

Key Features:

Uses Server-Sent Events (SSE) for streaming
Supports both TEXT_MODEL and VISION_MODEL from globals
Dynamically switches system prompts based on configuration
Integrates with llama.cpp's streaming API

Frontend (index.html)

New Tab: "💬 Chat with LLM"

Located in the main navigation tabs (tab6)

Components:

Configuration Panel
- Radio buttons for model selection
- Radio buttons for system prompt toggle
- Image upload section (shows/hides based on model)
- Clear chat history button
Chat Messages Container
- Scrollable message history
- Animated message appearance
- Typing indicator during streaming
- Color-coded messages with timestamps
Input Area
- Multi-line text input
- Send button with loading state
- Keyboard shortcuts

JavaScript Functions:

sendChatMessage(): Handles message sending and streaming reception
toggleChatImageUpload(): Shows/hides image upload for vision model
addChatMessage(): Adds messages to chat display
showTypingIndicator() / hideTypingIndicator(): Typing animation
clearChatHistory(): Clears all messages
handleChatKeyPress(): Keyboard shortcuts

Usage Guide

Basic Text Chat with Miku

Go to "💬 Chat with LLM" tab
Ensure "💬 Text Model" is selected
Ensure "✅ Use Miku Personality" is selected
Type your message and click "📤 Send" (or press Ctrl+Enter)
Watch as Miku's response streams in real-time!

Raw LLM Testing

Select "💬 Text Model"
Select "❌ Raw LLM (No Prompt)"
Chat directly with the base language model without personality constraints

Vision Model Chat

Select "👁️ Vision Model"
Click "Upload Image" and select an image
Type a message about the image (e.g., "What do you see in this image?")
Click "📤 Send"
The vision model will analyze the image and respond

Vision Model with Miku Personality

Select "👁️ Vision Model"
Keep "✅ Use Miku Personality" selected
Upload an image
Miku will analyze and comment on the image with her cheerful personality!

System Prompts

Text Model (with Miku personality)

Uses the same comprehensive system prompt as query_llama():

Full Miku character context
Current mood integration
Character consistency rules
Natural conversation guidelines

Vision Model (with Miku personality)

Simplified prompt optimized for image analysis:

You are Hatsune Miku analyzing an image. Describe what you see naturally 
and enthusiastically as Miku would. Be detailed but conversational. 
React to what you see with Miku's cheerful, playful personality.

No System Prompt

Both models respond without personality constraints when this option is selected.

Streaming Technology

The interface uses Server-Sent Events (SSE) for real-time streaming:

Backend sends chunked responses from llama.cpp
Frontend receives and displays chunks as they arrive
Smooth, ChatGPT-like experience
Works with both text and vision models

UI/UX Features

Message Styling

User messages: Green accent, right-aligned feel
Assistant messages: Blue accent, left-aligned feel
Error messages: Red accent with error icon
Fade-in animation: Smooth appearance for new messages

Responsive Design

Chat container scrolls automatically
Image preview for vision model
Loading states on buttons
Typing indicators
Custom scrollbar styling

Keyboard Shortcuts

Ctrl+Enter: Send message quickly
Tab: Navigate between input fields

Configuration Options

All settings are preserved during the chat session:

Model type (text/vision)
System prompt toggle (Miku/Raw)
Uploaded image (for vision model)

Settings do NOT persist after page refresh (fresh session each time).

Error Handling

The interface handles various errors gracefully:

Connection failures
Model errors
Invalid image files
Empty messages
Timeout issues

All errors are displayed in the chat with clear error messages.

Performance Considerations

Text Model

Fast responses (typically 1-3 seconds)
Streaming starts almost immediately
Low latency

Vision Model

Slower due to image processing
First token may take 3-10 seconds
Streaming continues once started
Image is sent as base64 (efficient)

Development Notes

File Changes

bot/api.py
- Added from fastapi.responses import StreamingResponse
- Added ChatMessage Pydantic model
- Added POST /chat/stream endpoint with SSE support
bot/static/index.html
- Added tab6 button in navigation
- Added complete chat interface HTML
- Added CSS styles for chat messages and animations
- Added JavaScript functions for chat functionality

Dependencies

Uses existing aiohttp for HTTP streaming
Uses existing globals.TEXT_MODEL and globals.VISION_MODEL
Uses existing globals.LLAMA_URL for llama.cpp connection
No new dependencies required!

Future Enhancements (Ideas)

Potential improvements for future versions:

Save/load chat sessions
Export chat history to file
Multi-user chat history (separate sessions per user)
Temperature and max_tokens controls
Model selection dropdown (if multiple models available)
Token count display
Voice input support
Markdown rendering in responses
Code syntax highlighting
Copy message button
Regenerate response button

Troubleshooting

"No response received from LLM"

Check if llama.cpp server is running
Verify LLAMA_URL in globals is correct
Check bot logs for connection errors

"Failed to read image file"

Ensure image is valid format (JPEG, PNG, GIF)
Check file size (large images may cause issues)
Try a different image

Streaming not working

Check browser console for JavaScript errors
Verify SSE is not blocked by proxy/firewall
Try refreshing the page

Model not responding

Check if correct model is loaded in llama.cpp
Verify model type matches what's configured
Check llama.cpp logs for errors

API Reference

POST /chat/stream

Request Body:

{
  "message": "string",          // Required: User's message
  "model_type": "text|vision",  // Required: Which model to use
  "use_system_prompt": boolean, // Required: Whether to add system prompt
  "image_data": "string|null"   // Optional: Base64 image for vision model
}

Response:

Content-Type: text/event-stream

data: {"content": "Hello"}
data: {"content": " there"}
data: {"content": "!"}
data: {"done": true}

Error Response:

data: {"error": "Error message here"}

Conclusion

The Chat Interface provides a powerful, user-friendly way to:

Test LLM responses interactively
Experiment with different prompting strategies
Analyze images with vision models
Chat with Miku's personality in real-time
Debug and understand model behavior

All with a smooth, modern streaming interface that feels like ChatGPT! 🎉