miku-discord/ERROR_HANDLING_SYSTEM.md

# Error Handling System

## Overview

The Miku bot now includes a comprehensive error handling system that catches errors from the llama-swap containers (both NVIDIA and AMD) and provides user-friendly responses while notifying the bot administrator.

## Features

### 1. Error Detection
The system automatically detects various types of errors including:
- HTTP error codes (502, 500, 503, etc.)
- Connection errors (refused, timeout, failed)
- LLM server errors
- Timeout errors
- Generic error messages

### 2. User-Friendly Responses
When an error is detected, instead of showing technical error messages like "Error 502" or "Sorry, there was an error", Miku will respond with:

> **"Someone tell Koko-nii there is a problem with my AI."**

This keeps Miku in character and provides a better user experience.

### 3. Administrator Notifications
When an error occurs, a webhook notification is automatically sent to Discord with:
- **Error Message**: The full error text from the container
- **Context Information**:
  - User who triggered the error
  - Channel/Server where the error occurred
  - User's prompt that caused the error
  - Exception type (if applicable)
  - Full traceback (if applicable)
- **Mention**: Automatically mentions Koko-nii for immediate attention

### 4. Conversation History Protection
Error messages are NOT saved to conversation history, preventing errors from polluting the context for future interactions.

## Implementation Details

### Files Modified

1. **`bot/utils/error_handler.py`** (NEW)
   - Core error detection and webhook notification logic
   - `is_error_response()`: Detects error messages using regex patterns
   - `handle_llm_error()`: Handles exceptions from the LLM
   - `handle_response_error()`: Handles error responses from the LLM
   - `send_error_webhook()`: Sends formatted error notifications

2. **`bot/utils/llm.py`**
   - Integrated error handling into `query_llama()` function
   - Catches all exceptions and HTTP errors
   - Filters responses to detect error messages
   - Prevents error messages from being saved to history

### Webhook URL
```
https://discord.com/api/webhooks/1462216811293708522/4kdGenpxZFsP0z3VBgebYENODKmcRrmEzoIwCN81jCirnAxuU2YvxGgwGCNBb6TInA9Z
```

## Error Detection Patterns

The system detects errors using the following patterns:
- `Error: XXX` or `Error XXX` (with HTTP status codes)
- `XXX Error` format
- "Sorry, there was an error"
- "Sorry, the response took too long"
- Connection-related errors (refused, timeout, failed)
- Server errors (service unavailable, internal server error, bad gateway)
- HTTP status codes >= 400

## Coverage

The error handler is automatically applied to:
- ✅ Direct messages to Miku
- ✅ Server messages mentioning Miku
- ✅ Autonomous messages (general, engaging users, tweets)
- ✅ Conversation joining
- ✅ All responses using `query_llama()`
- ✅ Both NVIDIA and AMD GPU containers

## Testing

A test suite is included in `bot/test_error_handler.py` that validates the error detection logic with 26 test cases covering:
- Various error message formats
- Normal responses (should NOT be detected as errors)
- HTTP status codes
- Edge cases

Run tests with:
```bash
cd /home/koko210Serve/docker/miku-discord/bot
python test_error_handler.py
```

## Example Scenarios

### Scenario 1: llama-swap Container Down
**User**: "Hi Miku!"
**Without Error Handler**: "Error: 502"
**With Error Handler**: "Someone tell Koko-nii there is a problem with my AI."
**Webhook Notification**: Sent with full error details

### Scenario 2: Connection Timeout
**User**: "Tell me a story"
**Without Error Handler**: "Sorry, the response took too long. Please try again."
**With Error Handler**: "Someone tell Koko-nii there is a problem with my AI."
**Webhook Notification**: Sent with timeout exception details

### Scenario 3: LLM Server Error
**User**: "How are you?"
**Without Error Handler**: "Error: Internal server error"
**With Error Handler**: "Someone tell Koko-nii there is a problem with my AI."
**Webhook Notification**: Sent with HTTP 500 error details

## Benefits

1. **Better User Experience**: Users see a friendly, in-character message instead of technical errors
2. **Immediate Notifications**: Administrator is notified immediately via Discord webhook
3. **Detailed Context**: Full error information is provided for debugging
4. **Clean History**: Errors don't pollute conversation history
5. **Consistent Handling**: All error types are handled uniformly
6. **Container Agnostic**: Works with both NVIDIA and AMD containers

## Future Enhancements

Potential improvements:
- Add retry logic for transient errors
- Track error frequency to detect systemic issues
- Automatic container restart if errors persist
- Error categorization (transient vs. critical)
- Rate limiting on webhook notifications to prevent spam