136 lines
4.6 KiB
Markdown
136 lines
4.6 KiB
Markdown
# Embed Content Reading Feature
|
|
|
|
## Overview
|
|
Miku can now read and understand embedded content from Discord messages, including articles, images, videos, and other rich media that gets automatically embedded when sharing links.
|
|
|
|
## Supported Embed Types
|
|
|
|
### 1. **Article Embeds** (`rich`, `article`, `link`)
|
|
When you share a news article or blog post link, Discord automatically creates an embed with:
|
|
- **Title** - The article headline
|
|
- **Description** - A preview of the article content
|
|
- **Author** - The article author (if available)
|
|
- **Images** - Featured images or thumbnails
|
|
- **Custom Fields** - Additional metadata
|
|
|
|
Miku will:
|
|
- Extract and read the text content (title, description, fields)
|
|
- Analyze any embedded images
|
|
- Combine all this context to provide an informed response
|
|
|
|
### 2. **Image Embeds**
|
|
When links contain images that Discord auto-embeds:
|
|
- Miku downloads and analyzes the images using her vision model
|
|
- Provides descriptions and commentary based on what she sees
|
|
|
|
### 3. **Video Embeds**
|
|
For embedded videos from various platforms:
|
|
- Miku extracts multiple frames from the video
|
|
- Analyzes the visual content across frames
|
|
- Provides commentary on what's happening in the video
|
|
|
|
### 4. **Tenor GIF Embeds** (`gifv`)
|
|
Already supported and now integrated:
|
|
- Extracts frames from Tenor GIFs
|
|
- Analyzes the GIF content
|
|
- Provides playful responses about what's in the GIF
|
|
|
|
## How It Works
|
|
|
|
### Processing Flow
|
|
1. **Message Received** - User sends a message with an embedded link
|
|
2. **Embed Detection** - Miku detects the embed type
|
|
3. **Content Extraction**:
|
|
- Text content (title, description, fields, footer)
|
|
- Image URLs from embed
|
|
- Video URLs from embed
|
|
4. **Media Analysis**:
|
|
- Downloads and analyzes images with vision model
|
|
- Extracts and analyzes video frames
|
|
5. **Context Building** - Combines all extracted content
|
|
6. **Response Generation** - Miku responds with full context awareness
|
|
|
|
### Example Scenario
|
|
```
|
|
User: @Miku what do you think about this?
|
|
[Discord embeds article: "Bulgaria arrests mayor over €200,000 fine"]
|
|
|
|
Miku sees:
|
|
- Embedded title: "Bulgaria arrests mayor over €200,000 fine"
|
|
- Embedded description: "Town mayor Blagomir Kotsev charged with..."
|
|
- Embedded image: [analyzes photo of the mayor]
|
|
|
|
Miku responds with context-aware commentary about the news
|
|
```
|
|
|
|
## Technical Implementation
|
|
|
|
### New Functions
|
|
**`extract_embed_content(embed)`** - In `utils/image_handling.py`
|
|
- Extracts text from title, description, author, fields, footer
|
|
- Collects image URLs from embed.image and embed.thumbnail
|
|
- Collects video URLs from embed.video
|
|
- Returns structured dictionary with all content
|
|
|
|
### Modified Bot Logic
|
|
**`on_message()`** - In `bot.py`
|
|
- Checks for embeds in messages
|
|
- Processes different embed types:
|
|
- `gifv` - Tenor GIFs (existing functionality)
|
|
- `rich`, `article`, `image`, `video`, `link` - NEW comprehensive handling
|
|
- Builds enhanced context with embed content
|
|
- Passes context to LLM for informed responses
|
|
|
|
### Context Format
|
|
```
|
|
[Embedded content: <title and description>]
|
|
[Embedded image shows: <vision analysis>]
|
|
[Embedded video shows: <vision analysis>]
|
|
|
|
User message: <user's actual message>
|
|
```
|
|
|
|
## Logging
|
|
New log indicators:
|
|
- `📰 Processing {type} embed` - Starting embed processing
|
|
- `🖼️ Processing image from embed: {url}` - Analyzing embedded image
|
|
- `🎬 Processing video from embed: {url}` - Analyzing embedded video
|
|
- `💬 Server embed response` - Responding with embed context
|
|
- `💌 DM embed response` - DM response with embed context
|
|
|
|
## Supported Platforms
|
|
Any platform that Discord embeds should work:
|
|
- ✅ News sites (BBC, Reuters, etc.)
|
|
- ✅ Social media (Twitter/X embeds, Instagram, etc.)
|
|
- ✅ YouTube videos
|
|
- ✅ Blogs and Medium articles
|
|
- ✅ Image hosting sites
|
|
- ✅ Tenor GIFs
|
|
- ✅ Many other platforms with OpenGraph metadata
|
|
|
|
## Limitations
|
|
- Embed text is truncated to 500 characters to keep context manageable
|
|
- Some platforms may block bot requests for media
|
|
- Very large videos may take time to process
|
|
- Paywalled content only shows the preview text Discord provides
|
|
|
|
## Server/DM Support
|
|
- ✅ Works in server channels
|
|
- ✅ Works in DMs
|
|
- Respects server-specific moods
|
|
- Uses DM mood for direct messages
|
|
- Logs DM interactions including embed content
|
|
|
|
## Privacy
|
|
- Only processes embeds when Miku is addressed (@mentioned or in DMs)
|
|
- Respects blocked user list for DMs
|
|
- No storage of embed content beyond conversation history
|
|
|
|
## Future Enhancements
|
|
Potential improvements:
|
|
- Audio transcription from embedded audio/video
|
|
- PDF content extraction
|
|
- Twitter/X thread reading
|
|
- Better handling of code snippets in embeds
|
|
- Embed source credibility assessment
|