4.6 KiB
4.6 KiB
Embed Content Reading Feature
Overview
Miku can now read and understand embedded content from Discord messages, including articles, images, videos, and other rich media that gets automatically embedded when sharing links.
Supported Embed Types
1. Article Embeds (rich, article, link)
When you share a news article or blog post link, Discord automatically creates an embed with:
- Title - The article headline
- Description - A preview of the article content
- Author - The article author (if available)
- Images - Featured images or thumbnails
- Custom Fields - Additional metadata
Miku will:
- Extract and read the text content (title, description, fields)
- Analyze any embedded images
- Combine all this context to provide an informed response
2. Image Embeds
When links contain images that Discord auto-embeds:
- Miku downloads and analyzes the images using her vision model
- Provides descriptions and commentary based on what she sees
3. Video Embeds
For embedded videos from various platforms:
- Miku extracts multiple frames from the video
- Analyzes the visual content across frames
- Provides commentary on what's happening in the video
4. Tenor GIF Embeds (gifv)
Already supported and now integrated:
- Extracts frames from Tenor GIFs
- Analyzes the GIF content
- Provides playful responses about what's in the GIF
How It Works
Processing Flow
- Message Received - User sends a message with an embedded link
- Embed Detection - Miku detects the embed type
- Content Extraction:
- Text content (title, description, fields, footer)
- Image URLs from embed
- Video URLs from embed
- Media Analysis:
- Downloads and analyzes images with vision model
- Extracts and analyzes video frames
- Context Building - Combines all extracted content
- Response Generation - Miku responds with full context awareness
Example Scenario
User: @Miku what do you think about this?
[Discord embeds article: "Bulgaria arrests mayor over €200,000 fine"]
Miku sees:
- Embedded title: "Bulgaria arrests mayor over €200,000 fine"
- Embedded description: "Town mayor Blagomir Kotsev charged with..."
- Embedded image: [analyzes photo of the mayor]
Miku responds with context-aware commentary about the news
Technical Implementation
New Functions
extract_embed_content(embed) - In utils/image_handling.py
- Extracts text from title, description, author, fields, footer
- Collects image URLs from embed.image and embed.thumbnail
- Collects video URLs from embed.video
- Returns structured dictionary with all content
Modified Bot Logic
on_message() - In bot.py
- Checks for embeds in messages
- Processes different embed types:
gifv- Tenor GIFs (existing functionality)rich,article,image,video,link- NEW comprehensive handling
- Builds enhanced context with embed content
- Passes context to LLM for informed responses
Context Format
[Embedded content: <title and description>]
[Embedded image shows: <vision analysis>]
[Embedded video shows: <vision analysis>]
User message: <user's actual message>
Logging
New log indicators:
📰 Processing {type} embed- Starting embed processing🖼️ Processing image from embed: {url}- Analyzing embedded image🎬 Processing video from embed: {url}- Analyzing embedded video💬 Server embed response- Responding with embed context💌 DM embed response- DM response with embed context
Supported Platforms
Any platform that Discord embeds should work:
- ✅ News sites (BBC, Reuters, etc.)
- ✅ Social media (Twitter/X embeds, Instagram, etc.)
- ✅ YouTube videos
- ✅ Blogs and Medium articles
- ✅ Image hosting sites
- ✅ Tenor GIFs
- ✅ Many other platforms with OpenGraph metadata
Limitations
- Embed text is truncated to 500 characters to keep context manageable
- Some platforms may block bot requests for media
- Very large videos may take time to process
- Paywalled content only shows the preview text Discord provides
Server/DM Support
- ✅ Works in server channels
- ✅ Works in DMs
- Respects server-specific moods
- Uses DM mood for direct messages
- Logs DM interactions including embed content
Privacy
- Only processes embeds when Miku is addressed (@mentioned or in DMs)
- Respects blocked user list for DMs
- No storage of embed content beyond conversation history
Future Enhancements
Potential improvements:
- Audio transcription from embedded audio/video
- PDF content extraction
- Twitter/X thread reading
- Better handling of code snippets in embeds
- Embed source credibility assessment