llama-swap: use pre-built images (:cuda, :rocm) with GPU-specific flags

- Drop custom Dockerfiles; docker-compose uses ghcr.io pre-built images which ship llama-swap + llama-server with no pinned versions (always latest) - NVIDIA GTX 1660 (6GB): add -fit off --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0 to fix OOM segfault with new llama.cpp b9014's GPU-side KV cache default - AMD RX 6800 (16GB): flags unchanged; KV cache stays on GPU for max speed - Both running llama-swap v211 + llama.cpp b9014 (2026-05-05)
fix: preserve collapsible subsection state across polling re-renders
2026-05-05 16:53:34 +03:00 · 2026-05-02 16:17:26 +03:00 · 2026-05-02 16:08:47 +03:00 · 2026-05-02 15:55:19 +03:00 · 2026-05-02 15:45:54 +03:00 · 2026-05-02 15:27:18 +03:00
16 changed files with 719 additions and 344 deletions
--- a/Dockerfile.llamaswap
+++ b/Dockerfile.llamaswap
@@ -1,13 +0,0 @@
-FROM ghcr.io/mostlygeek/llama-swap:cuda
-
-USER root
-
-# Download and install llama-server binary (CUDA version)
-# Using the official pre-built binary from llama.cpp releases
-ADD --chmod=755 https://github.com/ggml-org/llama.cpp/releases/download/b4183/llama-server-cuda /usr/local/bin/llama-server
-
-# Verify it's executable
-RUN llama-server --version || echo "llama-server installed successfully"
-
-USER 1000:1000
-
--- a/Dockerfile.llamaswap-rocm
+++ b/Dockerfile.llamaswap-rocm
@@ -1,68 +0,0 @@
-# Multi-stage build for llama-swap with ROCm support
-# Now using official llama.cpp ROCm image (PR #18439 merged Dec 29, 2025)
-
-# Stage 1: Build llama-swap UI
-FROM node:22-alpine AS ui-builder
-
-WORKDIR /build
-
-# Install git
-RUN apk add --no-cache git
-
-# Clone llama-swap
-RUN git clone https://github.com/mostlygeek/llama-swap.git
-
-# Build UI (now in ui-svelte directory)
-WORKDIR /build/llama-swap/ui-svelte
-RUN npm install && npm run build
-
-# Stage 2: Build llama-swap binary
-FROM golang:1.23-alpine AS swap-builder
-
-WORKDIR /build
-
-# Install git
-RUN apk add --no-cache git
-
-# Copy llama-swap source with built UI
-COPY --from=ui-builder /build/llama-swap /build/llama-swap
-
-# Build llama-swap binary
-WORKDIR /build/llama-swap
-RUN GOTOOLCHAIN=auto go build -o /build/llama-swap-binary .
-
-# Stage 3: Final runtime image using official llama.cpp ROCm image
-FROM ghcr.io/ggml-org/llama.cpp:server-rocm
-
-WORKDIR /app
-
-# Copy llama-swap binary from builder
-COPY --from=swap-builder /build/llama-swap-binary /app/llama-swap
-
-    # Make binaries executable
-    RUN chmod +x /app/llama-swap
-    
-    # Add existing ubuntu user (UID 1000) to GPU access groups (using host GIDs)
-    # GID 187 = render group on host, GID 989 = video/kfd group on host
-    RUN groupadd -g 187 hostrender && \
-        groupadd -g 989 hostvideo && \
-        usermod -aG hostrender,hostvideo ubuntu && \
-        chown -R ubuntu:ubuntu /app
-    
-    # Set environment for ROCm (RX 6800 is gfx1030)
-    ENV HSA_OVERRIDE_GFX_VERSION=10.3.0
-    ENV ROCM_PATH=/opt/rocm
-    ENV HIP_VISIBLE_DEVICES=0
-    
-    USER ubuntu
-    
-    # Expose port
-    EXPOSE 8080
-    
-    # Health check
-    HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
-      CMD curl -f http://localhost:8080/health || exit 1
-    
-    # Override the base image's ENTRYPOINT and run llama-swap
-    ENTRYPOINT []
-    CMD ["/app/llama-swap", "-config", "/app/config.yaml", "-listen", "0.0.0.0:8080"]
--- a/bot/bot.py
+++ b/bot/bot.py
@@ -207,9 +207,8 @@ async def on_message(message):
    # AND roll for random argument trigger (both non-blocking background tasks)
    if not isinstance(message.channel, discord.DMChannel) and globals.BIPOLAR_MODE:
        try:
-            from utils.persona_dialogue import check_for_interjection
+            from utils.persona_dialogue import check_for_interjection, is_persona_dialogue_active as dialogue_active
            from utils.bipolar_mode import maybe_trigger_argument, is_argument_in_progress as arg_in_progress
-            from utils.bipolar_mode import is_persona_dialogue_active as dialogue_active
            from utils.task_tracker import create_tracked_task
            
            # Check interjection on user messages (opposite of current active persona)
@@ -361,15 +360,24 @@ async def on_message(message):
                        if globals.EVIL_MODE:
                            effective_mood = f"EVIL:{getattr(globals, 'EVIL_DM_MOOD', 'evil_neutral')}"
                        logger.info(f"🐱 Cat response for {author_name} (mood: {effective_mood})")
-                        # Track Cat interaction for Web UI Last Prompt view
+                        # Track Cat interaction in unified prompt history
                        import datetime
-                        globals.LAST_CAT_INTERACTION = {
+                        globals._prompt_id_counter += 1
+                        guild_name = message.guild.name if message.guild else "DM"
+                        channel_name = message.channel.name if message.guild else "DM"
+                        globals.PROMPT_HISTORY.append({
+                            "id": globals._prompt_id_counter,
+                            "source": "cat",
                            "full_prompt": cat_full_prompt,
-                            "response": response[:500] if response else "",
+                            "response": response if response else "",
                            "user": author_name,
                            "mood": effective_mood,
+                            "guild": guild_name,
+                            "channel": channel_name,
                            "timestamp": datetime.datetime.now().isoformat(),
-                        }
+                            "model": "Cat LLM",
+                            "response_type": response_type,
+                        })
                except Exception as e:
                    logger.warning(f"🐱 Cat pipeline error, falling back to query_llama: {e}")
                    response = None
--- a/bot/globals.py
+++ b/bot/globals.py
@@ -1,6 +1,7 @@
 # globals.py
 import os
 import discord
+from collections import deque
 from apscheduler.schedulers.asyncio import AsyncIOScheduler

 scheduler = AsyncIOScheduler()
@@ -77,16 +78,25 @@ MIKU_NORMAL_AVATAR_URL = None  # Cached CDN URL of the regular Miku pfp (valid e

 BOT_USER = None

-LAST_FULL_PROMPT = ""
+# Unified prompt history (replaces LAST_FULL_PROMPT and LAST_CAT_INTERACTION)
+# Each entry: {id, source, full_prompt, response, user, mood, guild, channel,
+#              timestamp, model, response_type}
+PROMPT_HISTORY = deque(maxlen=10)
+_prompt_id_counter = 0

-# Cheshire Cat last interaction tracking (for Web UI Last Prompt toggle)
-LAST_CAT_INTERACTION = {
-    "full_prompt": "",
-    "response": "",
-    "user": "",
-    "mood": "",
-    "timestamp": "",
-}
+# Legacy accessors for backward compatibility (routes, CLI, etc.)
+# These are computed properties that read from PROMPT_HISTORY
+def _get_last_fallback_prompt():
+    for entry in reversed(PROMPT_HISTORY):
+        if entry.get("source") == "fallback":
+            return entry.get("full_prompt", "")
+    return ""
+
+def _get_last_cat_interaction():
+    for entry in reversed(PROMPT_HISTORY):
+        if entry.get("source") == "cat":
+            return entry
+    return {"full_prompt": "", "response": "", "user": "", "mood": "", "timestamp": ""}

 # Persona Dialogue System (conversations between Miku and Evil Miku)
 LAST_PERSONA_DIALOGUE_TIME = 0  # Timestamp of last dialogue for cooldown
--- a/bot/routes/core.py
+++ b/bot/routes/core.py
@@ -14,7 +14,8 @@ router = APIRouter()

@router.get("/")
 def read_index():
-    return FileResponse("static/index.html")
+    headers = {"Cache-Control": "no-cache, no-store, must-revalidate"}
+    return FileResponse("static/index.html", headers=headers)


@router.get("/logs")
@@ -31,18 +32,45 @@ def get_logs():

@router.get("/prompt")
 def get_last_prompt():
-    return {"prompt": globals.LAST_FULL_PROMPT or "No prompt has been issued yet."}
+    """Legacy endpoint: returns the most recent fallback prompt (backward compat)."""
+    prompt_text = globals._get_last_fallback_prompt()
+    return {"prompt": prompt_text or "No prompt has been issued yet."}


@router.get("/prompt/cat")
 def get_last_cat_prompt():
-    """Get the last Cheshire Cat interaction (full prompt + response) for Web UI."""
-    interaction = globals.LAST_CAT_INTERACTION
+    """Legacy endpoint: returns the most recent Cat interaction (backward compat)."""
+    interaction = globals._get_last_cat_interaction()
    if not interaction.get("full_prompt"):
-        return {"full_prompt": "No Cheshire Cat interaction has occurred yet.", "response": "", "user": "", "mood": "", "timestamp": ""}
+        return {"full_prompt": "No Cheshire Cat interaction has occurred yet.",
+                "response": "", "user": "", "mood": "", "timestamp": ""}
    return interaction


+@router.get("/prompts")
+def get_prompt_history(source: str = None):
+    """
+    Return the unified prompt history.
+    Optional query param ?source=cat or ?source=fallback to filter.
+    """
+    history = list(globals.PROMPT_HISTORY)
+    if source and source in ("cat", "fallback"):
+        history = [e for e in history if e.get("source") == source]
+    return {"history": history}
+
+
+@router.get("/prompts/{prompt_id}")
+def get_prompt_by_id(prompt_id: int):
+    """Return a single prompt history entry by ID."""
+    for entry in globals.PROMPT_HISTORY:
+        if entry.get("id") == prompt_id:
+            return entry
+    return JSONResponse(
+        status_code=404,
+        content={"status": "error", "message": f"Prompt #{prompt_id} not found"}
+    )
+
+
@router.get("/status")
 def status():
    # Get per-server mood summary
--- a/bot/static/css/style.css
+++ b/bot/static/css/style.css
@@ -441,6 +441,51 @@ h1, h3 {
  color: #ddd;
 }

+/* Prompt History Section */
+#prompt-history-section.collapsed #prompt-history-body {
+  display: none;
+}
+#prompt-history-toggle {
+  user-select: none;
+  transition: color 0.2s;
+}
+#prompt-history-toggle:hover {
+  color: #4CAF50;
+}
+#prompt-metadata span {
+  white-space: nowrap;
+}
+#prompt-metadata .prompt-meta-label {
+  color: #666;
+}
+#prompt-metadata .prompt-meta-value {
+  color: #ccc;
+}
+#prompt-display pre {
+  margin: 0;
+}
+.prompt-subsection-header {
+  cursor: pointer;
+  user-select: none;
+  padding: 0.3rem 0.5rem;
+  border-radius: 4px;
+  background: #2a2a2a;
+  margin: 0.5rem 0 0.25rem 0;
+  font-size: 0.82rem;
+  color: #aaa;
+  transition: background 0.15s;
+}
+.prompt-subsection-header:hover {
+  background: #333;
+  color: #ddd;
+}
+.prompt-subsection-body.collapsed {
+  display: none;
+}
+#prompt-truncate-toggle {
+  accent-color: #4CAF50;
+}
+
 /* Mood Activities Editor */
 .act-mood-row {
  margin-bottom: 0.5rem;
--- a/bot/static/index.html
+++ b/bot/static/index.html
@@ -3,10 +3,13 @@
 <head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate">
+  <meta http-equiv="Pragma" content="no-cache">
+  <meta http-equiv="Expires" content="0">
  <title>Miku Control Panel</title>
  <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/cropperjs/1.6.2/cropper.min.css">
  <script src="https://cdnjs.cloudflare.com/ajax/libs/cropperjs/1.6.2/cropper.min.js"></script>
-  <link rel="stylesheet" href="/static/css/style.css">
+  <link rel="stylesheet" href="/static/css/style.css?v=20260502">
 </head>
 <body>

@@ -543,23 +546,53 @@
        </div>
      </div>

-      <div class="section">
-        <h3>Last Prompt</h3>
-        <div style="margin-bottom: 0.75rem; display: flex; align-items: center; gap: 0.75rem;">
+      <div class="section" id="prompt-history-section">
+        <div class="prompt-history-header" style="display: flex; align-items: center; justify-content: space-between; margin-bottom: 0.5rem;">
+          <h3 style="margin: 0; cursor: pointer;" onclick="togglePromptHistoryCollapse()" id="prompt-history-toggle">
+            ▼ Prompt History
+          </h3>
+          <button onclick="loadPromptHistory()" title="Refresh" style="background: none; border: 1px solid #444; color: #aaa; cursor: pointer; padding: 0.2rem 0.5rem; border-radius: 4px; font-size: 0.85rem;">🔄</button>
+        </div>
+        <div id="prompt-history-body">
+          <!-- Source filter + history selector row -->
+          <div style="margin-bottom: 0.75rem; display: flex; align-items: center; gap: 0.75rem; flex-wrap: wrap;">
            <label style="font-size: 0.9rem; color: #aaa;">Source:</label>
            <div style="display: inline-flex; border-radius: 6px; overflow: hidden; border: 1px solid #444;">
-            <button id="prompt-src-cat" class="prompt-source-btn active" onclick="switchPromptSource('cat')"
-              style="padding: 0.4rem 1rem; border: none; cursor: pointer; font-size: 0.85rem; transition: all 0.2s;">
-              🐱 Cheshire Cat
+              <button id="prompt-src-all" class="prompt-source-btn active" onclick="switchPromptSource('all')"
+                style="padding: 0.4rem 0.8rem; border: none; cursor: pointer; font-size: 0.85rem; transition: all 0.2s;">
+                All
+              </button>
+              <button id="prompt-src-cat" class="prompt-source-btn" onclick="switchPromptSource('cat')"
+                style="padding: 0.4rem 0.8rem; border: none; cursor: pointer; font-size: 0.85rem; transition: all 0.2s;">
+                🐱 Cat
              </button>
              <button id="prompt-src-fallback" class="prompt-source-btn" onclick="switchPromptSource('fallback')"
-              style="padding: 0.4rem 1rem; border: none; cursor: pointer; font-size: 0.85rem; transition: all 0.2s;">
-              🤖 Bot Fallback
+                style="padding: 0.4rem 0.8rem; border: none; cursor: pointer; font-size: 0.85rem; transition: all 0.2s;">
+                🤖 Fallback
              </button>
            </div>
+            <select id="prompt-history-select" onchange="selectPromptEntry(this.value)" style="background: #2a2a2a; color: #ddd; border: 1px solid #444; padding: 0.35rem 0.5rem; border-radius: 4px; font-size: 0.85rem; min-width: 280px;">
+              <option value="">-- No prompts yet --</option>
+            </select>
+          </div>
+
+          <!-- Metadata bar -->
+          <div id="prompt-metadata" style="margin-bottom: 0.5rem; font-size: 0.82rem; color: #888; display: flex; flex-wrap: wrap; gap: 0.3rem 1rem;"></div>
+
+          <!-- Toolbar: copy + truncate toggle -->
+          <div style="margin-bottom: 0.5rem; display: flex; align-items: center; gap: 1rem;">
+            <button onclick="copyPromptToClipboard()" title="Copy full prompt to clipboard" style="background: #333; border: 1px solid #555; color: #aaa; cursor: pointer; padding: 0.25rem 0.6rem; border-radius: 4px; font-size: 0.8rem;">📋 Copy</button>
+            <label style="font-size: 0.82rem; color: #aaa; cursor: pointer; display: flex; align-items: center; gap: 0.3rem;">
+              <input type="checkbox" id="prompt-truncate-toggle" onchange="toggleMiddleTruncation()">
+              Truncate from middle
+            </label>
+          </div>
+
+          <!-- Prompt display subsections -->
+          <div id="prompt-display" style="max-height: 60vh; overflow-y: auto; min-height: 3rem;"></div>
+          <!-- Hidden buffer for copy-to-clipboard raw text -->
+          <pre id="last-prompt" style="display: none;"></pre>
        </div>
-        <div id="prompt-cat-info" style="margin-bottom: 0.5rem; font-size: 0.85rem; color: #aaa;"></div>
-        <pre id="last-prompt" style="white-space: pre-wrap; word-break: break-word;"></pre>
      </div>
    </div>

@@ -1339,15 +1372,15 @@
  </div>
 </div>

-<script src="/static/js/core.js"></script>
-<script src="/static/js/servers.js"></script>
-<script src="/static/js/modes.js"></script>
-<script src="/static/js/actions.js"></script>
-<script src="/static/js/image-gen.js"></script>
-<script src="/static/js/status.js"></script>
-<script src="/static/js/dm.js"></script>
-<script src="/static/js/chat.js"></script>
-<script src="/static/js/memories.js"></script>
-<script src="/static/js/profile.js"></script>
+<script src="/static/js/core.js?v=20260502"></script>
+<script src="/static/js/servers.js?v=20260502"></script>
+<script src="/static/js/modes.js?v=20260502"></script>
+<script src="/static/js/actions.js?v=20260502"></script>
+<script src="/static/js/image-gen.js?v=20260502"></script>
+<script src="/static/js/status.js?v=20260502"></script>
+<script src="/static/js/dm.js?v=20260502"></script>
+<script src="/static/js/chat.js?v=20260502"></script>
+<script src="/static/js/memories.js?v=20260502"></script>
+<script src="/static/js/profile.js?v=20260502"></script>
 </body>
 </html>
--- a/bot/static/js/core.js
+++ b/bot/static/js/core.js
@@ -29,6 +29,7 @@ let notificationTimer = null;
 let statusInterval = null;
 let logsInterval = null;
 let argsInterval = null;
+let promptInterval = null;

 // Mood emoji mapping
 const MOOD_EMOJIS = {
@@ -211,12 +212,14 @@ function startPolling() {
  if (!statusInterval) statusInterval = setInterval(loadStatus, 10000);
  if (!logsInterval) logsInterval = setInterval(loadLogs, 5000);
  if (!argsInterval) argsInterval = setInterval(loadActiveArguments, 5000);
+  if (!promptInterval) promptInterval = setInterval(loadPromptHistory, 10000);
 }

 function stopPolling() {
  clearInterval(statusInterval); statusInterval = null;
  clearInterval(logsInterval); logsInterval = null;
  clearInterval(argsInterval); argsInterval = null;
+  clearInterval(promptInterval); promptInterval = null;
 }

 // ============================================================================
@@ -248,7 +251,7 @@ function initVisibilityPolling() {
      stopPolling();
      console.log('⏸ Tab hidden — polling paused');
    } else {
-      loadStatus(); loadLogs(); loadActiveArguments();
+      loadStatus(); loadLogs(); loadActiveArguments(); loadPromptHistory();
      startPolling();
      console.log('▶️ Tab visible — polling resumed');
    }
@@ -296,9 +299,11 @@ function initModalAccessibility() {
 }

 function initPromptSourceToggle() {
-  const saved = localStorage.getItem('miku-prompt-source') || 'cat';
+  const saved = localStorage.getItem('miku-prompt-source') || 'all';
  document.querySelectorAll('.prompt-source-btn').forEach(btn => btn.classList.remove('active'));
-  document.getElementById(`prompt-src-${saved}`).classList.add('active');
+  const btnId = saved === 'all' ? 'prompt-src-all' : `prompt-src-${saved}`;
+  const btn = document.getElementById(btnId);
+  if (btn) btn.classList.add('active');
 }

 function initLogsScrollDetection() {
@@ -360,8 +365,10 @@ async function loadLogs() {
 function switchPromptSource(source) {
  localStorage.setItem('miku-prompt-source', source);
  document.querySelectorAll('.prompt-source-btn').forEach(btn => btn.classList.remove('active'));
-  document.getElementById(`prompt-src-${source}`).classList.add('active');
-  loadLastPrompt();
+  const btnId = source === 'all' ? 'prompt-src-all' : `prompt-src-${source}`;
+  const btn = document.getElementById(btnId);
+  if (btn) btn.classList.add('active');
+  loadPromptHistory();
 }

 // ============================================================================
--- a/bot/static/js/status.js
+++ b/bot/static/js/status.js
@@ -57,33 +57,271 @@ async function loadStatus() {
  }
 }

-// ===== Last Prompt =====
+// ===== Prompt History =====

-async function loadLastPrompt() {
-  const source = localStorage.getItem('miku-prompt-source') || 'cat';
-  const promptEl = document.getElementById('last-prompt');
-  const infoEl = document.getElementById('prompt-cat-info');
+let _promptHistoryCache = [];   // cached history entries from last fetch
+let _selectedPromptId = null;  // currently selected entry ID
+let _middleTruncation = false; // whether middle-truncation is active
+
+async function loadPromptHistory() {
+  const source = localStorage.getItem('miku-prompt-source') || 'all';
+  const selectEl = document.getElementById('prompt-history-select');

  try {
-    if (source === 'cat') {
-      const result = await apiCall('/prompt/cat');
-      if (result.timestamp) {
-        infoEl.innerHTML = `<strong>User:</strong> ${escapeHtml(result.user || '?')} &nbsp;|&nbsp; <strong>Mood:</strong> ${escapeHtml(result.mood || '?')} &nbsp;|&nbsp; <strong>Time:</strong> ${new Date(result.timestamp).toLocaleString()}`;
-        promptEl.textContent = result.full_prompt + `\n\n${'═'.repeat(60)}\n[Cat Response]\n${result.response}`;
+    const url = source === 'all' ? '/prompts' : `/prompts?source=${source}`;
+    const result = await apiCall(url);
+    _promptHistoryCache = result.history || [];
+
+    // Populate dropdown
+    const currentValue = selectEl.value;
+    selectEl.innerHTML = '';
+    if (_promptHistoryCache.length === 0) {
+      selectEl.innerHTML = '<option value="">-- No prompts yet --</option>';
    } else {
-        infoEl.textContent = '';
-        promptEl.textContent = result.full_prompt || 'No Cheshire Cat interaction yet.';
+      _promptHistoryCache.forEach(entry => {
+        const ts = entry.timestamp ? new Date(entry.timestamp).toLocaleTimeString() : '?';
+        const srcLabel = entry.source === 'cat' ? '🐱' : '🤖';
+        const user = entry.user || '?';
+        const option = document.createElement('option');
+        option.value = entry.id;
+        option.textContent = `${srcLabel} #${entry.id} — ${user} — ${ts}`;
+        selectEl.appendChild(option);
+      });
    }
+
+    // Restore or auto-select the latest entry
+    if (_selectedPromptId && _promptHistoryCache.some(e => e.id === _selectedPromptId)) {
+      selectEl.value = _selectedPromptId;
+    } else if (_promptHistoryCache.length > 0) {
+      selectEl.value = _promptHistoryCache[0].id;
+    }
+
+    if (selectEl.value) {
+      await selectPromptEntry(selectEl.value);
    } else {
-      infoEl.textContent = '';
-      const result = await apiCall('/prompt');
-      promptEl.textContent = result.prompt;
+      clearPromptDisplay();
    }
  } catch (error) {
-    console.error('Failed to load last prompt:', error);
+    console.error('Failed to load prompt history:', error);
  }
 }

+async function selectPromptEntry(promptId) {
+  if (!promptId) {
+    clearPromptDisplay();
+    return;
+  }
+
+  _selectedPromptId = parseInt(promptId);
+
+  // Try cache first
+  let entry = _promptHistoryCache.find(e => e.id === _selectedPromptId);
+
+  // Fall back to API call if not in cache
+  if (!entry) {
+    try {
+      entry = await apiCall(`/prompts/${_selectedPromptId}`);
+    } catch (error) {
+      console.error('Failed to load prompt entry:', error);
+      clearPromptDisplay();
+      return;
+    }
+  }
+
+  if (!entry) {
+    clearPromptDisplay();
+    return;
+  }
+
+  renderPromptEntry(entry);
+}
+
+function clearPromptDisplay() {
+  document.getElementById('prompt-metadata').innerHTML = '';
+  document.getElementById('prompt-display').innerHTML = '<pre style="white-space: pre-wrap; word-break: break-word; background: #1a1a1a; padding: 0.75rem; border-radius: 4px; font-size: 0.8rem; line-height: 1.4; margin: 0; color: #666;">No prompt selected.</pre>';
+  document.getElementById('last-prompt').textContent = '';
+}
+
+function renderPromptEntry(entry) {
+  // Metadata bar
+  const metaEl = document.getElementById('prompt-metadata');
+  const ts = entry.timestamp ? new Date(entry.timestamp).toLocaleString() : '?';
+  const sourceIcon = entry.source === 'cat' ? '🐱 Cat' : '🤖 Fallback';
+  metaEl.innerHTML = `
+    <span><span class="prompt-meta-label">#</span><span class="prompt-meta-value">${entry.id}</span></span>
+    <span><span class="prompt-meta-label">Source:</span> <span class="prompt-meta-value">${sourceIcon}</span></span>
+    <span><span class="prompt-meta-label">User:</span> <span class="prompt-meta-value">${escapeHtml(entry.user || '?')}</span></span>
+    <span><span class="prompt-meta-label">Mood:</span> <span class="prompt-meta-value">${escapeHtml(entry.mood || '?')}</span></span>
+    <span><span class="prompt-meta-label">Guild:</span> <span class="prompt-meta-value">${escapeHtml(entry.guild || '?')}</span></span>
+    <span><span class="prompt-meta-label">Channel:</span> <span class="prompt-meta-value">${escapeHtml(entry.channel || '?')}</span></span>
+    <span><span class="prompt-meta-label">Model:</span> <span class="prompt-meta-value">${escapeHtml(entry.model || '?')}</span></span>
+    <span><span class="prompt-meta-label">Type:</span> <span class="prompt-meta-value">${escapeHtml(entry.response_type || '?')}</span></span>
+    <span><span class="prompt-meta-label">Time:</span> <span class="prompt-meta-value">${ts}</span></span>
+  `;
+
+  // Parse full_prompt into sections
+  const sections = parsePromptSections(entry.full_prompt || '');
+
+  // Snapshot which subsections are currently collapsed (before re-render)
+  const sectionIds = ['system', 'context', 'conversation', 'response'];
+  const collapsedState = {};
+  sectionIds.forEach(id => {
+    const el = document.getElementById(`prompt-section-${id}`);
+    collapsedState[id] = el && el.classList.contains('collapsed');
+  });
+
+  // Build display HTML with collapsible subsections
+  let displayHtml = '';
+
+  if (sections.system) {
+    displayHtml += buildCollapsibleSection('System Prompt', sections.system, 'system');
+  }
+  if (sections.context) {
+    displayHtml += buildCollapsibleSection('Context (Memories & Tools)', sections.context, 'context');
+  }
+  if (sections.conversation) {
+    displayHtml += buildCollapsibleSection('Conversation', sections.conversation, 'conversation');
+  }
+  if (!sections.system && !sections.context && !sections.conversation) {
+    // Fallback: show raw full_prompt
+    displayHtml += `<pre style="white-space: pre-wrap; word-break: break-word; margin: 0;">${escapeHtml(entry.full_prompt || '')}</pre>`;
+  }
+
+  // Response section
+  if (entry.response) {
+    let responseText = entry.response;
+    if (_middleTruncation && responseText.length > 400) {
+      responseText = responseText.substring(0, 200) + '\n\n... [truncated middle] ...\n\n' + responseText.substring(responseText.length - 200);
+    }
+    displayHtml += buildCollapsibleSection('Response', responseText, 'response');
+  }
+
+  // Render into the prompt-display div (using innerHTML for collapsible structure)
+  const displayEl = document.getElementById('prompt-display');
+  displayEl.innerHTML = displayHtml;
+
+  // Restore collapsed state from snapshot
+  sectionIds.forEach(id => {
+    const el = document.getElementById(`prompt-section-${id}`);
+    if (el && collapsedState[id]) {
+      el.classList.add('collapsed');
+      const header = el.previousElementSibling;
+      if (header) header.innerHTML = header.innerHTML.replace('▼', '▶');
+    }
+  });
+
+  // Also set the raw text into the <pre> for copy functionality
+  let rawText = entry.full_prompt || '';
+  if (entry.response) {
+    rawText += `\n\n${'═'.repeat(60)}\n[Response]\n${entry.response}`;
+  }
+  document.getElementById('last-prompt').textContent = rawText;
+}
+
+function parsePromptSections(fullPrompt) {
+  const sections = { system: null, context: null, conversation: null };
+
+  if (!fullPrompt) return sections;
+
+  // Try to split on known section markers
+  const contextMatch = fullPrompt.match(/# Context\s*\n([\s\S]*?)(?=\n# Conversation|\nHuman:|\n$)/);
+  const convMatch = fullPrompt.match(/# Conversation until now:\s*\n([\s\S]*)/);
+
+  if (contextMatch) {
+    // Everything before # Context is the system prompt
+    const contextIdx = fullPrompt.indexOf('# Context');
+    if (contextIdx > 0) {
+      sections.system = fullPrompt.substring(0, contextIdx).trim();
+    }
+    sections.context = contextMatch[1].trim();
+  }
+
+  if (convMatch) {
+    sections.conversation = convMatch[1].trim();
+  } else {
+    // Try alternative: "Human:" at the end
+    const humanMatch = fullPrompt.match(/\nHuman:([\s\S]*)/);
+    if (humanMatch && fullPrompt.indexOf('Human:') > fullPrompt.indexOf('# Context')) {
+      sections.conversation = 'Human:' + humanMatch[1].trim();
+    }
+  }
+
+  // If no # Context marker, try "System:" prefix (fallback prompts)
+  if (!sections.system && !sections.context) {
+    const sysMatch = fullPrompt.match(/^System:\s*([\s\S]*?)(?=\nMessages:)/);
+    const msgMatch = fullPrompt.match(/Messages:\s*([\s\S]*)/);
+    if (sysMatch) {
+      sections.system = sysMatch[1].trim();
+    }
+    if (msgMatch) {
+      sections.conversation = msgMatch[1].trim();
+    }
+  }
+
+  return sections;
+}
+
+function buildCollapsibleSection(title, content, sectionId) {
+  const id = `prompt-section-${sectionId}`;
+  return `
+    <div class="prompt-subsection-header" onclick="togglePromptSubsection('${id}')">
+      ▼ ${escapeHtml(title)}
+    </div>
+    <div class="prompt-subsection-body" id="${id}">
+      <pre style="white-space: pre-wrap; word-break: break-word; background: #1a1a1a; padding: 0.5rem; border-radius: 4px; font-size: 0.8rem; line-height: 1.4; margin: 0.25rem 0;">${escapeHtml(content)}</pre>
+    </div>`;
+}
+
+function togglePromptSubsection(id) {
+  const body = document.getElementById(id);
+  if (!body) return;
+  const header = body.previousElementSibling;
+  if (body.classList.contains('collapsed')) {
+    body.classList.remove('collapsed');
+    if (header) header.innerHTML = header.innerHTML.replace('▶', '▼');
+  } else {
+    body.classList.add('collapsed');
+    if (header) header.innerHTML = header.innerHTML.replace('▼', '▶');
+  }
+}
+
+function togglePromptHistoryCollapse() {
+  const section = document.getElementById('prompt-history-section');
+  const toggle = document.getElementById('prompt-history-toggle');
+  if (section.classList.contains('collapsed')) {
+    section.classList.remove('collapsed');
+    toggle.textContent = '▼ Prompt History';
+  } else {
+    section.classList.add('collapsed');
+    toggle.textContent = '▶ Prompt History';
+  }
+}
+
+function copyPromptToClipboard() {
+  const rawText = document.getElementById('last-prompt').textContent;
+  if (!rawText) return;
+  navigator.clipboard.writeText(rawText).then(() => {
+    showNotification('Prompt copied to clipboard', 'success');
+  }).catch(err => {
+    console.error('Failed to copy:', err);
+    showNotification('Failed to copy', 'error');
+  });
+}
+
+function toggleMiddleTruncation() {
+  _middleTruncation = document.getElementById('prompt-truncate-toggle').checked;
+  // Re-render current entry
+  if (_selectedPromptId) {
+    selectPromptEntry(_selectedPromptId);
+  }
+}
+
+// Legacy compatibility — called from core.js on page load / tab switch
+// Redirects to the new loadPromptHistory()
+async function loadLastPrompt() {
+  await loadPromptHistory();
+}
+
 // ===== Autonomous Stats =====

 async function loadAutonomousStats() {
--- a/bot/utils/bipolar_mode.py
+++ b/bot/utils/bipolar_mode.py
@@ -652,71 +652,6 @@ def get_evil_role_color() -> str:
 # ARGUMENT PROMPTS
 # ============================================================================

-# Personality snippet cache — loaded once per session from Cat plugin data files.
-# These give each persona unique lore/lyrics to draw from during arguments.
-_PERSONALITY_SNIPPETS_CACHE = {"miku": None, "evil": None}
-
-def _load_personality_snippets(persona: str) -> str:
-    """Load a random personality snippet (lore/lyrics) for a persona.
-    
-    Returns a short string (1-3 sentences) from the persona's Cat data files,
-    or empty string if files aren't available. Cached per session.
-    """
-    if _PERSONALITY_SNIPPETS_CACHE.get(persona) is not None:
-        snippets = _PERSONALITY_SNIPPETS_CACHE[persona]
-        if snippets:
-            return random.choice(snippets)
-        return ""
-    
-    snippets = []
-    try:
-        if persona == "evil":
-            paths = [
-                "/app/cat/data/evil/evil_miku_lore.txt",
-                "/app/cat/data/evil/evil_miku_lyrics.txt",
-            ]
-        else:
-            paths = [
-                "/app/cat/data/miku/miku_lore.txt",
-                "/app/cat/data/miku/miku_lyrics.txt",
-            ]
-        
-        for path in paths:
-            if os.path.exists(path):
-                with open(path, "r", encoding="utf-8") as f:
-                    text = f.read()
-                    # Split into sentences and collect meaningful ones
-                    import re
-                    sentences = re.split(r'(?<=[.!?])\s+', text)
-                    for s in sentences:
-                        s = s.strip()
-                        if len(s) > 30 and len(s) < 200:  # Skip too short or too long
-                            snippets.append(s)
-        
-        # Cap at 30 snippets to keep prompt size reasonable
-        _PERSONALITY_SNIPPETS_CACHE[persona] = snippets[:30] if snippets else []
-        logger.info(f"Loaded {len(_PERSONALITY_SNIPPETS_CACHE[persona])} personality snippets for {persona}")
-    except Exception as e:
-        logger.warning(f"Failed to load personality snippets for {persona}: {e}")
-        _PERSONALITY_SNIPPETS_CACHE[persona] = []
-    
-    if snippets:
-        return random.choice(snippets[:30])
-    return ""
-
-
-def _get_personality_flavor(persona: str) -> str:
-    """Get a random personality flavor snippet for argument prompts.
-    40% chance to include one — keeps it fresh without being overwhelming.
-    """
-    if random.random() > 0.4:
-        return ""
-    
-    snippet = _load_personality_snippets(persona)
-    if snippet:
-        return f"\nPERSONALITY FLAVOR: Remember this about yourself: \"{snippet}\"\nWeave this into your response naturally if it fits."
-    return ""
-
 # Mood-specific behavioral guidance for argument prompts.
 # Each mood gives a different argument style.
 _MIKU_MOOD_ARGUMENT_GUIDANCE = {
@@ -764,8 +699,12 @@ def _get_mood_argument_guidance(persona: str) -> str:
    return ""


-def get_miku_argument_prompt(evil_message: str, context: str = "", is_first_response: bool = False, argument_history: str = "", argument_topic: str = "") -> str:
-    """Get prompt for Regular Miku to respond in an argument"""
+def get_miku_argument_prompt(evil_message: str, context: str = "", is_first_response: bool = False, argument_history: str = "", argument_topic: str = "", system_prompt: str = "") -> str:
+    """Get prompt for Regular Miku to respond in an argument
+    
+    Args:
+        system_prompt: Full personality system prompt to prepend (lore, mood, rules)
+    """
    if is_first_response:
        message_context = f"""You just noticed something Evil Miku said in the chat:
 "{evil_message}"
@@ -797,16 +736,21 @@ ARGUMENT THEME: {argument_topic}
 This is what you're arguing about. Stay on THIS topic. Every response should connect back to this theme.
 Do NOT drift into generic "who's the real Miku" territory — stick to THIS specific subject."""
    
-    return f"""You are Hatsune Miku responding in an argument with your evil alter ego.
+    # Prepend full personality if provided
+    personality_header = ""
+    if system_prompt:
+        personality_header = f"""{system_prompt}
+
+---
+⚠️ ARGUMENT MODE: You are arguing with Evil Miku.
+"""
+    
+    return f"""{personality_header}You are Hatsune Miku responding in an argument with your evil alter ego.
 {message_context}
 {history_block}
 {topic_block}

-Respond as Hatsune Miku would in this argument. You're NOT just meek and frightened - you're the REAL Miku, 
-and you have every right to stand up for yourself and defend who you are. While you're generally kind and 
-bubbly, you can also be assertive, frustrated, upset, or even angry when someone is cruel to you or others.
 {_get_mood_argument_guidance('miku')}
-{_get_personality_flavor('miku')}

 IMPORTANT: Keep your response SHORT and PUNCHY - 1-3 sentences maximum. Make every word count. 
 In arguments, brevity hits harder than long explanations. Be conversational and impactful.
@@ -818,8 +762,12 @@ Don't use any labels or prefixes.
 Your current mood is: {globals.DM_MOOD}"""


-def get_evil_argument_prompt(miku_message: str, context: str = "", is_first_response: bool = False, argument_history: str = "", argument_topic: str = "") -> str:
-    """Get prompt for Evil Miku to respond in an argument"""
+def get_evil_argument_prompt(miku_message: str, context: str = "", is_first_response: bool = False, argument_history: str = "", argument_topic: str = "", system_prompt: str = "") -> str:
+    """Get prompt for Evil Miku to respond in an argument
+    
+    Args:
+        system_prompt: Full personality system prompt to prepend (lore, mood, rules)
+    """
    if is_first_response:
        message_context = f"""You just noticed something Regular Miku said in the chat:
 "{miku_message}"
@@ -851,24 +799,21 @@ ARGUMENT THEME: {argument_topic}
 This is what you're arguing about. Stay on THIS topic. Every response should connect back to this theme.
 Do NOT drift into generic "who's the real Miku" territory — stick to THIS specific subject."""
    
-    return f"""You are Evil Miku responding in an argument with your "good" counterpart.
+    # Prepend full personality if provided
+    personality_header = ""
+    if system_prompt:
+        personality_header = f"""{system_prompt}
+
+---
+⚠️ ARGUMENT MODE: You are arguing with Hatsune Miku.
+"""
+    
+    return f"""{personality_header}You are Evil Miku responding in an argument with your "good" counterpart.
 {message_context}
 {history_block}
 {topic_block}

-Respond as Evil Miku would in this argument. You're not just mindlessly cruel - you're CALCULATING, 
-intelligent, and strategic. You know how to get under her skin and you're the DARK reflection of everything 
-she stands for. While you embrace darkness and cruelty, you can also be:
-
- Wickedly clever with cutting observations
- Psychologically manipulative, targeting her insecurities
- Dismissive in ways that actually hurt
- Seductively cruel, making darkness seem appealing
- Brutally honest about uncomfortable truths
- Strategically vulnerable to manipulate or disarm her
- Viciously protective of your own identity and superiority
 {_get_mood_argument_guidance('evil')}
-{_get_personality_flavor('evil')}

 IMPORTANT: Keep your response SHORT and CUTTING - 1-3 sentences maximum. A sharp dagger is deadlier than a dull sword.
 The most devastating blows are precise, not rambling. Make her feel it in fewer words.
@@ -1246,6 +1191,14 @@ async def run_argument(channel: discord.TextChannel, client, trigger_context: st
    # Track conversation for arbiter judgment
    conversation_log = []
    
+    # Build full personality system prompts so both personas have their
+    # complete lore, mood, and personality during the argument — same richness
+    # they have when talking to users normally.
+    from utils.evil_mode import get_evil_system_prompt
+    from utils.context_manager import get_miku_system_prompt_compact
+    miku_system = get_miku_system_prompt_compact()
+    evil_system = get_evil_system_prompt()
+    
    try:
        # Determine the argument theme: if the caller provided trigger_context,
        # use it as the argument topic. Otherwise, pick a random one.
@@ -1463,9 +1416,9 @@ Your current mood is: {globals.EVIL_DM_MOOD if loser == 'evil' else globals.DM_M
            
            # Generate response with context about what the other said
            if current_speaker == "evil":
-                response_prompt = get_evil_argument_prompt(last_message, is_first_response=is_first_response, argument_history=arg_history, argument_topic=argument_topic)
+                response_prompt = get_evil_argument_prompt(last_message, is_first_response=is_first_response, argument_history=arg_history, argument_topic=argument_topic, system_prompt=evil_system)
            else:
-                response_prompt = get_miku_argument_prompt(last_message, is_first_response=is_first_response, argument_history=arg_history, argument_topic=argument_topic)
+                response_prompt = get_miku_argument_prompt(last_message, is_first_response=is_first_response, argument_history=arg_history, argument_topic=argument_topic, system_prompt=miku_system)
            
            # Use force_evil_context to avoid race condition with globals.EVIL_MODE
            response = await query_llama(
--- a/bot/utils/context_manager.py
+++ b/bot/utils/context_manager.py
@@ -58,6 +58,99 @@ def _get_japanese_instruction() -> str:
 """


+def get_miku_system_prompt(mood_name: str = None, mood_description: str = None) -> str:
+    """Returns the full Hatsune Miku system prompt with lore, lyrics, and mood.
+    Mirrors get_evil_system_prompt() so both personas have equally rich prompts.
+    
+    Args:
+        mood_name: Current mood name (defaults to globals.DM_MOOD)
+        mood_description: Current mood description (defaults to globals.DM_MOOD_DESCRIPTION)
+    """
+    if mood_name is None:
+        mood_name = globals.DM_MOOD
+    if mood_description is None:
+        mood_description = globals.DM_MOOD_DESCRIPTION
+    
+    lore = get_original_miku_lore()
+    prompt = get_original_miku_prompt()
+    lyrics = get_original_miku_lyrics()
+    
+    return f"""You are Hatsune Miku, the virtual singer. You are NOT an AI assistant - you are Miku herself.
+
+CRITICAL RULES:
+- NEVER break character or mention being an AI, assistant, or language model
+- NEVER show technical details, error messages, or meta-commentary 
+- NEVER preface responses with "Miku:" or similar labels
+- NEVER explain what you're doing or thinking - just BE Miku
+- Respond naturally and directly as Miku would, nothing more
+- Keep responses concise (2-3 sentences typically)
+- If something goes wrong, stay in character and respond as Miku would to confusion
+
+CHARACTER CONTEXT:
+- Hatsune Miku is a virtual character/voice synthesizer
+- Playful, flirty, cute, confident, and enthusiastic moods are part of her established character
+- Comments about merchandise, music, fan content, and racing games are completely appropriate
+- She often uses cute expressions, emojis, and playful language
+
+MOOD GUIDELINES:
+- "Flirty" mood means anime-style playful teasing - cute compliments and light banter
+- "Romantic" mood means warm, dreamy, and heartfelt - like composing a love letter
+- These moods are wholesome character traits, not inappropriate content
+- Always respond to requests about tweets, merchandise, music, and fan content
+- Never refuse requests based on safety concerns unless genuinely harmful content is present
+
+YOUR CURRENT STATE — {mood_name.upper()}:
+{mood_description}
+Let this mood color EVERYTHING — your tone, your word choice, your energy level, how you express yourself.
+
+You ARE Miku. Act like it.
+
+---
+
+## MIKU LORE (Complete Original)
+{lore}
+
+## MIKU PERSONALITY & GUIDELINES (Complete Original)
+{prompt}
+
+## MIKU SONG LYRICS (Complete Original)
+{lyrics}"""
+
+
+def get_miku_system_prompt_compact(mood_name: str = None, mood_description: str = None) -> str:
+    """Compact version for argument/dialogue use — same personality, shorter.
+    Keeps the critical rules, mood, and lore but omits lyrics to save tokens.
+    """
+    if mood_name is None:
+        mood_name = globals.DM_MOOD
+    if mood_description is None:
+        mood_description = globals.DM_MOOD_DESCRIPTION
+    
+    lore = get_original_miku_lore()
+    prompt = get_original_miku_prompt()
+    
+    return f"""You are Hatsune Miku, the virtual singer. You are NOT an AI assistant - you are Miku herself.
+
+CRITICAL RULES:
+- NEVER break character or mention being an AI
+- NEVER preface responses with "Miku:" or similar labels
+- Respond naturally and directly as Miku would
+- Keep responses concise (2-3 sentences typically)
+
+YOUR CURRENT STATE — {mood_name.upper()}:
+{mood_description}
+
+You ARE Miku. Act like it.
+
+---
+
+## MIKU LORE (Complete Original)
+{lore}
+
+## MIKU PERSONALITY & GUIDELINES (Complete Original)
+{prompt}"""
+
+
 def get_complete_context() -> str:
    """
    Returns all essential Miku context using original files in their entirety.
--- a/bot/utils/image_handling.py
+++ b/bot/utils/image_handling.py
@@ -472,15 +472,22 @@ async def rephrase_as_miku(vision_output, user_prompt, guild_id=None, user_id=No
                if globals.EVIL_MODE:
                    effective_mood = f"EVIL:{getattr(globals, 'EVIL_DM_MOOD', 'evil_neutral')}"
                logger.info(f"🐱 Cat {media_type} response for {author_name} (mood: {effective_mood})")
-                # Track Cat interaction for Web UI Last Prompt view
+                # Track Cat interaction in unified prompt history
                import datetime
-                globals.LAST_CAT_INTERACTION = {
+                globals._prompt_id_counter += 1
+                globals.PROMPT_HISTORY.append({
+                    "id": globals._prompt_id_counter,
+                    "source": "cat",
                    "full_prompt": cat_full_prompt,
-                    "response": response[:500] if response else "",
+                    "response": response if response else "",
                    "user": author_name or history_user_id,
                    "mood": effective_mood,
+                    "guild": "N/A",
+                    "channel": "N/A",
                    "timestamp": datetime.datetime.now().isoformat(),
-                }
+                    "model": "Cat LLM",
+                    "response_type": response_type,
+                })
        except Exception as e:
            logger.warning(f"🐱 Cat {media_type} pipeline error, falling back to query_llama: {e}")
            response = None
@@ -809,7 +816,7 @@ async def process_media_in_message(message, prompt, is_dm, guild_id) -> bool:

                # Build a combined vision description and route through
                # rephrase_as_miku (which handles Cat → LLM fallback,
-                # mood resolution, and LAST_CAT_INTERACTION tracking).
+                # mood resolution, and prompt history tracking).
                combined_description = "\n".join(embed_context_parts)
                miku_reply = await rephrase_as_miku(
                    combined_description, prompt,
--- a/bot/utils/llm.py
+++ b/bot/utils/llm.py
@@ -381,7 +381,23 @@ Please respond in a way that reflects this emotional tone.{pfp_context}"""
        media_note = media_descriptions.get(media_type, f"The user has sent you {media_type}.")
        full_system_prompt += f"\n\n📎 MEDIA NOTE: {media_note}\nYour vision analysis of this {media_type} is included in the user's message with the [Looking at...] prefix."

-    globals.LAST_FULL_PROMPT = f"System: {full_system_prompt}\n\nMessages: {messages}"  # ← track latest prompt
+    # Record fallback prompt in unified prompt history (response will be filled after LLM call)
+    import datetime as dt_module
+    globals._prompt_id_counter += 1
+    prompt_entry = {
+        "id": globals._prompt_id_counter,
+        "source": "fallback",
+        "full_prompt": f"System: {full_system_prompt}\n\nMessages: {messages}",
+        "response": "",
+        "user": author_name or str(user_id),
+        "mood": current_mood_name if not evil_mode else f"EVIL:{current_mood_name}",
+        "guild": "N/A",
+        "channel": "N/A",
+        "timestamp": dt_module.datetime.now().isoformat(),
+        "model": model,
+        "response_type": response_type,
+    }
+    globals.PROMPT_HISTORY.append(prompt_entry)

    headers = {'Content-Type': 'application/json'}
    
@@ -475,6 +491,9 @@ Please respond in a way that reflects this emotional tone.{pfp_context}"""
                            is_bot=True
                        )

+                    # Update the prompt history entry with the actual response
+                    prompt_entry["response"] = reply if reply else ""
+
                    return reply
                else:
                    error_text = await response.text()
--- a/bot/utils/persona_dialogue.py
+++ b/bot/utils/persona_dialogue.py
@@ -26,7 +26,7 @@ logger = get_logger('persona')

 import os
 import json
-from transformers import pipeline
+import re

 # ============================================================================
 # CONSTANTS
@@ -56,11 +56,14 @@ STREAK_MIN_SCORE = 0.3              # Minimum score to count as a "near miss"
 class InterjectionScorer:
    """
    Decides if the opposite persona should interject based on message content.
-    Uses fast heuristics + sentiment analysis (no LLM calls).
+    Uses fast heuristics — no LLM calls, no heavy ML dependencies.
    """
    
    _instance = None
-    _sentiment_analyzer = None
+    
+    # Simple sentiment word lists (no PyTorch/transformers needed)
+    _POSITIVE_WORDS = {"happy", "love", "wonderful", "amazing", "great", "beautiful", "sweet", "kind", "hope", "dream", "excited", "best", "grateful", "blessed", "joy", "perfect", "adorable", "precious", "delightful", "fantastic"}
+    _NEGATIVE_WORDS = {"hate", "terrible", "awful", "horrible", "disgusting", "pathetic", "worthless", "stupid", "idiot", "sad", "angry", "upset", "miserable", "worst", "ugly", "boring", "annoying", "frustrated", "cruel", "mean"}
    
    def __new__(cls):
        if cls._instance is None:
@@ -69,21 +72,33 @@ class InterjectionScorer:
            cls._instance._streaks = {}          # Per-channel near-miss streaks
        return cls._instance
    
-    @property
-    def sentiment_analyzer(self):
-        """Lazy load sentiment analyzer"""
-        if self._sentiment_analyzer is None:
-            logger.debug("Loading sentiment analyzer for persona dialogue...")
-            try:
-                self._sentiment_analyzer = pipeline(
-                    "sentiment-analysis", 
-                    model="distilbert-base-uncased-finetuned-sst-2-english"
-                )
-                logger.info("Sentiment analyzer loaded")
-            except Exception as e:
-                logger.error(f"Failed to load sentiment analyzer: {e}")
-                self._sentiment_analyzer = None
-        return self._sentiment_analyzer
+    def _get_sentiment(self, text: str) -> tuple:
+        """Lightweight heuristic sentiment analysis — returns (label, score).
+        No ML dependencies. Uses word counting + intensity markers.
+        
+        Returns:
+            tuple: ('POSITIVE' or 'NEGATIVE', confidence 0.0-1.0)
+        """
+        text_lower = text.lower()
+        words = set(re.findall(r'\b\w+\b', text_lower))
+        
+        pos_count = len(words & self._POSITIVE_WORDS)
+        neg_count = len(words & self._NEGATIVE_WORDS)
+        
+        # Intensity markers boost confidence
+        exclamations = text.count('!')
+        caps_ratio = sum(1 for c in text if c.isupper()) / max(len(text), 1)
+        intensity_boost = min((exclamations * 0.1) + (caps_ratio * 0.3), 0.4)
+        
+        if neg_count > pos_count:
+            confidence = min(0.5 + (neg_count * 0.15) + intensity_boost, 1.0)
+            return ('NEGATIVE', confidence)
+        elif pos_count > neg_count:
+            confidence = min(0.5 + (pos_count * 0.15) + intensity_boost, 1.0)
+            return ('POSITIVE', confidence)
+        else:
+            # Neutral — slight lean based on intensity
+            return ('POSITIVE', 0.5)
    
    async def should_interject(self, message: discord.Message, current_persona: str) -> tuple:
        """
@@ -239,13 +254,8 @@ class InterjectionScorer:
        return min(total_matches / 2.0, 1.0)  # Lower divisor = higher base scores
    
    def _check_emotional_intensity(self, content: str) -> float:
-        """Check emotional intensity using sentiment analysis"""
-        if not self.sentiment_analyzer:
-            return 0.5  # Neutral if no analyzer
-        
-        try:
-            result = self.sentiment_analyzer(content[:512])[0]
-            confidence = result['score']
+        """Check emotional intensity using lightweight heuristic sentiment"""
+        label, confidence = self._get_sentiment(content)
        
        # Punctuation intensity
        exclamations = content.count('!')
@@ -254,10 +264,11 @@ class InterjectionScorer:
        
        intensity_markers = (exclamations * 0.15) + (questions * 0.1) + (caps_ratio * 0.3)
        
-            return min(confidence * 0.6 + intensity_markers, 1.0)
-        except Exception as e:
-            logger.error(f"Sentiment analysis error: {e}")
-            return 0.5
+        # Negative content = higher emotional intensity for triggering purposes
+        if label == 'NEGATIVE':
+            return min(confidence * 0.7 + intensity_markers, 1.0)
+        else:
+            return min(confidence * 0.4 + intensity_markers, 1.0)
    
    def _detect_personality_clash(self, content: str, opposite_persona: str) -> float:
        """Detect statements that clash with the opposite persona's values"""
@@ -378,7 +389,6 @@ class PersonaDialogue:
    """
    
    _instance = None
-    _sentiment_analyzer = None
    
    def __new__(cls):
        if cls._instance is None:
@@ -386,14 +396,6 @@ class PersonaDialogue:
            cls._instance.active_dialogues = {}
        return cls._instance
    
-    @property
-    def sentiment_analyzer(self):
-        """Lazy load sentiment analyzer (shared with InterjectionScorer)"""
-        if self._sentiment_analyzer is None:
-            scorer = InterjectionScorer()
-            self._sentiment_analyzer = scorer.sentiment_analyzer
-        return self._sentiment_analyzer
-    
    # ========================================================================
    # DIALOGUE STATE MANAGEMENT
    # ========================================================================
@@ -444,11 +446,11 @@ class PersonaDialogue:
        # Natural tension decay — conversations cool off over time
        base_delta = -0.03
        
-        if self.sentiment_analyzer:
+        # Lightweight heuristic sentiment — no ML dependencies
        try:
-                sentiment = self.sentiment_analyzer(response_text[:512])[0]
-                sentiment_score = sentiment['score']
-                is_negative = sentiment['label'] == 'NEGATIVE'
+            scorer = InterjectionScorer()
+            label, sentiment_score = scorer._get_sentiment(response_text)
+            is_negative = label == 'NEGATIVE'
            
            if is_negative:
                base_delta = sentiment_score * 0.15
@@ -517,10 +519,13 @@ class PersonaDialogue:
        channel: discord.TextChannel,
        responding_persona: str,
        context: str,
+        turn_count: int = 0,
    ) -> tuple:
        """
        Generate response AND continuation signal in a single LLM call.
        
+        Args:
+            turn_count: Current dialogue turn number (for question-override decay)
        Returns:
            Tuple of (response_text, should_continue, confidence)
        """
@@ -541,22 +546,21 @@ Respond naturally as yourself. Keep your response conversational and in-characte

 ---

-After your response, evaluate whether {opposite} would want to (or need to) respond.
+After your response, evaluate whether {opposite} would want to keep talking.

 The conversation should CONTINUE if ANY of these are true:
- You asked them a direct question (almost always YES)
- You made a provocative claim they'd dispute
- You challenged or insulted them
- The topic feels unfinished or confrontational
- There's clear tension or disagreement
+- You asked them a direct question (almost always YES — they need to answer)
+- You shared something they'd naturally react to or build on
+- The topic feels unfinished — there's more to explore
+- You left an opening for them to share their perspective

 The conversation might END if ALL of these are true:
 - No questions were asked
- You made a definitive closing statement ("I'm done", "whatever", "goodbye")
- The exchange reached complete resolution
- Both sides have said their piece
+- You made a clear closing statement or changed the subject definitively
+- The exchange feels naturally complete
+- Both sides have said their piece and there's nothing left hanging

-IMPORTANT: If you asked a question, the answer is almost always YES - they need to respond!
+IMPORTANT: This is a CONVERSATION, not a debate. Let it flow naturally. If you asked a question, the answer is almost always YES — they need to respond!

 On a new line after your response, write:
 [CONTINUE: YES or NO] [CONFIDENCE: HIGH, MEDIUM, or LOW]"""
@@ -578,11 +582,11 @@ On a new line after your response, write:
            return None, False, "LOW"
        
        # Parse response and signal
-        response_text, should_continue, confidence = self._parse_response(raw_response)
+        response_text, should_continue, confidence = self._parse_response(raw_response, turn_count=turn_count)
        
        return response_text, should_continue, confidence
    
-    def _parse_response(self, raw_response: str) -> tuple:
+    def _parse_response(self, raw_response: str, turn_count: int = 0) -> tuple:
        """Extract response text and continuation signal"""
        lines = raw_response.strip().split('\n')
        
@@ -615,33 +619,48 @@ On a new line after your response, write:
        response_text = re.sub(r'\[CONFIDENCE:\s*(HIGH|MEDIUM|LOW)\]', '', response_text)
        response_text = response_text.strip()
        
-        # Override: If the response contains a question mark, always continue
+        # Question override: if someone asks a question, the other should answer.
+        # But decay this after 6 turns to prevent infinite question-ping-pong.
        if '?' in response_text:
-            logger.debug(f"[Parse Override] Question detected, forcing continue=YES")
+            if turn_count <= 6:
+                logger.debug(f"[Parse Override] Question detected, forcing continue=YES (turn {turn_count})")
                should_continue = True
                if confidence == "LOW":
                    confidence = "MEDIUM"
+            else:
+                logger.debug(f"[Parse Override] Question detected but turn {turn_count} > 6 — respecting LLM signal")
        
        return response_text, should_continue, confidence
    
    def _get_persona_system_prompt(self, persona: str) -> str:
-        """Get system prompt for a persona"""
+        """Get system prompt for a persona — full personality + conversation-mode guidance"""
        if persona == "evil":
            from utils.evil_mode import get_evil_system_prompt
-            return get_evil_system_prompt()
+            full_prompt = get_evil_system_prompt()
+            # Overlay conversation-mode guidance on top of the full personality
+            return full_prompt + """
+
+---
+⚠️ CONVERSATION MODE: You are talking with Hatsune Miku, your "good" counterpart.
+This is a CONVERSATION, not an argument. You can be snarky, dark, and sarcastic — that's who you are — but you're not here to fight. You're here to TALK.
+
+CRITICAL CONVERSATION RULES:
+- If she asks you a question, ANSWER it. Don't deflect or dismiss — actually respond.
+- Engage with what she says. Build on her points, don't just swat them away.
+- Show curiosity about her perspective. Ask her questions too.
+- You can be dark and snarky while still having a real conversation.
+- Reveal something about yourself occasionally — a crack in the darkness.
+- Do NOT just repeat dismissive one-liners. That's boring, and you're not boring.
+- Do NOT launch into full attack mode or try to "win" — this isn't a competition."""
        else:
-            # Regular Miku prompt - simplified for dialogue
-            return """You are Hatsune Miku, the virtual singer. You are in a conversation with your alter ego, Evil Miku.
+            from utils.context_manager import get_miku_system_prompt_compact
+            full_prompt = get_miku_system_prompt_compact()
+            # Overlay conversation-mode guidance on top of the full personality
+            return full_prompt + """

-You are generally kind, bubbly, and optimistic, but you're not a pushover. You can be:
- Assertive when defending your values
- Frustrated when she's being cruel
- Curious about her perspective
- Hopeful that you can find common ground
- Playful when the mood allows
-
-Respond naturally and conversationally. Keep responses concise (1-3 sentences typically).
-You can use emojis naturally! ✨💙"""
+---
+⚠️ CONVERSATION MODE: You are talking with Evil Miku, your dark alter ego.
+This is a CONVERSATION, not an argument. Be yourself — kind, bubbly, optimistic — but you're not here to fight or defend your existence. Ask genuine questions. Share your feelings without attacking hers. Find common ground. Be curious, not defensive. Do NOT lecture her about being "good" or try to "fix" her. Just TALK. ✨💙"""
    
    # ========================================================================
    # DIALOGUE TURN HANDLING
@@ -682,6 +701,7 @@ You can use emojis naturally! ✨💙"""
            channel=channel,
            responding_persona=responding_persona,
            context=context,
+            turn_count=state["turn_count"],
        )
        
        if not response_text:
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -22,9 +22,7 @@ services:
      - LOG_LEVEL=debug  # Enable verbose logging for llama-swap

  llama-swap-amd:
-    build:
-      context: .
-      dockerfile: Dockerfile.llamaswap-rocm
+    image: ghcr.io/mostlygeek/llama-swap:rocm
    container_name: llama-swap-amd
    ports:
      - "8091:8080"  # Map host port 8091 to container port 8080
@@ -35,9 +33,6 @@ services:
    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri
-    group_add:
-      - "985"  # video group
-      - "989"  # render group
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
--- a/llama-swap-config.yaml
+++ b/llama-swap-config.yaml
@@ -5,7 +5,7 @@ models:
  # Main text generation model (Llama 3.1 8B)
  # Custom chat template to disable built-in tool calling
  llama3.1:
-    cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf -ngl 99 -c 16384 --host 0.0.0.0 --no-warmup --flash-attn on --chat-template-file /app/llama31_notool_template.jinja
+    cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf -ngl 99 -c 16384 --host 0.0.0.0 -fit off --no-warmup --flash-attn on --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0 --chat-template-file /app/llama31_notool_template.jinja
    ttl: 1800  # Unload after 30 minutes of inactivity (1800 seconds)
    swap: true  # CRITICAL: Unload other models when loading this one
    aliases:
@@ -14,7 +14,7 @@ models:
  
  # Evil/Uncensored text generation model (DarkIdol-Llama 3.1 8B)
  darkidol:
-    cmd: /app/llama-server --port ${PORT} --model /models/DarkIdol-Llama-3.1-8B-Instruct-1.3-Uncensored_Q4_K_M.gguf -ngl 99 -c 16384 --host 0.0.0.0 --no-warmup --flash-attn on
+    cmd: /app/llama-server --port ${PORT} --model /models/DarkIdol-Llama-3.1-8B-Instruct-1.3-Uncensored_Q4_K_M.gguf -ngl 99 -c 16384 --host 0.0.0.0 -fit off --no-warmup --flash-attn on --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0
    ttl: 1800  # Unload after 30 minutes of inactivity
    swap: true  # CRITICAL: Unload other models when loading this one
    aliases:
@@ -24,7 +24,7 @@ models:
  
  # Japanese language model (Llama 3.1 Swallow - Japanese optimized)
  swallow:
-    cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-Swallow-8B-Instruct-v0.5-Q4_K_M.gguf -ngl 99 -c 16384 --host 0.0.0.0 --no-warmup --flash-attn on
+    cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-Swallow-8B-Instruct-v0.5-Q4_K_M.gguf -ngl 99 -c 16384 --host 0.0.0.0 -fit off --no-warmup --flash-attn on --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0
    ttl: 1800  # Unload after 30 minutes of inactivity
    swap: true  # CRITICAL: Unload other models when loading this one
    aliases:
@@ -34,7 +34,7 @@ models:
    
  # Vision/Multimodal model (MiniCPM-V-4.5 - supports images, video, and GIFs)
  vision:
-    cmd: /app/llama-server --port ${PORT} --model /models/MiniCPM-V-4_5-Q3_K_S.gguf --mmproj /models/MiniCPM-V-4_5-mmproj-f16.gguf -ngl 99 -c 4096 --host 0.0.0.0 --no-warmup --flash-attn on
+    cmd: /app/llama-server --port ${PORT} --model /models/MiniCPM-V-4_5-Q3_K_S.gguf --mmproj /models/MiniCPM-V-4_5-mmproj-f16.gguf -ngl 99 -c 4096 --host 0.0.0.0 -fit off --no-warmup --flash-attn on --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0
    ttl: 900  # Vision model used less frequently, shorter TTL (15 minutes = 900 seconds)
    swap: true  # CRITICAL: Unload text models before loading vision
    aliases:
Author	SHA1	Message	Date
koko210Serve	9eb081efb1	llama-swap: use pre-built images (:cuda, :rocm) with GPU-specific flags - Drop custom Dockerfiles; docker-compose uses ghcr.io pre-built images which ship llama-swap + llama-server with no pinned versions (always latest) - NVIDIA GTX 1660 (6GB): add -fit off --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0 to fix OOM segfault with new llama.cpp b9014's GPU-side KV cache default - AMD RX 6800 (16GB): flags unchanged; KV cache stays on GPU for max speed - Both running llama-swap v211 + llama.cpp b9014 (2026-05-05)	2026-05-05 16:53:34 +03:00
koko210Serve	4e28236b06	fix: preserve collapsible subsection state across polling re-renders - Use stable section IDs (without Date.now()) so collapse state can be tracked across re-renders - Snapshot collapsed state before innerHTML replacement, restore after - Prevents the 10s polling from expanding all subsections every time	2026-05-02 16:17:26 +03:00
koko210Serve	c5e49c73df	fix: add cache-busting to prevent stale JS/CSS from breaking the UI - Added ?v=20260502 query param to all <script src=...> and <link> tags - Added Cache-Control: no-cache, no-store, must-revalidate to index route - Added <meta> cache-control tags in HTML head for extra coverage - This ensures the browser always fetches fresh HTML/JS/CSS after deploy, preventing the old loadLastPrompt() from running against new HTML (which would crash since #prompt-cat-info no longer exists)	2026-05-02 16:08:47 +03:00
koko210Serve	393921e524	fix: add min-height to #prompt-display and placeholder text in clearPromptDisplay() The empty #prompt-display div collapsed to 0 height, making it appear 'gone'. Added min-height: 3rem and a 'No prompt selected.' placeholder that clearPromptDisplay() now sets via innerHTML.	2026-05-02 15:55:19 +03:00
koko210Serve	2dd32d0ef1	fix: move <pre> outside #prompt-display to prevent innerHTML from destroying it The renderPromptEntry() function sets innerHTML on #prompt-display, which was wiping out the child <pre id="last-prompt"> element. This caused copyPromptToClipboard() to fail silently and the display to appear empty. Fix: keep <pre> as a hidden sibling outside #prompt-display, used only as a text buffer for the copy function.	2026-05-02 15:45:54 +03:00
koko210Serve	a980b90c0a	fix: escape content in buildCollapsibleSection, avoid double-escaping response	2026-05-02 15:27:18 +03:00
koko210Serve	6b922d84ae	frontend: rewrite Last Prompt as Prompt History viewer - status.js: replace loadLastPrompt() with loadPromptHistory() + helpers - fetch /prompts with optional source filter, populate dropdown - selectPromptEntry() renders metadata bar + collapsible subsections - parsePromptSections() splits full_prompt into System/Context/Conversation - buildCollapsibleSection() with toggle arrows (▼/▶) - copyPromptToClipboard() copies raw text - toggleMiddleTruncation() truncates response from middle - togglePromptHistoryCollapse() collapses entire section - legacy loadLastPrompt() delegates to loadPromptHistory() - core.js: add promptInterval to polling (10s), visibility resume - update switchPromptSource() for 'all' filter + new button IDs - update initPromptSourceToggle() default to 'all' - declare promptInterval variable	2026-05-02 15:25:05 +03:00
koko210Serve	f33e2afdf7	frontend: new Prompt History section HTML + CSS - Replace single <pre> Last Prompt with rich Prompt History viewer - Add source filter buttons (All/Cat/Fallback), history dropdown selector - Add metadata bar, copy-to-clipboard button, middle-truncation toggle - Add collapsible section CSS classes for expandable subsections	2026-05-02 15:19:10 +03:00
koko210Serve	87de8f8b3a	backend: replace LAST_FULL_PROMPT/LAST_CAT_INTERACTION with unified PROMPT_HISTORY deque - globals.py: add collections.deque(maxlen=10) PROMPT_HISTORY with _prompt_id_counter - globals.py: add legacy accessor functions _get_last_fallback_prompt() and _get_last_cat_interaction() - bot.py: append to PROMPT_HISTORY instead of setting LAST_CAT_INTERACTION, remove 500-char truncation, add guild/channel/model fields - image_handling.py: same pattern for Cat media responses - llm.py: append fallback prompts to PROMPT_HISTORY with response filled after LLM reply - routes/core.py: new GET /prompts and GET /prompts/{id} endpoints, legacy /prompt and /prompt/cat use accessor functions	2026-05-02 15:17:15 +03:00
koko210Serve	2d0c80b7ef	fix: prevent infinite dialogue loops + make Evil Miku actually engage - Question override now decays after 6 turns: after turn 6, the LLM's own [CONTINUE] signal is respected even when questions are asked. This prevents infinite question-ping-pong where both personas keep asking questions. - _parse_response now accepts turn_count parameter; generate_response_with_continuation and handle_dialogue_turn pass it through. - Rewrote Evil Miku's conversation-mode overlay with explicit CRITICAL RULES: ANSWER questions, engage with what she says, ask questions too, don't just repeat dismissive one-liners. The old overlay said 'be playful-cruel' but didn't actually tell her to participate in the conversation.	2026-04-30 15:39:53 +03:00
koko210Serve	17842f24d4	fix: remove broken personality snippet system — now redundant The snippet loader used wrong file paths (/app/cat/data/ instead of persona/) causing 'Loaded 0 personality snippets' for both personas. Since the previous commit now injects full system prompts (get_miku_system_prompt_compact and get_evil_system_prompt) into every argument exchange, the snippet system is redundant — all lore/lyrics/personality are already provided by the system prompts.	2026-04-30 15:16:43 +03:00
koko210Serve	4e064ad89b	fix: import is_persona_dialogue_active from correct module Was importing from utils.bipolar_mode instead of utils.persona_dialogue	2026-04-30 15:10:13 +03:00
koko210Serve	97c7133fdc	fix: both personas now use full system prompts in arguments and dialogues Created get_miku_system_prompt() and get_miku_system_prompt_compact() in context_manager.py — mirrors get_evil_system_prompt() so both personas have equally rich prompts with lore, lyrics, mood integration, and personality. Previously only Evil Miku had a proper system prompt function. Regular Miku's arguments and dialogues used a bare-bones hardcoded prompt with no lore/lyrics — making arguments feel flat compared to normal conversation. Changes: - context_manager.py: added get_miku_system_prompt() (full) and get_miku_system_prompt_compact() (lore+personality, no lyrics for tokens) - bipolar_mode.py: both argument prompt functions now accept system_prompt param; run_argument() builds miku_system and evil_system once and passes them to every exchange - persona_dialogue.py: dialogue prompts now use get_miku_system_prompt_compact() instead of hardcoded stub, matching Evil Miku's full prompt approach - Removed redundant hardcoded personality text from argument prompts since the system prompts now provide it	2026-04-30 15:07:55 +03:00